COMPUTATIONAL LINGUISTIC ANALYSIS OF GENETIC INFORMATION

Information

Research Project
3431546

ApplicationId
3431546
Core Project Number
R03RR004522
Full Project Number
1R03RR004522-01
Serial Number
4522
FOA Number
Sub Project Id

Project Start Date
7/15/1988 - 36 years ago
Project End Date
7/14/1989 - 35 years ago
Program Officer Name
Budget Start Date
7/15/1988 - 36 years ago
Budget End Date
7/14/1989 - 35 years ago
Fiscal Year
1988
Support Year
1
Suffix
Award Notice Date
7/6/1988 - 36 years ago

Organizations

Unisys Corporation

Information

COMPUTATIONAL LINGUISTIC ANALYSIS OF GENETIC INFORMATION

Computerized DNA sequence analysis is currently accomplished through the use of a large set of tools, ranging from generic regular expression search algorithms for pattern-matching on large sequence databases, to specialized similarity algorithms for discovering longer sets of sequences with potential evolutionary relatedness, to sophisticated ad hoc programs for search and analysis based on higher-order properties of DNA sequences. The proposed work would attempt to consolidate the wide range of approaches to such activities, by undertaking to treat the genome as language, bringing to bear the tools of computational linguistics to established a formal basis for describing genetic information. This will be done using the formalism of logic grammars (or Definite Clause Grammars), and extensible, Prolog- based system for specifying languages of greater than context- free power. This will extend DNA search capabilities well beyond the known limitations of current regular expression search programs, and should in addition subsume many specialized programs, because of the increased linguistic power available. The unified conceptual framework provided by such a system would provide a clear, hierarchical presentation of varying levels of abstraction on the genome, presenting the opportunity for (1) specifying searches for more sophisticated genetic elements over large sequence databases (such as those likely to be produced by the Human Genome Sequencing Project); (2) an interactive system for adjusting definitions of such elements to account for data; and (3) the foundation for an experiment-planning system based on a procedural interpretation of the declarative grammar. To achieve these goals, it will be necessary to address issues of computational efficiency and systematic extensions in linguistic power, making use of current approaches to parsing and natural language processing.

IC Name

NATIONAL CENTER FOR RESEARCH RESOURCES

Activity
R03
Administering IC
RR
Application Type
1

Direct Cost Amount
Indirect Cost Amount
Total Cost
Sub Project Total Cost

ARRA Funded
CFDA Code
371
Ed Inst. Type
Funding ICs
Funding Mechanism
Study Section
BRC
Study Section Name
Biotechnology Resources Review Committee

Organization Name
UNISYS
Organization Department
Organization DUNS
Organization City
PAOLI
Organization State
PA
Organization Country
UNITED STATES
Organization Zip Code
19301
Organization District
UNITED STATES

COMPUTATIONAL LINGUISTIC ANALYSIS OF GENETIC INFORMATION

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

COMPUTATIONAL LINGUISTIC ANALYSIS OF GENETIC INFORMATION

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District