COMPUTATIONAL LINGUISTIC ANALYSIS OF GENETIC INFORMATION

Information

  • Research Project
  • 3431546
  • ApplicationId
    3431546
  • Core Project Number
    R03RR004522
  • Full Project Number
    1R03RR004522-01
  • Serial Number
    4522
  • FOA Number
  • Sub Project Id
  • Project Start Date
    7/15/1988 - 36 years ago
  • Project End Date
    7/14/1989 - 35 years ago
  • Program Officer Name
  • Budget Start Date
    7/15/1988 - 36 years ago
  • Budget End Date
    7/14/1989 - 35 years ago
  • Fiscal Year
    1988
  • Support Year
    1
  • Suffix
  • Award Notice Date
    7/6/1988 - 36 years ago
Organizations

COMPUTATIONAL LINGUISTIC ANALYSIS OF GENETIC INFORMATION

Computerized DNA sequence analysis is currently accomplished through the use of a large set of tools, ranging from generic regular expression search algorithms for pattern-matching on large sequence databases, to specialized similarity algorithms for discovering longer sets of sequences with potential evolutionary relatedness, to sophisticated ad hoc programs for search and analysis based on higher-order properties of DNA sequences. The proposed work would attempt to consolidate the wide range of approaches to such activities, by undertaking to treat the genome as language, bringing to bear the tools of computational linguistics to established a formal basis for describing genetic information. This will be done using the formalism of logic grammars (or Definite Clause Grammars), and extensible, Prolog- based system for specifying languages of greater than context- free power. This will extend DNA search capabilities well beyond the known limitations of current regular expression search programs, and should in addition subsume many specialized programs, because of the increased linguistic power available. The unified conceptual framework provided by such a system would provide a clear, hierarchical presentation of varying levels of abstraction on the genome, presenting the opportunity for (1) specifying searches for more sophisticated genetic elements over large sequence databases (such as those likely to be produced by the Human Genome Sequencing Project); (2) an interactive system for adjusting definitions of such elements to account for data; and (3) the foundation for an experiment-planning system based on a procedural interpretation of the declarative grammar. To achieve these goals, it will be necessary to address issues of computational efficiency and systematic extensions in linguistic power, making use of current approaches to parsing and natural language processing.

IC Name
NATIONAL CENTER FOR RESEARCH RESOURCES
  • Activity
    R03
  • Administering IC
    RR
  • Application Type
    1
  • Direct Cost Amount
  • Indirect Cost Amount
  • Total Cost
  • Sub Project Total Cost
  • ARRA Funded
  • CFDA Code
    371
  • Ed Inst. Type
  • Funding ICs
  • Funding Mechanism
  • Study Section
    BRC
  • Study Section Name
    Biotechnology Resources Review Committee
  • Organization Name
    UNISYS
  • Organization Department
  • Organization DUNS
  • Organization City
    PAOLI
  • Organization State
    PA
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    19301
  • Organization District
    UNITED STATES