PIR Classification Database for Genomic Research

Information

  • NSF Award
  • 9974855
Owner
  • Award Id
    9974855
  • Award Effective Date
    10/1/1999 - 26 years ago
  • Award Expiration Date
    9/30/2002 - 23 years ago
  • Award Amount
    $ 283,147.00
  • Award Instrument
    Standard Grant

PIR Classification Database for Genomic Research

DBI 9974855; Wu, Cathy. National Biomedical Research Foundation. PIR Classification Database for Genomic Research<br/><br/>Project Summary<br/><br/><br/>As molecular sequence data continue to grow exponentially due to the Human Genome Project and other similar large sequencing projects, gaining a full understanding of the genome has become a great challenge in computational molecular biology. Advanced databases are needed to facilitate the retrieval of relevant information from the voluminous data and to provide insight into protein structure and function. The major objective of this research is to develop a prototype relational classification database that will provide an integrated platform for describing comprehensive family relationships and structural and functional features of proteins. The database will combine the classification schemes of several widely used sequence, structure, and function databases, allowing users to retrieve conveniently displayed summaries for any given sequence and to query for sequences and families with selected properties. The database itself will be automatically generated from the source databases and updated frequently. For complete and non-overlapping placement of all proteins, the PIR superfamily/family classification will be used as the underlying organization scheme. The basic superfamily entity will have attributes such as membership, annotation, and relationship to families, domains, and motifs defined by other classification schemes. In addition, a supplementary motif collection will be compiled to provide flexible and diagnostic patterns for functional and structural motifs not found in other motif databases. The proposed research will build upon and extend from our current databases. The ORACLE relational implementation will support database queries for asking questions concerning combinations of attributes for protein sequences or families and their relationships. Users will be able to perform sequence searches, query against the databases, and offer feedback information. The report will include graphical displays of features and relationships, as well as hypertext links to underlying databases. The framework of the proposed classification database has several features. The PIR superfamily is the only existing classification scheme supporting inclusive and unique clustering of all proteins, which is a prerequisite for complete database organization. The design supports genomic annotation based on global and motif sequence similarities, as well as sequence annotation. Each superfamily record in the classification database will document family relationships and features more comprehensively than any other single information resource. This database will be both a very useful tool for researchers and a prototype for construction of a more extensive database that includes more types of information and connections to additional structure and function classification databases. The classification database will be a significant informational resource to assist the identification of new genomic sequences, the inference of new biological knowledge, and the improvement of database integrity.

  • Program Officer
    Sylvia J. Spengler
  • Min Amd Letter Date
    9/23/1999 - 26 years ago
  • Max Amd Letter Date
    9/23/1999 - 26 years ago
  • ARRA Amount

Institutions

  • Name
    National Biomedical Research Foundation
  • City
    Washington
  • State
    DC
  • Country
    United States
  • Address
    3900 Reservoir Rd NW
  • Postal Code
    200072187
  • Phone Number
    2026872121

Investigators

  • First Name
    Winona
  • Last Name
    Barker
  • Email Address
    barker@nbrf.georgetown.edu
  • Start Date
    9/23/1999 12:00:00 AM
  • First Name
    Cathy
  • Last Name
    Wu
  • Email Address
    wuc@udel.edu
  • Start Date
    9/23/1999 12:00:00 AM

FOA Information

  • Name
    Data Banks & Software Design
  • Code
    510204
  • Name
    Structure & Function
  • Code
    510301