Computational Gene Modeling and Genome Sequence Assembly

Information

  • Research Project
  • 6802700
  • ApplicationId
    6802700
  • Core Project Number
    R01LM006845
  • Full Project Number
    5R01LM006845-06
  • Serial Number
    6845
  • FOA Number
  • Sub Project Id
  • Project Start Date
    9/1/1999 - 25 years ago
  • Project End Date
    9/29/2005 - 18 years ago
  • Program Officer Name
    YE, JANE
  • Budget Start Date
    9/30/2004 - 19 years ago
  • Budget End Date
    9/29/2005 - 18 years ago
  • Fiscal Year
    2004
  • Support Year
    6
  • Suffix
  • Award Notice Date
    9/10/2004 - 20 years ago

Computational Gene Modeling and Genome Sequence Assembly

DESCRIPTION (provided by applicant): This project will address two major bioinformatics problems: the development of new and improved software for finding genes in eukaryotic genome sequences, and the development of a sequence assembler that is capable of assembling very large genomes. The gene finding project will pursue three tracks: first, we will improve our existing eukaryotic gene finding system, GlimmerM, adding the ability to recognize new sequence patterns and enhancing the ease with which the system can be adapted to new organisms. Second, we will develop a new gene finder, based on Pair Hidden Markov Models (PHMMs), which will use the sequence similarity between two related organisms to find genes in both species simultaneously. Third, we will develop a system for integrating the output from multiple gene finders and from sequence alignment programs in order to produce gene models that incorporate all available evidence. The assembler project will include the development of several major components. The overall goal is to build a sequence assembler that will be able to assemble data from whole-genome shotgun sequencing projects for genomes ranging from a few million base pairs up to billions of base pairs. The assembler will have the ability to accept as input both raw sequencing reads and a mixture of reads and already-assembled sequences. A separate scaffold-building program will create larger scaffolds from a set of assemblies by using information from paired-end sequences. In addition, this project will develop and distribute a genome assembler benchmark set, containing sequences from shotgun sequencing projects for which the correct assembly is known. For all of the software development projects, the source code will be made freely available to investigators in the scientific research community worldwide.

IC Name
NATIONAL LIBRARY OF MEDICINE
  • Activity
    R01
  • Administering IC
    LM
  • Application Type
    5
  • Direct Cost Amount
  • Indirect Cost Amount
  • Total Cost
    671976
  • Sub Project Total Cost
  • ARRA Funded
  • CFDA Code
    879
  • Ed Inst. Type
  • Funding ICs
    NLM:671976\
  • Funding Mechanism
  • Study Section
    BLR
  • Study Section Name
    Biomedical Library Review Committee
  • Organization Name
    INSTITUTE FOR GENOMIC RESEARCH
  • Organization Department
  • Organization DUNS
  • Organization City
    ROCKVILLE
  • Organization State
    MD
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    20850
  • Organization District
    UNITED STATES