Computational Gene Modeling and Genome Sequence Assembly

Information

Research Project
6802700

ApplicationId
6802700
Core Project Number
R01LM006845
Full Project Number
5R01LM006845-06
Serial Number
6845
FOA Number
Sub Project Id

Project Start Date
9/1/1999 - 25 years ago
Project End Date
9/29/2005 - 19 years ago
Program Officer Name
YE, JANE
Budget Start Date
9/30/2004 - 20 years ago
Budget End Date
9/29/2005 - 19 years ago
Fiscal Year
2004
Support Year
6
Suffix
Award Notice Date
9/10/2004 - 20 years ago

Organizations

Institute for Genomic Research

Information

Computational Gene Modeling and Genome Sequence Assembly

DESCRIPTION (provided by applicant): This project will address two major bioinformatics problems: the development of new and improved software for finding genes in eukaryotic genome sequences, and the development of a sequence assembler that is capable of assembling very large genomes. The gene finding project will pursue three tracks: first, we will improve our existing eukaryotic gene finding system, GlimmerM, adding the ability to recognize new sequence patterns and enhancing the ease with which the system can be adapted to new organisms. Second, we will develop a new gene finder, based on Pair Hidden Markov Models (PHMMs), which will use the sequence similarity between two related organisms to find genes in both species simultaneously. Third, we will develop a system for integrating the output from multiple gene finders and from sequence alignment programs in order to produce gene models that incorporate all available evidence. The assembler project will include the development of several major components. The overall goal is to build a sequence assembler that will be able to assemble data from whole-genome shotgun sequencing projects for genomes ranging from a few million base pairs up to billions of base pairs. The assembler will have the ability to accept as input both raw sequencing reads and a mixture of reads and already-assembled sequences. A separate scaffold-building program will create larger scaffolds from a set of assemblies by using information from paired-end sequences. In addition, this project will develop and distribute a genome assembler benchmark set, containing sequences from shotgun sequencing projects for which the correct assembly is known. For all of the software development projects, the source code will be made freely available to investigators in the scientific research community worldwide.

IC Name

NATIONAL LIBRARY OF MEDICINE

Activity
R01
Administering IC
LM
Application Type
5

Direct Cost Amount
Indirect Cost Amount
Total Cost
671976
Sub Project Total Cost

ARRA Funded
CFDA Code
879
Ed Inst. Type
Funding ICs
NLM:671976\
Funding Mechanism
Study Section
BLR
Study Section Name
Biomedical Library Review Committee

Organization Name
INSTITUTE FOR GENOMIC RESEARCH
Organization Department
Organization DUNS
Organization City
ROCKVILLE
Organization State
MD
Organization Country
UNITED STATES
Organization Zip Code
20850
Organization District
UNITED STATES

Computational Gene Modeling and Genome Sequence Assembly

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

Computational Gene Modeling and Genome Sequence Assembly

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District