HMMER and Infernal: Finding distant homologs of sequences and RNA structures

Information

Research Project
10296226

ApplicationId
10296226
Core Project Number
R01HG009116
Full Project Number
2R01HG009116-05
Serial Number
009116
FOA Number
PA-20-185
Sub Project Id

Project Start Date
9/16/2016 - 7 years ago
Project End Date
6/30/2026 - 2 years from now
Program Officer Name
SEN, SHURJO KUMAR
Budget Start Date
9/10/2021 - 2 years ago
Budget End Date
6/30/2022 - a year ago
Fiscal Year
2021
Support Year
05
Suffix
Award Notice Date
9/10/2021 - 2 years ago

Organizations

Harvard University

Information

HMMER and Infernal: Finding distant homologs of sequences and RNA structures

Project Summary/Abstract Genome sequence data is now available for hundreds of thousands of species. Our ability to exploit this vast trove of information about the molecular basis and evolution of life depends on sophisticated computational analysis tools. One important class of tools is pro?le analysis software, for making consensus statistical models of multiple alignments of biological sequence families, and for using those models to sensitively detect homologs and make deep multiple alignments. Pro?le analysis derives its power from the fact that despite the unbounded growth of sequence data, the majority of functional sequences can be condensed into a manageably small number of conserved families. Pro?le software underlies numerous protein, RNA, and DNA sequence family databases. The systematic availability of deep multiple alignments (of many thousands of sequences) is enabling revolutionary advances in predicting molecular function and 3D structure by comparative sequence analysis. The HMMER and Infernal software packages from our laboratory are some of the most widely used tools for pro?le analysis. HMMER implements pro?le hidden Markov models (pro?le HMMs) of primary sequence consensus, typically for protein domains and conserved DNA elements. Infernal implements pro?le stochastic context-free grammars (pro?le SCFGs) of RNA secondary structure and sequence consensus. In the context of the continued development of these packages, this proposal has three speci?c aims for new lines of research that we expect to lead to major improvements in the accuracy, utility, and computational ef?ciency of pro?le anal- ysis. The ?rst aim proposes to develop a discontinuous Markov model of nonhomologous sequences, to improve the ability to distinguish homologs from nonhomologs and reduce the false positive rate of database searches. The second aim proposes to develop sketching methods for ef?ciently representing the voluminous results of a database homology search with a subset of the most phylogenetically informative hits. The third aim proposes to develop adaptive computation methods to ?exibly harness the complex mix of CPU/GPU processors, mem- ory, and storage in modern hardware architectures, enabling ef?cient scalable computation and near-interactive database search times.

IC Name

NATIONAL HUMAN GENOME RESEARCH INSTITUTE

Activity
R01
Administering IC
HG
Application Type
2

Direct Cost Amount
315651
Indirect Cost Amount
217799
Total Cost
533450
Sub Project Total Cost

ARRA Funded
False
CFDA Code
172
Ed Inst. Type
SCHOOLS OF ARTS AND SCIENCES
Funding ICs
NHGRI:533450\
Funding Mechanism
Non-SBIR/STTR RPGs
Study Section
GCAT
Study Section Name
Genomics, Computational Biology and Technology Study Section

Organization Name
HARVARD UNIVERSITY
Organization Department
MICROBIOLOGY/IMMUN/VIROLOGY
Organization DUNS
082359691
Organization City
CAMBRIDGE
Organization State
MA
Organization Country
UNITED STATES
Organization Zip Code
021385319
Organization District
UNITED STATES

HMMER and Infernal: Finding distant homologs of sequences and RNA structures

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

HMMER and Infernal: Finding distant homologs of sequences and RNA structures

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District