Novel Use of Genome Information to Understand Mutations

Information

Research Project
10303852

ApplicationId
10303852
Core Project Number
R01HG012117
Full Project Number
1R01HG012117-01
Serial Number
012117
FOA Number
PAR-18-844
Sub Project Id

Project Start Date
9/13/2021 - 3 years ago
Project End Date
6/30/2026 - a year from now
Program Officer Name
BROOKS, LISA
Budget Start Date
9/13/2021 - 3 years ago
Budget End Date
6/30/2022 - 2 years ago
Fiscal Year
2021
Support Year
01
Suffix
Award Notice Date
9/13/2021 - 3 years ago

Organizations

Iowa State University

Information

Novel Use of Genome Information to Understand Mutations

There are significant advantages from translating genome sequences into proteins, where there is a large body of accumulated knowledge regarding their relationships among sequence, structure and function. Advances in genome sequencing are producing a deluge of data that can be used to train and test prediction methods to identify the characteristics of various mutants by building atop the large functional protein data. Clinicians need to know the functional behavior of mutants - whether they are neutral or deleterious - whether they affect protein structure ? whether they affect protein dynamics - whether they affect protein binding specificity. Protein structures have local environments for each amino acid in the sequence, and usually amino acids at each position are compatible with their local environment. This leads to strongly correlated amino acids as manifested in the multiple sequence alignments. This project will combine protein sequence and structure data together with amino acid properties and their correlations to characterize each site in the protein structure to investigate the hypothesis that outliers in the distributions over the important amino acid properties for each position will negatively impact functionality, i.e. they will be deleterious mutants. The project will drill down deeply to learn what is the nature of the impaired mechanism. Two diverse approaches will be taken in the two aims: Aim 1 will investigate the amino acid property distributions to identify the properties that best characterize each position in the sequence and structure, and determine how the outliers negatively impact the functional structures, dynamics and binding characteristics. Preliminary results show that the deleterious mutants usually have a significantly broader range of single amino acid properties for the deleterious mutants. Data from these analyses will be fed into Aim 2 where two type of machine learning approaches ? Extreme Learning Machines and Random Forests will be jointly applied. Preliminary results show that incorporating just one amino acid property yields significant gains over existing methods. One of the major strengths of this project is that results from the two Aims will be exchanged frequently to achieve improved predictions for both approaches. The project builds on the long experience of the PIs in datamining from protein structures and sequences, as well as previous machine learning applications. Important potential outcomes include a more reliable, more informed understanding of how mutants affect function. In addition, the project aims to predict connections of mutants to specific diseases. The results of the project will be important for drug development, because the specific part of the protein where function is impaired will be identified, to allow drug developers to narrow their focus onto more limited parts of a protein that is targeted for drug design. The predictors established by this project will also have the potential to screen for large numbers of previously unknown mutations that could be used to identify specific regions of a protein structure susceptible to further disease-related mutations.

IC Name

NATIONAL HUMAN GENOME RESEARCH INSTITUTE

Activity
R01
Administering IC
HG
Application Type
1

Direct Cost Amount
395986
Indirect Cost Amount
84622
Total Cost
480608
Sub Project Total Cost

ARRA Funded
False
CFDA Code
172
Ed Inst. Type
EARTH SCIENCES/RESOURCES
Funding ICs
NHGRI:480608\
Funding Mechanism
Non-SBIR/STTR RPGs
Study Section
ZRG1
Study Section Name
Special Emphasis Panel

Organization Name
IOWA STATE UNIVERSITY
Organization Department
BIOCHEMISTRY
Organization DUNS
005309844
Organization City
AMES
Organization State
IA
Organization Country
UNITED STATES
Organization Zip Code
500112025
Organization District
UNITED STATES

Novel Use of Genome Information to Understand Mutations

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

Novel Use of Genome Information to Understand Mutations

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District