New Computational Methods for Data-driven Protein Structure Prediction

Information

Research Project
7764110

ApplicationId
7764110
Core Project Number
R01GM089753
Full Project Number
1R01GM089753-01
Serial Number
89753
FOA Number
PA-07-070
Sub Project Id

Project Start Date
5/14/2010 - 14 years ago
Project End Date
4/30/2015 - 10 years ago
Program Officer Name
LYSTER, PETER
Budget Start Date
5/14/2010 - 14 years ago
Budget End Date
4/30/2011 - 14 years ago
Fiscal Year
2010
Support Year
1
Suffix
Award Notice Date
5/12/2010 - 14 years ago

Organizations

Toyota Jidosha Kabushiki Kaisha

Information

New Computational Methods for Data-driven Protein Structure Prediction

DESCRIPTION (provided by applicant): Proteins play a central role in all biological processes. Akin to the complete sequencing of genomes, complete description of protein structures is a fundamental step towards understanding biological life, and is also highly relevant medically in the development of therapeutics and drugs. The broad, long-term goal of the project is to develop machine learning methods for data-driven protein structure prediction through two independent but complementary strategies: 1) much more accurate template-based modeling for proteins with remote homologs in the Protein Data Bank and 2) better template-free modeling method for proteins without detectable templates and for improving template-based models. The specific aims are: Aim 1) to greatly improve template-based modeling by 1a) improving protein sequence-template alignment using a regression-tree-based nonlinear scoring function, especially when good sequence profiles are unavailable;and 1b) improving fold recognition using a machine learning method to combine both residue-level and atom-level features;Aim 2) to improve protein conformation sampling in a continuous space and thus template-free modeling by three independent but complementary approaches: 2a) modeling nonlinear sequence- structure relationship using Conditional (Markov) Random Fields (CRF) models;2b) simultaneously sampling secondary and tertiary structure;and 2c) learning structure information from template. The core of the project is to develop various CRF models for data-driven protein structure prediction, by learning protein sequence-structure relationship from existing sequence/structure databases. The product of this research includes a regression-tree-based CRF model for accurate protein alignment, especially for proteins without close homologs in the PDB or without very good sequence profiles;a SVM model for protein fold recognition;a few CRF models for efficient protein conformation sampling in a continuous space;and a complete protein structure prediction software package. Also, it will produce a web server publicly available for various academic and biomedical users. Protein structure prediction will lead to a broad range of biomedical applications, such as the development of novel diagnostics, better understanding of disease processes and improved preventive therapies leading to reduced health care costs. Protein modeling is also widely applied in the pharmaceutical industry and integrated into most stages of pharmaceutical research. PUBLIC HEALTH RELEVANCE: Novel protein structure prediction will lead to a broad range of biomedical applications, such as the development of novel diagnostics, better understanding of disease processes and improved preventive therapies leading to reduced health care costs. Protein modeling is also widely applied in the pharmaceutical industry and integrated into most stages of pharmaceutical research.

IC Name

NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES

Activity
R01
Administering IC
GM
Application Type
1

Direct Cost Amount
Indirect Cost Amount
Total Cost
268555
Sub Project Total Cost

ARRA Funded
False
CFDA Code
859
Ed Inst. Type
Funding ICs
NIGMS:268555\
Funding Mechanism
Research Projects
Study Section
BDMA
Study Section Name
Biodata Management and Analysis Study Section

Organization Name
TOYOTA TECHNOLOGICAL INSTITUTE / CHICAGO
Organization Department
Organization DUNS
127228927
Organization City
CHICAGO
Organization State
IL
Organization Country
UNITED STATES
Organization Zip Code
606372803
Organization District
UNITED STATES

New Computational Methods for Data-driven Protein Structure Prediction

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

New Computational Methods for Data-driven Protein Structure Prediction

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District