New Computational Methods for Data-driven Protein Structure Prediction

Information

  • Research Project
  • 8269822
  • ApplicationId
    8269822
  • Core Project Number
    R01GM089753
  • Full Project Number
    5R01GM089753-03
  • Serial Number
    089753
  • FOA Number
    PA-07-070
  • Sub Project Id
  • Project Start Date
    5/14/2010 - 14 years ago
  • Project End Date
    4/30/2015 - 9 years ago
  • Program Officer Name
    LYSTER, PETER
  • Budget Start Date
    5/1/2012 - 12 years ago
  • Budget End Date
    4/30/2013 - 11 years ago
  • Fiscal Year
    2012
  • Support Year
    03
  • Suffix
  • Award Notice Date
    4/27/2012 - 12 years ago

New Computational Methods for Data-driven Protein Structure Prediction

DESCRIPTION (provided by applicant): Proteins play a central role in all biological processes. Akin to the complete sequencing of genomes, complete description of protein structures is a fundamental step towards understanding biological life, and is also highly relevant medically in the development of therapeutics and drugs. The broad, long-term goal of the project is to develop machine learning methods for data-driven protein structure prediction through two independent but complementary strategies: 1) much more accurate template-based modeling for proteins with remote homologs in the Protein Data Bank and 2) better template-free modeling method for proteins without detectable templates and for improving template-based models. The specific aims are: Aim 1) to greatly improve template-based modeling by 1a) improving protein sequence-template alignment using a regression-tree-based nonlinear scoring function, especially when good sequence profiles are unavailable; and 1b) improving fold recognition using a machine learning method to combine both residue-level and atom-level features; Aim 2) to improve protein conformation sampling in a continuous space and thus template-free modeling by three independent but complementary approaches: 2a) modeling nonlinear sequence- structure relationship using Conditional (Markov) Random Fields (CRF) models; 2b) simultaneously sampling secondary and tertiary structure; and 2c) learning structure information from template. The core of the project is to develop various CRF models for data-driven protein structure prediction, by learning protein sequence-structure relationship from existing sequence/structure databases. The product of this research includes a regression-tree-based CRF model for accurate protein alignment, especially for proteins without close homologs in the PDB or without very good sequence profiles; a SVM model for protein fold recognition; a few CRF models for efficient protein conformation sampling in a continuous space; and a complete protein structure prediction software package. Also, it will produce a web server publicly available for various academic and biomedical users. Protein structure prediction will lead to a broad range of biomedical applications, such as the development of novel diagnostics, better understanding of disease processes and improved preventive therapies leading to reduced health care costs. Protein modeling is also widely applied in the pharmaceutical industry and integrated into most stages of pharmaceutical research. PUBLIC HEALTH RELEVANCE: Novel protein structure prediction will lead to a broad range of biomedical applications, such as the development of novel diagnostics, better understanding of disease processes and improved preventive therapies leading to reduced health care costs. Protein modeling is also widely applied in the pharmaceutical industry and integrated into most stages of pharmaceutical research.

IC Name
NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
  • Activity
    R01
  • Administering IC
    GM
  • Application Type
    5
  • Direct Cost Amount
    193050
  • Indirect Cost Amount
    72819
  • Total Cost
    265869
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    859
  • Ed Inst. Type
  • Funding ICs
    NIGMS:265869\
  • Funding Mechanism
    Non-SBIR/STTR RPGs
  • Study Section
    BDMA
  • Study Section Name
    Biodata Management and Analysis Study Section
  • Organization Name
    TOYOTA TECHNOLOGICAL INSTITUTE / CHICAGO
  • Organization Department
  • Organization DUNS
    127228927
  • Organization City
    CHICAGO
  • Organization State
    IL
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    606372803
  • Organization District
    UNITED STATES