New Computational Methods for Data-driven Protein Structure Prediction

Information

  • Research Project
  • 9149287
  • ApplicationId
    9149287
  • Core Project Number
    R01GM089753
  • Full Project Number
    5R01GM089753-07
  • Serial Number
    089753
  • FOA Number
    PA-13-302
  • Sub Project Id
  • Project Start Date
    5/14/2010 - 14 years ago
  • Project End Date
    8/31/2019 - 5 years ago
  • Program Officer Name
    KREPKIY, DMITRIY
  • Budget Start Date
    9/1/2016 - 8 years ago
  • Budget End Date
    8/31/2017 - 7 years ago
  • Fiscal Year
    2016
  • Support Year
    07
  • Suffix
  • Award Notice Date
    8/24/2016 - 8 years ago

New Computational Methods for Data-driven Protein Structure Prediction

Proteins and their interactions play fundamental roles in all biological processes. Accurate description of protein structure and interactions is a fundamental step towards understanding biological life and highly relevant in the development of therapeutics and drugs. However, there is a large gap between the number of available protein sequences and the number of proteins (complexes) with solved structures and accurate interaction description, which has to be filled by computational prediction. The long-term goal of this project is to apply statistical machine learning and optimization algorithms to understand protein sequence-structure-function relationship by analyzing low- and high-throughput sequence, structure and functional data and to develop algorithms for structure and functional prediction. Our hypothesis is that by developing sophisticated algorithms to take advantage of the growing sequence and structure data, we can model sequence-structure relationship much more accurately and significantly improve structure and functional prediction, in particular for this proposal, residue (atomic) interaction strength prediction and remote homology detection. This project has produced a few CASP-winning, widely-used data-driven algorithms and web server (http://raptorx.uchicago.edu) for monomer protein modeling. This renewal will not only further develop machine learning algorithms (especially Deep Learning and probabilistic graphical models) for monomer proteins, but also branch out to protein interactions (complexes). The specific aims are: (1) develop novel structure learning algorithms to predict inter-reside contacts and coevolved residues; (2) develop context-specific, coevolution-based, and distance-dependent statistical potentials using a new machine learning model called Deep Conditional (Markov) Neural Fields (DeepCNF); (3) develop Markov Random Fields (MRF) and DeepCNF methods for remote protein (interface/complex) homology detection and fold recognition to make use of long-range residue interaction predicted by the first two aims. This renewal will lead to further understanding and new models of protein sequence-structure-function relationship and yield publicly available software and servers for automated, accurate, quantitative analysis for a wide range of proteins and their interactions. The impact will be multiplied by tens of thousands of worldwide users employing the resulting software/servers to study a wide variety of proteins and interactions relevant to basic biological research and human diseases, in both low- and high-throughput experiments.

IC Name
NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
  • Activity
    R01
  • Administering IC
    GM
  • Application Type
    5
  • Direct Cost Amount
    227500
  • Indirect Cost Amount
    78925
  • Total Cost
    306425
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    859
  • Ed Inst. Type
  • Funding ICs
    NIGMS:306425\
  • Funding Mechanism
    Non-SBIR/STTR RPGs
  • Study Section
    BDMA
  • Study Section Name
    Biodata Management and Analysis Study Section
  • Organization Name
    TOYOTA TECHNOLOGICAL INSTITUTE / CHICAGO
  • Organization Department
  • Organization DUNS
    127228927
  • Organization City
    CHICAGO
  • Organization State
    IL
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    606372803
  • Organization District
    UNITED STATES