New Computational Methods for Data-driven Protein Structure Prediction

Information

  • Research Project
  • 10246779
  • ApplicationId
    10246779
  • Core Project Number
    R01GM089753
  • Full Project Number
    5R01GM089753-11
  • Serial Number
    089753
  • FOA Number
    PA-18-484
  • Sub Project Id
  • Project Start Date
    5/14/2010 - 14 years ago
  • Project End Date
    8/31/2024 - 4 months ago
  • Program Officer Name
    LYSTER, PETER
  • Budget Start Date
    9/1/2021 - 3 years ago
  • Budget End Date
    8/31/2022 - 2 years ago
  • Fiscal Year
    2021
  • Support Year
    11
  • Suffix
  • Award Notice Date
    8/30/2021 - 3 years ago

New Computational Methods for Data-driven Protein Structure Prediction

Proteins play fundamental roles in all biological processes. Accurate description of protein structure is an important step towards understanding of biological life and highly relevant in the development of therapeutics and drugs. Although experimental structure determination has been greatly improved, there is still a very large gap between the number of available protein sequences and that of solved protein structures, which can only be filled by computational prediction. The long-term goal of this project is to apply machine learning and optimization algorithms to understand protein sequence-structure-function relationship by analyzing sequence, structure and functional data and to develop data-driven computational methods and tools for structure and functional prediction. We believe that by developing sophisticated algorithms to extract knowledge from the increasing sequence and structure data, we can model protein sequence-structure relationship very accurately and improve structure and functional prediction greatly. This project has already produced a few CASP-winning, widely-used data- driven algorithms and web servers (http://raptorx.uchicago.edu) for protein structure modeling. This renewal will further develop machine learning (especially deep learning) algorithms for protein structure modeling without good templates. The specific aims are: (1) developing deep learning (DL) algorithms for the prediction of protein contact and distance matrix; (2) developing distance-based algorithms for fast and accurate ab initio folding of proteins without templates; (3) developing DL algorithms for template-based modeling with only weakly similar templates. This renewal will lead to further understanding and new models of protein sequence-structure relationship and yield publicly available resources for automated, accurate, quantitative analysis for a wide range of proteins. The impact will be multiplied by tens of thousands of worldwide users employing our web servers to study a wide variety of proteins relevant to basic biological research and human diseases, in both low- and high-throughput experiments.

IC Name
NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
  • Activity
    R01
  • Administering IC
    GM
  • Application Type
    5
  • Direct Cost Amount
    240000
  • Indirect Cost Amount
    81599
  • Total Cost
    321599
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    859
  • Ed Inst. Type
  • Funding ICs
    NIGMS:321599\
  • Funding Mechanism
    Non-SBIR/STTR RPGs
  • Study Section
    MSFD
  • Study Section Name
    Macromolecular Structure and Function D Study Section
  • Organization Name
    TOYOTA TECHNOLOGICAL INSTITUTE / CHICAGO
  • Organization Department
  • Organization DUNS
    127228927
  • Organization City
    CHICAGO
  • Organization State
    IL
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    606372803
  • Organization District
    UNITED STATES