BCSP: ABI Innovation: Collaborative Research: Predicting changes in protein activity from changes in sequence by identifying the underlying Biophysical Conditional Random Field

Information

  • NSF Award
  • 1262457
Owner
  • Award Id
    1262457
  • Award Effective Date
    6/1/2014 - 10 years ago
  • Award Expiration Date
    5/31/2017 - 7 years ago
  • Award Amount
    $ 255,276.00
  • Award Instrument
    Standard Grant

BCSP: ABI Innovation: Collaborative Research: Predicting changes in protein activity from changes in sequence by identifying the underlying Biophysical Conditional Random Field

Proteins are the molecular machines that are responsible for a vast array of functions that are necessary for life. Understanding how they work is critical to both a better scientific understanding of the fundamental processes of life, and to modifying or improving their function. Despite the fact that proteins are physically 3-dimensional structures of cooperating parts, the current state of the art for representing and studying proteins uses a description that is simply a sequential list of the parts used in their assembly. This sequential-list style of description has biased the development of tools for protein analysis to accentuate the sequential properties of these molecules, and ignores the fact that the parts must work together in unison for the protein to function. This work will broadly impact the study of proteins, improving a range of activities from basic scientific studies of function, to endeavors in protein engineering. The products of this project will be made freely available to the research community as online tools, and the methods will be incorporated into coursework and made available as lesson-plan material appropriate for both primary and secondary education. <br/><br/>This project will adapt a recently-developed statistical technique, the Conditional Random Field (CRF), that can quantitatively represent densely-connected networks of features, and a recently-developed visualization tool that enables interactive exploration of these networks, for the task of describing proteins. Structurally, Conditional Random Fields appear to recapitulate the process by which evolution has selected for parts that cooperate in proteins, and protein descriptions based on CRFs will be able to predict whether a change to a protein - a mutation - would have been tolerated by evolution, or selected against as non-functional. This information will aid in predicting the effect of a mutation, or multiple mutations to a protein, using much more of the available information, than is currently utilized by state-of-the-art tools. The "change in protein sequence to change in protein function" problem is a "model organism" for many other types of biological and non-biological systems where rich interactions between parts of the system demand a sophisticated statistical approach. To-date, in most of these fields, models that are similarly limited to those currently used in proteins are the de-facto standard. Developing the tools necessary for applying CRFs to protein data, and methods of establishing testable ground-truth in this system, will enhance the application of CRFs to many other domains where they may provide a significant advantage over current methods. This tool may make interdependencies between features visually explorable and modifications of these dependencies quantifiably predictable, and may promote more thorough consideration of the true complexity of data and systems in many domains. The products of this project will be made freely available to the research community as online tools. As the teachable component matures, products will be made available as lesson-plan material appropriate for both primary and secondary education.

  • Program Officer
    Jennifer Weller
  • Min Amd Letter Date
    5/19/2014 - 10 years ago
  • Max Amd Letter Date
    5/19/2014 - 10 years ago
  • ARRA Amount

Institutions

  • Name
    The Research Institute at Nationwide Children's Hospital
  • City
    Columbus
  • State
    OH
  • Country
    United States
  • Address
    700 Childrens Drive
  • Postal Code
    432052664

Investigators

  • First Name
    William
  • Last Name
    Ray
  • Email Address
    Ray.29@osu.edu
  • Start Date
    5/19/2014 12:00:00 AM

Program Element

  • Text
    ADVANCES IN BIO INFORMATICS
  • Code
    1165

Program Reference

  • Text
    GRADUATE INVOLVEMENT
  • Code
    9179