MOLECULAR INTERACTION PREDICTORS

Information

  • Patent Application
  • 20070192037
  • Publication Number
    20070192037
  • Date Filed
    October 03, 2006
    18 years ago
  • Date Published
    August 16, 2007
    17 years ago
Abstract
Adaptive threading models for predicting an interaction between two or more molecules such as proteins are provided. The adaptive threading models have one or more learnable parameters that can be learned from all or some of the available data. The available data can include data relating to known interactions between the two or more molecules, the composition of the molecules and the geometry of the molecular complex.
Description

BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 schematically illustrates the 3-D structure of MHC A0201 bound to a peptide.



FIG. 2 is a block diagram of one example of a system that facilitates making a prediction relating to a molecular interaction.



FIG. 3 is a block diagram of another example of a system that facilitates making a prediction relating to a molecular interaction.



FIG. 4 is a block diagram of another example of a system that facilitates making a prediction relating to a molecular interaction.



FIG. 5 is a block diagram of another example of a system that facilitates making a prediction relating to a molecular interaction.



FIG. 6 is a block diagram of yet another example of a system that facilitates making a prediction relating to a molecular interaction.



FIG. 7 is a flowchart illustrating one example of a method to evaluate a molecular contact.



FIG. 8 is a block diagram of an exemplary system that facilitates determining the binding free energies of protein-protein complexes.



FIG. 9 shows ROC curves comparing the performance of a bilinear predictor having MHC-specific weights (Bil) to the standard threading approach employing two previously published pairwise potential matrices (Miy and Bet).



FIG. 10 shows ROC curves comparing the performance of a bilinear predictor having MHC-specific weights (Bil) to the standard threading approach employing two previously published pairwise potential matrices (Miy and Bet).



FIG. 11 shows ROC curves comparing the performance of a bilinear predictor having MHC-independent weights (Bil) to the standard threading approach employing two previously published pairwise potential matrices (Miy and Bet).



FIG. 12 shows ROC curves comparing the performance of a bilinear predictor having MHC-independent weights (Bil) to the standard threading approach employing two previously published pairwise potential matrices (Miy and Bet).



FIG. 13 shows ROC curves comparing the performance of a bilinear predictor having MHC-specific weights (Bil) trained on data from different MHC molecules to the standard threading approach employing two previously published pairwise potential matrices (Miy and Bet).



FIG. 14 shows ROC curves comparing the performance of a bilinear predictor having MHC-independent weights (Bil) trained on data from different MHC molecules to the standard threading approach employing two previously published pairwise potential matrices (Miy and Bet).



FIG. 15 shows ROC curves demonstrating the performance of an adaptive threading predictor trained on data from over 50 MHC molecules.



FIG. 16 shows ROC curves demonstrating the performance of an adaptive threading predictor trained on data from over 50 MHC molecules.



FIG. 17 shows ROC curves demonstrating the performance of an adaptive threading predictor trained on data from over 50 MHC molecules.



FIG. 18 shows ROC curves demonstrating the performance of an adaptive threading predictor trained on data from over 50 MHC molecules.



FIG. 19 shows ROC curves demonstrating the performance of an adaptive threading predictor trained on data from over 50 MHC molecules.



FIG. 20 shows ROC curves demonstrating the performance of an adaptive threading predictor trained on data from over 50 MHC molecules.



FIG. 21 shows ROC curves demonstrating the performance of an adaptive threading predictor trained on data from over 50 MHC molecules.



FIG. 22 is an overall ROC curve demonstrating the performance of an adaptive threading predictor trained on data from over 50 MHC molecules.



FIG. 23A is a graph showing HIV peptide-MHC A0201 binding energy trends as a function of viral load in individual's infected with HIV.



FIG. 23B is a graph showing the average binding energy of MHC A0201 to HIV peptides over the last 23 years.



FIG. 24 schematically illustrates an exemplary computing architecture.



FIG. 25 schematically illustrates an exemplary networking environment.


Claims
  • 1. A system for determining binding free energies of protein-protein complexes, the system comprising: machine learning means for machine estimating amino acid contact potentials and their corresponding weights from at least some available data; andmachine learning means for determining a binding free energy of one protein to another protein, the machine learning means for determining a binding free energy utilizing an optimized soft step function defining an amino acid distance criterion, the amino acid contact potentials and their corresponding weights to determine the binding free energy of the proteins.
  • 2. The system of claim 1, wherein one of the proteins is an MHC molecule and the other protein is a peptide from about 8 to about 11 amino acids in length.
  • 3. The system of claim 1, wherein at least one of the proteins is a synthetic protein.
  • 4. A system for determining a binding free energy between two proteins, the system comprising: a prediction component employing an adjusted threading model having one or more learnable parameters, information about the proteins' sequences and a geometry of a protein-protein complex to predict the binding free energy between the two proteins;the one or more learnable parameters machine learned at least in part utilizing data relating to known binding energies.
  • 5. The system of claim 4, further comprising an inference component to machine infer the geometry of the protein-protein complex.
  • 6. The system of claim 4, wherein two of the one or more learnable parameters comprise pairwise contact potentials and weights.
  • 7. The system of claim 6, wherein the adjusted threading model has a soft step function.
  • 8. The system of claim 7, wherein the soft step function is a sigmoid function.
  • 9. The system of claim 7, wherein the soft step function has at least one learnable parameter.
  • 10. The system of claim 9, wherein the at least one learnable parameter comprises a threshold distance and a smoothness of the soft step function.
  • 11. The system of claim 4, wherein one of the proteins is an MHC molecule and one of the proteins is a peptide of about 8-11 amino acids in length.
  • 12. The system of claim 11, wherein the MHC molecule is a synthetic molecule.
  • 13. Computer-executable instructions stored on computer-readable media, the computer-executable instructions encoding a method for evaluating a binding energy between two proteins, the method comprising: providing an optimized set of weighted contact potentials, the set of optimized weighted contact potentials optimized utilizing one or more machine learning algorithms;choosing a set of distances according to a structural template, the set of distances defining a minimum distance of contact between amino acids of the proteins; anddetermining a score that rates the contact between the two proteins, the score determined by evaluating sequence data according to the set of distances and the optimized set of weighted contact potentials.
  • 14. The computer-executable instructions of claim 13, wherein choosing a set of distances according to the structural template comprises inferring an identity of the structural template by making a Bayesian inference.
  • 15. The computer-executable instructions of claim 14, wherein at least one of the proteins is a mutated naturally occurring protein.
  • 16. The computer-executable instructions of claim 13, wherein the one or more machine learning algorithms comprise an iterative optimization.
  • 17. The computer-executable instructions of claim 14, wherein the iterative optimization is iterative least squares.
  • 18. The computer-executable instructions of claim 13, wherein evaluating sequence data according to the set of distances comprises utilizing an optimized soft step distance function.
  • 19. The computer-executable instructions of claim 14, wherein the optimized soft step distance function was optimized using gradient descent.
  • 20. The computer-executable instructions of claim 13, wherein one of the proteins is an MHC molecule and one of the proteins is an amino acid sequence from about 8 to about 11 amino acids in length.
Divisions (1)
Number Date Country
Parent 11356196 Feb 2006 US
Child 11538413 US