Claims
- 1. A computer implemented method to generate a model representation of a class of proteins specified by the length, three dimensional orientation and connectivity of secondary structures, comprising the steps of:
generating a set of three dimensional protein model configurations each composed of an arrangement of a connected sequence of predetermined secondary structural elements selected from the group of alpha-helices and beta-sheet elements, with predetermined lengths and orders of these elements along a protein chain; computing the free energy of each residue sequence for each model configuration, wherein the energy functions are selected from (i) the interaction energies between residues nearby in the configuration but not adjacent along the chain, and (ii) the hydrophobic salvation energy; considering the set of all possible residue sequences for a given model configuration, and determining the configuration with the lowest free energy among the set of protein model configurations for each residue sequence, wherein a residue sequence consists of residues selected from the amino-acid residues found in natural proteins; and determining the protein model configurations which satisfy lowest energy for a large number of sequences to produce a list of model configurations that satisfy predetermined criteria; and screening the model configurations that satisfy said criteria against a database of known existing protein configurations to select those configurations not known in nature as a class model representation.
- 2. The method of claim 1, wherein said predetermined length of each secondary structure is 10 to 50 amino acids.
- 3. The method of claim 2, wherein the length of alpha-helices is about 15 amino acids.
- 4. The method of claim 1, wherein said set of three-dimensional protein configurations excludes self-intersecting configurations.
- 5. The method of claim 1, wherein said set of three-dimensional protein configurations excludes open configurations.
- 6. The method of claim 1, wherein said set of three-dimensional protein configurations includes stacks of secondary structural elements.
- 7. The method of claim 6, wherein said secondary structural elements are selected from the group consisting of alpha-helices and beta sheets.
- 8. The method of claim 7, wherein said secondary structural element comprises alpha-carbon atom positions and amino acid side-chain centroids.
- 9. The method of claim 8, wherein said alpha carbon atom positions and amino acid side-chain centroids are determined by backbone dihedral angles.
- 10. The method of claim 9, wherein said backbone dihedral angles are about {φ, ψ}={−60, −50}.
- 11. The method of claim 1, wherein each configuration is determined by a conjugate gradiant method wherein Epacking=E1+E2+E3+E4, and wherein E1=Σi si, si is the surface exposure of the ith amino acid along the chain; E2=+V0 Σij [(2rCA/rAi,j)12+(2rCB/rBi,j)12+((rCA+rCB)/rABi,j)12], rCA and rCB are sphere sizes for the backbone alpha-carbon atoms and centroids respectively, rAi,j is the distance between backbone alpha-carbon atoms i and j, rBi,j is the distance between centroids i and j, and rABi,j is the distance between backbone alpha-carbon atom i and centroid j; V0 is the scale of the repulsive energy; E3=0.5 K rg2, wherein rg is the radius of gyration of the entire stack; and E4=Σi 0.5 Ks (di,j−d0i,j)2, v and E4=0 if di,j<d0i,j wherein di,j is the distance between the connected ends of elements i and j, and d0i,j is a specified equilibrium length.
- 12. The method of claim 6, wherein additional stacks are generated by a symmetry operation.
- 13. The method of claim 12, wherein said symmetry operation generates additional stacks wherein each stack is based on a set of selected starting coordinates.
- 14. The method of claim 13, wherein said selected starting coordinates comprise center of mass coordinates and Euler angles for secondary structural elements.
- 15. The method of claim 11, wherein said symmetry operation comprises at least one screw operation.
- 16. The method of claim 6, wherein generation of stacks is stopped when a specified fraction of generated stacks lies within a specified coordinate root mean square (crms) of at least one stack already in the set.
- 17. The method of claim 16, wherein the coordinate root mean square crms2=1/N Σi [ri(s)−ri(s1)]2, wherein ri(s)/(s1) is the position of the ith alpha-carbon for the (s)/(s1) stack and N is the number of backbone alpha-carbons.
- 18. The method of claim 12, wherein said additional stacks are clustered.
- 19. The method of claim 18, wherein said clustering is performed by sorting said stacks from a most compact stack to a least compact stack.
- 20. The method of claim 19, wherein all stacks closer than 1.5 Angstroms crms to the most compact stack are eliminated and said clustering is continued iteratively until all stacks have been considered.
- 21. The method of claim 1, wherein model configurations which satisfy the lowest energy for a large number of sequences within a cluster are summed and the total is the designability of the cluster.
- 22. The method of claim 21 where the energy function is Edesignability=−Σi hi si, wherein hi is the hydrophobicity of the ith element of the sequence and si is the surface exposure of the ith amino-acid in the particular stack.
- 23. A protein comprising three helices coiled with a right-handed twist atop a fourth helix, all having backbone dihedral angles of about {φ, ψ}={−60, −50}.
- 24. A protein comprising four helices, with a cloverleaf connectivity and helices 1 and 3 atop helices 2 and 4, all having backbone dihedral angles of about {φ, ψ}={−60, −50}.
- 25. A computer system for providing a database of protein structures useful for chemical and pharmaceutical laboratory research comprising:
a memory for storing (i) a database of naturally occurring protein structures, and (ii) a set of generated protein structures satisfying predetermined designability criteria; a thermally novel protein structure identification procedure including machine executable instructions for (a) generating a class of proteins based upon predetermined designability criteria; (b) comparing the structural characteristics of each member of said class with said database to identify similar generated protein structures; and (c) storing the identified protein structures which satisfy predetermined satisfiability criteria.
- 26. Computer software resident on a computer readable medium including a library of protein designs specified by composition, length, and three dimensional orientation of the chains of amino acid sequences, the software comprising instructions for:
processing a three dimensional configuration of receptor and ligand molecules that is of laboratory testing interest; determining a class of designable proteins satisfying the three dimensional configuration criteria; computing the potential energy of each model configuration; for a given model configuration, considering the set of all possible residue sequences and determining the lowest-energy configuration among the set of protein model configurations, wherein a sequence consists of residues selected from the amino-acid residues found in natural proteins; and wherein the energy functions are selected from the interaction energies between residues nearby in the configuration but not adjacent along the polymer, and the hydrophobic solvation energy; and determining the protein model configurations which satisfy lowest energy for large number of sequences.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application is a continuation-in-part of U.S. patent application Ser. No. 09/730,214 filed Dec. 5, 2000 and claims benefit of U.S. Provisional Application No. 60/371,947 filed Apr. 11, 2002.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60371947 |
Apr 2002 |
US |
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
09730214 |
Dec 2000 |
US |
Child |
10411839 |
Apr 2003 |
US |