Claims
- 1. A computer-assisted method for assigning an amino acid probe sequence to a known three-dimensional protein structure, including the steps of:(a) receiving into a computer system a string describing the amino acid sequence of the probe sequence; (b) generating, from the amino acid string, at least one sequence-derived property for the amino acid sequence, defined as p1, p2 . . . pn, wherein each p represents the sequence-derived property of a corresponding amino acid of the probe sequence; (c) inputting into the computer system a string t1, t2 . . . tm of structural properties, wherein each t represents the structural property of a corresponding amino acid of the known three-dimensional protein, for each member of a library of known three-dimensional protein structures; (d) executing an alignment algorithm in the computer system to compute an alignment score indicating the optimal alignment of the string p1, p2 . . . pn to the string t1, t2 . . . tm by applying a combined compatibility function g(pi, tj); (e) determining the statistical significance of each alignment score to determine a best-fit alignment score; and (f) applying the best-fit alignment score to indicate or select the corresponding three-dimensional protein structure from the library for output to a user.
- 2. The computer-assisted method of claim 1, wherein each string t1, t2 . . . tm of structural properties is determined from a known one-dimensional and three-dimensional structure of a protein.
- 3. The computer-assisted method of claim 1, wherein the combined compatibility function g(pi, tj) is of the form:g(pi,tj)=f(aapi,tj)+w×h(pi,tj) where aapi is the amino acid type for pi, w is a weighting factor, the function f(aapi, tj) is a compatibility function that relates the amino acid sequence of the probe sequence to structural properties of known three-dimensional protein structures, and the function h(pi, tj) is an extended compatibility function that relates sequence-derived properties of the probe sequence to structural properties of known three-dimensional protein structures.
- 4. The computer-assisted method of claim 1, wherein the combined compatibility function g(pi, tj) is of the form: g(pi,tj)=w′×∑k-120mapi[k]f(k,tj)+w×h(pi,tj)where k is the amino acid type for pi, w and w′ are weighting factors, mapi[k] denotes a weight assigned to amino acid of type k in pi, the function f(k,tj) is a compatibility function that relates the amino acid sequence of the probe sequence to structural properties of known three-dimensional protein structures, and the function h(pi, tj) is an extended compatibility function that relates sequence-derived properties of the probe sequence to structural properties of known three-dimensional protein structures.
- 5. The computer-assisted method of claim 1, wherein the alignment algorithm includes a dynamic programming algorithm.
- 6. The computer-assisted method of claim 5, wherein the dynamic programming algorithm includes a global-local alignment algorithm.
- 7. The computer-assisted method of claim 1, wherein the step of determining the statistical significance of each alignment score to determine a best-fit alignment score includes the step of computing a z-score as the number of standard deviations above a mean score.
- 8. The computer-assisted method of claim 1, wherein the step of applying the best-fit alignment score to indicate or select the corresponding three-dimensional protein structure from the library for output to a user includes the step of displaying the three-dimensional protein structure on a visual display device.
- 9. A computer-assisted method for predicting the three-dimensional structure of an amino acid probe sequence using a programmed computer comprising a processor, a data storage system, at least one input device, and at least one output device, including the steps of:(a) generating input data for the programmed computer, including the steps of: (1) determining a string p1, p2 . . . pn describing the amino acid sequence of the probe sequence and at least one sequence-derived property for the probe sequence, wherein each p represents the sequence-derived property of a corresponding amino acid of the probe sequence; (2) determining a string t1, t2 . . . tm of structural properties, wherein each t represents the structural property of a corresponding amino acid of the known three-dimensional protein, for each member of a library of known three-dimensional protein structures; (b) inputting the generated input data into the programmed computer through one of the input devices; (c) applying, by means of the processor, an alignment algorithm to compute an alignment score indicating the optimal alignment of the string p1, p2 . . . pn to the string t1, t2 . . . tm by applying a combined compatibility function g(pi, tj); (d) determining the statistical significance of each alignment score to determine a best-fit alignment score; (e) applying, using computer methods, the best-fit alignment score to identify the corresponding three-dimensional protein structure from the library; and (f) outputting to at least one output device the identity of such three-dimensional protein structure.
- 10. The computer-assisted method of claim 9, wherein the step of determining each string t1, t2 . . . tm of structural properties includes the step of determining such structural properties from a known one-dimensional and three-dimensional structure of a protein.
- 11. The computer-assisted method of claim 9, wherein the combined compatibility function g(pi, tj) is of the form:g(pi,tj)=f(aapi,tj)+w×h(pi,tj) where aapi is the amino acid type for pi, w is a weighting factor, the function f(aapi, tj) is a compatibility function that relates the amino acid sequence of the probe sequence to structural properties of known three-dimensional protein structures, and the function h(pi, tj) is an extended compatibility function that relates sequence-derived properties of the probe sequence to structural properties of known three-dimensional protein structures.
- 12. The computer-assisted method of claim 9, wherein the combined compatibility function g(pi, tj) is of the form: g(pi,tj)=w′×∑k-120mapi[k]f(k,tj)+w×h(pi,tj)where k is the amino acid type for pi, w and w′ are weighting factors, mapi[k] denotes a weight assigned to amino acid of type k in pi, the function f(k,tj) is a compatibility function that relates the amino acid sequence of the probe sequence to structural properties of known three-dimensional protein structures, and the function h(pi, tj) is an extended compatibility function that relates sequence-derived properties of the probe sequence to structural properties of known three-dimensional protein structures.
- 13. The computer-assisted method of claim 9, wherein the alignment algorithm includes a dynamic programming algorithm.
- 14. The computer-assisted method of claim 13, wherein the dynamic programming algorithm includes a global-local alignment algorithm.
- 15. The computer-assisted method of claim 9, wherein the step of determining the statistical significance of each alignment score to determine a best-fit alignment score includes the step of computing a z-score as the number of standard deviations above a mean score.
- 16. The computer-assisted method of claim 9, wherein the step of outputting to at least one output device the identity of such three-dimensional protein structure includes the step of displaying the three-dimensional protein structure on a visual display device.
- 17. A computer program, residing on a computer-readable medium, for assigning an amino acid probe sequence to a known three-dimensional protein structure, the computer program comprising instructions for causing a computer to:(a) access a string describing the amino acid sequence of the probe sequence; (b) generate, from the amino acid string, at least one sequence-derived property for the probe sequence, defined as p1, p2 . . . pn, wherein each p represents the sequence-derived property of a corresponding amino acid of the probe sequence; (c) access a string t1, t2 . . . tm of structural properties, wherein each t represents the structural property of a corresponding amino acid of the known three-dimensional protein, for each member of a library of known three-dimensional protein structures; (d) execute an alignment algorithm to compute an alignment score indicating the optimal alignment of the string p1, p2 . . . pn to the string t1, t2 . . . tm by applying a combined compatibility function g(pi, tj); (e) determine the statistical significance of each alignment score to determine a best-fit alignment score; and (f) apply the best-fit alignment score to indicate or select the corresponding three-dimensional protein structure from the library for output to a user.
- 18. The computer program of claim 17, wherein each string t1, t2 . . . tm of structural properties is determined from a known one-dimensional and three-dimensional structure of a protein.
- 19. The computer program of claim 17, wherein the combined compatibility function g(pi, tj) is of the form:g(pi,tj)=f(aapi,tj)+w×h(pi,tj) where aapi is the amino acid type for pi, w is a weighting factor, the function f(aapi, tj) is a compatibility function that relates the amino acid sequence of the probe sequence to structural properties of known three-dimensional protein structures, and the function h(pi, tj) is an extended compatibility function that relates sequence-derived properties of the probe sequence to structural properties of known three-dimensional protein structures.
- 20. The computer program of claim 17, wherein the combined compatibility function g(pi, tj) is of the form: g(pi,tj)=w′×∑k-120mapi[k]f(k,tj)+w×h(pi,tj)where k is the amino acid type for pi, w and w′ are weighting factors, mapi[k] denotes a weight assigned to amino acid of type k in pi, the function f(k,tj) is a compatibility function that relates the amino acid sequence of the probe sequence to structural properties of known three-dimensional protein structures, and the function h(pi, tj) is an extended compatibility function that relates sequence-derived properties of the probe sequence to structural properties of known three-dimensional protein structures.
- 21. The computer program of claim 17, wherein the alignment algorithm includes a dynamic programming algorithm.
- 22. The computer program of claim 21, wherein the dynamic programming algorithm includes a global-local alignment algorithm.
- 23. The computer program of claim 17, wherein the instructions for causing a computer to determine the statistical significance of each alignment score to determine a best-fit alignment score includes instructions for causing the computer to compute a z-score as the number of standard deviations above a mean score.
- 24. The computer program of claim 17, wherein the instructions for causing a computer to apply the best-fit alignment score to indicate or select the corresponding three-dimensional protein structure from the library for output to a user includes instructions for causing the computer to display the three-dimensional protein structure on a visual display device.
- 25. The computer assisted method of claim 1, wherein the at least one sequence derived property is a secondary structure of the amino acid sequence.
- 26. The computer assisted method of claim 9, wherein the at least one sequence-derived property is a secondary structure of the probe sequence.
- 27. The computer program of claim 17, wherein the at least one sequence-derived property is a secondary structure of the probe sequence.
CROSS REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of U.S. Patent Application Serial No. 60/045,273, filed May 1, 1997.
US Referenced Citations (2)
Number |
Name |
Date |
Kind |
5436850 |
Eisenberg et al. |
Jul 1995 |
A |
5878373 |
Cohen et al. |
Mar 1999 |
A |
Non-Patent Literature Citations (2)
Entry |
Russell et al., “Protein fold recognition by mapping predicted secondary structures” J. Mol. Biol. vol. 259, pp. 349-365, 1996.* |
Kreisberg et al., “Paired natural cysteine mutation mapping: Aid to constraing models of protein structure” Preotein Science (1995), vol. 4, pp. 2405-2410. |
Provisional Applications (1)
|
Number |
Date |
Country |
|
60/045273 |
May 1997 |
US |