Claims
- 1. A computer-based method for identifying a region in a query sequence corresponding to a non-canonical protein conformation, the method comprising the steps of:
forming a composite pattern descriptor for sequence patterns corresponding to instances of non-canonical protein conformations; and using the composite pattern descriptor to identify positions of a query sequence corresponding to the sequence patterns.
- 2. The method of claim 1, wherein the sequence patterns comprise amino acid sequence patterns.
- 3. The method of claim 1, wherein the sequence patterns comprise nucleotide sequence patterns.
- 4. The method of claim 1, wherein the forming step further comprises the steps of:
constructing a set of sequence patterns from a set of sequence fragments, the sequence fragments corresponding to instances of non-canonical conformations; and selecting sequence patterns for a plurality of non-canonical conformations.
- 5. The method of claim 4, wherein the sequence fragments comprise amino acid sequence fragments.
- 6. The method of claim 4, wherein the sequence fragments comprise nucleotide sequence fragments.
- 7. The method of claim 5, wherein the step of selecting amino acid sequence patterns comprises selecting those patterns wherein a residue at a fixed position coincides with a signature amino acid residue.
- 8. The method of claim 7, wherein the non-canonical conformations are at least one of π-like helices, 310-like helices, and proline-induced kinks.
- 9. The method of claim 8, wherein a number of selected sequence patterns is controlled by a probability value.
- 10. The method of claim 4, wherein the set of sequence fragments comprises patterns that characterize a respective non-canonical conformation.
- 11. The method of claim 2, further comprising sub-selecting amino acid sequence patterns, wherein the amino acid sequence patterns each comprise about seven to about nine amino acid residues.
- 12. The method of claim 1, wherein searching is conducted through use of a search engine comprising the composite pattern descriptor.
- 13. The method of claim 5, wherein the set of amino acid sequence patterns comprises classes: {A,G}, {D,E}, {K,R}, {I,L,M,V}, {S,T}, {Q,N} and {F,Y}, and wherein amino acids within each class are permitted to replace each other.
- 14. The method of claim 5, wherein the set of amino acid sequence patterns comprises wild card positions which are replaced by a regular expression of the type {X1X2 . . . XN}, wherein each Xi is an amino acid represented by the wild card, and N is a maximal number of amino acids.
- 15. The method of claim 14, wherein those patterns with a number of amino acids greater than or equal to N are discarded.
- 16. The method of claim 15, wherein N is seven.
- 17. The method of claim 1, wherein the positions of a query sequence corresponding to sequence patterns are assigned an amount.
- 18. The method of claim 17, wherein the positions are labeled with x1, x2 and x3 to denote the amounts assigned.
- 19. The method of claim 1, further comprising the step of:
using the identified positions of the query sequence to predict a protein structure.
- 20. The method of claim 19, wherein unit vector (u1, u2, u3)=(x1, x2, x3)/∥(x1, x2, x3)∥ is used to determine the membership of the position in a category.
- 21. An apparatus for identifying a region in a query sequence corresponding to a non-canonical protein conformation, the apparatus comprising:
a memory; and at least one processor, coupled to the memory, operative to: form a composite pattern descriptor for sequence patterns corresponding to instances of non-canonical protein conformations; and use the composite pattern descriptor to identify positions of a query sequence corresponding to the sequence patterns.
- 22. The apparatus of claim 21, wherein the at least one processor is further operative to:
construct a set of sequence patterns from a set of sequence fragments, the sequence fragments corresponding to instances of non-canonical conformations; and select sequence patterns for a plurality of non-canonical conformations.
- 23. The apparatus of claim 22, wherein the non-canonical conformations are at least one of π-like helices, 310-like helices, and proline-induced kinks.
- 24. An article of manufacture for identifying a region in a query sequence corresponding to a non-canonical protein conformation, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
forming a composite pattern descriptor for sequence patterns corresponding to instances of non-canonical protein conformations; and using the composite pattern descriptor to identify positions of a query sequence corresponding to the sequence patterns.
- 25. The article of manufacture of claim 24, further comprising the steps of:
constructing a set of sequence patterns from a set of sequence fragments, the sequence fragments corresponding to instances of non-canonical conformations; and selecting sequence patterns for a plurality of non-canonical conformations.
- 26. The article of manufacture of claim 25, wherein the non-canonical conformations are at least one of π-like helices, 310-like helices, and proline-induced kinks.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 60/356,439, filed Feb. 12, 2002.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60356439 |
Feb 2002 |
US |