Claims
- 1. A method of representing a polymer sequence, the method comprising:
obtaining a position vector descriptor (PVD) for one or more positions in the polymer; and replacing the monomer(s) with the corresponding PVD(s) in the representation of the polymer.
- 2. The method of claim 1, wherein obtaining a PVD comprises:
calculating functional descriptors (FDPs) for each position in the polymer, wherein the FDPs are calculated with respect to a specific pre-selected monomer, P; and combining the calculated FDPs into a single vector having m elements, where m is equal to the number of different types of monomers in the polymer.
- 3. The method of claim 2, wherein the FDPs are calculated using the formula:
FDP=I*D*F, if the associated monomer is at a position other than P; and FDP=I*F, if the associated monomer is at position P, wherein I is an impulse function, D is a distance function, and F is either a function describing a physical parameter of each monomer in the polymer or F=1.
- 4. The method of claim 1, wherein the PVD(s) is/are simplified to include only a subset of elements.
- 5. The method of claim 4, wherein the PVD(s) is/are simplified to include only a single element, the context leading monomer (CLM).
- 6. The method of claim 1, wherein the polymer is a protein.
- 7. A method of predicting the effects of a change in the sequence of a protein, the method comprising:
obtaining a mathematical relationship that predicts the effects of a change in the sequence of a protein, wherein the input variable for the mathematical relationship is the difference between the value of a PVD element corresponding to the changed monomer and the value of a PVD element corresponding to the original monomer, and wherein the two PVD elements are from the same PVD and the PVD represents the position at which the change is located in the protein; obtaining a PVD representing a position of interest in the protein; and using (i) the difference between elements of the PVD representing the position of interest in the protein and (ii) the mathematical relationship to calculate the predicted effects of a change in sequence of the protein.
- 8. The method of claim 7, wherein the effect being predicted is protein stability.
- 9. A method of predicting secondary structure boundaries in a protein sequence, the method comprising:
obtaining PVDs for some or all amino acid position in the protein sequence; constructing a leading monomer distribution map (LMDM) for the protein; and dividing the LMDM into segments representing predicted units of secondary structure.
- 10. The method of claim 16, wherein a fixed number of context centers on the LMDM define each segment of secondary structure.
- 11. A method for identifying structural homologs of a protein, the method comprising:
obtaining PVDs for some or all amino acid positions in the protein sequence; determining the effective primary sequence of the protein; and searching a protein database for sequences homologous to the effective primary sequence of the protein.
- 12. The method of claim 11, wherein the sequences present in the protein database are effective primary sequences.
- 13. A method of identifying positions of contextual similarity in a pair of polymers, the method comprising:
a) obtaining a first set of PVDs describing one or more positions in the first polymer and a second set of PVDs describing one or more positions in the second polymer; b) calculating a difference matrix for the first set of PVDs with respect to the second set of PVDs; c) identifying the elements in the resulting difference matrix that are within a pre-selected range; and d) optionally, graphing the identified elements.
- 14. A method of identifying positions of contextual similarity in a polymer, the method comprising:
a) obtaining a set of PVDs describing one or more positions in the polymer, wherein the set of PVDs has been simplified to include a reduced number of elements, X; b) performing pair-wise comparisons of each PVD (CLXPVD) from the set of PVDs, wherein two PVDs that have a threshold number, t, of CLMs in common are identified as representing monomer positions that are contextually similar; and, c) optionally, generating a matrix (E-MAAP™) representing the results of step (b).
- 15. The method of claim 14, further comprising the steps:
d) repeating steps (a), (b), and (c) using PVDs constructed for multiple impulse function widths, W; and e) summing the matrices resulting from step (d) to produce a global matrix (E-MAAP™).
- 16. A method of identifying proteins that have similar structural folds, the method comprising:
obtaining a first scaled E-MAAP™, wherein the E-MAAP™ is scaled using amino acid cohesion energies; obtaing a second scaled E-MAAP™, wherein the E-MAAP™ is scaled using amino acid cohesion energies, and wherein the polymer sequence of the second scaled E-MAAP™ is different from the polymer sequence of the first scaled E-MAAP™; and determining the similarity of the second scaled E-MAAP™ with respect to the first scaled E-MAAP™.
- 17. The method of claim 16, comprising:
repeating the method with the same first scaled E-MAAP™ but different second scaled E-MAAP™s from the database, and optionally, ranking the E-MAAP™s of the database with respect to their similarity to the first scaled E-MAAP™.
- 18. A method of estimating the folding rate of a protein, the method comprising:
obtaining a scaled E-MAAPTM, wherein the E-MAAP™ is scaled using the Richardson hydrophobicity scale; making a three-dimensional representation of the scaled E-MAAP™; integrating the positive volume of the three-dimensional representation; and using the value resulting from the integration to estimate the folding rate of the protein.
- 19. A method of identifying positions of contextual similarity in a pair of polymers, the method comprising:
a) obtaining a first set of PVDs describing one or more positions in the first polymer and a second set of PVDs describing one or more positions in the second polymer, wherein the PVDs of the first and second set of PVDs have been simplified to include a limited number of elements, X; b) performing pairwise comparisons of each PVD (CLXPVD) from the first set of PVDs with each PVD (CLXPVD) from the second set of PVDs, wherein two PVDs that have a threshold number, t, of CLMs in common are identified as representing monomer positions that are contextually similar; and, c) optionally, generating a matrix (E-MAAP™) representing the results of step (b).
- 20. The method of claim 19, further comprising the steps:
d) repeating steps (a), (b), and (c) using PVDs constructed for multiple impulse function widths, W; and e) summing the matrices resulting from step (d) to produce a global matrix (E-MAAP™).
- 21. A method of predicting an interaction between two polymers, the method comprising:
scaling the values of the matrix produced by the method of claim 20 using amino acid cohesion energies; and identifying positive peaks in the values of the matrix.
- 22. A method of representing a polymer sequence, the method comprising:
obtaining a PVD representing a position in the polymer sequence; and using the elements of the PVD to construct a Context Functional Surface (CFS) for one or more positions in the polymer sequence.
- 23. The method of claim 22, wherein the set of CFSs corresponding to some or all of the monomer positions in the polymer are combined to generate a CFS having an additional dimension.
- 24. A method of characterizing secondary structure segments in a protein, the method comprising:
a) obtaining a PVD representing a particular monomer position, R, in the protein; b) using the PVD of step a) to generate a CFS for some or all monomer positions in the polymer; c) plotting the positive values of the CFSs of step b) on a single graph to produce a G-profile; and d) analyzing the G-profile.
- 25. A method of characterizing the contextual similarity of different positions in a polymer, the method comprising:
a) obtaining a PVD representing a particular monomer position, R, in the polymer; b) using the PVD to generate a set of CFSs for some or all positions in the polymer; c) calculating an correlation matrix, rR, for the set of CFSs generated in step b); d) repeating steps a) through c) for some or all positions, R, in the polymer; and e) using the correlation matrices of step d) to generate a GCD for the polymer.
- 26. A method of identifying contextually unique positions in a polymer, the method comprising:
obtaining a GCD for the polymer; and identifying elements in the GCD that are greater than or equal to a predetermined threshold value; and identifying correlated islands in the set of GCD elements identified as exceeding the threshold value.
- 27. A method of predicting the effects of mutations on the structure of a protein, the method comprising:
a) obtaining a GCD for the protein; b) identifying a position P in the GCD; c) identifying a position R in the GCD; d) plotting the row vector of the GCD at position P and the column vector of the GCD at position R on the same graph; and e) identifying peaks in the graph, thereby identifying positions in the protein that are predicted to disrupt the structural stability of the protein when mutated.
- 28. The method of identifying positions in a nucleic acid sequence, the method comprising:
a) obtaining a GCD for a protein encoded by the nucleic acid sequence; b) identifying a position P in the GCD; c) identifying a position R in the GCD; d) plotting the row vector of the GCD at position P and the column vector of the GCD at position R on the same graph; and e) identifying positions in the graph corresponding to positions in the protein that are predicted to influence the structural stability of the protein; and f) identifying regions of the nucleic acid sequence that encode the amino acids identified in step e), thereby identifying positions in the nucleic acid sequence that are likely to contain SNPs.
RELATED APPLICATIONS
[0001] This application claims priority to U.S. S No. 60/299,911, filed on Jun. 21, 2001, the content of which are incorporated herein in their entirety by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60299911 |
Jun 2001 |
US |