Claims
- 1. A method of determining a functional site in a protein comprising the steps of:
obtaining a protein sequence; aligning the protein sequence to homologous protein sequences to generate a multiple sequence alignment; adding gap tolerance, wherein a gap in the protein sequence alignment is considered as an artificial amino acid; producing an evolutionary trace, wherein the evolutionary trace identifies residues that are trace residues; and determining cluster formation of the trace residues, wherein a cluster indicates the functional site of the protein.
- 2. The method of claim 1, wherein the determining step comprises mapping the class-specific residues onto a representative protein structure and examining for cluster formation.
- 3. The method of claim 1, wherein the obtaining step comprises obtaining the protein sequence and homologous protein sequences.
- 4. The method of claim 1 further comprising determining clustering statistics.
- 5. The method of claim 4, wherein the clustering statistics comprises the overall number of clusters.
- 6. The method of claim 4, wherein the clustering statistics comprises the size of the largest cluster.
- 7. The method of claim 4, wherein the clustering statistics comprises the overall number of clusters and the size of the largest cluster.
- 8. A method of determining a functional site in a protein comprising the steps of:
obtaining a protein sequence; aligning the protein sequence to homologous protein sequences to generate a multiple sequence alignment; adding gap tolerance, wherein a gap in the protein sequence alignment is considered as an artificial amino acid; producing an evolutionary trace, wherein the trace identifies residues that are trace residues; determining cluster formation of the residues; performing clustering statistics, wherein the statistics indicates the functional site of the protein; and mapping the trace residues.
- 9. The method of claim 8, wherein the clustering statistics comprises the overall number of clusters.
- 10. The method of claim 8, wherein the clustering statistics comprises the size of the largest cluster.
- 11. The method of claim 8, wherein the clustering statistics comprises the overall number of clusters and the size of the largest cluster.
- 12. A protein database comprising proteins having predicted functional sites.
- 13. The database of claim 12, wherein the database is produced using evolutionary trace analysis having gap tolerance.
- 14. The database of claim 12, wherein the database is produced using evolutionary trace analysis and clustering statistics.
- 15. The database of claim 12, wherein the database is produced using evolutionary trace analysis having gap tolerance and clustering statistics.
- 16. A method of producing a protein database having predicted functional sites comprising the steps of:
obtaining a protein sequence; aligning the protein sequence to homologous protein sequences to generate a multiple sequence alignment; adding gap tolerance, wherein a gap in the protein sequence alignment is considered as an artificial amino acid; producing an evolutionary trace, wherein the trace identifies residues that are trace residues; and determining cluster formation of the residues, wherein a cluster indicates the functional site of the protein.
- 17. The method of claim 16 further comprising determining clustering statistics.
- 18. The method of claim 17, wherein the clustering statistics comprises the overall number of clusters.
- 19. The method of claim 17, wherein the clustering statistics comprises the size of the largest cluster.
- 20. The method of claim 17, wherein the clustering statistics comprises the overall number of clusters and the size of the largest cluster.
- 21. A peptide database comprising peptides which are the binding sites of proteins.
- 22. The database of claim 21, wherein the database is produced using evolutionary trace analysis.
- 23. The database of claim 21, wherein the database is produced using evolutionary trace analysis having gap tolerance.
- 24. The database of claim 21, wherein the database is produced using evolutionary trace analysis having gap tolerance and clustering statistics.
- 25. The database of claim 21, wherein the database is produced using evolutionary trace analysis and clustering statistics.
- 26. The database of claim 21, wherein the binding sites are the ligand binding pockets of G-protein coupled receptors.
- 27. A method of producing a peptide database having peptides that are the binding sites of proteins comprising the steps of:
obtaining a peptide sequence; aligning the peptide sequence to homologous peptide sequences to generate a multiple sequence alignment; adding gap tolerance, wherein a gap in a peptide sequence alignment is considered as an artificial amino acid; producing an evolutionary trace, wherein the trace identifies residues that are trace residues; and determining cluster formation of the residues, wherein a cluster indicates the binding site.
- 28. A method of aligning remote protein homologs comprising the steps of:
obtaining protein sequences of at least two proteins with no sequence homology; producing a separate evolutionary trace sequence of each protein, wherein the evolutionary trace sequence identifies residues that are trace residues; assigning evolutionary rank to trace residues from each trace; assigning an order based on the evolutionary rank; determining a correlation between any two trace residues, wherein a correlation of greater than zero indicates that the trace residues have evolutionary ranks that are dependent on each other and a correlation of zero indicates that the trace residues have evolutionary ranks that are independent of each other; aligning the traces from the protein sequence, wherein aligning is performed to maximize the evolutionary rank order correlation from each trace; and determining a correlation between the two proteins with no sequence homology.
- 29. A method of determining a ligand binding pocket in a protein comprising the steps of:
determining global functional determinates of a family of proteins using quantitative evolutionary trace analysis, wherein determinates are residues that are involved in the global function of the protein; obtaining protein sequences of a subfamily of proteins within the family having a common function; aligning the protein sequences of the subfamily of proteins to generate a multiple sequence alignment; producing an evolutionary trace, wherein the trace identifies residue that are trace residues; and comparing the trace of the family to the trace of the subfamily, wherein a difference in the comparison yields the ligand binding pockets of the protein.
- 30. The method of claim 29, wherein the family of proteins is G-protein coupled receptors.
- 31. The method of claim 29, wherein the subfamily is Class A G-protein coupled receptors.
- 32. The method of claim 31, wherein the Class A G-protein coupled receptors are selected from the group consisting of opsins, adrenergic receptors, chemokine-related receptors and olfactory receptors.
- 33. The method of claim 29, wherein the subfamily is Class B G-protein coupled receptors.
- 34. The method of claim 33, wherein the Class B G-protein coupled receptors are secretin-related receptors.
- 35. A method of designing pharmaceuticals that target a protein comprising the steps of:
obtaining a protein sequence; aligning the protein sequence to homologous protein sequences to generate a multiple sequence alignment; predicting at least one residue in the protein sequence which is involved in the protein's function, wherein predicting the residues involves using quantitative evolutionary trace analysis; and synthesizing the pharmaceutical to interact with the predicted residue in the protein.
- 36. The method of claim 35, wherein the pharmaceutical is a protein, a peptide or small molecule.
- 37. The method of claim 35 further comprising mutating at least one predicted residue prior to synthesizing the pharmaceutical.
- 38. The method of claim 37 wherein mutating produces an antagonist protein pharmaceutical.
- 39. The method of claim 37 wherein mutating produces an agonist protein pharmaceutical.
- 40. A method of designing pharmaceuticals that target a protein comprising the steps of:
obtaining a protein sequence; aligning the protein sequence to homologous protein sequences to generate a multiple sequence alignment; predicting at least one residue in the protein sequence which is involved in the protein's function, wherein predicting the residues involves using quantitative evolutionary trace analysis; mutating at least one predicted residue in the protein sequence, wherein mutating modulates the protein's function; and synthesizing the pharmaceutical to interact with the predicted residue in the protein.
- 41. The method of claim 40, wherein the pharmaceutical is a protein, a peptide or small molecule.
- 42. The method of claim 40, wherein modulates the protein's function is an enhancement of the binding of the protein to the target.
- 43. The method of claim 40, wherein modulates the protein's function is an interference with the binding of the protein to the target.
- 44. A method of determining the significance of a single nucleotide polymorphism in a protein, wherein the single nucleotide polymorphism occurs in a predicted trace residue comprising the steps of:
performing a quantitative evolutionary trace analysis on a protein; performing a quantitative evolutionary trace analysis on a protein suspected of containing a single nucleotide polymorphism; comparing the analysis on the protein to the analysis of the protein suspected of containing a single nucleotide polymorphism; and assessing whether the single nucleotide polymorphism effects a residue that is predicted to be a functional site of the protein.
- 45. A method of screening compounds comprising the steps of:
obtaining a protein having predicted functional sites, wherein the functional sites are predicted using quantitative evolutionary trace analysis; contacting the protein with a candidate substance; determining whether the candidate substance interacts with the protein, wherein interaction with the protein indicates that the candidate substance is a ligand.
- 46. A method of determining a functional site in a nucleic acid sequence comprising the steps of:
obtaining a nucleic acid sequence; aligning the sequence to homologous sequences to generate a multiple sequence alignment; adding gap tolerance, wherein a gap in a sequence alignment is considered as an artificial nucleic acid; producing an evolutionary trace, wherein the evolutionary trace identifies nucleic acids that are trace nucleic acids; and determining cluster formation of the trace nucleic acids, wherein a cluster indicates the functional site of the nucleic acid sequence.
- 47. The method of claim 46, wherein determining comprises mapping the class-specific nucleic acid onto a representative structure and examining for cluster formation.
- 48. The method of claim 46, wherein obtaining comprises obtaining the nucleic acid sequence and homolog nucleic acid sequences.
- 49. The method of claim 39, wherein the nucleic acid sequence is DNA.
- 50. The method of claim 39, wherein the nucleic acid sequence is RNA.
- 51. The method of claim 46 further comprising determining clustering statistics.
- 52. The method of claim 51, wherein the clustering statistics comprises the overall number of clusters.
- 53. The method of claim 51, wherein the clustering statistics comprises the size of the largest cluster.
- 54. The method of claim 51, wherein the clustering statistics comprises the overall number of clusters and the size of the largest cluster.
- 55. A method of designing proteins with desired protein properties comprising the steps of:
obtaining a protein sequence; aligning the protein sequence to homologous protein sequences to generate a multiple sequence alignment; predicting at least one residue in the protein sequence which is involved in the protein's functions, wherein predicting the residues involves using quantitative evolutionary trace analysis; synthesizing libraries of protein variants wherein residues at and/or around the predicted functional site are substituted with alternative amino acids; and screening the resulting libraries for mutant proteins with the desired protein properties.
- 56. The method of claim 55, wherein the desired protein properties are selected from the group consisting of enhanced binding affinity, decreased immunogenicity, increased stability, increased solubility, decreased aggregation, and decreased crystallization.
- 57. The method of claim 55, wherein the desired protein properties are selected from the group consisting of decreased binding affinity, increased immunogenicity, decreased stability, decreased solubility, increased aggregation, and increased crystallization.
- 58. A designing a protein having altered protein properties comprising the steps of:
obtaining a protein sequence; aligning the protein sequence to homologous protein sequences to generate a multiple sequence alignment; predicting at least one residue in the protein sequence which is not related to the protein's function, wherein predicting the residues involves using quantitative evolutionary trace analysis; synthesizing libraries of protein variants wherein at least one of the predicted residues are mutated to result in an altered protein property; and screening the resulting libraries for mutant proteins with the desired altered protein properties.
Parent Case Info
[0001] This application claims priority to U.S. Provisional Application serial No. 60/333,796 filed on Nov. 28, 2001.
Government Interests
[0002] This invention was made with government support under NSF Grant No. DBI-01 14796 awarded by the National Science Foundation. The United States Government may have certain rights in the invention.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60333796 |
Nov 2001 |
US |