Claims
- 1. A method for identifying a protein, the method comprising:
cleaving the protein with a proteolytic agent to produce peptide fragments; providing an array comprising a solution set of binding reagents; contacting the peptide fragments with the array to promote specific interactions between the fragments and the array; detecting the binding pattern of the peptide fragments on the array; and comparing the binding pattern of the peptide fragments to a reference set.
- 2. The method of claim 1, wherein the reference set is contained in a database.
- 3. The method of claim 1, wherein the proteolytic agent is selected from the group consisting of Arg-C proteinase, Asp-N endopeptidase, BNPS-skatole, caspase 1, caspase 2, caspase 3, caspase 4, caspase 5, caspase 6, caspase 7, caspase 8, caspase 9, caspase 10, chymotrypsin, clostripain (clostridiopeptidase B), CNBr, factor Xa, formic acid, glutamyl endopeptidase, granzyme B, hydroxylamine (NH2OH), iodosobenzoic acid, lys-C proteinase, NTCB+Ni (2-nitro-5-thiocyanobenzoic acid), pepsin, proline-endopeptidase, proteinase K, staphylococcal peptidase I, thermolysin, thrombin, and trypsin.
- 4. The method of claim 1, wherein the proteolytic agent is trypsin.
- 5. The method of claim 1, wherein the peptide fragments are labeled with a tag.
- 6. The method of claim 5, wherein either the N-terminus or the C-terminus of the peptide fragments are labeled.
- 7. The method of claim 5, wherein the tag is fluorescent.
- 8. The method of claim 1, wherein the binding reagent is selected from the group consisting of antibodys, single chain fragments (ScFv), F(ab) fragments, and aptamers.
- 9. The method of claim 8, wherein the binding reagent is an antibody.
- 10. The method of claim 9, wherein the antibody is a monoclonal antibody.
- 11. The method of claim 1, wherein the array comprises the binding reagents in a spatially-addressable form.
- 12. The method of claim 5, wherein the binding reagent interacts with the tag and at least one amino acid of the peptide fragment adjacent to the tag.
- 13. The method of claim 1, wherein the reference set is provided in computer readable form.
- 14. The method of claim 1, wherein the reference set is provided in printed form.
- 15. A method for forming a solution set of at least one epitope, the solution set to identify at least two proteins, the method comprising:
forming at least one protein group by associating each of the at least two proteins based on whether the proteins are undistinguished by the solution set, and, updating the solution set with a maximum epitope that divides a maximum number of protein groups.
- 16. A method according to claim 15, wherein associating includes associating such that associated proteins are undistinguished by the solution set and unassociated proteins are distinguished by the solution set.
- 17. A method according to claim 15, wherein the solution set is initialized to the empty set.
- 18. A method according to claim 15, wherein the solution set is initialized to the empty set, and forming at least one association includes associating all of the at least two proteins based on the initialized solution set.
- 19. A method according to claim 15, further including iteratively performing the forming and the updating until each of the at least two proteins is unassociated with any of the other at least two proteins.
- 20. A method according to claim 15, wherein updating includes updating the at least one protein group based on whether the associated proteins are undistinguished by the updated solution set.
- 21. A method according to claim 15, wherein associating includes associating such that, for each associated protein in a selected one of the at least one protein group, and for a selected one of the at least one epitope in the solution set, the selected protein group proteins either include the selected epitope, or the selected protein group proteins do not include the selected epitope.
- 22. A method according to claim 15, wherein updating the solution set includes
determining that at least two epitopes divide a maximum number of the at least one protein group, and, selecting one of the at least two epitopes that divides a maximum number of the at least one protein group.
- 23. A method according to claim 15, wherein updating the solution set includes,
determining that at least two epitopes divide a maximum number of the at least one protein group, and, randomly selecting one of the at least two epitopes that divides a maximum number of the at least one protein group.
- 24. A method according to claim 15, wherein the at least two proteins include a protein catalog.
- 25. A method according to claim 15, wherein the at least one epitope is based on cleaving the at least two proteins with at least one proteolytic agent.
- 26. A method according to claim 15, wherein associating includes determining that at least one of the at least two proteins is unassociated with another of the at least two proteins.
- 27. A method according to claim 15, wherein associating includes assigning a label to the at least one protein group.
- 28. A method according to claim 15, wherein updating the solution set includes assigning to at least one of the at least one protein groups, a group score based on whether a selected epitope can distinguish associated proteins in the at least one protein group.
- 29. A method according to claim 15, wherein updating the solution set includes assigning to at least one of the at least one protein group, a group score based on whether a selected epitope is included in at least one, but not all of the protein group proteins.
- 30. A method according to claim 15, further including providing a database that associates the epitopes of the at least two proteins based on at least one proteolytic agent.
- 31. A method according to claim 30, wherein the database includes a label for associated proteins.
- 32. A method according to claim 15, wherein associated proteins are associated using at least one of at least one database, at least one linked list, at least one queue, at least one hash table, and at least one tree.
- 33. A method according to claim 15, including association means for associating the associated proteins.
- 35. A method according to claim 15, wherein updating includes determining the maximum epitope by computing a composite group score for at least one epitope that is not an element of the solution set.
- 36. A method according to claim 15, wherein updating includes,
computing at least one group score for at least one epitope, the at least one group score corresponding to at least one of the at least one protein group, and generating a composite group for the at least one epitope score based on the at least one group score.
- 37. A method according to claim 36, wherein updating includes selecting as the maximum epitope, the at least one epitope with the maximum composite group score.
- 38. A method according to claim 15, wherein updating includes computing a group score based on the number of occurrences of an epitope in at least one selected of the at least one protein group, and a number of proteins in the selected protein group.
- 39. A method according to claim 15, further including associating the maximum epitope with a binding reagent.
- 40. A method according to claim 15, further including associating the maximum epitope with a binding reagent in a chip.
- 41. A method according to claim 15, further including forming a representation of the at least two proteins based on the maximum epitope.
- 42. A method according to claim 15, further including forming a representation of the at least two proteins based on the epitopes in the solution set.
- 43. A method according to claim 15, further including forming a binary representation of the at least two proteins based on whether the at least two proteins include the maximum epitope.
- 44. A method according to claim 15, wherein the maximum epitope is included in at least one of the at least two proteins.
- 45. A method according to claim 15, further including:
repeating forming a protein group and updating until the at least two proteins are unassociated with another of the at least two proteins, and, forming a solution set based on the maximum epitopes.
- 46. A method according to claim 45, further including
eliminating at least some of the epitopes in the solution set, and, based on the eliminated epitopes and the epitopes in the solution set, repeating forming at least one protein group and updating until the at least two proteins are unassociated with another of the at least two proteins.
- 47. A method for identifying a solution set of epitopes to identify at least two proteins, the method comprising:
determining the epitopes in the at least two proteins based on one or more proteolytic agents, and, applying a randomized greedy algorithm to the determined epitopes to distinguish the solution set of epitopes.
- 48. A method according to claim 47, further including applying a local search algorithm to the solution set.
- 49. A method according to claim 47, further including iteratively applying a local search algorithm to the solution set.
- 50. A method according to claim 47, further including associating at least one of the epitopes in the solution set with a binding reagent.
- 51. A method according to claim 47, further including generating a binary representation for the at least two proteins based on the solution set.
- 52. A method according to claim 47, wherein applying a randomized greedy algorithm includes forming at least one protein group by associating the at least two proteins based on whether the at least two proteins are distinguished by the solution set.
- 53. A method according to claim 47, wherein applying a randomized greedy algorithm includes,
identifying a maximum epitope from the determined epitopes where the maximum epitope distinguishes at least as many pairs of the at least two proteins as at least one of the other determined epitopes, associating the maximum epitope with a solution set, removing the maximum epitope from the set of determined epitopes, and, repeating the identifying, associating, and removing until every pair of the at least two proteins are distinguished by the epitopes associated with the solution set.
- 54. A method according to claim 47, wherein the at least two proteins are undistinguished by molecular mass.
- 55. A method for identifying at least one protein in a protein catalog, the method comprising:
determining epitopes in the protein catalog based on cleaving the protein catalog proteins with at least one proteolytic agent, using a randomized greedy algorithm to identify a solution set of the determined epitopes that can distinguish the protein catalog proteins, forming a chip based on binding reagents associated with the solution set of the determined epitopes, obtaining a signature from the chip based on at least one protein in the protein catalog, and, associating the signature with the at least one protein.
- 56. A method according to claim 55, further including identifying a signature for the at least one protein in the protein catalog, the signature based on the solution set of the determined epitopes.
- 57. A method according to claim 56, wherein associating the signature includes comparing the signature with the identified signature for the at least one protein in a protein catalog.
- 57. A method according to claim 55, further including using a local search algorithm with the greedy algorithm.
- 58. A method for generating an identifier for at least one protein in a protein catalog, the method comprising:
determining epitopes in the protein catalog based on cleaving the protein catalog proteins with at least one proteolytic agent, identifying a solution set that includes a solution set of determined epitopes that distinguish the proteins, and, associating an identifier with the at least one protein based on whether the at least one protein includes the epitopes in the solution set.
- 59. A method according to claim 58, wherein associating includes, for each epitope in the solution set, assigning a binary digit to the at least one protein based on whether at least one the protein includes the epitope.
- 60. A method according to claim 58, wherein identifying includes identifying based on a randomized greedy algorithm.
- 61. A method according to claim 60, further including a local search algorithm.
- 62. A method according to claim 58, wherein identifying includes:
associating protein catalog proteins based on whether the protein catalog proteins are undistinguished by the solution set, updating the solution set with a maximum epitope that divides a maximum number of the associations, and, repeating the forming and associating until the protein catalog proteins are unassociated with any other protein catalog protein.
- 63. A method according to claim 62, wherein associating an identifier is performed based on the number of repeats of the forming and the updating.
- 64. A method according to claim 58, wherein associating an identifier includes associating a binary number based on the solution set.
- 65. A method according to claim 58, further including associating the at least one protein with the identifier.
- 66. A processor-readable medium for storing data regarding a protein catalog, the medium comprising,
at least one protein name associated with at least one protein catalog protein, and, for each of the at least one protein name, a protein identifier based on a solution set of epitopes for distinguishing the at least one protein catalog protein from other protein catalog proteins, wherein the at least one protein name and the protein identifier are associated.
- 67. A processor-readable medium according to claim 66, wherein the at least one protein name is alphanumeric.
- 68. A processor-readable medium according to claim 66, wherein the protein identifier is binary.
- 69. A processor-readable medium according to claim 66, wherein the protein identifier is alphanumeric.
- 70. A processor-readable medium according to claim 66, further including, for the at least one protein name, an association with at least one epitope included in the at least one protein catalog protein associated with the at least one protein name.
- 71. A processor-readable medium according to claim 66, further including an association between the at least one protein name and a protein signature, wherein the protein signature is based upon a chip that includes binding reagents, wherein the binding reagents correspond to the solution set of epitopes.
- 72. A processor-readable medium according to claim 66, wherein the at least one protein name and the protein identifier are associated by at least one of at least one database, at least one queue, at least one linked list, at least one hash table, and at least one tree.
- 73. A chip for identifying at least one protein in a protein catalog, the chip comprising binding reagents that are associated with a solution set of epitopes, wherein the solution set of epitopes are determined by a method that includes:
determining epitopes in the protein catalog based on cleaving the protein catalog proteins with at least one proteolytic agent, initializing the solution set of epitopes to be the empty set, associating protein catalog proteins based on whether the protein catalog proteins are undistinguished by the solution set, updating the solution set with a maximum epitope that divides a maximum number of the associations, and, repeating the associating and updating until the protein catalog proteins are unassociated with any other protein catalog protein.
- 74. A method for evaluating a set of epitopes for identifying a protein in a protein catalog, the method comprising,
providing a chip including binding reagents associated with the set of epitopes, selecting at least two proteins from the protein catalog, determining a signature of the at least two proteins based on the chip, adding errors to the signature to form an augmented signature, and, computing a significance score for unidentified protein in the protein catalog, the significance score based on binding sites in unidentified protein catalog proteins and the augmented signature, identifying a protein from the unidentified protein catalog proteins based on the largest significance score, determining a signature of the identified protein, removing the signature of the identified protein from the augmented signature, repeating computing a significance score for each unidentified protein and identifying, until a number of proteins equal to the at least two selected proteins are identified, and, comparing the identified proteins to the at least two selected proteins.
- 75. A method according to claim 74, further including updating a counter when the identified proteins are equivalent to the at least two selected proteins.
- 76. A method according to claim 74, further including returning to selecting at least two proteins and continuing to comparing the identified proteins.
- 77. A method according to claim 74, wherein the significance score for a protein is based on the number of binding sites in the protein catalog for a selected epitope as compared to the total number of binding sites in the protein catalog.
- 78. A method according to claim 74, wherein adding errors includes adding at least one of false negative and false positives.
- 79. A method according to claim 74, wherein adding errors includes adding errors based on at least one probability distribution.
- 80. A computer product disposed on a computer readable medium, the computer product for forming a solution set of at least one epitope, the solution set to identify at least two proteins, the computer product including instructions for causing a processor to:
form at least one protein group by associating each of the at least two proteins based on whether the proteins are undistinguished by the solution set, and, update the solution set with a maximum epitope that divides a maximum number of the protein groups.
- 81. A computer product according to claim 80, wherein the instructions to associate include instructions to form at least one protein group.
- 82. A computer product according to claim 80, wherein the instructions to associate include instructions to associate such that associated proteins are undistinguished by the solution set and unassociated proteins are distinguished by the solution set.
- 83. A computer product according to claim 80, further including instructions to iteratively perform the instructions to form and update until each of the at least two proteins is unassociated with any of the other at least two proteins.
- 84. A computer product according to claim 80, wherein the instructions to update includes instructions to update the at least one protein group based on whether the associated proteins are undistinguished by the updated solution set.
- 85. A computer product according to claim 80, wherein the instructions to associate include instructions to associate such that, for each associated protein in a selected one of the at least one protein group, and for a selected one of the at least one epitope in the solution set, the selected protein group proteins either include the selected epitope, or the selected protein group proteins do not include the selected epitope.
- 86. A computer product according to claim 80, wherein the instructions to update the solution set include instructions to
determine that at least two epitopes divide a maximum number of the at least one protein group, and, select one of the at least two epitopes that divides a maximum number of the at least one protein group.
- 87. A computer product according to claim 80, wherein the instructions to update the solution set include instructions to
determine that at least two epitopes divide a maximum number of the at least one protein group, and, randomly select one of the at least two epitopes that divides a maximum number of the at least one protein group.
- 88. A computer product according to claim 80, wherein the at least two proteins include a protein catalog.
- 89. A computer product according to claim 80, wherein the at least one epitope is based on cleaving the at least two proteins with at least one proteolytic agent.
- 90. A computer product according to claim 80, wherein the instructions to associate include instructions to determine that at least one of the at least two proteins is not associated with another of the at least two proteins.
- 91. A computer product according to claim 80, wherein the instructions to associate includes instructions to assign a label to the at least one protein group.
- 92. A computer product according to claim 80, wherein the instructions to update the solution set include the instructions to assign to at least one of the at least one protein group, a group score based on whether a selected epitope is included in at least one, but not all of the protein group proteins.
- 93. A computer product according to claim 80, wherein the instructions to update the solution set include instructions to assign to at least one of the at least one protein groups, a group score based on whether a selected epitope can distinguish associated proteins in the at least one protein group.
- 94. A computer product according to claim 80, further including instructions to provide a database that associates the epitopes of the at least two proteins based on at least one proteolytic agent.
- 95. A computer product according to claim 94, wherein the database includes a label for associated proteins.
- 96. A computer product according to claim 80, wherein associated proteins are associated using at least one of at least one database, at least one linked list, at least one queue, at least one hash table, and at least one tree.
- 97. A computer product according to claim 80, including association means for associating the associated proteins.
- 98. A computer product according to claim 80, wherein the instructions to update include instructions to determine the maximum epitope by computing a composite group score for an epitope.
- 99. A computer product according to claim 80, wherein the instructions to update include instructions to
compute at least one group score for at least one epitope, the at least one group score corresponding to at least one of the at least one protein group, and generate a composite group for the at least one epitope score based on the at least one group score.
- 100. A computer product according to claim 99, wherein the instructions to update include instructions to select as the maximum epitope, the at least one epitope with the maximum composite group score.
- 101. A computer product according to claim 80, wherein the instructions to update include instructions to compute a group score based on the number of occurrences of an epitope in at least one selected of the at least one protein group, and a number of proteins in the selected protein group.
- 102. A computer product according to claim 80, further including instructions to associate the maximum epitope with a binding reagent.
- 103. A computer product according to claim 80, further including instructions to associate the maximum epitope with a binding reagent in a chip.
- 104. A computer product according to claim 80, further including instructions to form a representation of the at least two proteins based on the maximum epitope.
- 105. A computer product according to claim 80, further including instructions to form a binary representation of the at least two proteins based on whether the at least two proteins include the maximum epitope.
- 106. A computer product according to claim 80, wherein the maximum epitope is included in at least one of the at least two proteins.
- 107. A computer product according to claim 80, further including instructions to:
repeat the instructions to form a protein group and update until the at least two proteins are unassociated with another of the at least two proteins, and, instructions to form a solution set based on the maximum epitopes.
- 108. A method according to claim 107, further including instructions to:
eliminate at least some of the epitopes in the solution set, and, based on the eliminated epitopes and the epitopes in the solution set, repeat the instructions to form at least one protein group and updating until the at least two proteins are unassociated with another of the at least two proteins.
CLAIM PRIORITY
[0001] This application claims priority to U.S. S. No. 60/285,219, entitled “Protein Assignment by Combinatorial Epitope Recognition (PACER)”, and filed on Apr. 20, 2001, naming Jonathan Minden, Ramamoorthi Ravi, Alan Koretsky, and Bjarni Halldorsson as inventors, the contents of which are herein incorporated by reference in their entirety.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60285219 |
Apr 2001 |
US |