Claims
- 1. A method executed by a computer under the control of a program, said computer including a memory for storing said program, said method comprising the steps of:
a) receiving a scaffold protein structure with residue positions; b) selecting a collection of variable residue positions from said residue positions; c) establishing a group of potential rotamers for each of said variable residue positions, and wherein a first group for a first variable residue position has a first set of rotamers from at least two different amino acid side chains, and wherein a second group for a second variable residue position has a second set of rotamers from at least two different amino acid side chains; and, d) analyzing the interaction of each of said rotamers in each group with all or part of the remainder of said protein to generate a set of optimized protein sequences.
- 2. A method according to claim 1 wherein said first and second sets of rotamers are different.
- 3. A method according to claim 1 wherein said first and second sets of rotamers are the same.
- 4. A method executed by a computer under the control of a program, said computer including a memory for storing said program, said method comprising the steps of:
a) receiving a scaffold protein with residue positions; b) selecting a collection of variable residue positions from said residue positions; c) establishing a group of potential amino acids for each of said variable residue positions, wherein a first group for a first variable residue position has a first set of at least two amino acid side chains, and wherein a second group for a second variable residue position has a second set of at least two different amino acid side chains; and, d) analyzing the interaction of each of said amino acids with all or part of the remainder of said protein to generate a set of optimized protein sequences.
- 5. A method according to claims 1-4 wherein after step d) a library of said optimized protein sequences is generated.
- 6. A method according to claim 5 further comprising physically generating at least one member of said set of optimized protein sequences and experimentally testing said sequences for a desired function.
- 7. A method for generating a secondary library of scaffold protein variants comprising:
a) providing a primary library comprising a filtered set of scaffold protein primary variant sequences; b) generating a list of primary variant positions in said primary library; c) combining a plurality of said primary variant positions to generate a secondary library of secondary sequences.
- 8. A method for generating a secondary library of scaffold protein variants comprising:
a) providing a primary library comprising a filtered set of scaffold protein primary variant sequences; b) generating a probability distribution of amino acid residues in a plurality of variant positions; c) combining a plurality of said amino acid residues to generate a secondary library of secondary sequences.
- 9. A method according to claim 7 further comprising synthesizing a plurality of said secondary sequences.
- 10. A method according to claim 8 wherein said synthesizing is done by multiple PCR with pooled oligonucleotides.
- 11. A method according to claim 10 wherein said pooled oligonucleotides are added in equimolar amounts.
- 12. A method according to claim 10 wherein said pooled oligonucleotides are added in amounts that correspond to the frequency of the mutation.
- 13. A composition comprising a plurality of secondary variant proteins comprising a subset of said secondary library.
- 14. A composition comprising a plurality of nucleic acids encoding a plurality of secondary variant proteins comprising a subset of said secondary library.
- 15. A method for generating a secondary library of scaffold protein variants comprising:
a) providing a first library rank-ordered list of scaffold protein primary variants; b) generating a probability distribution of amino acid residues in a plurality of variant positions; c) synthesizing a plurality of scaffold protein secondary variants comprising a plurality of said amino acid residues to form a secondary library; wherein at least one of said secondary variants is different from said primary variants.
- 16. A computational method comprising:
a) receiving a scaffold protein with residue positions; b) selecting a collection of variable residue positions from said residue positions; a) providing a sequence alignment of a plurality of related proteins; b) generating frequencies of occurrence for individual amino acids in at least a plurality of positions with said alignments; e) creating a pseudo-energy scoring function using said frequencies; f) using said pseudo-energy scoring function and at least one additional scoring function to generate a set of optimized protein sequences.
- 17. A method according to claim 16 wherein said frequencies are weighted.
- 18. A method according to claim 17 wherein said frequencies are weighted using a diversity weighting function.
- 19. A method according to claim 17 wherein said frequencies are weighted using a sequence homology weighting function.
- 20. A method according to claim 17 wherein said frequencies are weighted using a structural homology weighting function.
- 21. A method according to claim 17 wherein said frequencies are weighted using a weighting function based on physical properties.
- 22. A method according to claim 17 wherein said frequencies are weighted using a functional-based weighing function.
- 23. A method according to claims 19 or 20 wherein if said homology is high, said weighting is high.
- 24. A method according to claim 19 or 20 wherein if said homology is high, said weight is low.
- 25. A method according to claim 17 wherein said multiple sequence alignment comprises proteins with related three-dimensional structures.
- 26. A method according to claim 16 wherein pseudo-energy is based on logarithms of said frequencies.
- 27. A method according to claim 26 wherein said pseudo energy scoring function is based on log-odds ratios.
- 28. A method according to claims 16-27 wherein after step f) a library of said optimized protein sequences is generated.
- 29. A method according to claim 28 further comprising physically generating at least one member of said set of optimized protein sequences and experimentally testing said sequences for a desired function.
- 30. A computational method comprising:
a) receiving a scaffold protein with residue positions; b) selecting a collection of variable residue positions from said residue positions; c) providing a sequence alignment of a plurality of related proteins; d) generating a frequency of occurrence for individual amino acids in at least a plurality of positions with said proteins; e) selecting a group of potential amino acids for each of said variable residue positions, wherein a first group for a first variable residue position has a first set of at least two amino acid side chains, and wherein a second group for a second variable residue position has a second set of at least two different amino acid side chains according to their frequency of occurrence; and, f) analyzing the interaction of each of said amino acids at each variable residue position with all or part of the remainder of said protein using at least one scoring function to generate a set of optimized protein sequences.
- 31. A computational method according to claim 28, wherein amino acids with a frequency of occurrence of at least 1% are selected.
- 32. A computational method according to claim 28, wherein amino acids with a frequency of occurrence of at least 5% are selected.
- 33. A computational method according to claim 28, wherein amino acids with a frequency of occurrence of at least 10% are selected.
- 34. A computational method according to claim 28, wherein amino acids with a frequency of occurrence of at least 20% are selected.
- 35. A method according to claim 28 wherein said frequency is weighted.
- 36. A method according to claim 28 wherein said frequencies are weighted using a diversity weighting function.
- 37. A method according to claim 28 wherein said frequencies are weighted using a sequence homology weighting function.
- 38. A method according to claim 28 wherein said frequencies are weighted using a structural homology weighting function.
- 39. A method according to claim 28 wherein said frequencies are weighted using a weighting function based on physical properties.
- 40. A method according to claim 28 wherein said frequencies are weighted using a functional-based weighing function.
- 41. A method according to claims 37 or 38 wherein if said homology is high, said weighting is high.
- 42. A method according to claims 37 or 38 wherein if said homology is high, said weight is low.
- 43. A method according to claim 28 wherein said multiple sequence alignment comprises proteins with related three-dimensional structures.
- 44. A computational method according claims 16-43 wherein said analyzing step further comprises at least two scoring functions.
- 45. A method according to claim 28 wherein said scoring function is selected from the group consisting of van der Waals potential scoring function, a hydrogen bond potential scoring function, an atomic solvation scoring function, an electrostatic scoring function, a secondary structure propensity scoring function and a pseudo-energy scoring function.
- 46. A method according to claims 28-45 wherein after step f) a library of said optimized protein sequences is generated.
- 47. A method according to claim 28 further comprising physically generating at least one member of said set of optimized protein sequences and experimentally testing said sequences for a desired function.
- 48. A computational method comprising:
a) receiving a scaffold protein with residue positions; b) selecting a collection of variable residue positions from said residue positions; c) providing an amino acid substitution matrix; d) creating a pseudo-energy scoring function using said matrix; e) using said pseudo-energy scoring function and at least one additional scoring function to generate a set of optimized protein sequences.
- 49. A computational method according to claim 48 wherein said substitution matrix is selected from the group consisting of PAM, BLOSUM, and DAYHOFF.
- 50. A method according to claims 46-49 wherein after step e) a library of said optimized protein sequences is generated.
- 51. A method according to claim 50 further comprising physically generating at least one member of said set of optimized protein sequences and experimentally testing said sequences for a desired function.
- 52. A method executed by a computer under the control of a program, said computer including a memory for storing said program, said method comprising the steps of:
a) receiving a scaffold protein with residue positions; b) selecting a collection of at least one variable residue position from said residue positions; c) importing a set of coordinates for a scaffold protein, said scaffold protein comprising amino acid positions; d) analyzing the interaction of each of said amino acids with all or part of the remainder of said protein; e) utilizing a plurality of scoring functions, at least a first a scoring function having a first weight and a second scoring function having a second weight, to generate at least one variable decoy sequence; and, f) comparing the scores from said scoring functions of said variable decoy sequence to the scores of a reference state to generate modified weights, wherein each weight is increased if the corresponding score of the decoy is higher than the corresponding score of the reference state and each weight is decreased if the corresponding score of the decoy is lower than the corresponding score of the reference state and, wherein the extent of increase or decrease is based on the relative individual and total scores of the decoy and reference states.
- 53. A method according to claim 52 comprising repeating steps a) and e) at least one or more times to generate a final modified weight for each scoring function.
- 54. A method according to claim 52 wherein the collection of variable residue positions is modified repeating steps a through f).
- 55. A method according to claim 52 wherein said final modified weight for each scoring function is used to generate a set of optimized protein sequences.
- 56. A method according to claim 52, wherein the reference state is based on the native sequence and structure.
- 57. A method according to claim 52, wherein the reference state is a prototypical protein.
- 58. A method according to claim 52, wherein said prototypical protein is derived from a set of proteins with similar physical or functional properties.
- 59. A method according to claim 52, wherein said weights are optimized on a set of said scaffold proteins.
- 60. A method according to claim 52, wherein the extent of increase or decrease of said weights is based on the total Boltzmann probabilities of said reference and decoy states.
- 61. A method according to claim 52, wherein the extent of increase or decrease of said weights is based on the difference between individual scores of said decoy and reference states.
- 62. A method according to claim 52 comprising replacing at least a single amino acid in said scaffold protein to create a variable sequence and analyzing said variable sequence using said scoring functions.
- 63. A method according to claim 52 further comprising replacing a subset of amino acids, said subset selected from the group comprising core, boundary, and surface amino acids.
- 64. A method according to claim 52 further comprising protein design automation.
- 65. A method according to claim 52 further comprising sequence prediction algorithm.
- 66. A method according to claims 52-65 wherein after step d) a library of said optimized protein sequences is generated.
- 67. A method according to claim 66 further comprising physically generating at least one member of said set of optimized protein sequences and experimentally testing said sequences for a desired function.
- 68. A method executed by a computer under the control of a program, said computer including a memory for storing said program, said method comprising the steps of:
a) receiving a scaffold protein with residue positions; b) selecting a collection of variable residue positions from said residue positions; c) importing a set of coordinates for a scaffold protein, said scaffold protein comprising amino acid positions; d) generating a variable protein sequence comprising a defined energy state for each amino acid position; e) applying an energy increase to at least one of said defined energy states for a least one of said amino acid positions; and, f) generating at least one alternate variable protein sequence.
- 69. A method executed by a computer under the control of a program, said computer including a memory for storing said program, said method comprising the steps of:
a) receiving a scaffold protein with residue positions; b) selecting a collection of variable residue positions from said residue positions; c) importing a set of coordinates for a scaffold protein, said scaffold protein comprising amino acid positions; d) generating a variable protein sequence comprising a defined energy state for each amino acid position; e) applying a probability parameter to at least one of said amino acid positions; and f) generating at least one alternate variable protein sequence.
- 70. A method according to claim 68 or 69 wherein said energy increase is applied to a plurality of amino acid positions.
- 71. A method according to claim 68 or 69 generating a plurality of alternate optimized variable protein sequences.
- 72. A method according to claim 68 further comprising applying a recency parameter with said energy increase.
- 73. A method according to claims 68-72 further comprises comparing said alternate optimized variable protein sequences.
- 74. A method according to claim 68 further comprising applying a frequency parameter with said energy increase.
- 75. A method according to claim 74 wherein said method comprises biasing said frequency parameter against the most frequent amino acid residue at a particular position.
- 76. A method according to claim 68 wherein said energy increase includes the energy increase of a set of rotamers for at least one amino acid position.
- 77. A method according to claim 68 wherein said energy increase includes the energy increase of a set of rotamers for a plurality of amino acid positions.
- 78. A method according to claim 68 wherein said protein design cycle comprises applying protein design automation technology.
- 79. A method according to claim 68 wherein said protein design cycle comprises applying the sequence prediction algorithm.
- 80. A method according to claim 68 wherein said protein design cycle comprises applying a force field calculation.
- 81. A method according to claims 68-80 wherein after step f) a library of said optimized protein sequences is generated.
- 82. A method according to claim 81 further comprising physically generating at least one member of said set of optimized protein sequences and experimentally testing said sequences for a desired function.
- 83. A method executed by a computer under the control of a program, said computer including a memory for storing said program, said method comprising the steps of:
a) receiving a scaffold protein with residue positions; b) selecting a collection of variable residue positions from said residue positions; c) importing a set of coordinates for a scaffold protein, said scaffold protein comprising amino acid positions; d) generating a set of optimized variant protein sequences comprising one or more variant amino acids; and, e) applying a clustering algorithm to cluster said set into a plurality of subsets.
- 84. A method according to claim 83 further comprising applying a taboo search.
- 85. A method according to claim 83 wherein said clustering algorithm comprises a single-linkage clustering algorithm.
- 86. A method according to claim 83 wherein said clustering algorithm comprises a complete linkage clustering algorithm.
- 87. A method according to claim 83 wherein said clustering algorithm comprises an average linkage clustering algorithm.
- 88. A method according to claim 83 wherein said subsets are clustered according to sequence similarity.
- 89. A method according to claim 83 wherein said subsets are clustered according to energetic similarity.
- 90. A method according to claim 83 wherein DNA shuffling is applied with said subsets to generate a library of optimized protein sequences.
- 91. A method according to claim 83 wherein said protein design cycle comprises protein design automation technology.
- 92. A method according to claim 83 wherein said protein design cycle comprises the sequence prediction algorithm.
- 93. A method according to claim 83 wherein said protein design cycle comprises a force field calculation.
- 94. A method according to claims 83 or 90 wherein said subsets are used to generate secondary libraries comprising related sequences.
- 95. A method according to claims 83-90 wherein after step e) a library of said optimized protein sequences is generated.
- 96. A method according to claims 94 or 95 further comprising physically generating at least one member of said set of optimized protein sequences and experimentally testing said sequences for a desired function.
- 97. A method for identifying proteins that have a similar conformation to a target protein, said method comprising:
a) receiving at least one scaffold protein structure with variable residue positions of a target protein; b) computationally generating a set of primary variant amino acid sequences that adopt a conformation similar to the conformation of said target protein; and, c) identifying at least one protein sequence that is similar to at least one member of said set of primary variants, but is dissimilar to said target protein amino acid sequence.
- 98. A method according to claim 97, further comprising the step of confirming that said protein will adopt said conformation of said target protein.
- 99. A method according to claim 97 wherein an amino acid sequence with less than 30% sequence identity is dissimilar.
- 100. A method according to claim 97 wherein an amino acid sequence with less than 20% sequence identity is dissimilar.
- 101. A method according to claim 97 wherein a similar conformation is a protein comprising a position for a given fold.
- 102. A method according to claim 97 wherein said computationally generating is applying a protein design algorithm.
- 103. A method according to claim 102 wherein said computationally generating is applying protein design automation.
- 104. A method according to claims 97 or 102 wherein said computationally generating step comprises a taboo search.
- 105. A method according to claim 102 wherein said computationally generating step comprises applying a sequence prediction algorithm.
- 106. A method according to claims 97 or 102 wherein said computationally generated sequences are used to create a Position Specific Scoring Matrix.
- 107. A method according to claim 102 wherein said computationally generating includes the use of at least two scoring functions.
- 108. A method according to claim 107 wherein said scoring functions are selected from the group consisting of a van der Waals potential scoring function, a hydrogen bond potential scoring function, an atomic solvation scoring function, an electrostatic scoring function and a secondary structure propensity scoring function.
- 109. A method according to claim 97 wherein the method for identifying said protein comprises searching public databases.
- 110. A method according to claim 97 wherein the method for identifying said protein comprises using a dynamic programming algorithm.
- 111. A method according to claim 97 wherein said confirming is selected from the group consisting of x-ray crystallography, NMR spectroscopy, and combinations thereof.
- 112. A method for generating variant protein sequence libraries comprising:
a) providing populations of at least two double stranded donor fragments corresponding to a nucleic acid template; b) adding polymerase primers capable of hybridizing to end regions of each of said population of donor fragments; f) generating a population of hybrid double stranded molecules wherein one strand comprises a 5′-purification tag and the other strand comprises a 5′-phosphorylated overhang; g) enriching for variant strands by removing strands comprising a 5′-biotin moiety; h) annealing said variant strands to form at least two double stranded ligation substrates; and, i) ligating said ligation substrates to form a double stranded ligation product wherein said ligation product encodes a variant protein.
- 113. A method according to claim 112 wherein one of said polymerase primers generates a variant nucleic acid strand.
- 114. A method according to claim 112 wherein said template generates a variant nucleic acid strand.
- 115. A method according to claim 112 wherein step e) precedes step d).
- 116. A method according to claim 112 wherein steps a) through f) are repeated to generate a variant protein.
Parent Case Info
[0001] This application is a continuing application of Ser. No. 09/927,790, filed on Aug. 10, 2001 and claims the benefit of the filing dates of Serial No. 60/311,545, filed on Aug. 10, 2001, No. 60/324,899, filed on Sep. 25, 2001, No. 60/351,937, filed on Jan. 25, 2002, and No. 60/352,103, filed on Jan. 25, 2002.
Provisional Applications (4)
|
Number |
Date |
Country |
|
60311545 |
Aug 2001 |
US |
|
60324899 |
Sep 2001 |
US |
|
60351937 |
Jan 2002 |
US |
|
60352103 |
Jan 2002 |
US |
Continuations (1)
|
Number |
Date |
Country |
Parent |
09927790 |
Aug 2001 |
US |
Child |
10218102 |
Aug 2002 |
US |