Claims
- 1. A method for constructing a library of antibody based on a structure of a lead antibody, the method comprising:
providing an amino acid sequence of the variable region of the heavy chain (VH) or light chain (VL) of a lead antibody, the lead antibody having a known three dimensional structure which is defined as a lead structural template; identifying the amino acid sequences in the CDRs of the lead antibody; selecting one of the CDRs in the VH or VL region of the lead antibody; providing an amino acid sequence that comprises at least 3 consecutive amino acid residues in the selected CDR, the selected amino acid sequence being a lead sequence; comparing the lead sequence profile with a plurality of tester protein sequences; selecting from the plurality of tester protein sequences at least two peptide segments that have at least 10% sequence identity with lead sequence, the selected peptide segments forming a hit library; determining if a member of the hit library is structurally compatible with the lead structural template using a scoring function; and selecting the members of the hit library that score equal to or better than or equal to the lead sequence.
- 2. The method of claim 1, wherein the length of the lead sequence is between 5-100 aa.
- 3. The method of claim 1, wherein the length of the lead sequence is between 6-80 aa.
- 4. The method of claim 1, wherein the length of the lead sequence is between 8-50 aa.
- 5. The method of claim 1, wherein the step of identifying the amino sequences in the CDRs is carried out by using Kabat criteria; other criteria such as Chothia etc can be also used (quote them).
or Chothia criteria.
- 6. The method of claim 1, wherein the lead sequence comprises an amino acid sequence from a region within the VH or VL of the lead antibody selected from the group consisting of CDR1, CDR2, CDR3, FR1-CDR1, CDR1-FR2, FR2-CDR2, CDR2-FR3, FR3-CDR3, CDR3-FR4, FR1-CDR1-FR2, FR2-CDR2-FR3, and FR3-CDR3-FR4.
- 7. The method of claim 1, wherein the lead sequence comprises at least 6 consecutive amino acid residues in the selected CDR.
- 8. The method of claim 1, wherein the lead sequence comprises at least 7 consecutive amino acid residues in the selected CDR.
- 9. The method of claim 1, wherein the lead sequence comprises all of the amino acid residues in the selected CDR.
- 10. The method of claim 1, wherein the lead sequence further comprises at least one of the amino acid residues immediately adjacent to the selected CDR.
- 11. The method of claim 1, wherein the lead sequence further comprises at least one of the amino acid residues in the FRs flanking the selected CDR.
- 12. The method of claim 1, wherein the lead sequence further comprises one or more CDRs or FRs adjacent the C-terminus or N-terminus of the selected CDR.
- 13. The method of claim 1, wherein the plurality of tester protein sequences comprises antibody sequences.
- 14. The method of claim 1, wherein the plurality of tester protein sequences comprises human antibody sequences.
- 15. The method of claim 1, wherein the plurality of tester protein sequences comprises humanized antibody sequences having at least 70% human sequences in VH or VL.
- 16. The method of claim 1, wherein the plurality of tester protein sequences comprises human germline antibody sequences.
- 17. The method of claim 1, wherein the plurality of tester protein sequences is retrieved from a database consisting of genbank of the NIH, Swiss-Prot database, and the Kabat database for CDRs of antibodies.
- 18. The method of claim 1, wherein the step of comparing the lead sequence with the plurality of tester protein sequences is implemented by an algorithm selected from the group consisting of BLAST, PSI-BLAST, profile HMM, and COBLATH.
- 19. The method of claim 1, wherein the sequence identity of the selected peptide segments in the hit library with the lead sequence is at least 25%.
- 20. The method of claim 1, wherein the sequence identity of the selected peptide segments in the hit library with the lead sequence is at least 35%.
- 21. The method of claim 1, wherein the sequence identity of the selected peptide segments in the hit library with the lead sequence is at least 45%.
- 22. The method of claim 1, wherein the scoring function is an energy scoring function selected from the group consisting of electrostatic interactions, van der Waals interactions, electrostatic solvation energy, solvent-accessible surface solvation energy, and conformational entropy.
- 23. The method of claim 1, wherein the scoring function is a scoring function that incorporates a forcefield selected from the group consisting of the Amber forcefield, Charmm forcefield, the Discover cvff forcefields, the ECEPP forcefields, the GROMOS forcefields, the OPLS forcefields, the MMFF94 forcefield, the Tripos forcefield, the MM3 forcefield, the Dreiding forcefield, and UNRES forcefield, and other knowledge-based statistical forcefield (mean field) and structure-based thermodynamic potential functions.
- 24. The method of claim 1, wherein the step of selecting the members of the hit library includes selecting the members of the hit library that have a lower or equal total energy than that of the lead sequence calculated based on a formula of
- 25. The method of claim 1, wherein the step of selecting the members of the hit library includes selecting the members of the hit library that have a lower binding free energy than that of the lead sequence calculated as the difference between the bound and unbound states using a refined scoring function
- 26. The method of claim 1, wherein the lead structural template is a 3D structure of a fully assembled lead antibody.
- 27. The method of claim 1, wherein the lead structural template is a 3D structure of VH or VL of the lead antibody.
- 28. The method of claim 1, wherein the lead structural template is a 3D structure of a CDR or FR of the lead antibody, or combination thereof.
- 29. The method of claim 1, wherein the lead structural template is a structure derived from X-crystallography, nuclear magnetic resonance (NMR) spectroscopy or theoretical structural modeling.
- 30. The method of claim 1, further comprising the step of:
constructing a nucleic acid library comprising DNA segments encoding the amino acid sequences of the hit library.
- 31. The method of claim 1, further comprising the steps of:
building an amino acid positional variant profile of the hit library; converting amino acid positional variant profile of the hit library into a nucleic acid positional variant profile by back-translating the amino acid positional variants into their corresponding genetic codons; and constructing a degenerate nucleic acid library of DNA segments by combinatorially combining the nucleic acid positional variants.
- 32. The method of claim 31, wherein the genetic codons are the ones that are preferred for expression in bacteria.
- 33. The method of claim 31, wherein the genetic codons are the sizechosen such that the diversity of the degenerate nucleic acid library of DNA segments within the experimentally coverable diversity (<10ˆ 6 or 7) without undue experimental effort.is below 1×107.
- 34. The method of claim 31, wherein the genetic codons are the sizechosen such that the diversity of the degenerate nucleic acid library of DNA segments within the experimentally coverable diversity (<10ˆ 6 or 7) without undue experimental effort.is below 1×106.
- 35. The method of claim 31, further comprising the steps of:
introducing the DNA segments in the degenerate nucleic acid library into cells of a host organism; expressing the DNA segments in the host cells such that recombinant antibodies containing the amino acid sequences of the hitencoded by the degenerate nucleic acid library are produced in the cells of the host organism; and selecting the recombinant antibody that binds to a target antigen with affinity higher than 106 M−1.
- 36. The method of claim 35, wherein the affinity of the selected recombinant antibody is higher than 108 M−1.
- 37. The method of claim 35, wherein the affinity of the selected recombinant antibody is higher than 109 M−1.
- 38. The method of claim 35, wherein the host organism is selected from the group consisting of bacteria, yeast, plants, insects, and mammals.
- 39. The method of claim 35, wherein the recombinant antibodies are selected from the group consisting of fully assembled antibodies, Fab fragments, Fv fragments, and single chain antibodies.
- 40. The method of claim 35, wherein the recombinant antibodies are displayed on the surface of phage particles.
- 41. The method of claim 40, wherein the recombinant antibodies displayed on the surface of phage particles are double-chain heterodimers formed between VH and VL.
- 42. The method of claim 41, wherein heterodimerization of VH and VL chains is facilitated by a heterodimer formed between two non-antibody polypeptide chains fused to the VH and VL chains, respectively.
- 43. The method of claim 42, wherein the non-antibody polypeptide chains are derived from heterodimeric receptors GABAB R1 (GR1) and R2 (GR2), respectively.
- 44. The method of claim 40, wherein the recombinant antibodies displayed on the surface of phage particles are single-chain antibodies containing VH and VL linked by a peptide linker.
- 45. The method of claim 44, wherein display of the single chain antibody on the surface of phage particles is facilitated by a heterodimer formed between a fusion of the single chain antibody with GR1 and a fusion of phage pill capsid protein with GR2.
- 46. The method of claim 35, wherein the target antigen is selected from the group consisting of small organic molecules, proteins, peptides, nucleic acids and polycarbohydrates.
- 47. A method for constructing a library of antibody sequences, the method comprising the steps of:
providing an amino acid sequence of the variable region of the heavy chain (VH) or light chain (VL) of a lead antibody, the lead antibody having a known three dimensional structure which is defined as a lead structural template; identifying the amino acid sequences in the CDRs of the lead antibody; selecting one of the CDRs in the VH or VL region of the lead antibody; providing an amino acid sequence that comprises at least 3 consecutive amino acid residues in the selected CDR, the selected amino acid sequence being a lead sequence; comparing the lead sequence with a plurality of tester protein sequences; selecting from the plurality of tester protein sequences at least two peptide segments that have at least 10% sequence identity with lead sequence, the selected peptide segments forming a hit library; building an amino acid positional variant profile of the hit library based on frequency of amino acid variant appearing at each position of the lead sequence; combining the amino acid variants in the hit library to produce a combination of hit variants which form a hit variant library; determining if a member of the hit variant library is structurally compatible with the lead structural template using a scoring function; and selecting the members of the hit variant library that score equal to or better than the lead sequence.
- 48. The method of claim 47, wherein the step of combining the amino acid variants in the hit library comprises the step of:
selecting the amino acid variants with frequency of appearance higher than 4 times.
- 49. The method of claim 47, wherein the step of combining the amino acid variants in the hit library comprises the step of:
selecting the amino acid variants with frequency of appearance higher than 6 times.
- 50. The method of claim 47, wherein the step of combining the amino acid variants in the hit library comprises the step of:
selecting the amino acid variants with frequency of appearance higher than 5% out of the total variants at each position.
- 51. The method of claim 47, wherein the step of combining the amino acid variants in the hit library comprises the steps of:
selecting the amino acid variants with frequency of appearance higher than 10% out of the total variants at each position; and combining the selected amino acid variants in the hit library to produce a combination of hit variants which form a hit variant library.
- 52. The method of claim 47, wherein the step of combining the amino acid variants in the hit library comprises the step of:
selecting the amino acid variants with frequency of appearance higher than 5% out of the total variants at each position; selecting the amino acid of the lead sequence if its frequency of appearance is equal to or lower than 5% out of the total variants at each position; and combining the selected amino acid variants in the hit library to produce a combination of hit variants which form the hit variant library.
- 53. The method of claim 47, wherein the scoring function is an energy scoring function selected from the group consisting of electrostatic interactions, van der Waals interactions, electrostatic solvation energy, solvent-accessible surface solvation energy, and conformational entropy.
- 54. The method of claim 47, wherein the scoring function is a scoring function that incorporates a forcefield selected from the group consisting of the Amber forcefield, Charmm forcefield, the Discover cvff forcefields, the ECEPP forcefields, the GROMOS forcefields, the OPLS forcefields, the MMFF94 forcefield, the Tripos forcefield, the MM3 forcefield, the Dreiding forcefield, and UNRES forcefield, and other knowledge-based statistical forcefield (mean field) and structure-based thermodynamic potential functions.
- 55. The method of claim 47, further comprising the step of:
constructing a nucleic acid library comprising DNA segments encoding the amino acid sequences of the selected members of the hit variant library.
- 56. The method of claim 47, further comprising the step of:
partitioning the parsing the selected members of hit variant library into at least two sub-hit variant libraries; selecting a sub-hit variant library; building an amino acid positional variant profile of the selected sub-hit variant library; converting the amino acid positional variant profile of the selected sub-hit variant library into a nucleic acid positional variant profile by back-translating the amino acid positional variants into their corresponding genetic codons; and constructing a degenerate nucleic acid library of DNA segments by combinatorially combining the nucleic acid positional variants.
- 57. The method of claim 56, wherein the step of parsing the hit variant library comprises the step of:
randomly selecting 10-30 members of the hit variant library that score equal to or better than the lead sequence, the selected members forming a sub-variant library.
- 58. The method of claim 56, wherein the step of parsing the hit variant library comprises the step of:
building an amino acid positional variant profile of the hit variant library, resulting a hit variant profile; and Route V. Library construction by parsingparsing the hit variant profile into segments of sub-variant profile based on the contact maps of the Cα, Cβ or heavy atoms of the structure or structure ensembles of a lead sequence within certain distance cutoff (8A to 4.5 A). A structural model or lead structural template within a distance of 4.5 Å.
- 59. The method of claim 56, wherein the step of parsing the hit variant library comprises the step of:
building an amino acid positional variant profile of the hit variant library, resulting a hit variant profile; and Route V. Library construction by parsingparsing the hit variant profile into segments of sub-variant profile based on the contact maps of the Cα, Cβ or heavy atoms of the structure or structure ensembles of a lead sequence within certain distance cutoff (8A to 4.5 A). A structural model or lead structural template within a distance of 6-8 Å.
- 60. A method for constructing a library of antibody based on a structural ensemble of multiple antibodies, the method comprising the steps of:
providing an amino acid sequence of the variable region of the heavy chain (VH) or light chain (VL) of a lead antibody, the lead antibody having a known three dimensional structure; providing 3D structures of one or more antibodies with different sequences in VH or VL region than that of the lead antibody; forming a structure ensemble by combining the structures of the lead antibody and the one or more antibodies; the structure ensemble being defined as a lead structural template; identifying the amino acid sequences in the CDRs of the lead antibody; selecting one of the CDRs in the VH or VL region of the lead antibody; providing an amino acid sequence that comprises at least 3 consecutive amino acid residues in the selected CDR, the selected amino acid sequence being a lead sequence; comparing the lead sequence with a plurality of tester protein sequences; selecting from the plurality of tester protein sequences at least two peptide segments that have at least 10% sequence identity with lead sequence, the selected peptide segments forming a hit library; building an amino acid positional variant profile of the hit library based on frequency of amino acid variant appearing at each position of the lead sequence; combining the amino acid variants in the hit library to produce a combination of hit variants which form a hit variant library; determining if a member of the hit variant library is structurally compatible with the lead structural template using a scoring function; and selecting the members of the hit variant library that score equal to or better than the lead sequence.
- 61. A method for constructing a library of antibody based on a structure of a lead antibody, the method comprising the steps of:
a) providing an amino acid sequence of the variable region of the heavy chain (VH) or light chain (VL) of a lead antibody, the lead antibody having a known three dimensional structure; b) identifying the amino acid sequences in the CDRs of the lead antibody; c) selecting one of the CDRs in the VH or VL region of the lead antibody; d) providing an amino acid sequence that comprises at least 3 consecutive amino acid residues in the selected CDR, the selected amino acid sequence being defined as a lead sequence; e) comparing the lead sequence with a plurality of tester protein sequences; f) selecting from the plurality of tester protein sequences at least two peptide segments that have at least 10% sequence identity with lead sequence, the selected peptide segments forming a hit library; g) building an amino acid positional variant profile of the hit library based on frequency of amino acid variant appearing at each position of the lead sequence; h) combining the amino acid variants in the hit library to produce a combination of hit variants which form a hit variant library; i) determining if a member of the hit variant library is structurally compatible with the lead structural template using a scoring function; j) selecting the members of the hit variant library that score equal to or better than the lead sequence; k) constructing a degenerate nucleic acid library comprising DNA segments encoding the amino acid sequences of the selected members of the hit variant library; l) determining the diversity of the nucleic acid library, if the diversity is higher than 1×106, repeating steps j) through l) until the diversity of the diversity of the nucleic acid library is equal to or lower than 1×106; m) introducing the DNA segments in the degenerate nucleic acid library into cells of a host organism; n) expressing the DNA segments in the host cells such that recombinant antibodies containing the amino acid sequences of the hit library are produced in the cells of the host organism; o) selecting the recombinant antibody that binds to a target antigen with affinity higher than 106 M−1; and p) repeating steps e) through o) if no recombinant antibody is found to bind to the target antigen with affinity higher than 106 M−1.
- 62. A method for constructing a library of antibody based on a structure of a lead antibody, the method comprising the steps of:
a) providing an amino acid sequence of the variable region of the heavy chain (VH) or light chain (VL) of a lead antibody, the lead antibody having a known three dimensional structure which is defined as a lead structural template; b) identifying the amino acid sequences in the CDRs of the lead antibody; c) selecting one of the CDRs in the VH or VL region of the lead antibody; d) providing an amino acid sequence that comprises at least 3 consecutive amino acid residues in the selected CDR, the selected amino acid sequence being defined as a lead sequence; e) mutating the lead sequence by substituting one or more of the amino acid residues of the lead sequence with one or more different amino acid residues, resulting in a lead sequence mutant library; f) determining if a member of the lead sequence mutant library is structurally compatible with the lead structural template using a first scoring function; g) selecting the lead sequence mutants that score equal to or better than the lead sequence; h) comparing the lead sequence with a plurality of tester protein sequences; i) selecting from the plurality of tester protein sequences at least two peptide segments that have at least 10% sequence identity with lead sequence, the selected peptide segments forming a hit library; j) building an amino acid positional variant profile of the hit library based on frequency of amino acid variant appearing at each position of the lead sequence; k) combining the amino acid variants in the hit library to produce a combination of hit variants; l) combining the selected lead sequence mutants with the combination of hit variants to produce a hit variant library; m) determining if a member of the hit variant library is structurally compatible with the lead structural template using a second scoring function; n) selecting the members of the hit variant library that score equal to or better than the lead sequence; o) constructing a degenerate nucleic acid library comprising DNA segments encoding the amino acid sequences of the selected members of the hit variant library; p) determining the diversity of the nucleic acid library, and if the diversity is higher than 1×106, repeating steps n) through p) until the diversity of the diversity of the nucleic acid library is equal to or lower than 1×106; q) introducing the DNA segments in the degenerate nucleic acid library into cells of a host organism; r) expressing the DNA segments in the host cells such that recombinant antibodies containing the amino acid sequences of the hit library are produced in the cells of the host organism; s) selecting the recombinant antibody that binds to a target antigen with affinity higher than 106 M−1; and t) repeating steps e) through s) if no recombinant antibody is found to bind to the target antigen with affinity higher than 106 M−1.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation-in-part of U.S. patent application Ser. No. 10/125,687 entitled “Structure-based construction of human antibody library” filed Apr. 17, 2002, which claims the benefit of U.S. Provisional Application Serial No. 60/284,407 entitled “Structure-based construction of human antibody library” filed Apr. 17, 2001. These applications are incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60284407 |
Apr 2001 |
US |
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
10125687 |
Apr 2002 |
US |
Child |
10153159 |
May 2002 |
US |