RATIONALLY DESIGNED, SYNTHETIC ANTIBODY LIBRARIES AND USES THEREFOR

SEQUENCE LISTING

In accordance with 37 CFR § 1.52(e)(5), the present specification makes reference to a Sequence Listing (submitted electronically as a .txt file named “2009186_0231SL.txt”). The .txt file was generated on Dec. 27, 2018 and is 379,884 bytes in size. The entire contents of the Sequence Listing are herein incorporated by reference.

BACKGROUND OF THE INVENTION

Antibodies have profound relevance as research tools and in diagnostic and therapeutic applications. However, the identification of useful antibodies is difficult and once identified, antibodies often require considerable redesign or ‘humanization’ before they are suitable for therapeutic applications.

Previous methods for identifying desirable antibodies have typically involved phage display of representative antibodies, for example human libraries derived by amplification of nucleic acids from B cells or tissues, or, alternatively, synthetic libraries. However, these approaches have limitations. For example, most human libraries known in the art contain only the antibody sequence diversity that can be experimentally captured or cloned from the source (e.g., B cells). Accordingly, the human library may completely lack or under-represent certain useful antibody sequences. Synthetic or consensus libraries known in the art have other limitations, such as the potential to encode non-naturally occurring (e.g., non-human) sequences that have the potential to be immunogenic. Moreover, certain synthetic libraries of the art suffer from at least one of two limitations: (1) the number of members that the library can theoretically contain (i.e., theoretical diversity) may be greater than the number of members that can actually be synthesized, and (2) the number of members actually synthesized may be so great as to preclude screening of each member in a physical realization of the library, thereby decreasing the probability that a library member with a particular property may be isolated.

For example, a physical realization of a library (e.g., yeast display, phage display, ribosomal display, etc.) capable of screening 10¹²library members will only sample about 10% of the sequences contained in a library with 10¹³members. Given a median CDRH3 length of about 12.7 amino acids (Rock et al., J. Exp. Med., 1994, 179.323-328), the number of theoretical sequence variants in CDRH3 alone is about 20¹²⁷, or about 3.3×10¹⁶variants. This number does not account for known variation that occurs in CDRH1 and CDRH2, heavy chain framework regions, and pairing with different light chains, each of which also exhibit variation in their respective CDRL1, CDRL2, and CDRL3. Finally, the antibodies isolated from these libraries are often not amenable to rational affinity maturation techniques to improve the binding of the candidate molecule.

Accordingly, a need exists for smaller (i.e., able to be synthesized and physically realizable) antibody libraries with directed diversity that systematically represent candidate antibodies that are non-immunogenic (i.e., more human) and have desired properties (e.g., the ability to recognize a broad variety of antigens). However, obtaining such libraries requires balancing the competing objectives of restricting the sequence diversity represented in the library (to enable synthesis and physical realization, potentially with oversampling, while limiting the introduction of non-human sequences) while maintaining a level of diversity sufficient to recognize a broad variety of antigens. Prior to the instant invention, it was known in the art that “[al]though libraries containing heavy chain CDR3 length diversity have been reported, it is impossible to synthesize DNA encoding both the sequence and the length diversity found in natural heavy chain CDR3 repertoires” (Hoet et al., Nat. Biotechnol., 2005, 23: 344, incorporated by reference in its entirety).

Therefore, it would be desirable to have antibody libraries which (a) can be readily synthesized, (b) can be physically realized and, in certain cases, oversampled, (c) contain sufficient diversity to recognize all antigens recognized by the preimmune human repertoire (i.e., before negative selection), (d) are non-immunogenic in humans (i.e., comprise sequences of human origin), and (e) contain CDR length and sequence diversity, and framework diversity, representative of naturally-occurring human antibodies. Embodiments of the instant invention at least provide, for the first time, antibody libraries that have these desirable features.

SUMMARY OF THE INVENTION

The present invention relates to, at least, synthetic polynucleotide libraries, methods of producing and using the libraries of the invention, kits and computer readable forms including the libraries of the invention. In some embodiments, the libraries of the invention are designed to reflect the preimmune repertoire naturally created by the human immune system and are based on rational design informed by examination of publicly available databases of human antibody sequences. It will be appreciated that certain non-limiting embodiments of the invention are described below. As described throughout the specification, the invention encompasses many other embodiments as well.

In certain embodiments, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode at least 10° unique antibody CDRH3 amino acid sequences comprising:

- (i) an N1 amino acid sequence of 0 to about 3 amino acids, wherein each amino acid of the N1 amino acid sequence is among the 12 most frequently occurring amino acids at the corresponding position in N1 amino acid sequences of CDRH3 amino acid sequences that are functionally expressed by human B cells;
- (ii) a human CDRH3 DH amino acid sequence, N- and C-terminal truncations thereof, or a sequence of at least about 80% identity to any of them;
- (iii) an N2 amino acid sequence of 0 to about 3 amino acids, wherein each amino acid of the N2 amino acid sequence is among the 12 most frequently occurring amino acids at the corresponding position in N2 amino acid sequences of CDRH3 amino acid sequences that are functionally expressed by human B cells; and
- (iv) a human CDRH3 H3-JH amino acid sequence, N-terminal truncations thereof, or a sequence of at least about 80% identity to any of them.

In other embodiments, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode at least about 10⁶unique antibody CDRH3 amino acid sequences comprising:

- (i) an N1 amino acid sequence of 0 to about 3 amino acids, wherein:
  - (a) the most N-terminal N1 amino acid, if present, is selected from a group consisting of R, G, P, L, S, A, V, K, I, Q, T and D;
  - (b) the second most N-terminal N1 amino acid, if present, is selected from a group consisting of G, P, R, S, L, V, E, A, D, I, T and K; and
  - (c) the third most N-terminal N1 amino acid, if present, is selected from the group consisting of G, R, P, S, L, A, V, T, E, D, K and F;
- (ii) a human CDRH3 DH amino acid sequence, N- and C-terminal truncations thereof, or a sequence of at least about 80% identity to any of them;
- (iii) an N2 amino acid sequence of 0 to about 3 amino acids, wherein.
  - (a) the most N-terminal N2 amino acid, if present, is selected from a group consisting of G, P, R, L, S, A, T, V, E, D, F and H;
  - (b) the second most N-terminal N2 amino acid, if present, is selected from a group consisting of G, P, R, S, T, L, A, V, E, Y, D and K; and
  - (c) the third most N-terminal N2 amino acid, if present, is selected from the group consisting of G, P, S, R, L, A, T, V, D, E, W and Q; and
- (iv) a human CDRH3 H3-JH amino acid sequence, N-terminal truncations thereof, or a sequence of at least about 80% identity to any of them.

In still other embodiments, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode at least about 10⁶unique antibody CDRH3 amino acid sequences that are at least about 80% identical to an amino acid sequence represented by the following formula:

[X]-[N1]-[DH]-[N2]-[H3-JH], wherein:

- (i) X is any amino acid residue or no amino acid residue;
- (ii) N1 is an amino acid sequence selected from the group consisting of G, P, R, A, S, L, T, V, GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ, SR, SS, ST, SV, TA, TR, TS, TT, TW, VD, VS, WS, YS, AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, PTQ, REL, RPL, SAA, SAL, SGL, SSE, TGL, WGT, and combinations thereof;
- (iii) DH is an amino acid sequence selected from the group consisting of all possible reading frames that do not include a stop codon encoded by IGHD1-1 (SEQ ID NO: 501), IGHD1-20 (SEQ ID NO: 503), IGHD1-26 (polynucleotides encoding SEQ ID NOs: 13 and 14), IGHD1-7 (SEQ ID NO: 504), IGHD2-15 (polynucleotides encoding SEQ ID NO: 16), IGHD2-2 (polynucleotides encoding SEQ ID NOs: 10 and 11), IGHD2-21 (SEQ ID NOs: 505 and 506), IGHD2-8 (SEQ ID NO: 507), IGHD3-10 (polynucleotides encoding SEQ ID NOs: 1-3), IGHD3-16 (SEQ ID NO: 508), IGHD3-22 (polynucleotides encoding SEQ ID NO: 4), IGHD3-3 (polynucleotides encoding SEQ ID NO: 9), IGHD3-9 (SEQ ID NO: 509), IGHD4-17 (polynucleotides encoding SEQ ID NO: 12), IGHD4-23 (SEQ ID NO: 510), IGHD4-4 (SEQ ID NO: 511), IGHD-4-11 (SEQ ID NO: 511), IGHD5-12 (SEQ ID NO: 512), IGHD5-24 (SEQ ID NO. 513). IGHD5-5 (polynucleotides encoding SEQ ID NO: 15), IGHD-5-18 (polynucleotides encoding SEQ ID NO: 15), IGHD6-13 (polynucleotides encoding SEQ ID NOs: 7 and 8), IGHD6-19 (polynucleotides encoding SEQ ID NOs: 5 and 6), IGHD6-25 (SEQ ID NO: 514), IGHD6-6 (SEQ ID NO: 515), and IGHD7-27 (SEQ ID NO: 516), and N- and C-terminal truncations thereof;
- (iv) N2 is an amino acid sequence selected from the group consisting of G, P, R, A, S, L, T, V. GG, GP, GR. GA, GS, GL, GT, GV, PG, RG, AG, SG, LG. TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK. SL, SQ, SR, SS, ST, SV, TA, TR, TS, TT, TW, VD, VS, WS, YS, AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, PTQ, REL, RPL, SAA, SAL, SGL, SSE, TGL, WGT, and combinations thereof; and
- (v) H3-JH is an amino acid sequence selected from the group consisting of AEYFQH (SEQ ID NO: 17), EYFQH (SEQ ID NO: 583), YFQH (SEQ ID NO: 584), FQH, QH, H, YWYFDL (SEQ ID NO: 18), WYFDL (SEQ ID NO: 585), YFDL (SEQ ID NO: 586), FDL, DL, L, AFDV (SEQ ID NO: 19), FDV, DV. V, YFDY (SEQ ID NO: 20), FDY, DY, Y, NWFDS (SEQ ID NO: 21), WFDS (SEQ ID NO: 587), FDS, DS, S, YYYYYGMDV (SEQ ID NO: 22), YYYYGMDV (SEQ ID NO: 588), YYYGMDV (SEQ ID NO: 589), YYGMDV (SEQ ID NO: 590), YGMDV (SEQ ID NO: 591), GMDV (SEQ ID NO: 592), and MDV, or a sequence of at least 80% identity to any of them.

In still another embodiment, the invention comprises wherein said library consists essentially of a plurality of polynucleotides encoding CDRH3 amino acid sequences that are at least about 80% identical to an amino acid sequence represented by the following formula:

[X]-[N1]-[DH]-[N2]-[H3-JH], wherein:

- (i) X is any amino acid residue or no amino acid residue;
- (ii) N1 is an amino acid sequence selected from the group consisting of G, P, R, A, S, L, T, V. GG, GP, GR. GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, RR. RS, RV, SA, SD, SE, SF, SI, SK. SL, SQ, SR, SS, ST, SV, TA, TR, TS, TT, TW, VD, VS, WS, YS, AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, PTQ, REL, RPL, SAA, SAL, SGL, SSE, TGL, WGT, and combinations thereof;
- (iii) DH is an amino acid sequence selected from the group consisting of all possible reading frames that do not include a stop codon encoded by IGHD1-1 (SEQ ID NO: 501), IGHD1-20 (SEQ ID NO: 503), IGHD1-26 (polynucleotides encoding SEQ ID NOs: 13 and 14), IGHD1-7 (SEQ ID NO: 504), IGHD2-15 (polynucleotides encoding SEQ ID NO: 16), IGHD2-2 (polynucleotides encoding SEQ ID NOs: 10 and 11), IGHD2-21 (SEQ ID NOs: 505 and 506), IGHD2-8 (SEQ ID NO: 507), IGHD3-10 (polynucleotides encoding SEQ ID NOs: 1-3), IGHD3-16 (SEQ ID NO: 508), IGHD3-22 (polynucleotides encoding SEQ ID NO: 4), IGHD3-3 (polynucleotides encoding SEQ ID NO: 9), IGHD3-9 (SEQ ID NO: 509), IGHD4-17 (polynucleotides encoding SEQ ID NO: 12), IGHD4-23 (SEQ ID NO: 510), IGHD4-4 (SEQ ID NO: 511), IGHD-4-11 (SEQ ID NO: 511), IGHD5-12 (SEQ ID NO: 512), IGHD5-24 (SEQ ID NO: 513), IGHD5-5 (polynucleotides encoding SEQ ID NO: 15), IGHD-5-18 (polynucleotides encoding SEQ ID NO: 15), IGHD6-13 (polynucleotides encoding SEQ ID NOs: 7 and 8), IGHD6-19 (polynucleotides encoding SEQ ID NOs: 5 and 6), IGHD6-25 (SEQ ID NO: 514), IGHD6-6 (SEQ ID NO: 515), and IGHD7-27 (SEQ ID NO: 516), and N- and C-terminal truncations thereof;
- (iv) N2 is an amino acid sequence selected from the group consisting of G, P, R, A, S, L, T, V, GG, GP, GR, GA, GS, GL. GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG. RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ, SR, SS, ST, SV, TA, TR, TS, TT, TW, VD, VS, WS, YS, AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, PTQ, REL, RPL, SAA, SAL, SGL, SSE, TGL, WGT, and combinations thereof; and
- (v) H3-JH is an amino acid sequence selected from the group consisting of AEYFQH (SEQ ID NO: 17), EYFQH (SEQ ID NO: 583), YFQH (SEQ ID NO: 584), FQH, QH, H, YWYFDL (SEQ ID NO: 18), WYFDL (SEQ ID NO. 585). YFDL (SEQ ID NO: 586), FDL, DL, L, AFDV (SEQ ID NO: 19), FDV, DV, V, YFDY (SEQ ID NO: 20), FDY, DY, Y, NWFDS (SEQ ID NO: 21), WFDS (SEQ ID NO: 587), FDS, DS, S, YYYYYGMDV (SEQ ID NO: 22), YYYYGMDV (SEQ ID NO: 588), YYYGMDV (SEQ ID NO: 589), YYGMDV (SEQ ID NO: 590), YGMDV (SEQ ID NO: 591), GMDV (SEQ ID NO: 592), and MDV, or a sequence of at least 80% identity to any of them.

In another embodiment, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode one or more full length antibody heavy chain sequences, and wherein the CDRH3 amino acid sequences of the heavy chain comprise:

- (i) an N1 amino acid sequence of 0 to about 3 amino acids, wherein each amino acid of the N1 amino acid sequence is among the 12 most frequently occurring amino acids at the corresponding position in N1 amino acid sequences of CDRH3 amino acid sequences that are functionally expressed by human B cells;
- (ii) a human CDRH3 DH amino acid sequence, N- and C-terminal truncations thereof, or a sequence of at least about 80% identity to any of them;
- (iii) an N2 amino acid sequence of 0 to about 3 amino acids, wherein each amino acid of the N2 amino acid sequence is among the 12 most frequently occurring amino acids at the corresponding position in N2 amino acid sequences of CDRH3 amino acid sequences that are functionally expressed by human B cells; and
- (iv) a human CDRH3 H3-JH amino acid sequence, N-terminal truncations thereof, or a sequence of at least about 80% identity to any of them.

The following embodiments may be applied throughout the embodiments of the instant invention. In one aspect, one or more CDRH3 amino acid sequences further comprise an N-terminal tail residue. In still another aspect, the N-terminal tail residue is selected from the group consisting of G, D. and E.

In yet another aspect, the N1 amino acid sequence is selected from the group consisting of G, P, R, A, S, L, T, V, GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ, SR, SS, ST, SV, TA, TR, TS, TT, TW, VD, VS, WS, YS, AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, PTQ, REL, RPL, SAA, SAL, SGL, SSE, TGL, WGT, and combinations thereof. In certain other aspects, the N1 amino acid sequence may be of about 0 to about 5 amino acids.

In yet another aspect, the N2 amino acid sequence is selected from the group consisting of G, P, R, A, S, L, T, V, GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, PS, PL, PT, PV, RP, AP, SP, LP, TP, VP, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ, SR, SS, ST, SV, TA, TR, TS, TT, TW, VD, VS, WS, YS, AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, PTQ, REL, RPL, SAA, SAL. SGL, SSE. TGL, WGT, and combinations thereof. In certain other aspects, the N2 sequence may be of about 0 to about 5 amino acids.

In yet another aspect, the H3-JH amino acid sequence is selected from the group consisting of AEYFQH (SEQ ID NO: 17), EYFQH (SEQ ID NO: 583), YFQH (SEQ ID NO: 584), FQH, QH, H, YWYFDL (SEQ ID NO: 18), WYFDL (SEQ ID NO: 585), YFDL (SEQ ID NO: 586), FDL, DL, L, AFDV (SEQ ID NO: 19), FDV, DV, V, YFDY (SEQ ID NO: 20), FDY, DY. Y, NWFDS (SEQ ID NO: 21), WFDS (SEQ ID NO: 587), FDS, DS, S, YYYYYGMDV (SEQ ID NO: 22), YYYYGMDV (SEQ ID NO: 588), YYYGMDV (SEQ ID NO: 589), YYGMDV (SEQ ID NO: 590), YGMDV (SEQ ID NO: 591), GMDV (SEQ ID NO: 592), and MDV.

In other embodiments, the invention comprises a library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, wherein the percent occurrence within the central loop of the CDRH3 amino acid sequences of at least one of the following i-i+1 pairs in the library is within the ranges specified below:

- Tyr-Tyr in an amount from about 2.5% to about 6.5%;
- Ser-Gly in an amount from about 2.5% to about 4.5%;
- Ser-Ser in an amount from about 2% to about 4%;
- Gly-Ser in an amount from about 1.5% to about 4%;
- Tyr-Ser in an amount from about 0.75% to about 2%;
- Tyr-Gly in an amount from about 0.75% to about 2%; and
- Ser-Tyr in an amount from about 0.75% to about 2%.

In still other embodiments, the invention comprises a library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, wherein the percent occurrence within the central loop of the CDRH3 amino acid sequences of at least one of the following i-i+2 pairs in the library is within the ranges specified below:

- Tyr-Tyr in an amount from about 2.5% to about 4.5%;
- Gly-Tyr in an amount from about 2.5% to about 5.5%;
- Ser-Tyr in an amount from about 2% to about 4%;
- Tyr-Ser in an amount from about 1.75% to about 3.75%;
- Ser-Gly in an amount from about 2% to about 3.5%;
- Ser-Ser in an amount from about 1.5% to about 3%;
- Gly-Ser in an amount from about 1.5% to about 3%; and
- Tyr-Gly in an amount from about 1% to about 2%.

In another embodiment, the invention comprises a library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, wherein the percent occurrence within the central loop of the CDRH3 amino acid sequences of at least one of the following i-i+3 pairs in the library is within the ranges specified below:

- Gly-Tyr in an amount from about 2.5% to about 6.5%;
- Ser-Tyr in an amount from about 1% to about 5%;
- Tyr-Ser in an amount from about 2% to about 4%;
- Ser-Ser in an amount from about 1% to about 3%;
- Gly-Ser in an amount from about 2% to about 5%; and
- Tyr-Tyr in an amount from about 0.75% to about 2%.

In one aspect of the invention, at least 2, 3, 4, 5, 6, or 7 of the specified i-i+1 pairs in the library are within the specified ranges. In another aspect, the CDRH3 amino acid sequences are human. In yet another aspect, the polynucleotides encode at least about 10⁶unique CDRH3 amino acid sequences.

In other aspects of the invention, the polynucleotides further encode one or more heavy chain chassis amino acid sequences that are N-terminal to the CDRH3 amino acid sequences, and the one or more heavy chain chassis sequences are selected from the group consisting of about Kabat amino acid 1 to about Kabat amino acid 94 encoded by IGHV1-2 (SEQ ID NO: 24), IGHV1-3 (SEQ ID NO: 423), IGHV1-8 (SEQ ID NOs: 424, 425), IGHV1-18 (SEQ ID NO: 25), IGHV1-24 (SEQ ID NO: 426), IGHV1-45 (SEQ ID NO: 427), IGHV1-46 (SEQ ID NO: 26), IGHV1-58 (SEQ ID NO: 428), IGHV1-69 (SEQ ID NO: 27), IGHV2-5 (SEQ ID NO: 429), IGHV2-26 (SEQ ID NO: 430), IGHV2-70 (SEQ ID NO: 431, 432), IGHV3-7 (SEQ ID NO: 28), IGHV3-9 (SEQ ID NO: 433), IGHV3-11 (SEQ ID NO: 434), IGHV3-13 (SEQ ID NO: 435), IGHV3-15 (SEQ ID NO: 29), IGHV3-20 (SEQ ID NO: 436), IGHV3-21 (SEQ ID NO: 437), IGHV3-23 (SEQ ID NO: 30), IGHV3-30 (SEQ ID NO: 31), IGHV3-33 (SEQ ID NO: 32), IGHV3-43 (SEQ ID NO: 438), IGHV3-48 (SEQ ID NO: 33), IGHV3-49 (SEQ ID NO: 439), IGHV3-53 (SEQ ID NO: 440). IGHV3-64 (SEQ ID NO: 441), IGHV3-66 (SEQ ID NO: 442), IGHV3-72 (SEQ ID NO: 443), IGHV3-73 (SEQ ID NO: 444), IGHV3-74 (SEQ ID NO: 445), IGHV4-4 (SEQ ID NO: 446, 447), IGHV4-28 (SEQ ID NO: 448), IGHV4-31 (SEQ ID NO: 34), IGHV4-34 (SEQ ID NO: 35), IGHV4-39 (SEQ ID NO: 36), IGHV4-59 (SEQ ID NO: 37), IGHV4-61 (SEQ ID NO: 38), IGHV4-B (SEQ ID NO: 39), IGHV5-51 (SEQ ID NO: 40), IGHV6-1 (SEQ ID NO: 449), and IGHV7-4-1 (SEQ ID NO: 450), or a sequence of at least about 80% identity to any of them.

In another aspect, the polynucleotides further encode one or more FRM4 amino acid sequences that are C-terminal to the CDRH3 amino acid sequences, wherein the one or more FRM4 amino acid sequences are selected from the group consisting of a FRM4 amino acid sequence encoded by IGHJ1 (SEQ ID NO: 253), IGHJ2 (SEQ ID NO: 254), IGHJ3 (SEQ ID NO: 255), IGHJ4 (SEQ ID NO: 256), IGHJ5 (SEQ ID NO: 257), and IGHJ6 (SEQ ID NO: 257), or a sequence of at least about 80% identity to any of them. In still another aspect, the polynucleotides further encode one or more immunoglobulin heavy chain constant region amino acid sequences that are C-terminal to the FRM4 sequence.

In yet another aspect, the CDRH3 amino acid sequences are expressed as part of full-length heavy chains. In other aspects, the full-length heavy chains are selected from the group consisting of an IgG1, IgG2, IgG3, and IgG4, or combinations thereof. In one embodiment, the CDRH3 amino acid sequences are from about 2 to about 30, from about 8 to about 19, or from about 10 to about 18 amino acid residues in length. In other aspects, the synthetic polynucleotides of the library encode from about 10⁶to about 10¹⁴, from about 10⁷to about 10¹³from about 10⁸to about 10¹², from about 10⁹to about 10¹², or from about 10¹⁰to about 10¹²unique CDRH3 amino acid sequences.

In certain embodiments, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of antibody VKCDR3 amino acid sequences comprising about 1 to about 10 of the amino acids found at Kabat positions 89, 90, 91, 92, 93, 94, 95, 95A, 96, and 97, in selected VKCDR3 amino acid sequences derived from a particular IGKV or IGKJ germline sequence.

In one aspect, the synthetic polynucleotides encode one or more of the amino acid sequences listed in Table 33 or a sequence at least about 80% identical to any of them.

In some embodiments, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of unique antibody VKCDR3 amino acid sequences that are of at least about 80% identity to an amino acid sequence represented by the following formula:

[VK_Chassis]-[L3-VK]-[X]-[JK*], wherein:

- (i) VK_Chassis is an amino acid sequence selected from the group consisting of about Kabat amino acid 1 to about Kabat amino acid 88 encoded by IGKV1-05 (SEQ ID NO: 229), IGKV1-06 (SEQ ID NO: 451), IGKV1-08 (SEQ ID NO: 452, 453), IGKV1-09 (SEQ ID NO: 454), IGKV1-12 (SEQ ID NO: 230), IGKV1-13 (SEQ ID NO: 455), IGKV1-16 (SEQ ID NO: 456), IGKV1-17 (SEQ ID NO: 457), IGKV1-27 (SEQ ID NO: 231), IGKV1-33 (SEQ ID NO: 232), IGKV1-37 (SEQ ID NOs: 458, 459), IGKV1-39 (SEQ ID NO: 233), IGKV1D-16 (SEQ ID NO: 460), IGKV1D-17 (SEQ ID NO: 461), IGKV1D-43 (SEQ ID NO: 462), IGKV1D-8 (SEQ ID NOs: 463, 464), IGKV2-24 (SEQ ID NO: 465), IGKV2-28 (SEQ ID NO: 234), IGKV2-29 (SEQ ID NO: 466), IGKV2-30 (SEQ ID NO: 467), IGKV2-40 (SEQ ID NO: 468), IGKV2D-26 (SEQ ID NO: 469), IGKV2D-29 (SEQ ID NO: 470), IGKV2D-30 (SEQ ID NO: 471), IGKV3-11 (SEQ ID NO: 235), IGKV3-15 (SEQ ID NO: 236), IGKV3-20 (SEQ ID NO: 237), IGKV3D-07 (SEQ ID NO: 472). IGKV3D-11 (SEQ ID NO: 473), IGKV3D-20 (SEQ ID NO: 474), IGKV4-1 (SEQ ID NO: 238), IGKV5-2 (SEQ ID NOs: 475, 476), IGKV6-21 (SEQ ID NOs: 477), and IGKV6D-41, or a sequence of at least about 80% identity to any of them;
- (ii) L3-VK is the portion of the VKCDR3 encoded by the IGKV gene segment; and
- (iii) X is any amino acid residue; and
- (iv) JK* is an amino acid sequence selected from the group consisting of sequences encoded by IGJK1, IGJK2, IGJK3, IGJK4, and IGJK5, wherein the first residue of each IGJK sequence is not present.

In still other aspects, X may be selected from the group consisting of F, L, I, R, W, Y, and P.

In certain embodiments, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of V % CDR3 amino acid sequences that are of at least about 80% identity to an amino acid sequence represented by the following formula:

[Vλ_Chassis]-[L3-Vλ]-[Jλ], wherein:

- (i) Vλ_Chassis is an amino acid sequence selected from the group consisting of about Kabat amino acid 1 to about Kabat amino acid 88 encoded by IGλV1-36 (SEQ ID NO: 480), IGλV1-40 (SEQ ID NO: 531), IGλV1-44 (SEQ ID NO: 532), IGλV1-47 (SEQ ID NO: 481). IGλV1-51 (SEQ ID NO: 533), IGλV10-54 (SEQ ID NO: 482), IGλV2-11 (SEQ ID NOS: 483, 484), IGλV2-14 (SEQ ID NO: 534), IGλV2-18 (SEQ ID NO: 485), IGλV2-23 (SEQ ID NOS: 486, 487), IGλV2-8 (SEQ ID NO: 488), IGλV3-1 (SEQ ID NO: 535). IGλV3-10 (SEQ ID NO: 489), IGλV3-12 (SEQ ID NO: 490), IGλV3-16 (SEQ ID NO: 491), IGλV3-19 (SEQ ID NO: 536), IGλV3-21 (SEQ ID NO: 537), IGλV3-25 (SEQ ID NO: 492), IGλV3-27 (SEQ ID NO: 493), IGλV3-9 (SEQ ID NO: 494), IGλV4-3 (SEQ ID NO: 495), IGλV4-60 (SEQ ID NO: 4%), IGλV4-69 (SEQ ID NO: 538), IGλV5-39 (SEQ ID NO: 497), IGλV5-45 (SEQ ID NO: 540), IGλV6-57 (SEQ ID NO: 539), IGλV7-43 (SEQ ID NO: 541), IGλV7-46 (SEQ ID NO: 498). IGλV8-61 (SEQ ID NO: 499), IGλV9-49 (SEQ ID NO: 500), and IGλV10-54 (SEQ ID NO: 482), or a sequence of at least about 80% identity to any of them;
- (ii) L3-V) is the portion of the VλCDR3 encoded by the IGλV segment; and
- (iii) JA is an amino acid sequence selected from the group consisting of sequences encoded by IGλJ1-01, IGλJ2-01, IGλJ3-01, IGλJ3-02, IGλJ6-01, IGλJ7-01, and IGλJ7-02, and wherein the first residue of each IGλJλ sequence may or may not be deleted.

In further aspects, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of antibody proteins comprising:

- (i) a CDRH3 amino acid sequence as specifically described herein; and
- (ii) a VKCDR3 amino acid sequence comprising about 1 to about 10 of the amino acids found at Kabat positions 89, 90, 91, 92, 93, 94, 95, 95A, 96, and 97, in selected VKCDR3 sequences derived from a particular IGKV or IGKJ germline sequence.

In still further aspects, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of antibody proteins comprising:

- (i) a CDRH3 amino acid sequence as specifically described herein; and
- (ii) a VKCDR3 amino acid sequences of at least about 80% identity to an amino acid sequence represented by the following formula:

[VK_Chassis]-[L3-VK]-[X]-[JK*], wherein:

- - (a) VK_Chassis is an amino acid sequence selected from the group consisting of about Kabat amino acid 1 to about Kabat amino acid 88 encoded by IGKV1-05 (SEQ ID NO: 229), IGKV1-06 (SEQ ID NO: 451), IGKV1-08 (SEQ ID NO: 452, 453), IGKV1-09 (SEQ ID NO: 454), IGKV1-12 (SEQ ID NO: 230), IGKV1-13 (SEQ ID NO: 455), IGKV1-16 (SEQ ID NO: 456), IGKV1-17 (SEQ ID NO: 457). IGKV1-27 (SEQ ID NO: 231), IGKV1-33 (SEQ ID NO: 232), IGKV1-37 (SEQ ID NOs: 458, 459), IGKV1-39 (SEQ ID NO: 233), IGKV1D-16 (SEQ ID NO: 460), IGKV1D-17 (SEQ ID NO: 461), IGKV1D-43 (SEQ ID NO: 462), IGKV1D-8 (SEQ ID NOs: 463, 464), IGKV2-24 (SEQ ID NO: 465), IGKV2-28 (SEQ ID NO: 234), IGKV2-29 (SEQ ID NO: 466), IGKV2-30 (SEQ ID NO: 467). IGKV2-40 (SEQ ID NO: 468), IGKV2D-26 (SEQ ID NO: 469), IGKV2D-29 (SEQ ID NO: 470), IGKV2D-(SEQ ID NO: 471), IGKV3-11 (SEQ ID NO: 235), IGKV3-15 (SEQ ID NO: 236), IGKV3-20, IGKV3D-07 (SEQ ID NO: 472), IGKV3D-11 (SEQ ID NO: 473), IGKV3D-20 (SEQ ID NO: 474), IGKV4-1 (SEQ ID NO: 238), IGKV5-2 (SEQ ID NOs: 475, 476), IGKV6-21 (SEQ ID NOs: 477), and IGKV6D-41, or a sequence of at least about 80% identity to any of them:
- (b) L3-VK is the portion of the VKCDR3 encoded by the IGKV gene segment; and
- (c) X is any amino acid residue; and
- (d) JK* is an amino acid sequence selected from the group consisting of sequences encoded by IGJK1, IGJK2, IGJK3, IGJK4, and IGJK5, wherein the first residue of each IGJK sequence is not present.

In some aspects, the VKCDR3 amino acid sequence comprises one or more of the sequences listed in Table 33 or a sequence at least about 80% identical to any of them. In other aspects, the antibody proteins are expressed in a heterodimeric form. In yet another aspect, the human antibody proteins are expressed as antibody fragments. In still other aspects of the invention, the antibody fragments are selected from the group consisting of Fab, Fab′, F(ab′)₂, Fv fragments, diabodies, linear antibodies, and single-chain antibodies.

In certain embodiments, the invention comprises an antibody isolated from the polypeptide expression products of any library described herein.

In still other aspects, the polynucleotides further comprise a 5′ polynucleotide sequence and a 3′ polynucleotide sequence that facilitate homologous recombination.

In one embodiment, the polynucleotides further encode an alternative scaffold.

In another embodiment, the invention comprises a library of polypeptides encoded by any of the synthetic polynucleotide libraries described herein.

In yet another embodiment, the invention comprises a library of vectors comprising any of the polynucleotide libraries described herein. In certain other aspects, the invention comprises a population of cells comprising the vectors of the instant invention.

In one aspect, the doubling time of the population of cells is from about 1 to about 3 hours, from about 3 to about 8 hours, from about 8 to about 16 hours, from about 16 to about 20 hours, or from 20 to about 30 hours. In yet another aspect, the cells are yeast cells. In still another aspect, the yeast is Saccharomyces cerevisiae.

In other embodiments, the invention comprises a library that has a theoretical total diversity of N unique CDRH3 sequences, wherein N is about 10⁶to about 10¹⁵; and wherein the physical realization of the theoretical total CDRH3 diversity has a size of at least about 3N, thereby providing a probability of at least about 95% that any individual CDRH3 sequence contained within the theoretical total diversity of the library is present in the actual library.

In certain embodiments, the invention comprises a library of synthetic polynucleotides, wherein said polynucleotides encode a plurality of antibody VCDR3 amino acid sequences comprising about 1 to about 10 of the amino acids found at Kabat positions 89, 90, 91, 92, 93, 94, 95, 95A, 95B, 95C, 96, and 97, in selected VkCDR3 sequences encoded by a single germline sequence.

In some embodiments, the invention relates to a library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, wherein the library has a theoretical total diversity of about 10⁶to about 10¹⁵unique CDRH3 sequences.

In still other embodiments, the invention relates to a method of preparing a library of synthetic polynucleotides encoding a plurality of antibody VK amino acid sequences, the method comprising:

- (i) providing polynucleotide sequences encoding:
  - (a) one or more VK_Chassis amino acid sequences selected from the group consisting of about Kabat amino acid 1 to about Kabat amino acid 88 encoded by IGKV1-05 (SEQ ID NO: 229), IGKV1-06 (SEQ ID NO: 451), IGKV1-08 (SEQ ID NO: 452, 453), IGKV1-09 (SEQ ID NO: 454), IGKV1-12 (SEQ ID NO: 230), IGKV1-13 (SEQ ID NO: 455), IGKV1-16 (SEQ ID NO: 456), IGKV1-17 (SEQ ID NO: 457), IGKV1-27 (SEQ ID NO: 231), IGKV1-33 (SEQ ID NO: 232), IGKV1-37 (SEQ ID NOs: 458, 459). IGKV1-39 (SEQ ID NO: 233), IGKV1D-16 (SEQ ID NO: 460), IGKV1D-17 (SEQ ID NO: 461), IGKV1D-43 (SEQ ID NO: 462), IGKV1D-8 (SEQ ID NOs: 463, 464), IGKV2-24 (SEQ ID NO: 465). IGKV2-28 (SEQ ID NO: 234), IGKV2-29 (SEQ ID NO: 466), IGKV2-30 (SEQ ID NO: 467), IGKV2-40 (SEQ ID NO: 468). IGKV2D-26 (SEQ ID NO: 469), IGKV2D-29 (SEQ ID NO: 470), IGKV2D-30 (SEQ ID NO: 471), IGKV3-11 (SEQ ID NO: 235), IGKV3-15 (SEQ ID NO: 236), IGKV3-20 (SEQ ID NO: 237), IGKV3D-07 (SEQ ID NO: 472). IGKV3D-11 (SEQ ID NO: 473), IGKV3D-20 (SEQ ID NO: 474), IGKV4-1 (SEQ ID NO: 238). IGKV5-2 (SEQ ID NOs: 475, 476), IGKV6-21 (SEQ ID NOs: 477), and IGKV6D-41, or a sequence at least about 80% identical to any of them;
  - (b) one or more L3-VK amino acid sequences, wherein L3-VK the portion of the VKCDR3 amino acid sequence encoded by the IGKV gene segment;
  - (c) one or more X residues, wherein X is any amino acid residue; and
  - (d) one or more JK* amino acid sequences, wherein JK* is an amino acid sequence selected from the group consisting amino acid sequences encoded by IGKJ1 (SEQ ID NO: 552), IGKJ2 (SEQ ID NO: 553), IGKJ3 (SEQ ID NO: 554), IGKJ4 (SEQ ID NO: 555), and IGKJ5 (SEQ ID NO: 556), wherein the first amino acid residue of each sequence is not present; and
- (ii) assembling the polynucleotide sequences to produce a library of synthetic polynucleotides encoding a plurality of human VK sequences represented by the following formula:

[VK_Chassis]-[L3-VK]-[X]-[JK*].

In some embodiments, the invention relates to a method of preparing a library of synthetic polynucleotides encoding a plurality of antibody light chain CDR3 sequences, the method comprising:

- (i) determining the percent occurrence of each amino acid residue at each position in selected light chain CDR3 amino acid sequences derived from a single germline polynucleotide sequence;
- (ii) designing synthetic polynucleotides encoding a plurality of human antibody light chain CDR3 amino acid sequences, wherein the percent occurrence of any amino acid at any position within the designed light chain CDR3 amino acid sequences is within at least about 30% of the percent occurrence in the selected light chain CDR3 amino acid sequences derived from a single germline polynucleotide sequence, as determined in (i); and
- (iii) synthesizing one or more polynucleotides that were designed in (ii).

In other embodiments, the invention relates to a method of preparing a library of synthetic polynucleotides encoding a plurality of antibody Vλ amino acid sequences, the method comprising:

(i) providing polynucleotide sequences encoding:

- (a) one or more V %_Chassis amino acid sequences selected from the group consisting of about Kabat residue 1 to about Kabat residue 88 encoded by IGλV1-36 SEQ ID NO: 480), IGλV1-40 (SEQ ID NO: 531), IGλV1-44 (SEQ ID NO: 532), IGλV1-47 (SEQ ID NO: 481), IGλV1-51 (SEQ ID NO: 533), IGλV10-54 (SEQ ID NO: 482), IGλV2-11 (SEQ ID NO: 483, 484), IGλV2-14 (SEQ ID NO: 534), IGλV2-18 (SEQ ID NO: 485), IGλV2-23 (SEQ ID NO: 486, 487), IGλV2-8 (SEQ ID NO: 488), IGλV3-1 (SEQ ID NO: 535), IGλV3-10 (SEQ ID NO: 489). IGλV3-12 (SEQ ID NO. 490), IGλV3-16 (SEQ ID NO: 491), IGλV3-19 (SEQ ID NO: 536), IGλV3-21 (SEQ ID NO: 537), IGλV3-25 (SEQ ID NO: 492), IGλV3-27 (SEQ ID NO: 493), IGλV3-9 (SEQ ID NO: 494), IGλV4-3 (SEQ ID NO: 495), IGλV4-60 (SEQ ID NO: 496), IGλV4-69 (SEQ ID NO: 538), IGλV5-39 (SEQ ID NO: 497). IGλV5-45 (SEQ ID NO: 540). IGλV6-57 (SEQ ID NO: 539), IGλV7-43 (SEQ ID NO: 541), IGλV7-46 (SEQ ID NO: 498), IGλV8-61 (SEQ ID NO: 499), IGλV9-49 (SEQ ID NO: 500), and IGλV10-54 (SEQ ID NO: 482), or a sequence at least about 80% identical to any of them;
  - (b) one ore more L3-W sequences, wherein L3-Vλ is the portion of the VλCDR3 amino acid sequence encoded by the IGλV gene segment;
  - (c) one or more J4 sequences, wherein Jλ is an amino acid sequence selected from the group consisting of amino acid sequences encoded by IGλJ1-01 (SEQ ID NO: 557), IGλJ2-01 (SEQ ID NO: 558), IGλJ3-01 (SEQ ID NO: 559), IGλJ3-02 (SEQ ID NO: 560), IGλJ6-01 (SEQ ID NO: 561), IGλJ7-01 (SEQ ID NO: 562), and IGλJ7-02 (SEQ ID NO: 563) wherein the first amino acid residue of each sequence may or may not be present; and
- (ii) assembling the polynucleotide sequences to produce a library of synthetic polynucleotides encoding a plurality of human V)_amino acid sequences represented by the following formula:

[Vλ_Chassis]-[L3-Vλ]-[Jλ].

In certain embodiments, the amino acid sequences encoded by the polynucleotides of the libraries of the invention are human.

The present invention is also directed to methods of preparing a synthetic polynucleotide library comprising providing and assembling the polynucleotide sequences of the instant invention.

In another aspect, the invention comprises a method of preparing the library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, the method comprising:

- (i) providing polynucleotide sequences encoding:
  - (a) one or more N1 amino acid sequences of about 0 to about 3 amino acids, wherein each amino acid of the N1 amino acid sequence is among the 12 most frequently occurring amino acids at the corresponding position in N1 sequences of CDRH3 amino acid sequences that are functionally expressed by human B cells;
  - (b) one or more human CDRH3 DH amino acid sequences, N- and C-terminal truncations thereof, or a sequence of at least about 80% identity to any of them;
  - (c) one or more N2 amino acid sequences of about 0 to about 3 amino acids, wherein each amino acid of the N1 amino acid sequence is among the 12 most frequently occurring amino acids at the corresponding position in N2 amino acid sequences of CDRH3 amino acid sequences that are functionally expressed by human B cells; and
  - (d) one or more human CDRH3 H3-JH amino acid sequences, N-terminal truncations thereof, or a sequence of at least about 80% identity to any of them; and
- (ii) assembling the polynucleotide sequences to produce a library of synthetic polynucleotides encoding a plurality of human antibody CDRH3 amino acid sequences represented by the following formula:

[N1]-[DH]-[N2]-[H3-JH].

In one aspect, one or more of the polynucleotide sequences are synthesized via split-pool synthesis.

In another aspect, the method of the invention further comprises the step of recombining the assembled synthetic polynucleotides with a vector comprising a heavy chain chassis and a heavy chain constant region, to form a full-length heavy chain.

In another aspect, the method of the invention further comprises the step of providing a 5′ polynucleotide sequence and a 3′ polynucleotide sequence that facilitate homologous recombination. In still another aspect, the method of the invention further comprises the step of recombining the assembled synthetic polynucleotides with a vector comprising a heavy chain chassis and a heavy chain constant region, to form a full-length heavy chain.

In some embodiments, the step of recombining is performed in yeast. In certain embodiments, the yeast is S. cerevisiae.

In certain other embodiments, the invention comprises a method of isolating one or more host cells expressing one or more antibodies, the method comprising:

- (i) expressing the human antibodies as described herein in one or more host cells;
- (ii) contacting the host cells with one or more antigens; and
- (iii) isolating one or more host cells having antibodies that bind to the one or more antigens.

In another aspect, the method of the invention further comprises the step of isolating one or more antibodies from the one or more host cells that present the antibodies which recognize the one or more antigens. In yet another aspect, the method of the invention further comprises the step of isolating one or more polynucleotide sequences encoding one or more antibodies from the one or more host cells that present the antibodies which recognize the one or more antigens.

In certain other embodiments, the invention comprises a kit comprising the library of synthetic polynucleotides encoding a plurality of antibody CDRH3 amino acid sequences, or any of the other sequences disclosed herein.

In still other aspects, the CDRH3 amino acid sequences encoded by the libraries of synthetic polynucleotides described herein, or any of the other sequences disclosed herein, are in computer readable form.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a schematic of recombination between a fragment (e.g. CDR3) and a vector (e.g., comprising a chassis and constant region) for the construction of a library.

FIG. 2 depicts the length distribution of the N1 and N2 regions of rearranged human antibody sequences compiled from Jackson el al. (J. Immunol Methods, 2007, 324: 26, incorporated by reference in its entirety).

FIG. 3 depicts the length distribution of the CDRL3 regions of rearranged human kappa light chain sequences compiled from the NCBI database (Appendix A).

FIG. 4 depicts the length distribution of the CDRL3 regions of rearranged human lambda light chain sequences compiled from the NCBI database (Appendix B).

FIG. 5 depicts a schematic representation of the 424 cloning vectors used in the synthesis of the CDRH3 regions before and after ligation of the [DH]-[N2]-[JH] segment (DTAVYYCAR: SEQ ID NO: 579; DTAVYYCAK: SEQ ID NO: 578; SSASTK: SEQ ID NO: 580).

FIG. 6 depicts a schematic structure of a heavy chain vector, prior to recombination with a CDRH3 (DTAVYYCAK: SEQ ID NO: 578; VTVSS: SEQ ID NO: 1524).

FIG. 7 depicts a schematic diagram of a CDRH3 integrated into a heavy chain vector and the polynucleotide and polypeptide sequences of CDRH3 (amino acid: SEQ ID NO: 1387; coding strand: SEQ ID NO: 581; complementary strand: SEQ ID NO: 1388).

FIG. 8 depicts a schematic structure of a kappa light chain vector, prior to recombination with a CDRL3.

FIG. 9 depicts a schematic diagram of a CDRL3 integrated into a light chain vector and the polynucleotide and polypeptide sequences of CDRL3 (amino acid: SEQ ID NO: 1389; coding strand: SEQ ID NO: 582; complementary strand: SEQ ID NO: 1390).

FIG. 10 depicts the length distribution of the CDRH3 domain (Kabat positions 95-102) from 96 colonies obtained by transformation with 10 of the 424 vectors synthesized as described in Example 10 (observed), as compared to the expected (i.e., designed) distribution.

FIG. 11 depicts the length distribution of the DH segment from 96 colonies obtained by transformation with 10 of the 424 vectors synthesized as described in Example 10 (observed), as compared to the expected (i.e., designed) distribution.

FIG. 12 depicts the length distribution of the N2 segment from 96 colonies obtained by transformation with 10 of the 424 vectors synthesized as described in Example 10 (observed), as compared to the expected (i.e., designed) distribution.

FIG. 13 depicts the length distribution of the H3-JH segment from 96 colonies obtained by transformation with 10 of the 424 vectors synthesized as described in Example 10 (observed), as compared to the expected (i.e., designed) distribution.

FIG. 14 depicts the length distribution of the CDRH3 domains from 291 sequences prepared from yeast cells transformed according to the method outlined in Example 10.4, namely the co-transformation of vectors containing heavy chain chassis and constant regions with a CDRH3 insert (observed), as compared to the expected (i.e., designed) distribution.

FIG. 15 depicts the length distribution of the [Tail]-[N1] region from the 291 sequences prepared from yeast cells transformed according to the protocol outlined in Example 10.4 (observed), as compared to the expected (i.e., designed) distribution.

FIG. 16 depicts the length distribution of the DH region from the 291 sequences prepared from yeast cells transformed according to the protocol outlined in Example 10.4 (observed), as compared to the theoretical (i.e., designed) distribution.

FIG. 17 depicts the length distribution of the N2 region from the 291 sequences prepared from yeast cells transformed according to the protocol outlined in Example 10.4 (observed), as compared to the theoretical (i.e., designed) distribution.

FIG. 18 depicts the length distribution of the H3-JH region from the 291 sequences prepared from yeast cells transformed according to the protocol outlined in Example 10.4 (observed), as compared to the theoretical (i.e., designed) distribution.

FIG. 19 depicts the familial origin of the JH segments identified in the 291 sequences (observed), as compared to the theoretical (i.e., designed) familial origin.

FIG. 20 depicts the representation of each of the 16 chassis of the library (observed), as compared to the theoretical (i.e., designed) chassis representation. VH3-23 is represented twice; once ending in CAR and once ending in CAK. These representations were combined, as were the ten variants of VH3-33 with one variant of VH3-30.

FIG. 21 depicts a comparison of the CDRL3 length from 86 sequences selected from the VKCDR3 library of Example 6.2 (observed) to human sequences (human) and the designed sequences (designed).

FIG. 22 depicts the representation of the light chain chassis amongst the 86 sequences selected from the library (observed), as compared to the theoretical (i.e., designed) chassis representation.

FIG. 23 depicts the frequency of occurrence of different CDRH3 lengths in an exemplary library of the invention, versus the preimmune repertoire of Lee et al. (Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety).

FIG. 24 depicts binding curves for 6 antibodies selected from a library of the invention.

FIG. 25 depicts binding data for 10 antibodies selected from a library of the invention binding to hen egg white lysozyme.

DETAILED DESCRIPTION OF THE INVENTION

The present invention is directed to, at least, synthetic polynucleotide libraries, methods of producing and using the libraries of the invention, kits and computer readable forms including the libraries of the invention. The libraries taught in this application are described, at least in part, in terms of the components from which they are assembled.

In certain embodiments, the instant invention provides antibody libraries specifically designed based on the composition and CDR length distribution in the naturally occurring human antibody repertoire. It is estimated that, even in the absence of antigenic stimulation, a human makes at least about 10⁷different antibody molecules. The antigen-binding sites of many antibodies can cross-react with a variety of related but different epitopes. In addition the human antibody repertoire is large enough to ensure that there is an antigen-binding site to fit almost any potential epitope, albeit with low affinity.

The mammalian immune system has evolved unique genetic mechanisms that enable it to generate an almost unlimited number of different light and heavy chains in a remarkably economical way, by combinatorially joining chromosomally separated gene segments prior to transcription. Each type of immunoglobulin (Ig) chain (i.e., a light, λ light, and heavy) is synthesized by combinatorial assembly of DNA sequences selected from two or more families of gene segments, to produce a single polypeptide chain. Specifically, the heavy chains and light chains each consist of a variable region and a constant (C) region. The variable regions of the heavy chains are encoded by DNA sequences assembled from three families of gene segments: variable (IGHV), joining (IGHJ) and diversity (IGHD). The variable regions of light chains are encoded by DNA sequences assembled from two families of gene segments for each of the kappa and lambda light chains: variable (IGLV) and joining (IGLJ). Each variable region (heavy and light) is also recombined with a constant region, to produce a full-length immunoglobulin chain.

While combinatorial assembly of the V. D and J gene segments make a substantial contribution to antibody variable region diversity, further diversity is introduced in vivo, at the pre-B cell stage, via imprecise joining of these gene segments and the introduction of non-templated nucleotides at the junctions between the gene segments.

After a B cell recognizes an antigen, it is induced to proliferate. During proliferation, the B cell receptor locus undergoes an extremely high rate of somatic mutation that is far greater than the normal rate of genomic mutation. The mutations that occur are primarily localized to the Ig variable regions and comprise substitutions, insertions and deletions. This somatic hypermutation enables the production of B cells that express antibodies possessing enhanced affinity toward an antigen. Such antigen-driven somatic hypermutation fine-tunes antibody responses to a given antigen.

Significant efforts have been made to create antibody libraries with extensive diversity, and to mimic the natural process of affinity maturation of antibodies against various antigens, especially antigens associated with diseases such as autoimmune diseases, cancer, and infectious disease. Antibody libraries comprising candidate binding molecules that can be readily screened against targets are desirable. However, the full promise of an antibody library, which is representative of the preimmune human antibody repertoire, has remained elusive. In addition to the shortcomings enumerated above, and throughout the application, synthetic libraries that are known in the art often suffer from noise (i.e., very large libraries increase the presence of many sequences which do not express well, and/or which misfold), while entirely human libraries that are known in the art may be biased against certain antigen classes (e.g. self-antigens). Moreover, the limitations of synthesis and physical realization techniques restrict the functional diversity of antibody libraries of the art. The present invention provides, for the first time, a fully synthetic antibody library that is representative of the human preimmune antibody repertoire (e.g., in composition and length), and that can be readily screened (i.e., it is physically realizable and, in some cases can be oversampled) using, for example, high throughput methods, to obtain, for example, new therapeutics and/or diagnostics

In particular, the synthetic antibody libraries of the instant invention have the potential to recognize any antigen, including self-antigens of human origin. The ability to recognize self-antigens is usually lost in an expressed human library, because self-reactive antibodies are removed by the donor's immune system via negative selection. Another feature of the invention is that screening the antibody library using positive clone selection, for example, byn FACS (florescence activated cell sorter) bypasses the standard and tedious methodology of generating a hybridoma library and supernatant screening. Still further, the libraries, or sub-libraries thereof, can be screened multiple times, to discover additional antibodies against other desired targets.

Before further description of the invention, certain terms are defined.

1. Definitions

Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by one of ordinary skill in the art relevant to the invention. The definitions below supplement those in the art and are directed to the embodiments described in the current application.

The term “antibody” is used herein in the broadest sense and specifically encompasses at least monoclonal antibodies, polyclonal antibodies, multi-specific antibodies (e.g., bispecific antibodies), chimeric antibodies, humanized antibodies, human antibodies, and antibody fragments. An antibody is a protein comprising one or more polypeptides substantially or partially encoded by immunoglobulin genes or fragments of immunoglobulin genes. The recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon and mu constant region genes, as well as myriad immunoglobulin variable region genes.

“Antibody fragments” comprise a portion of an intact antibody, for example, one or more portions of the antigen-binding region thereof. Examples of antibody fragments include Fab, Fab′, F(ab′)₂, and Fv fragments, diabodies, linear antibodies, single-chain antibodies, and multi-specific antibodies formed from intact antibodies and antibody fragments.

An “intact antibody” is one comprising full-length heavy- and light-chains and an Fc region. An intact antibody is also referred to as a “full-length, heterodimeric” antibody or immunoglobulin.

The term “variable” refers to the portions of the immunoglobulin domains that exhibit variability in their sequence and that are involved in determining the specificity and binding affinity of a particular antibody (i.e., the “variable domain(s)”). Variability is not evenly distributed throughout the variable domains of antibodies; it is concentrated in sub-domains of each of the heavy and light chain variable regions. These sub-domains are called “hypervariable” regions or “complementarity determining regions” (CDRs). The more conserved (i.e., non-hypervariable) portions of the variable domains are called the “framework” regions (FRM). The variable domains of naturally occurring heavy and light chains each comprise four FRM regions, largely adopting a β-sheet configuration, connected by three hypervariable regions, which form loops connecting, and in some cases forming part of, the β-sheet structure. The hypervariable regions in each chain are held together in close proximity by the FRM and, with the hypervariable regions from the other chain, contribute to the formation of the antigen-binding site (see Kabat et al. Sequences of Proteins of Immunological Interest, 5th Ed. Public Health Service, National Institutes of Health. Bethesda, Md., 1991, incorporated by reference in its entirety). The constant domains are not directly involved in antigen binding, but exhibit various effector functions, such as, for example, antibody-dependent, cell-mediated cytotoxicity and complement activation.

The “chassis” of the invention represent a portion of the antibody heavy chain variable (IGHV) or light chain variable (IGLV) domains that are not part of CDRH3 or CDRL3, respectively. The chassis of the invention is defined as the portion of the variable region of an antibody beginning with the first amino acid of FRM1 and ending with the last amino acid of FRM3. In the case of the heavy chain, the chassis includes the amino acids including from about Kabat position 1 to about Kabat position 94. In the case of the light chains (kappa and lambda), the chassis are defined as including from about Kabat position 1 to about Kabat position 88. The chassis of the invention may contain certain modifications relative to the corresponding germline variable domain sequences presented herein or available in public databases. These modifications may be engineered (e.g., to remove N-linked glycosylation sites) or naturally occurring (e.g., to account for allelic variation). For example, it is known in the art that the immunoglobulin gene repertoire is polymorphic (Wang et al., Immunol. Cell. Biol., 2008, 86: 111; Collins et al., Immunogenetics, 2008, DOI 10.1007/s00251-008-0325-z, published online, each incorporated by reference in its entirety); chassis, CDRs (e.g., CDRH3) and constant regions representative of these allelic variants are also encompassed by the invention. In some embodiments, the allelic variant(s) used in a particular embodiment of the invention may be selected based on the allelic variation present in different patient populations, for example, to identify antibodies that are non-immunogenic in these patient populations. In certain embodiments, the immunogenicity of an antibody of the invention may depend on allelic variation in the major histocompatibility complex (MHC) genes of a patient population. Such allelic variation may also be considered in the design of libraries of the invention. In certain embodiments of the invention, the chassis and constant regions are contained on a vector, and a CDR3 region is introduced between them via homologous recombination.

In some embodiments, one, two or three nucleotides may follow the heavy chain chassis, forming either a partial (if one or two) or a complete (if three) codon. When a full codon is present, these nucleotides encode an amino acid residue that is referred to as the “tail,” and occupies position 95.

The “CDRH3 numbering system” used herein defines the first amino acid of CDRH3 as being at Kabat position 95 (the “tail,” when present) and the last amino acid of CDRH3 as position 102. The amino acids following the “tail” are called “N1” and, when present, are assigned numbers 96, 96A, 96B, etc. The N1 segment is followed by the “DH” segment, which is assigned numbers 97, 97A, 97B, 97C, etc. The DH segment is followed by the “N2” segment, which, when present, is numbered 98, 98A, 98B, etc. Finally, the most C-terminal amino acid residue of the set of the “H3-JH” segment is designated as number 102. The residue directly before (N-terminal) it, when present, is 101, and the one before (if present) is 100. For reasons of convenience, and which will become apparent elsewhere, the rest of the H3-JH amino acids are numbered in reverse order, beginning with 99 for the amino acid just N-terminal to 100, 99A for the residue N-terminal to 99, and so forth for 99B, 99C, etc. Examples of certain CDRH3 sequence residue numbers may therefore include the following:

13 Amino Acid CDR-H3 with N1 and N2

embedded image

Amino Acid CDR-H3 without N1 and N2

embedded image

As used herein, the term “diversity” refers to a variety or a noticeable heterogeneity. The term “sequence diversity” refers to a variety of sequences which are collectively representative of several possibilities of sequences, for example, those found in natural human antibodies. For example, heavy chain CDR3 (CDRH3) sequence diversity may refer to a variety of possibilities of combining the known human DH and H3-JH segments, including the N1 and N2 regions, to form heavy chain CDR3 sequences. The light chain CDR3 (CDRL3) sequence diversity may refer to a variety of possibilities of combining the naturally occurring light chain variable region contributing to CDRL3 (i.e., L3-VL) and joining (i.e., L3-JL) segments, to form light chain CDR3 sequences. As used herein, H3-JH refers to the portion of the IGHJ gene contributing to CDRH3. As used herein, L3-VL and L3-JL refer to the portions of the IGLV and IGLJ genes (kappa or lambda) contributing to CDRL3, respectively.

As used herein, the term “expression” includes any step involved in the production of a poly peptide including, but not limited to, transcription, post-transcriptional modification, translation, post-translational modification, and secretion.

As used herein, the term “host cell” is intended to refer to a cell into which a polynucleotide of the invention. It should be understood that such terms refer not only to the particular subject cell but to the progeny or potential progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term as used herein.

The term “length diversity” refers to a variety in the length of a particular nucleotide or amino acid sequence. For example, in naturally occurring human antibodies, the heavy chain CDR3 sequence varies in length, for example, from about 3 amino acids to over about 35 amino acids, and the light chain CDR3 sequence varies in length, for example, from about 5 to about 16 amino acids. Prior to the instant invention, it was known in the art that it is possible to design antibody libraries containing sequence diversity or length diversity (see, e.g., Hoet et al., Nat. Biotechnol., 2005, 23: 344; Kretzschmar and von Ruden, Curr. Opin. Biotechnol., 2002 13: 598; and Rauchenberger et al., J. Biol. Chem., 2003 278: 38194, each of which is incorporated by reference in its entirety); however, the instant invention is directed to, at least, the design of synthetic antibody libraries containing the sequence diversity and length diversity of naturally occurring human sequences. In some cases, synthetic libraries containing sequence and length diversity have been synthesized, however these libraries contain too much theoretical diversity to synthesize the entire designed repertoire and/or too many theoretical members to physically realize or oversample the entire library.

As used herein, a sequence designed with “directed diversity” has been specifically designed to contain both sequence diversity and length diversity. Directed diversity is not stochastic.

As used herein, “stochastic” describes a process of generating a randomly determined sequence of amino acids, which is considered as a sample of one element from a probability distribution.

The term “library of polynucleotides” refers to two or more polynucleotides having a diversity as described herein, specifically designed according to the methods of the invention. The term “library of polypeptides” refers to two or more polypeptides having a diversity as described herein, specifically designed according to the methods of the invention. The term “library of synthetic polynucleotides” refers to a polynucleotide library that includes synthetic polynucleotides. The term “library of vectors” refers herein to a library of at least two different vectors. As used herein, the term “human antibody libraries,” at least includes, a polynucleotide or polypeptide library which has been designed to represent the sequence diversity and length diversity of naturally occurring human antibodies.

As described throughout the specification, the term “library” is used herein in its broadest sense, and also may include the sub-libraries that may or may not be combined to produce libraries of the invention.

As used herein, the term “synthetic polynucleotide” refers to a molecule formed through a chemical process, as opposed to molecules of natural origin, or molecules derived via template-based amplification of molecules of natural origin (e.g., immunoglobulin chains cloned from populations of B cells via PCR amplification are not “synthetic” used herein). In some instances, for example, when referring to libraries of the invention that comprise multiple components (e.g., N1, DH, N2, and/or H3-JH), the invention encompasses libraries in which at least one of the aforementioned components is synthetic. By way of illustration, a library in which certain components are synthetic, while other components are of natural origin or derived via template-based amplification of molecules of natural origin, would be encompassed by the invention.

The term “split-pool synthesis” refers to a procedure in which the products of a plurality of first reactions are combined (pooled) and then separated (split) before participating in a plurality of second reactions. Example 9, describes the synthesis of 278 DH segments (products), each in a separate reaction. After synthesis, these 278 segments are combined (pooled) and then distributed (split) amongst 141 columns for the synthesis of the N2 segments. This enables the pairing of each of the 278 DH segments with each of the 141 N2 segments. As described elsewhere in the specification, these numbers are non-limiting.

“Preimmune” antibody libraries have similar sequence diversities and length diversities to naturally occurring human antibody sequences before these sequences have undergone negative selection or somatic hypermutation. For example, the set of sequences described in Lee el al. (Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety) is believed to represent sequences from the preimmune repertoire. In certain embodiments of the invention, the sequences of the invention will be similar to these sequences (e.g., in terms of composition and length). In certain embodiments of the invention, such antibody libraries are designed to be small enough to chemically synthesize and physically realize, but large enough to encode antibodies with the potential to recognize any antigen. In one embodiment of the invention, an antibody library comprises about 10⁷to about 10²⁰different antibodies and/or polynucleotide sequences encoding the antibodies of the library. In some embodiments, the libraries of the instant invention are designed to include 10³, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, 10¹¹, 10¹², 10¹³, 10¹⁴, 10¹⁵, 10¹⁶, 10¹⁷, 10¹⁸, 10¹⁹, or 10²⁰different antibodies and/or polynucleotide sequences encoding the antibodies. In certain embodiments, the libraries of the invention may comprise or encode about 10³to about 10⁵, about 10⁵to about 10⁷, about 10⁷to about 10⁹, about 10⁹to about 10¹¹, about 10¹¹to about 10¹³, about 10¹³to about 10¹⁵, about 10¹⁷to about 10¹⁷, or about 10¹⁷to about 10²⁰different antibodies. In certain embodiments of the invention, the diversity of the libraries may be characterized as being greater than or less than one or more of the diversities enumerated above, for example greater than about 10³, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, 10¹¹, 10¹², 10¹³, 10¹⁴, 10¹⁵, 10¹⁶, 10¹⁷, 10¹⁸, 10¹⁹, or 10²⁰or less than about 10³, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, 10¹¹, 10¹², 10¹³, 10¹⁴, 10¹⁵, 10¹⁶, 10¹⁷, 10¹⁸, 10¹⁹, or 10²⁰. In certain other embodiments of the invention, the probability of an antibody of interest being present in a physical realization of a library with a size as enumerated above is at least about 0.0001%, 0.001%, 0.01%, 0.1%, 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, 99%, 99.5%, or 99.9% (see Library Sampling, in the Detailed Description, for more information on the probability of a particular sequence being present in a physical realization of a library). The antibody libraries of the invention may also include antibodies directed to, for example, self (i.e., human) antigens. The antibodies of the present invention may not be present in expressed human libraries for reasons including because self-reactive antibodies are removed by the donor's immune system via negative selection. However, novel heavy/light chain pairings may in some cases create self-reactive antibody specificity (Griffiths et al. U.S. Pat. No. 5,885,793, incorporated by reference in its entirety). In certain embodiments of the invention, the number of unique heavy chains in a library may be about 10, 50, 10², 150, 10³, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, 10¹¹, 10¹², 10¹³, 10¹⁴, 10¹⁵, 10¹⁶, 10¹⁷, 10¹⁸, 10¹⁹, 10²⁰, or more. In certain embodiments of the invention, the number of unique light chains in a library may be about 5, 10, 25, 50, 10², 150, 500, 10³, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, 10¹¹, 10¹², 10¹³, 10¹⁴, 10⁵, 10¹⁶, 10¹⁶, 10¹⁷, 10¹⁸, 10¹⁹, 10²⁰, or more.

As used herein, the term “human antibody CDRH3 libraries.” at least includes, a polynucleotide or polypeptide library which has been designed to represent the sequence diversity and length diversity of naturally occurring human antibodies. “Preimmune” CDRH3 libraries have similar sequence diversities and length diversities to naturally occurring human antibody CDRH3 sequences before these sequences undergo negative selection and somatic hypermutation. Known human CDRH3 sequences are represented in various data sets, including Jackson et al., J. Immunol Methods, 2007, 324: 26: Martin, Proteins, 1996, 25: 130; and Lee et al., Immunogenetics, 2006, 57: 917, each of which is incorporated by reference in its entirety. In certain embodiments of the invention, such CDRH3 libraries are designed to be small enough to chemically synthesize and physically realize, but large enough to encode CDRH3s with the potential to recognize any antigen. In one embodiment of the invention, an antibody library includes about 10⁶to about 10¹⁵different CDRH3 sequences and/or polynucleotide sequences encoding said CDRH3 sequences. In some embodiments, the libraries of the instant invention are designed to about 10³, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, 10¹¹, 10¹², 10¹³, 10¹⁴, 10¹⁵, or 10¹⁶, different CDRH3 sequences and/or polynucleotide sequences encoding said CDRH3 sequences. In some embodiments, the libraries of the invention may include or encode about 10³to about 10⁶, about 10⁶to about 10⁸, about 10⁸to about 10¹⁰, about 10¹⁰to about 10¹², about 10¹²to about 10¹⁴, or about 10¹⁴to about 10¹⁶different CDRH3 sequences. In certain embodiments of the invention, the diversity of the libraries may be characterized as being greater than or less than one or more of the diversities enumerated above, for example greater than about 10³, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, 10¹¹, 10¹², 10¹³, 10¹⁴, 10¹⁵, or 10¹⁶or less than about 10³, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, 10¹¹, 10¹², 10¹³, 10¹⁴, 10¹⁵, or 10¹⁶. In certain embodiments of the invention, the probability of a CDRH3 of interest being present in a physical realization of a library with a size as enumerated above is at least about 0.0001%, 0.001%, 0.01%, 0.1%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, 99%, 99.5%, or 99.9% (see Library Sampling, in the Detailed Description, for more information on the probability of a particular sequence being present in a physical realization of a library). The preimmune CDRH3 libraries of the invention may also include CDRH3s directed to, for example, self (i.e., human) antigens. Such CDRH3s may not be present in expressed human libraries, because self-reactive CDRH3s are removed by the donor's immune system via negative selection.

Libraries of the invention containing “VKCDR3” sequences and “VλCDR3” sequences refer to the kappa and lambda sub-sets of the CDRL3 sequences, respectively. These libraries may be designed with directed diversity, to collectively represent the length and sequence diversity of the human antibody CDRL3 repertoire. “Preimmune” versions of these libraries have similar sequence diversities and length diversities to naturally occurring human antibody CDRL3 sequences before these sequences undergo negative selection. Known human CDRL3 sequences are represented in various data sets, including the NCBI database (see Appendix A and Appendix B for light chain sequence data sets) and Martin. Proteins, 1996, 25: 130 incorporated by reference in its entirety. In certain embodiments of the invention, such CDRL3 libraries are designed to be small enough to chemically synthesize and physically realize, but large enough to encode CDRL3s with the potential to recognize any antigen.

In one embodiment of the invention, an antibody library comprises about 10⁵different CDRL3 sequences and/or polynucleotide sequences encoding said CDRL3 sequences. In some embodiments, the libraries of the instant invention are designed to comprise about 10¹, 10², 10³, 10⁴, 10⁶, 10⁷, or 10⁸different CDRL3 sequences and/or polynucleotide sequences encoding said CDRL3 sequences. In some embodiments, the libraries of the invention may comprise or encode about 10¹to about 10³, about 10³to about 10⁵, or about 10⁵to about 10⁸different CDRL3 sequences. In certain embodiments of the invention, the diversity of the libraries may be characterized as being greater than or less than one or more of the diversities enumerated above, for example greater than about 10¹, 10², 10³, 10⁴, 10⁵, 10⁶, 10⁷, or 10⁸or less than about 10¹, 10², 10³, 10⁴, 10⁵, 10⁶, 10⁷, or 10⁸. In certain embodiments of the invention, the probability of a CDRL3 of interest being present in a physical realization of a library with a size as enumerated above is at least about 0.0001%, 0.001%, 0.01%, 0.1%, 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, 99%, 99.5%, or 99.9% (see Library Sampling, in the Detailed Description, for more information on the probability of a particular sequence being present in a physical realization of a library). The preimmune CDRL3 libraries of the invention may also include CDRL3s directed to, for example, self (i.e., human) antigens. Such CDRL3s may not be present in expressed human libraries, because self-reactive CDRL3s are removed by the donor's immune system via negative selection.

As used herein, the term “known heavy chain CDR3 sequences” refers to heavy chain CDR3 sequences in the public domain that have been cloned from populations of human B cells. Examples of such sequences are those published or derived from public data sets, including, for example, Zemlin et al., JMB, 2003, 334: 733; Lee et al., Immunogenetics, 2006, 57: 917; and Jackson et al. J. Immunol Methods, 2007, 324: 26, each of which are incorporated by reference in their entirety.

As used herein, the term “known light chain CDR3 sequences” refers to light chain CDR3 sequences (e.g., kappa or lambda) in the public domain that have been cloned from populations of human B cells. Examples of such sequences are those published or derived from public data sets, including, for example, the NCBI database (see Appendices A and B filed herewith).

As used herein the term “antibody binding regions” refers to one or more portions of an immunoglobulin or antibody variable region capable of binding an antigen(s). Typically, the antibody binding region is, for example, an antibody light chain (or variable region or one or more CDRs thereof), an antibody heavy chain (or variable region or one or more CDRs thereof), a heavy chain Fd region, a combined antibody light and heavy chain (or variable regions thereof) such as a Fab, F(ab′)₂, single domain, or single chain antibodies (scFv), or any region of a full length antibody that recognizes an antigen, for example, an IgG (e.g., an IgG1, IgG2, IgG3, or IgG4 subtype), IgA1, IgA2, IgD, IgE, or IgM antibody.

The term “framework region” refers to the art-recognized portions of an antibody variable region that exist between the more divergent (i.e., hypervariable) CDRs. Such framework regions are typically referred to as frameworks 1 through 4 (FRM1, FRM2, FRM3, and FRM4) and provide a scaffold for the presentation of the six CDRs (three from the heavy chain and three from the light chain) in three dimensional space, to form an antigen-binding surface.

The term “canonical structure” refers to the main chain conformation that is adopted by the antigen binding (CDR) loops. From comparative structural studies, it has been found that five of the six antigen binding loops have only a limited repertoire of available conformations. Each canonical structure can be characterized by the torsion angles of the polypeptide backbone. Correspondent loops between antibodies may, therefore, have very similar three dimensional structures, despite high amino acid sequence variability in most parts of the loops (Chothia and Lesk, J. Mol. Biol., 1987, 196: 901; Chothia et al., Nature, 1989, 342: 877; Martin and Thornton, J. Mol. Biol., 1996, 263: 800, each of which is incorporated by reference in its entirety). Furthermore, there is a relationship between the adopted loop structure and the amino acid sequences surrounding it. The conformation of a particular canonical class is determined by the length of the loop and the amino acid residues residing at key positions within the loop, as well as within the conserved framework (i.e., outside of the loop). Assignment to a particular canonical class can therefore be made based on the presence of these key amino acid residues. The term “canonical structure” may also include considerations as to the linear sequence of the antibody, for example, as catalogued by Kabat (Kabat et al., in “Sequences of Proteins of Immunological Interest,” 5*^hEdition, U.S. Department of Heath and Human Services, 1992). The Kabat numbering scheme is a widely adopted standard for numbering the amino acid residues of an antibody variable domain in a consistent manner. Additional structural considerations can also be used to determine the canonical structure of an antibody. For example, those differences not fully reflected by Kabat numbering can be described by the numbering system of Chothia et al. and/or revealed by other techniques, for example, crystallography and two or three-dimensional computational modeling. Accordingly, a given antibody sequence may be placed into a canonical class which allows for, among other things, identifying appropriate chassis sequences (e.g. based on a desire to include a variety of canonical structures in a library). Kabat numbering of antibody amino acid sequences and structural considerations as described by Chothia et al., and their implications for construing canonical aspects of antibody structure, are described in the literature.

The terms “CDR”, and its plural “CDRs”, refer to a complementarity determining region (CDR) of which three make up the binding character of a light chain variable region (CDRL1, CDRL2 and CDRL3) and three make up the binding character of a heavy chain variable region (CDRH1, CDRH2 and CDRH3). CDRs contribute to the functional activity of an antibody molecule and are separated by amino acid sequences that comprise scaffolding or framework regions. The exact definitional CDR boundaries and lengths are subject to different classification and numbering systems. CDRs may therefore be referred to by Kabat. Chothia, contact or any other boundary definitions, including the numbering system described herein. Despite differing boundaries, each of these systems has some degree of overlap in what constitutes the so called “hypervariable regions” within the variable sequences. CDR definitions according to these systems may therefore differ in length and boundary areas with respect to the adjacent framework region. See for example Kabat, Chothia, and/or MacCallum et al., (Kabat et al., in “Sequences of Proteins of Immunological Interest,” 5^thEdition, U.S. Department of Health and Human Services, 1992; Chothia el al, J. Mol. Biol., 1987, 196: 901; and MacCallum el al, J. Mol. Biol., 1996, 262: 732, each of which is incorporated by reference in its entirety).

The term “amino acid” or “amino acid residue” typically refers to an amino acid having its art recognized definition such as an amino acid selected from the group consisting of: alanine (Ala or A); arginine (Arg or R); asparagine (Asn or N); aspartic acid (Asp or D); cysteine (Cys or C); glutamine (Gln or Q); glutamic acid (Glu or E); glycine (Gly or G); histidine (His or H); isoleucine (Ile or I); leucine (Leu or L); lysine (Lys or K); methionine (Met or M); phenylalanine (Phe or F); proline (Pro or P); serine (Ser or S); threonine (Thr or T); tryptophan (Trp or W); tyrosine (Tyr or Y); and valine (Val or V), although modified, synthetic, or rare amino acids may be used as desired. Generally, amino acids can be grouped as having a nonpolar side chain (e.g., Ala, Cys, Ile, Leu, Met, Phe, Pro, Val); a negatively charged side chain (e.g., Asp, Glu); a positively charged sidechain (e.g., Arg, His, Lys); or an uncharged polar side chain (e.g., Asn, Cys, Gin, Gly, His, Met, Phe, Ser, Thr, Trp, and Tyr).

The term “polynucleotide(s)” refers to nucleic acids such as DNA molecules and RNA molecules and analogs thereof (e.g., DNA or RNA generated using nucleotide analogs or using nucleic acid chemistry). As desired, the polynucleotides may be made synthetically, e.g., using art-recognized nucleic acid chemistry or enzymatically using, e.g., a polymerase, and, if desired, be modified. Typical modifications include methylation, biotinylation, and other art-known modifications. In addition, the nucleic acid molecule can be single-stranded or double-stranded and, where desired, linked to a detectable moiety.

The terms “theoretical diversity”, “theoretical total diversity”, or “theoretical repertoire” refer to the maximum number of variants in a library design. For example, given an amino acid sequence of three residues, where residues one and three may each be any one of five amino acid types and residue two may be any one of 20 amino acid types, the theoretical diversity is 5×20×5=500 possible sequences. Similarly if sequence X is constructed by combination of 4 amino acid segments, where segment 1 has 100 possible sequences, segment 2 has 75 possible sequences, segment 3 has 250 possible sequences, and segment 4 has 30 possible sequences, the theoretical total diversity of fragment X would be 100×75×200×30, or 5.6×10⁵possible sequences.

The term “physical realization” refers to a portion of the theoretical diversity that can actually be physically sampled, for example, by any display methodology. Exemplary display methodology include: phage display, ribosomal display, and yeast display. For synthetic sequences, the size of the physical realization of a library depends on (1) the fraction of the theoretical diversity that can actually be synthesized, and (2) the limitations of the particular screening method. Exemplary limitations of screening methods include the number of variants that can be screened in a particular assay (e.g., ribosome display, phage display, yeast display) and the transformation efficiency of a host cell (e.g., yeast, mammalian cells, bacteria) which is used in a screening assay. For the purposes of illustration, given a library with a theoretical diversity of 10¹²members, an exemplary physical realization of the library (e.g., in yeast, bacterial cells, ribosome display, etc.; details provided below) that can maximally include 10¹¹members will, therefore, sample about 10% of the theoretical diversity of the library. However, if less than 10¹¹members of the library with a theoretical diversity of 10¹²are synthesized, and the physical realization of the library can maximally include 10¹¹members, less than 10% of the theoretical diversity of the library is sampled in the physical realization of the library. Similarly, a physical realization of the library that can maximally include more than 10¹²members would “oversample” the theoretical diversity, meaning that each member may be present more than once (assuming that the entire 10¹²theoretical diversity is synthesized).

The term “all possible reading frames” encompasses at least the three forward reading frames and, in some embodiments, the three reverse reading frames.

The term “antibody of interest” refers to any antibody that has a property of interest that is isolated from a library of the invention. The property of interest may include, but is not limited to, binding to a particular antigen or epitope, blocking a binding interaction between two molecules, or eliciting a certain biological effect.

The term “functionally expressed” refers to those immunoglobulin genes that are expressed by human B cells and that do not contain premature stop codons.

The term “full-length heavy chain” refers to an immunoglobulin heavy chain that contains each of the canonical structural domains of an immunoglobulin heavy chain, including the four framework regions, the three CDRs, and the constant region. The term “full-length light chain” refers to an immunoglobulin light chain that contains each of the canonical structural domains of an immunoglobulin light chain, including the four framework regions, the three CDRs, and the constant region.

The term “unique,” as used herein, refers to a sequence that is different (e.g. has a different chemical structure) from every other sequence within the designed theoretical diversity. It should be understood that there are likely to be more than one copy of many unique sequences from the theoretical diversity in a particular physical realization. For example, a library comprising three unique sequences may comprise nine total members if each sequence occurs three times in the library. However, in certain embodiments, each unique sequence may occur only once.

The term “heterologous moiety” is used herein to indicate the addition of a composition to an antibody wherein the composition is not normally part of the antibody. Exemplary heterologous moieties include drugs, toxins, imaging agents, and any other compositions which might provide an activity that is not inherent in the antibody itself.

As used herein, the term “percent occurrence of each amino acid residue at each position” refers to the percentage of instances in a sample in which an amino acid is found at a defined position within a particular sequence. For example, given the following three sequences:

K V R

K Y P

K R P,

K occurs in position one in 100% of the instances and P occurs in position three in about 67% of the instances. In certain embodiments of the invention, the sequences selected for comparison are human immunoglobulin sequences.

As used herein, the term “most frequently occurring amino acids” at a specified position of a sequence in a population of polypeptides refers to the amino acid residues that have the highest percent occurrence at the indicated position in the indicated polypeptide population. For example, the most frequently occurring amino acids in each of the three most N-terminal positions in N1 sequences of CDRH3 sequences that are functionally expressed by human B cells are listed in Table 21, and the most frequently occurring amino acids in each of the three most N-terminal positions in N2 sequences of CDRH3 sequences that are functionally expressed by human B cells are listed in Table 22.

For the purposes of analyzing the occurrence of certain duplets (Example 13) and the information content (Example 14) of the libraries of the invention, and other libraries, a “central loop” of CDRH3 is defined. If the C-terminal 5 amino acids from Kabat CDRH3 (95-102) are removed, then the remaining sequence is termed the “central loop”. Thus, considering the duplet occurrence calculations of Example 13, using a CDRH3 of size 6 or less would not contribute to the analysis of the occurrence of duplets. A CDRH3 of size 7 would contribute only to the i-i+1 data set, a CDRH3 of size 8 would also contribute to the i-i+2 data set, and a CDRH3 of size 9 and larger would also contribute to the i-i+3 data set. For example, a CDR H3 of size 9 may have amino acids at positions 95-96-97-98-99-100-100A-101-102, but only the first four residues (bolded) would be part of the central loop and contribute to the pair-wise occurrence (duplet) statistics. As a further example, a CDRH3 of size 14 may have the sequence: 95-96-97-98-99-100-100A-100B-100C-100D-100E-100F-101-102. Here, only the first nine residues (bolded) contribute to the central loop.

Library screening requires a genotype-phenotype linkage. The term “genotype-phenotype linkage” is used in a manner consistent with its art-recognized meaning and refers to the fact that the nucleic acid (genotype) encoding a protein with a particular phenotype (e.g., binding an antigen) can be isolated from a library. For the purposes of illustration, an antibody fragment expressed on the surface of a phage can be isolated based on its binding to an antigen (e.g., Ladner et al.). The binding of the antibody to the antigen simultaneously enables the isolation of the phage containing the nucleic acid encoding the antibody fragment. Thus, the phenotype (antigen-binding characteristics of the antibody fragment) has been “linked” to the genotype (nucleic acid encoding the antibody fragment). Other methods of maintaining a genotype-phenotype linkage include those of Wittrup et al. (U.S. Pat. Nos. 6,300,065, 6,331,391, 6,423,538, 6,696,251, 6,699,658, and US Pub. No. 20040146976, each of which is incorporated by reference in its entirety), Miltenyi (U.S. Pat. No. 7,166,423, incorporated by reference in its entirety), Fandl (U.S. Pat. No. 6,919,183, US Pub No. 20060234311, each incorporated by reference in its entirety), Clausell-Tormos et al. (Chem. Biol., 2008, 15: 427, incorporated by reference in its entirety), Love el al. (Nat. Biotechnol., 2006, 24: 703, incorporated by reference in its entirety), and Kelly et al. (Chem. Commun., 2007, 14: 1773, incorporated by reference in its entirety). Any method which localizes the antibody protein with the gene encoding the antibody, in a way in which they can both be recovered while the linkage between them is maintained, is suitable.

2. Design of the Libraries

The antibody libraries of the invention are designed to reflect certain aspects of the preimmune repertoire as naturally created by the human immune system. Certain libraries of the invention are based on rational design informed by the collection of human V, D, and J genes, and other large databases of human heavy and light chain sequences (e.g. publicly known germline sequences; sequences from Jackson et al., J. Immunol Methods, 2007, 324: 26, incorporated by reference in its entirety; sequences from Lee et al., Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety; and sequences compiled for rearranged VK and Vλ—see Appendices A and B filed herewith). Additional information may be found, for example, in Scaviner et al., Exp. Clin. Immunogenet., 1999, 16: 234; Tomlinson et al., J. Mol. Biol., 1992, 227: 799; and Matsuda et al., J. Exp. Med., 1998, 188: 2151 each incorporated by reference in its entirety. In certain embodiments of the invention, cassettes representing the possible V, D. and J diversity found in the human repertoire, as well as junctional diversity (i.e., N1 and N2), are synthesized de now) as single or double-stranded DNA oligonucleotides. In certain embodiments of the invention, oligonucleotide cassettes encoding CDR sequences are introduced into yeast along with one or more acceptor vectors containing heavy or light chain chassis sequences. No primer-based PCR amplification or template-directed cloning steps from mammalian cDNA or mRNA are employed. Through standard homologous recombination, the recipient yeast recombines the cassettes (e.g., CDR3s) with the acceptor vector(s) containing the chassis sequence(s) and constant regions, to create a properly ordered synthetic, full-length human heavy chain and/or light chain immunoglobulin library that can be genetically propagated, expressed, displayed, and screened. One of ordinary skill in the art will readily recognize that the chassis contained in the acceptor vector can be designed so as to produce constructs other than full-length human heavy chains and/or light chains. For example, in certain embodiments of the invention, the chassis may be designed to encode portions of a polypeptide encoding an antibody fragment or subunit of an antibody fragment, so that a sequence encoding an antibody fragment, or subunit thereof, is produced when the oligonucleotide cassette containing the CDR is recombined with the acceptor vector.

In certain embodiments, the invention provides a synthetic, preimmune human antibody repertoire comprising about 10⁷to about 10²⁰antibody members, wherein the repertoire comprises:

- (a) selected human antibody heavy chain chassis (i.e., amino acids 1 to 94 of the heavy chain variable region, using Kabat's definition);
- (b) a CDRH3 repertoire, designed based on the human IGHD and IGHJ germline sequences, the CDRH3 repertoire comprising the following:
  - (i) optionally, one or more tail regions;
  - (ii) one or more N1 regions, comprising about 0 to about 10 amino acids selected from the group consisting of fewer than 20 of the amino acid types preferentially encoded by the action of terminal deoxynucleotidyl transferase (TdT) and functionally expressed by human B cells;
  - (iii) one or DH segments, based on one or more selected IGHD segments, and one or more N- or C-terminal truncations thereof;
  - (iv) one or more N2 regions, comprising about 0 to about 10 amino acids selected from the group consisting of fewer than 20 of the amino acids preferentially encoded by the activity of TdT and functionally expressed by human B cells; and
  - (v) one or more H3-JH segments, based on one or more IGHJ segments, and one or more N-terminal truncations thereof (e.g., down to XXWG);
- (c) one or more selected human antibody kappa and/or lambda light chain chassis; and
- (d) a CDRL3 repertoire designed based on the human IGLV and IGLJ germline sequences, wherein “L” may be a kappa or lambda light chain.

The heavy chain chassis may be any sequence with homology to Kabat residues 1 to 94 of an immunoglobulin heavy chain variable domain. Non-limiting examples of heavy chain chassis are included in the Examples, and one of ordinary skill in the art will readily recognize that the principles presented therein, and throughout the specification, may be used to derive additional heavy chain chassis.

As described above, the heavy chain chassis region is followed, optionally, by a “tail” region. The tail region comprises zero, one, or more amino acids that may or may not be selected on the basis of comparing naturally occurring heavy chain sequences. For example, in certain embodiments of the invention, heavy chain sequences available in the art may be compared, and the residues occurring most frequently in the tail position in the naturally occurring sequences included in the library (e.g., to produce sequences that most closely resemble human sequences). In other embodiments, amino acids that are used less frequently may be used. In still other embodiments, amino acids selected from any group of amino acids may be used. In certain embodiments of the invention, the length of the tail is zero (no residue) or one (e.g., G/D/E) amino acid. For the purposes of clarity, and without being bound by theory, in the naturally occurring human repertoire, the first 2/3 of the codon encoding the tail residue is provided by the FRM3 region of the VH gene. The amino acid at this position in naturally occurring heavy chain sequences may thus be considered to be partially encoded by the IGHV gene (2/3) and partially encoded by the CDRH3 (1/3). However, for the purposes of clearly illustrating certain aspects of the invention, the entire codon encoding the tail residue (and, therefore, the amino acid derived from it) is described herein as being part of the CDRH3 sequence.

As described above, there are two peptide segments derived from nucleotides which are added by TdT in the naturally occurring human antibody repertoire. These segments are designated N1 and N2 (referred to herein as N1 and N2 segments, domains, regions or sequences). In certain embodiments of the invention, N1 and N2 are about 0, 1, 2, or 3 amino acids in length. Without being bound by theory, it is thought that these lengths most closely mimic the N1 and N2 lengths found in the human repertoire (see FIG. 2). In other embodiments of the invention, N1 and N2 may be about 4, 5, 6, 7, 8, 9, or 10 amino acids in length. Similarly, the composition of the amino acid residues utilized to produce the N1 and N2 segments may also vary. In certain embodiments of the invention, the amino acids used to produce N1 and N2 segments may be selected from amongst the eight most frequently occurring amino acids in the N1 and N2 domains of the human repertoire (e.g., G, R, S, P, L, A, V, and T). In other embodiments of the invention, the amino acids used to produce the N1 and N2 segments may be selected from the group consisting of fewer than about 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, or 3 of the amino acids preferentially encoded by the activity of TdT and functionally expressed by human B cells. Alternatively, N1 and N2 may comprise amino acids selected from any group of amino acids. It is not required that N1 and N2 be of a similar length or composition, and independent variation of the length and composition of N1 and N2 is one method by which additional diversity may be introduced into the library.

The DH segments of the libraries are based on the peptides encoded by the naturally occurring IGHD gene repertoire, with progressive deletion of residues at the N- and C-termini. IGHD genes may be read in multiple reading frames, and peptides representing these reading frames, and their N- and C-terminal deletions are also included in the libraries of the invention. In certain embodiments of the invention, DH segments as short as three amino acid residues may be included in the libraries. In other embodiments of the invention, DH segments as short as about 1, 2, 4, 5, 6, 7, or 8 amino acids may be included in the libraries.

The H3-JH segments of the libraries are based on the peptides encoded by the naturally occurring IGHJ gene repertoire, with progressive deletion of residues at the N-terminus. The N-terminal portion of the IGHJ segment that makes up part of the CDRH3 is referred to herein as H3-JH. In certain embodiments of the invention, the H3-JH segment may be represented by progressive N-terminal deletions of one or more H3-JH residues, down to two H3-JH residues. In other embodiments of the invention, the H3-JH segments of the library may contain N-terminal deletions (or no deletions) down to about 6, 5, 4, 3, 2, 1, or 0 H3-JH residues.

The light chain chassis of the libraries may be any sequence with homology to Kabat residues 1 to 88 of naturally occurring light chain (c or X) sequences. In certain embodiments of the invention, the light chain chassis of the invention are synthesized in combinatorial fashion, utilizing VL and JL segments, to produce one or more libraries of light chain sequences with diversity in the chassis and CDR3 sequences. In other embodiments of the invention, the light chain CDR3 sequences are synthesized using degenerate oligonucleotides or trinucleotides and recombined with the light chain chassis and light chain constant region, to form full-length light chains.

The instant invention also provides methods for producing and using such libraries, as well as libraries comprising one or more immunoglobulin domains or antibody fragments. Design and synthesis of each component of the claimed antibody libraries is provided in more detail below.

2.1. Design of the Antibody Library Chassis Sequences

One step in building certain libraries of the invention is the selection of chassis sequences, which are based on naturally occurring variable domain sequences (e.g., IGHV and IGLV). This selection can be done arbitrarily, or by the selection of chassis that meet certain criteria. For example, the Kabat database, an electronic database containing non-redundant rearranged antibody sequences, can be queried for those heavy and light chain germline sequences that are most frequently represented. The BLAST search algorithm, or more specialized tools such as SoDA (Volpe et al., Bioinformatics, 2006, 22: 438-44, incorporated by reference in its entirety), can be used to compare rearranged antibody sequences with germline sequences, using the V BASE2 database (Retter et al., Nucleic Acids Res., 2005, 33: D671-D674), or similar collections of human V, D, and J genes, to identify the germline families that are most frequently used to generate functional antibodies.

Several criteria can be utilized for the selection of chassis for inclusion in the libraries of the invention. For example, sequences that are known (or have been determined) to express poorly in yeast, or other organisms used in the invention (e.g., bacteria, mammalian cells, fungi, or plants) can be excluded from the libraries. Chassis may also be chosen based on their representation in the peripheral blood of humans. In certain embodiments of the invention, it may be desirable to select chassis that correspond to germline sequences that are highly represented in the peripheral blood of humans. In other embodiments, it may be desirable to select chassis that correspond to germline sequences that are less frequently represented, for example, to increase the canonical diversity of the library. Therefore, chassis may be selected to produce libraries that represent the largest and most structurally diverse group of functional human antibodies. In other embodiments of the invention, less diverse chassis may be utilized, for example, if it is desirable to produce a smaller, more focused library with less chassis variability and greater CDR variability. In some embodiments of the invention, chassis may be selected based on both their expression in a cell of the invention (e.g., a yeast cell) and the diversity of canonical structures represented by the selected sequences. One may therefore produce a library with a diversity of canonical structures that express well in a cell of the invention.

2.1.1. Design of the Heavy Chain Chassis Sequences

In certain embodiments of the invention, the antibody library comprises variable heavy domains and variable light domains, or portions thereof. Each of these domains is built from certain components, which will be more fully described in the examples provided herein. In certain embodiments, the libraries described herein may be used to isolate fully human antibodies that can be used as diagnostics and/or therapeutics. Without being bound by theory, antibodies with sequences most similar or identical to those most frequently found in peripheral blood (for example, in humans) may be less likely to be immunogenic when administered as therapeutic agents.

Without being bound by theory, and for the purposes of illustrating certain embodiments of the invention, the VH domains of the library may be considered to comprise three primary components: (1) a VH “chassis”, which includes amino acids 1 to 94 (using Kabat numbering), (2) the CDRH3, which is defined herein to include the Kabat CDRH3 proper (positions 95-102), and (3) the FRM4 region, including amino acids 103 to 113 (Kabat numbering). The overall VH structure may therefore be depicted schematically (not to scale) as:

embedded image

The selection and design of VH chassis sequences based on the human IGHV germline repertoire will become more apparent upon review of the examples provided herein. In certain embodiments of the invention, the VH chassis sequences selected for use in the library may correspond to all functionally expressed human IGHV germline sequences. Alternatively, IGHV germline sequences may be selected for representation in a library according to one or more criteria. For example, in certain embodiments of the invention, the selected IGHV germline sequences may be among those that are most highly represented among antibody molecules isolated from the peripheral blood of healthy adults, children, or fetuses.

In certain embodiments, it may be desirable to base the design of the VH chassis on the utilization of IGHV germline sequences in adults, children, or fetuses with a disease, for example, an autoimmune disease. Without being bound by theory, it is possible that analysis of germline sequence usage in the antibody molecules isolated from the peripheral blood of individuals with autoimmune disease may provide information useful for the design of antibodies recognizing human antigens.

In some embodiments, the selection of IGHV germline sequences for representation in a library of the invention may be based on their frequency of occurrence in the peripheral blood. For the purposes of illustration, four IGHV1 germline sequences (IGHV1-2 (SEQ ID NO: 24), IGHV1-18 (SEQ ID NO: 25), IGHV1-46 (SEQ ID NO: 26), and IGHV1-69 (SEQ ID NO: 27) comprise about 80% of the IGHV1 family repertoire in peripheral blood. Thus, the specific IGHV1 germline sequences selected for representation in the library may include those that are most frequently occurring and that cumulatively comprise at least about 80% of the IGHV1 family repertoire found in peripheral blood. An analogous approach can be used to select specific IGHV germline sequences from any other IGHV family (i.e., IGHV1, IGHV2, IGHV3, IGHV4, IGHV5, IGHV6, and IGHV7). The specific germline sequences chosen for representation of a particular IGHV family in a library of the invention may therefore comprise at least about 100%, 99%, 98%, 97%, 96% 95%, 94%, 93%, 92%, 91% 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, or 0% of the particular IGHV family member repertoire found in peripheral blood.

In some embodiments, the selected IGHV germline sequences may be chosen to maximize the structural diversity of the VH chassis library. Structural diversity may be evaluated by, for example, comparing the lengths, compositions, and canonical structures of CDRH1 and CDRH2 in the IGHV germline sequences. In human IGHV sequences, the CDRH1 (Kabat definition) may have a length of 5, 6 or 7 amino acids, while CDRH2 (Kabat definition) may have length of 16, 17, 18 or 19 amino acids. The amino acid compositions of the IGHV germline sequences and, in particular, the CDR domains, may be evaluated by sequence alignments, as presented in the Examples. Canonical structure may be assigned, for example, according to the methods described by Chothia et al., J. Mol. Biol., 1992, 227: 799, incorporated by reference in its entirety.

In certain embodiments of the invention, it may be advantageous to design VH chassis based on IGHV germline sequences that may maximize the probability of isolating an antibody with particular characteristics. For example, without being bound by theory, in some embodiments it may be advantageous to restrict the IGHV germline sequences to include only those germline sequences that are utilized in antibodies undergoing clinical development, or antibodies that have been approved as therapeutics. On the other hand, in some embodiments, it may be advantageous to produce libraries containing VH chassis that are not represented amongst clinically utilized antibodies. Such libraries may be capable of yielding antibodies with novel properties that are advantageous over those obtained with the use of “typical” IGHV germline sequences, or enabling studies of the structures and properties of “atypical” IGHV germline sequences or canonical structures.

One of ordinary skill in the art will readily recognize that a variety of other criteria can be used to select IGHV germline sequences for representation in a library of the invention. Any of the criteria described herein may also be combined with any other criteria. Further exemplary criteria include the ability to be expressed at sufficient levels in certain cell culture systems, solubility in particular antibody formats (e.g., whole immunoglobulins and antibody fragments), and the thermodynamic stability of the individual domains, whole immunoglobulins, or antibody fragments. The methods of the invention may be applied to select any IGHV germline sequence that has utility in an antibody library of the instant invention.

In certain embodiments of the invention, the VH chassis of the libraries may comprise from about Kabat residue 1 to about Kabat residue 94 of one or more of the following IGHV germline sequences: IGHV1-2 (SEQ ID NO: 24), IGHV1-3 (SEQ ID NO: 423), IGHV1-8 (SEQ ID NO: 424, 425), IGHV1-18 (SEQ ID NO: 25), IGHV1-24 (SEQ ID NO: 426), IGHV1-45 (SEQ ID NO: 427), IGHV1-46 (SEQ ID NO: 26), IGHV1-58 (SEQ ID NO: 428), IGHV1-69 (SEQ ID NO: 27), IGHV2-5 (SEQ ID NO: 429), IGHV2-26 (SEQ ID NO: 430), IGHV2-70 (SEQ ID NO: 431, 432), IGHV3-7 (SEQ ID NO: 28), IGHV3-9 (SEQ ID NO: 433), IGHV3-11 (SEQ ID NO: 434), IGHV3-13 (SEQ ID NO: 435), IGHV3-15 (SEQ ID NO. 29), IGHV3-20 (SEQ ID NO: 436), IGHV3-21 (SEQ ID NO: 437), IGHV3-23 (SEQ ID NO: 30), IGHV3-30 (SEQ ID NO: 31), IGHV3-33 (SEQ ID NO: 32), IGHV3-43 (SEQ ID NO: 438), IGHV3-48 (SEQ ID NO: 33), IGHV3-49 (SEQ ID NO: 439), IGHV3-53 (SEQ ID NO: 440), IGHV3-64 (SEQ ID NO: 441), IGHV3-66 (SEQ ID NO: 442), IGHV3-72 (SEQ ID NO: 443), IGHV3-73 (SEQ ID NO: 444), IGHV3-74 (SEQ ID NO: 445), IGHV4-4 (SEQ ID NO: 446, 447), IGHV4-28 (SEQ ID NO: 448), IGHV4-31 (SEQ ID NO: 34), IGHV4-34 (SEQ ID NO: 35), IGHV4-39 (SEQ ID NO: 36), IGHV4-59 (SEQ ID NO: 37), IGHV4-61 (SEQ ID NO: 38), IGHV4-B (SEQ ID NO: 39), IGHV5-51 (SEQ ID NO: 40), IGHV6-1 (SEQ ID NO: 449), and IGHV7-4-1 (SEQ ID NO: 450). In some embodiments of the invention, a library may contain one or more of these sequences, one or more allelic variants of these sequences, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.

In other embodiments, the VH chassis of the libraries may comprise from about Kabat residue 1 to about Kabat residue 94 of the following IGHV germline sequences: IGHV1-2 (SEQ ID NO: 24), IGHV1-18 (SEQ ID NO: 25), IGHV1-46 (SEQ ID NO: 26). IGHV1-69 (SEQ ID NO: 27), IGHV3-7 (SEQ ID NO: 28). IGHV3-15 (SEQ ID NO: 29), IGHV3-23 (SEQ ID NO: 30), IGHV3-30 (SEQ ID NO: 31), IGHV3-33 (SEQ ID NO: 32), IGHV3-48 (SEQ ID NO: 33), IGHV4-31 (SEQ ID NO: 34), IGHV4-34 (SEQ ID NO: 35), IGHV4-39 (SEQ ID NO: 36), IGHV4-59 (SEQ ID NO: 37), IGHV4-61 (SEQ ID NO: 38), IGHV4-B (SEQ ID NO: 39), and IGHV5-51 (SEQ ID NO: 40). In some embodiments of the invention, a library may contain one or more of these sequences, one or more allelic variants of these sequences, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences. The amino acid sequences of these chassis are presented in Table 5.

2.1.1.1. Heaty Chain Chassis Variants

While the selection of the VH chassis with sequences based on the IGHV germline sequences is expected to support a large diversity of CDRH3 sequences, further diversity in the VH chassis may be generated by altering the amino acid residues comprising the CDRH1 and/or CDRH2 regions of each chassis selected for inclusion in the library (see Example 2).

In certain embodiments of the invention, the alterations or mutations in the amino acid residues comprising the CDRH1 and CDRH2 regions, or other regions, of the IGHV germline sequences are made after analyzing the sequence identity within data sets of rearranged human heavy chain sequences that have been classified according to the identity of the original IGHV germline sequence from which the rearranged sequences are derived. For example, from a set of rearranged antibody sequences, the IGHV germline sequence of each antibody is determined, and the rearranged sequences are classified according to the IGHV germline sequence. This determination is made on the basis of sequence identity.

Next, the occurrence of any of the 20 amino acid residues at each position in these sequences is determined. In certain embodiments of the invention, one may be particularly interested in the occurrence of different amino acid residues at the positions within CDRH1 and CDRH2, for example if increasing the diversity of the antigen-binding portion of the VH chassis is desired. In other embodiments of the invention, it may be desirable to evaluate the occurrence of different amino acid residues in the framework regions. Without being bound by theory, alterations in the framework regions may impact antigen binding by altering the spatial orientation of the CDRs.

After the occurrence of amino acids at each position of interest has been identified, alterations may be made in the VH chassis sequence, according to certain criteria. In some embodiments, the objective may be to produce additional VH chassis with sequence variability that mimics the variability observed in the heavy chain domains of rearranged human antibody sequences (derived from respective IGHV germline sequences) as closely as possible, thereby potentially obtaining sequences that are most human in nature (i.e., sequences that most closely mimic the composition and length of human sequences). In this case, one may synthesize additional VH chassis sequences that include mutations naturally found at a particular position and include one or more of these VH chassis sequences in a library of the invention, for example, at a frequency that mimics the frequency found in nature. In another embodiment of the invention, one may wish to include VH chassis that represent only mutations that most frequently occur at a given position in rearranged human antibody sequences. For example, rather than mimicking the human variability precisely, as described above, and with reference to exemplary Tables 6 and 7, one may choose to include only top 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1, amino acid residues that most frequently occur at each position. For the purposes of illustration, and with reference to Table 6, if one wished to include the top four most frequently occurring amino acid residues at position 31 of the VH1-69 sequence, then position 31 in the VH1-69 sequence would be varied to include S, N. T, and R. Without being bound by theory, it is thought that the introduction of diversity by mimicking the naturally occurring composition of the rearranged heavy chain sequences is likely to produce antibodies that are most human in composition. However, the libraries of the invention are not limited to heavy chain sequences that are diversified by this method, and any criteria can be used to introduce diversity into the heavy chain chassis, including random or rational mutagenesis. For example, in certain embodiments of the invention, it may be preferable to substitute neutral and/or smaller amino acid residues for those residues that occur in the IGHV germline sequence. Without being bound by theory, neutral and/or smaller amino acid residues may provide a more flexible and less sterically hindered context for the display of a diversity of CDR sequences.

Example 2 illustrates the application of this method to heavy chains derived from a particular IGHV germline. One of ordinary skill in the art will readily recognize that this method can be applied to any germline sequence, and can be used to generate at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 1000, 10⁴, 10′, 10⁶, or more variants of each heavy chain chassis.

2.1.2. Design of the Light Chain Chassis Sequences

The light chain chassis of the invention may be based on kappa and/or lambda light chain sequences. The principles underlying the selection of light chain variable (IGLV) germline sequences for representation in the library are analogous to those employed for the selection of the heavy chain sequences (described above and in Examples 1 and 2). Similarly, the methods used to introduce variability into the selected heavy chain chassis may also be used to introduce variability into the light chain chassis.

Without being bound by theory, and for the purposes of illustrating certain embodiments of the invention, the VL domains of the library may be considered to comprise three primary components: (1) a VL “chassis”, which includes amino acids 1 to 88 (using Kabat numbering), (2) the VLCDR3, which is defined herein to include the Kabat CDRL3 proper (positions 89-97), and (3) the FRM4 region, including amino acids 98 to 107 (Kabat numbering). The overall VL structure may therefore be depicted schematically (not to scale) as:

embedded image

In certain embodiments of the invention, the VL chassis of the libraries include one or more chassis based on IGKV germline sequences. In certain embodiments of the invention, the VL chassis of the libraries may comprise from about Kabat residue 1 to about Kabat residue 88 of one or more of the following IGKV germline sequences: IGKV1-05 (SEQ ID NO: 229), IGKV1-06 (SEQ ID NO: 451), IGKV1-08 (SEQ ID NO: 452, 453), IGKV1-09 (SEQ ID NO: 454), IGKV1-12 (SEQ ID NO: 230), IGKV1-13 (SEQ ID NO: 455), IGKV1-16 (SEQ ID NO: 456), IGKV1-17 (SEQ ID NO: 457), IGKV1-27 (SEQ ID NO: 231), IGKV1-33 (SEQ ID NO: 232), IGKV1-37 (SEQ ID NOs: 458, 459), IGKV1-39 (SEQ ID NO: 233), IGKV1D-16 (SEQ ID NO: 460), IGKV1D-17 (SEQ ID NO: 461), IGKV1D-43 (SEQ ID NO: 462), IGKV1D-8 (SEQ ID NOs: 463, 464). IGKV2-24 (SEQ ID NO: 465), IGKV2-28 (SEQ ID NO: 234), IGKV2-29 (SEQ ID NO: 466), IGKV2-30 (SEQ ID NO: 467), IGKV2-40 (SEQ ID NO: 468), IGKV2D-26 (SEQ ID NO: 469), IGKV2D-29 (SEQ ID NO: 470), IGKV2D-30 (SEQ ID NO: 471), IGKV3-11 (SEQ ID NO: 235), IGKV3-15 (SEQ ID NO: 236), IGKV3-20 (SEQ ID NO: 237), IGKV3D-07 (SEQ ID NO: 472), IGKV3D-11 (SEQ ID NO: 473), IGKV3D-20 (SEQ ID NO: 474), IGKV4-1 (SEQ ID NO: 238), IGKV5-2 (SEQ ID NOs: 475, 476), IGKV6-21 (SEQ ID NOs: 477), and IGKV6D-41. In some embodiments of the invention, a library may contain one or more of these sequences, one or more allelic variants of these sequences, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.

In other embodiments, the VL chassis of the libraries may comprise from about Kabat residue 1 to about Kabat residue 88 of the following IGKV germline sequences: IGKV1-05 (SEQ ID NO: 229), IGKV1-12 (SEQ ID NO: 230), IGKV1-27 (SEQ ID NO: 231), IGKV1-33 (SEQ ID NO: 232), IGKV1-39 (SEQ ID NO: 233), IGKV2-28 (SEQ ID NO: 234), IGKV3-11 (SEQ ID NO: 235), IGKV3-15 (SEQ ID NO: 236), IGKV3-20 (SEQ ID NO: 237), and IGKV4-1 (SEQ ID NO: 238). In some embodiments of the invention, a library may contain one or more of these sequences, one or more allelic variants of these sequences, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences. The amino acid sequences of these chassis are presented in Table 11.

In certain embodiments of the invention, the VL chassis of the libraries include one or more chassis based on IGλV germline sequences. In certain embodiments of the invention, the VL chassis of the libraries may comprise from about Kabat residue 1 to about Kabat residue 88 of one or more of the following IGλV germline sequences: IGλV3-1 (SEQ ID NO: 535), IGλV3-21 (SEQ ID NO: 537), IGλV2-14 (SEQ ID NO: 534), IGλV1-40 (SEQ ID NO: 531), IGλV3-19 (SEQ ID NO: 536), IGλV1-51 (SEQ ID NO: 533). IGλV1-44 (SEQ ID NO: 532), IGλV6-57 (SEQ ID NO: 539). IGλV2-8, IGλV3-25, IGλV2-23, IGλV3-10, IGλV4-69 (SEQ ID NO: 538), IGλV1-47, IGλV2-11, IGλV7-43 (SEQ ID NO: 541), IGλV7-46, IGλV5-45 (SEQ ID NO: 540), IGλV4-60, IGλV10-54 (SEQ ID NO: 482), IGλV8-61 (SEQ ID NO: 499), IGλV3-9 (SEQ ID NO: 494), IGλV1-36 (SEQ ID NO: 480), IGλV2-18 (SEQ ID NO: 485), IGλV3-16 (SEQ ID NO: 491). IGλV3-27 (SEQ ID NO: 493), IGλV4-3 (SEQ ID NO: 495). IGλV5-39 (SEQ ID NO: 497), IGλV9-49 (SEQ ID NO: 500), and IGλV3-12 (SEQ ID NO: 490). In some embodiments of the invention, a library may contain one or more of these sequences, one or more allelic variants of these sequences, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.

In other embodiments, the VL chassis of the libraries may comprise from about Kabat residue 1 to about Kabat residue 88 of the following IGλV germline sequences. IGλV3-1 (SEQ ID NO: 535), IGλV3-21 (SEQ ID NO: 537), IGλV2-14 (SEQ ID NO: 534), IGλV1-40 (SEQ ID NO: 531), IGλV3-19 (SEQ ID NO: 536), IGλV1-51 (SEQ ID NO: 533), IGλV1-44 (SEQ ID NO: 532), IGλV6-57 (SEQ ID NO: 539), IGλV4-69 (SEQ ID NO: 538), IGλV7-43 (SEQ ID NO: 541), and IGλV5-45 (SEQ ID NO: 540). In some embodiments of the invention, a library may contain one or more of these sequences, one or more allelic variants of these sequences, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences. The amino acid sequences of these chassis are presented in Table 14.

2.2. Design of the Antibody Library CDRH3 Components

It is known in the art that diversity in the CDR3 region of the heavy chain is sufficient for most antibody specificities (Xu and Davis, Immunity, 2000, 13: 27-45, incorporated by reference in its entirety) and that existing successful libraries have been created using CDRH3 as the major source of diversification (Hoogenboom et al., J. Mol. Biol., 1992, 227: 381; Lee et al., J. Mol. Biol., 2004, 340: 1073 each of which is incorporated by reference in its entirety). It is also known that both the DH region and the N1/N2 regions contribute to the CDRH3 functional diversity (Schroeder et al., J. Immunol., 2005, 174: 7773 and Mathis et al., Eur J Immunol., 1995, 25: 3115, each of which is incorporated by reference in its entirety). For the purposes of the present invention, the CDHR3 region of naturally occurring human antibodies can be divided into five segments: (1) the tail segment, (2) the N1 segment, (3) the DH segment, (4) the N2 segment, and (5) the JH segment. As exemplified below, the tail, N1 and N2 segments may or may not be present.

In certain embodiments of the invention, the method for selecting amino acid sequences for the synthetic CDRH3 libraries includes a frequency analysis and the generation of the corresponding variability profiles of existing rearranged antibody sequences. In this process, which is described in more detail in the Examples section, the frequency of occurrence of a particular amino acid residue at a particular position within rearranged CDRH3s (or any other heavy or light chain region) is determined. Amino acids that are used more frequently in nature may then be chosen for inclusion in a library of the invention.

2.2.1. Design and Selection of the DH Segment Repertoire

In certain embodiments of the invention, the libraries contain CDRH3 regions comprising one or more segments designed based on the IGHD gene germline repertoire. In some embodiments of the invention, DH segments selected for inclusion in the library are selected and designed based on the most frequent usage of human IGHD genes, and progressive N-terminal and C-terminal deletions thereof, to mimic the in vivo processing of the IGHD gene segments. In some embodiments of the invention, the DH segments of the library are about 3 to about 10 amino acids in length. In some embodiments of the invention, the DH segments of the library are about 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids in length, or a combination thereof. In certain embodiments, the libraries of the invention may contain DH segments with a wide distribution of lengths (e.g., about 0 to about 10 amino acids). In other embodiments, the length distribution of the DH may be restricted (e.g., about 1 to about 5 amino acids, about 3 amino acids, about 3 and about 5 amino acids, and so on). In certain embodiments of the library, the shortest DH segments may be about 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids.

In certain embodiments of the invention, libraries may contain DH segments representative of any reading frame of any IGHD germline sequence. In certain embodiments of the invention, the DH segments selected for inclusion in a library include one or more of the following IGHD sequences, or their derivatives (i.e., any reading frame and any degree of N-terminal and C-terminal truncation): IGHD3-10 (SEQ ID NOs: 1-3), IGHD3-22 (SEQ ID NOs: 239, 4, 240), IGHD6-19 (SEQ ID NOs: 5, 6, 241), IGHD6-13 (SEQ ID NOs: 7, 8, 242), IGHD3-3 (SEQ ID NOs: 243, 244, 9), IGHD2-2 (SEQ ID NOs: 245, 10, 11), IGHD4-17 (SEQ ID NOs: 246, 12, 247), IGHD1-26 (SEQ ID NOs: 13, 248 and 14), IGHD5-5/5-18 (SEQ ID NOs: 249, 250, 15), IGHD2-15 (SEQ ID NOs: 251, 16, 252), IGHD6-6 (encoded by SEQ ID NO: 515), IGHD3-9 (encoded by SEQ ID NO: 509), IGHD5-12 (encoded by SEQ ID NO: 512), IGHD5-24 (encoded by SEQ ID NO: 513), IGHD2-21 (encoded by SEQ ID NOs: 505 and 506). IGHD3-16 (encoded by SEQ ID NO: 508), IGHD4-23 (encoded by SEQ ID NO: 510), IGHD1-1 (encoded by SEQ ID NO: 501), IGHD1-7 (encoded by SEQ ID NO: 504), IGHD4-4/4-11 (encoded by SEQ ID NO: 511). IGHD1-20 (encoded by SEQ ID NO: 503), IGHD7-27, IGHD2-8, and IGHD6-25. In some embodiments of the invention, a library may contain one or more of these sequences, allelic variants thereof, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.

For the purposes of illustration, progressive N-terminal and C-terminal deletions of IGHD3-10, reading frame 1, are enumerated in the Table 1. N-terminal and C-terminal deletions of other IGHD sequences and reading frames are also encompassed by the invention, and one of ordinary skill in the art can readily determine these sequences using, for example, the non-limiting exemplary data presented in Table 16, and/or the methods outlined above. Table 18 (Example 5) enumerates certain DH segments used in certain embodiments of the invention.

TABLE 1

Example of Progressive N- and C-terminal

Deletions of Reading Frame 1 for Gene

IGHD3-10, Yielding DH Segments

DH
SEQ ID NO:

VLLWFGELL
1

VLLWFGEL
593

VLLWFGE
594

VLLLWFG
595

VLLWF
596

VLLW
597

VLL

LLWFGELL
598

LLWFGEL
599

LLWFGE
600

LLWFG
601

LLWF
602

LLW

LWFGELL
603

LWFGEL
604

LWFGE
605

LWFG
606

LWF

WFGELL
607

WFGEL
608

WFGE
609

WFG

FGELL
610

FGEL
611

FGE

GELL
612

GEL

ELL

In certain embodiments of the invention, the DH segments selected for inclusion in a library include one or more of the following IGHD sequences, or their derivatives (i.e., any reading frame and any degree N-terminal and C-terminal truncation): IGHD3-(SEQ ID NOs: 1-3), IGHD3-22 (SEQ ID NOs: 239, 4, 240), IGHD6-19 (SEQ ID NOs: 5, 6, 241), IGHD6-13 (SEQ ID NOs: 7, 8, 242), IGHD3-03 (SEQ ID NOs: 243, 244, 9), IGHD2-02 (SEQ ID NOs: 245, 10, 11), IGHD4-17 (SEQ ID NOs: 246, 12, 247), IGHD1-26 (SEQ ID NOs: 13, 248 and 14), IGHD5-5/5-18 (SEQ ID NOs: 249, 250, 15), and IGHD2-15 (SEQ ID NOs: 251, 16, 252). In some embodiments of the invention, a library may contain one or more of these sequences, allelic variants thereof, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.

In certain embodiments of the invention, the DH segments selected for inclusion in a library include one or more of the following IGHD sequences, wherein the notation “_x” denotes the reading frame of the gene, or their derivatives (i.e., any degree of N-terminal or C-terminal truncation): IGHD1-26_1 (SEQ ID NO: 13). IGHD1-26_3 (SEQ ID NO: 14), IGHD2-2_2 (SEQ ID NO: 10), IGHD2-2_3 (SEQ ID NO: 11), IGHD2-15_2 (SEQ ID NO: 16), IGHD3-3_3 (SEQ ID NO: 9), IGHD3-10_1 (SEQ ID NO: 1), IGHD3-10_2 (SEQ ID NO: 2), IGHD3-10_3 (SEQ ID NO: 3), IGHD3-22_2 (SEQ ID NO: 4), IGHD4-17_2 (SEQ ID NO: 12), IGHD5-5_3 (SEQ ID NO: 15). IGHD6-13_1 (SEQ ID NO: 7), IGHD6-13_2 (SEQ ID NO: 8), IGHD6-19_1 (SEQ ID NO: 5), and IGHD6-19_2 (SEQ ID NO: 6). In some embodiments of the invention, a library may contain one or more of these sequences, allelic variants thereof, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90/o, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.

In certain embodiments of the invention, the libraries are designed to reflect a pre-determined length distribution of N- and C-terminal deleted IGHD segments. For example, in certain embodiments of the library, the DH segments of the library may be designed to mimic the natural length distribution of DH segments found in the human repertoire. For example, the relative occurrence of different IGHD segments in rearranged human antibody heavy chain domains from Lee et al. (Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety). Table 2 shows the relative occurrence of the top 68% of IGHD segments from Lee et al.

TABLE 2

Relative Occurrence of Top 68% of IGHD

Gene Usage from Lee et al.

IGHD

SEQ

Reading
Sequence
ID
Relative

Frame
(Parent)
NO:
Occurrence

IGHD3-10_1
VLLWFGELL
1
4.3%

IGHD3-10_2
YYYGSGSYYN
2
8.4%

IGHD3-10_3
ITMVRGVII
3
4.0%

IGHD3-22_2
YYYDSSGYYY
4
15.6%

IGHD6-19_1
GYSSGWY
5
7.4%

IGHD6-19_2
GIAVAG
6
6.0%

IGHD6-13_1
GYSSSWY
7
8.4%

IGHD6-13_2
GIAAAG
8
5.3%

IGHD3-3_3
ITIFGVVII
9
7.4%

IGHD2-2_2
GYCSSTSCYT
10
5.2%

IGHD2-2_3
DIVVVPAAM
11
4.1%

IGHD4-17_2
DYGDY
12
6.8%

IGHD1-26_1
GIVGATT
13
2.9%

IGHD1-26_3
YSGSYY
14
4.3%

IGHD5-5_3
GYSYGY
15
4.3%

IGHD2-15_2
GYCSGGSCYS
16
5.6%

In certain embodiments, these relative occurrences may be used to design a library with DH prevalence that is similar to the IGHD usage found in peripheral blood. In other embodiments of the invention, it may be preferable to bias the library toward longer or shorter DH segments, or DH segments of a particular composition. In other embodiments, it may be desirable to use all DH segments selected for the library in equal proportion.

In certain embodiments of the invention, the most commonly used reading-frames of the ten most frequently occurring IGHD sequences are utilized, and progressive N-terminal and C-terminal deletions of these sequences are made, thus providing a total of 278 non-redundant DH segments that are used to create a CDRH3 repertoire of the instant invention (Table 18). In some embodiments of the invention, the methods described above can be applied to produce libraries comprising the top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 expressed IGHD sequences, and progressive N-terminal and C-terminal deletions thereof. As with all other components of the library, while the DH segments may be selected from among those that are commonly expressed, it is also within the scope of the invention to select these gene segments based on the fact that they are less commonly expressed. This may be advantageous, for example, in obtaining antibodies toward self-antigens or in further expanding the diversity of the library. Alternatively, DH segments can be used to add compositional diversity in a manner that is strictly relative to their occurrence in actual human heavy chain sequences.

In certain embodiments of the invention, the progressive deletion of IGHD genes containing disulfide loop encoding segments may be limited, so as to leave the loop intact and to avoid the presence of unpaired cysteine residues. In other embodiments of the invention, the presence of the loop can be ignored and the progressive deletion of the IGHD gene segments can occur as for any other segments, regardless of the presence of unpaired cysteine residues. In still other embodiments of the invention, the cysteine residues can be mutated to any other amino acid.

2.2.1.2 Design and Selection of DH Segments from Non-Human Vertebrates

In certain embodiment of the invention, DH segments from non-human vertebrates may be used in conjunction with human VH, N1, N2, and H3-JH segments to produce CDRH3s and/or antibodies in which all segments except the DH segment are synthesized with reference to human sequences. Without being bound by theory, it is anticipated that the extensive variability in the DH segment of antibodies, for example as the result of somatic hypermutation, may make this region more permissive to the inclusion of sequences that have non-human characteristics, without sacrificing the ability to recognize a broad variety of antigens or introducing immunogenic sequences.

The general methods taught herein are readily applicable to information derived from species other than humans. Example 16 presents exemplary DH segments from a variety of species and outlines methods for their inclusion in the libraries of the invention. These methods may be readily applied to information derived from other species and/or sources of information other than those presented in Example 16. For example, as IGHD sequence data becomes available for additional species (e.g., as a result of focused sequencing efforts), one of ordinary skill in the art could use the teachings of this application to construct libraries with DH segments derived from these species.

In certain embodiments of the invention, a library may contain one or more DH segments derived from the IGHD genes presented in Table 55. As further enumerated in Example 16, these sequences can be selected according to one or more non-limiting criteria, including diversity in length and sequence, maximal (or minimal) human “string content,” and/or the absence or minimization of T cell epitopes. Like the human IGHD sequences discussed elsewhere in the application, the non-human IGHD segments of the invention may be deleted at their N- and/or C-termini to provide DH segments with a minimal length of 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids. The length distribution, reading frame, and frequency of inclusion of the non-human DH segments selected for inclusion in the library may be varied as presented for the human DH segments. Non-human DH segments include those derived from non-human IGHD genes according to the methods presented herein, allelic variants thereof, and amino acid and nucleotide sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences.

IGHD segments may be obtained from multiple species, including camel, shark, mouse, rat, llama, fish, rabbit, and so on. Non-limiting exemplary species from which IGHD segments may be obtained include Mus musculus. Camelus sp., Llama sp., Camelidae sp., Raja sp., Ginglymostoma sp., Carcharhinus sp., Heterodontus sp., Hydrolagus sp., Ictalurus sp., Gallus sp., Bos sp., Marmaronetta sp., Aythya sp., Netta sp., Equus sp., Pentalagus sp., Bunolagus sp., Nesolagus sp., Romerolagus sp., Brachylagus sp., Sylvilagus sp., Oryctolagus sp., Poelagus sp., Ovis sp., Sus sp., Gadus sp., Salmo sp., Oncorhynchus sp, Macaca sp., Rattus sp., Pan sp., Hexanchus sp., Heptranchias sp., Notorynchus sp., Chlamydoselachus sp., Helerodontus sp. Pristiophorus sp., Pliotrema sp., Squatina sp., Carcharia sp., Mitsukurina sp., Lamma sp., Isurus sp., Carcharodon sp., Cetorhinus sp., Alopias sp., Nebrius sp., Stegostoma sp., Orectolobus sp., Eucrossorhinus sp., Sutorectus sp., Chiloscyllium sp., Hemiscyllium sp., Brachaelurus sp., Heteroscyllium sp., Cirrhoscyllium sp., Parascyllium sp., Rhincodon sp., Apristurus sp., Atelomycterus sp., Cephaloscyllium sp., Cephalurus sp., Dichichthys sp., Galeus sp., Halaelurus sp., Haploblepharus sp., Parmaturus sp., Pentanchus sp., Poroderna sp., Schroederichthys sp., Scyliorhinus sp., Pseudotriakis sp., Scylliogaleus sp., Furgaleus sp., Hemitriakis sp., Mustelus sp., Triakis sp., Iago sp., Galeorhinus sp., Hypogaleus sp., Chaenogaleus sp., Hemigaleus sp., Paragaleus sp., Galeocerdo sp., Prionace sp., Sciolodon sp., Loxodon sp., Rhizoprionodon sp., Aprionodon sp., Negaprion sp., Hypoprion sp., Carcharhinus sp., Isogomphodon sp., Triaenodon sp., Sphyrna sp., Echinorhinus sp., Oxynotus sp., Squalus sp., Centroscyllium sp., Etmopterus sp., Centrophorus sp., Cirrhigaleus sp., Deania sp., Centroscymnus sp., Scymnodon sp., Dalalias sp., Euprotomicrus sp., Isislius sp., Squaliolus sp., Heteroscymnoides sp., Somniosus sp. and Megachasma sp.

Publications discussing IGHD segments from additional species and/or methods of obtaining such segments include, for example, Ye, Immunogenetics, 2004, 56: 399; De Genst et al., Dev. Comp. Immunol., 2006, 30: 187; Dooley and Flajnik, Dev. Comp. Immunol. 2006, 30: 43; Bengtén et al., Dev. Comp. Immunol., 2006, 30: 77; Ratcliffe, Dev. Comp. Immunol., 2006, 30: 101; Zhao et al., Dev. Comp. Immunol., 2006, 30: 175; Lundqvist et al., Dev. Comp. Immunol., 2006, 30: 93; Wagner, Dev. Comp. Immunol. 2006, 30: 155; Mage et al., Dev. Comp. Immunol., 2006, 30: 137; Malecek et al., J. Immunol., 2005, 175: 8105; Jenne et al., Dev. Comp. Immunol., 2006, 30: 165; Butler et al., Dev. Comp. Immunol., 2006, 30: 199; Solem et al., Dev. Comp. Immunol., 2006, 30: 57; Das et al., Immunogenetics, 2008, 60: 47, and Kiss et al., Nucleic Acids Res., 2006, 34: e132, each of which is incorporated by reference in its entirety.

Given the degree of variability in N1 and N2, these segments might also be considered possible regions for substitution with non-human sequences, that is, sequences with composition biases not arising from those of human terminal deoxynucleotide transferase. The methods taught herein for the identification and analysis of the N1 and N2 regions of human antibodies are also readily applicable to non-human antibodies.

2.2.2. Design and Selection of the H3-JH Segment Repertoire

There are six IGHJ (joining) segments, IGHJ1 (SEQ ID NO: 253). IGHJ2 (SEQ ID NO: 254), IGHJ3 (SEQ ID NO: 255), IGHJ4 (SEQ ID NO: 256), IGHJ5 (SEQ ID NO: 257), and IGHJ6 (SEQ ID NO: 258). The amino acid sequences of the parent segments and the progressive N-terminal deletions are presented in Table 20 (Example 5). Similar to the N- and C-terminal deletions that the IGHD genes undergo, natural variation is introduced into the IGHJ genes by N-terminal “nibbling”, or progressive deletion, of one or more codons by exonuclease activity.

The H3-JH segment refers to the portion of the IGHJ segment that is part of CDRH3. In certain embodiments of the invention, the H3-JH segment of a library comprises one or more of the following sequences: AEYFQH (SEQ ID NO: 17), EYFQH (SEQ ID NO: 583), YFQH (SEQ ID NO: 584), FQH. QH, H, YWYFDL (SEQ ID NO: 18), WYFDL (SEQ ID NO: 585), YFDL (SEQ ID NO: 586), FDL, DL, L, AFDV (SEQ ID NO: 19), FDV, DV, V, YFDY (SEQ ID NO: 20), FDY, DY, Y, NWFDS (SEQ ID NO: 21), WFDS (SEQ ID NO: 587), FDS, DS, S, YYYYYGMDV (SEQ ID NO: 22), YYYYGMDV (SEQ ID NO: 588). YYYGMDV (SEQ ID NO: 589), YYGMDV (SEQ ID NO: 590), YGMDV (SEQ ID NO: 591), GMDV (SEQ ID NO: 592), MDV, and DV. In some embodiments of the invention, a library may contain one or more of these sequences, allelic variations thereof, or encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5% 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 60%, 55%, or 50% identical to one or more of these sequences.

In other embodiments of the invention, the H3-JH segment may comprise about 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or more amino acids. For example, the H3-JH segment of JH1_4 (Table 20) has a length of three residues, while non-deleted JH6 has an H3-JH segment length of nine residues. The FRM4-JH region of the IGHJ segment begins with the sequence WG(Q/R)G (SEQ ID NO: 23) and corresponds to the portion of the IGHJ segment that makes up part of framework 4. In certain embodiments of the invention, as enumerated in Table 20, there are 28 H3-JH segments that are included in a library. In certain other embodiments, libraries may be produced by utilizing about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 of the IGHJ segments enumerated above or in Table 20.

2.2.3. Design and Selection of the N1 and N2 Segment Repertoires

Terminal deoxynucleotidyl transferase (TdT) is a highly conserved enzyme from vertebrates that catalyzes the attachment of 5′ triphosphates to the 3′ hydroxyl group of single- or double-stranded DNA. Hence, the enzyme acts as a template-independent polymerase (Koiwai et al., Nucleic Acids Res., 1986, 14: 5777; Basu et al., Biochem. Biophys. Res. Comm., 1983, 111: 1105, each incorporated by reference in its entirety). In vivo, TdT is responsible for the addition of nucleotides to the V-D and D-J junctions of antibody heavy chains (Alt and Baltimore, PNAS, 1982, 79: 4118; Collins et al., J. Immunol., 2004, 172: 340, each incorporated by reference in its entirety). Specifically, TdT is responsible for creating the N1 and N2 (non-templated) segments that flank the D (diversity) region.

In certain embodiments of the invention, the length and composition of the N1 and N2 segments are designed rationally, according to statistical biases in amino acid usage found in naturally occurring N1 and N2 segments in human antibodies. One embodiment of a library produced via this method is described in Example 5. According to data compiled from human databases (Jackson et al., J. Immunol Methods, 2007, 324: 26, incorporated by reference in its entirety), there are an average of 3.02 amino acid insertions for N1 and 2.4 amino acid insertions for N2, not taking into account insertions of two nucleotides or less (FIG. 2). In certain embodiments of the invention, N1 and N2 segments are restricted to lengths of zero to three amino acids. In other embodiments of the invention, N1 and N2 may be restricted to lengths of less than about 4, 5, 6, 7, 8, 9, or 10 amino acids.

In some embodiments of the invention, the composition of these sequences may be chosen according to the frequency of occurrence of particular amino acids in the N1 and N2 sequences of natural human antibodies (for examples of this analysis, see, Tables 21 to 23, in Example 5). In certain embodiments of the invention, the eight most commonly occurring amino acids in these regions (i.e., G, R, S, P, L, A, T, and V) are used to design the synthetic N1 and N2 segments. In other embodiments of the invention about the most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or 19 most commonly occurring amino acids may be used in the design of the synthetic N1 and N2 segments. In still other embodiments, all 20 amino acids may be used in these segments. Finally, while it is possible to base the designed composition of the N1 and N2 segments of the invention on the composition of naturally occurring N1 and N2 segments, this is not a requirement. The N and N2 segments may comprise amino acids selected from any group of amino acids, or designed according to other criteria considered for the design of a library of the invention. A person of ordinary skill in the art would readily recognize that the criteria used to design any portion of a library of the invention may vary depending on the application of the particular library. It is an object of the invention that it may be possible to produce a functional library through the use of N1 and N2 segments selected from any group of amino acids, no N1 or N2 segments, or the use of N1 and N2 segments with compositions other than those described herein.

One important difference between the libraries of the current invention and other libraries known in the art is the consideration of the composition of naturally occurring duplet and triplet amino acid sequences during the design of the library. Table 23 shows the top twenty-five naturally occurring duplets in the N1 and N2 regions. Many of these can be represented by the general formula (G/P)(G/R/S/P/LA/V/T) or (R/S/L/A/VT)(G/P). In certain embodiments of the invention, the synthetic N1 and N2 regions may comprise all of these duplets. In other embodiments, the library may comprise the top 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 most common naturally occurring N1 and/or N2 duplets. In other embodiments of the invention, the libraries may include duplets that are less frequently occurring (i.e., outside of the top 25). The composition of these additional duplets or triplets could readily be determined, given the methods taught herein.

Finally, the data from the naturally occurring triplet N1 and N2 regions demonstrates that the naturally occurring N1 and N2 triplet sequences can often be represented by the formulas (G)(G)(G/R/S/P/L/A/V/T), (G)(R/S/P/UA/V/T)(G), or (R/S/P/L/A/V/T)(G)(G). In certain embodiments of the invention, the library may comprise the top 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 most commonly occurring N1 and/or N2 triplets. In other embodiments of the invention, the libraries may include triplets that are less frequently occurring (i.e., outside of the top 25). The composition of these additional duplets or triplets could readily be determined, given the methods taught herein.

In certain embodiments of the invention, there are about 59 total N1 segments and about 59 total N2 segments used to create a library of CDRH3s. In other embodiments of the invention, the number of N1 segments, N2 segments, or both is increased to about 141 (see, for example, Example 5). In other embodiments of the invention, one may select a total of about 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 220, 240, 260, 280, 300, 320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 1000, 10⁴, or more N1 and/or N2 segments for inclusion in a library of the invention.

One of ordinary skill in the art will readily recognize that, given the teachings of the instant specification, it is well within the realm of normal experimentation to extend the analysis detailed herein, for example, to generate additional rankings of naturally occurring duplet and triplet (or higher order) N regions that extend beyond those presented herein (e.g., using sequence alignment, the SoDA algorithm, and any database of human sequences (Volpe et al., Bioinformatics, 2006, 22: 438-44, incorporated by reference in its entirety). An ordinarily skilled artisan would also recognize that, based on the information taught herein, it is now possible to produce libraries that are more diverse or less diverse (i.e., more focused) by varying the number of distinct amino acid sequences used in the N1 pool and/or N2 pool.

As described above, many alternative embodiments are envisioned, in which the compositions and lengths of the N1 and N2 segments vary from those presented in the Examples herein. In some embodiments, sub-stoichiometric synthesis of trinucleotides may be used for the synthesis of N1 and N2 segments. Sub-stoichiometric synthesis with trinucleotides is described in Knappik et al. (U.S. Pat. No. 6,300,064, incorporated by reference in its entirety). The use of sub-stoichiometric synthesis would enable synthesis with consideration of the length variation in the N1 and N2 sequences.

In addition to the embodiments described above, a model of the activity of TdT may also be used to determine the composition of the N1 and N2 sequences in a library of the invention. For example, it has been proposed that the probability of incorporating a particular nucleotide base (A, C, G, T) on a polynucleotide, by the activity of TdT, is dependent on the type of base and the base that occurs on the strand directly preceding the base to be added. Jackson et al., (J. Immunol. Methods, 2007, 324: 26, incorporated by reference in its entirety) have constructed a Markov model describing this process. In certain embodiments of the invention, this model may be used to determine the composition of the N1 and/or N2 segments used in libraries of the invention. Alternatively, the parameters presented in Jackson et al. could be further refined to produce sequences that more closely mimic human sequences.

2.2.4. Design of a CDRH3 Library Using the N1, DH, N2, and H3-JH Segments

The CDRH3 libraries of the invention comprise an initial amino acid (in certain exemplary embodiments, G, D, E) or lack thereof (designated herein as position 95), followed by the N1, DH, N2, and H3-JH segments. Thus, in certain embodiments of the invention, the overall design of the CDRH3 libraries can be represented by the following formula:

[G/D/E/-]-[N1]-[DH]-[N2]-[H3-JH].

While the compositions of each portion of a CDRH3 of a library of the invention are more fully described above, the composition of the tail presented above (G/D/E/-) is non-limiting, and that any amino acid (or no amino acid) can be used in this position. Thus, certain embodiments of the invention may be represented by the following formula:

[X]-[N1]-[DH]-[N2]-[H3-JH],

wherein [X] is any amino acid residue or no residue.

In certain embodiments of the invention, a synthetic CDRH3 repertoire is combined with selected VH chassis sequences and heavy chain constant regions, via homologous recombination. Therefore, in certain embodiments of the invention, it may be necessary to include DNA sequences flanking the 5′ and 3′ ends of the synthetic CDRH3 libraries, to facilitate homologous recombination between the synthetic CDRH3 libraries and vectors containing the selected chassis and constant regions. In certain embodiments, the vectors also contain a sequence encoding at least a portion of the non-nibbled region of the IGHJ gene (i.e., FRM4-JH). Thus, a polynucleotide encoding an N-terminal sequence (e.g., CA(K/R/T)) may be added to the synthetic CDRH3 sequences, wherein the N-terminal polynucleotide is homologous with FRM3 of the chassis, while a polynucleotide encoding a C-terminal sequence (e.g., WG(Q/R)G; SEQ ID NO: 23) may be added to the synthetic CDRH3, wherein the C-terminal polynucleotide is homologous with FRM4-JH. Although the sequence WG(Q/R)G (SEQ ID NO: 23) is presented in this exemplary embodiment, additional amino acids, C-terminal to this sequence in FRM4-JH may also be included in the polynucleotide encoding the C-terminal sequence. The purpose of the polynucleotides encoding the N-terminal and C-terminal sequences, in this case, is to facilitate homologous recombination, and one of ordinary skill in the art would recognize that these sequences may be longer or shorter than depicted below. Accordingly, in certain embodiments of the invention, the overall design of the CDRH3 repertoire, including the sequences required to facilitate homologous recombination with the selected chassis, can be represented by the following formula (regions homologous with vector underlined).

CA[R/K/T]-[X]-[N1]-[DH]-[N2]-[H3-JH]-[WG(O/R)G].

In other embodiments of the invention, the CDRH3 repertoire can be represented by the following formula, which excludes the T residue presented in the schematic above:

CA[R/K]-[X]-[N1]-[DH]-[N2]-[H3-JH]-[WG(O/R)G].

References describing collections of V, D, and J genes include Scaviner et al., Exp. Clin, Immunogenet., 1999, 16: 243 and Ruiz et al., Exp. Clin. Immunogenet, 1999, 16: 173, each incorporated by reference in its entirety.

2.2.5. CDRH3 Length Distributions

As described throughout this application, in addition to accounting for the composition of naturally occurring CDRH3 segments, the instant invention also takes into account the length distribution of naturally occurring CDRH3 segments. Surveys by Zemlin et al. (JMB, 2003, 334: 733, incorporated by reference in its entirety) and Lee et al. (Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety) provide analyses of the naturally occurring CDRH3 lengths. These data show that about 95% of naturally occurring CDRH3 sequences have a length from about 7 to about 23 amino acids. In certain embodiments, the instant invention provides rationally designed antibody libraries with CDRH3 segments which directly mimic the size distribution of naturally occurring CDRH3 sequences. In certain embodiments of the invention, the length of the CDRH3s may be about 2 to about 30, about 3 to about 35, about 7 to about 23, about 3 to about 28, about 5 to about 28, about 5 to about 26, about 5 to about 24, about 7 to about 24, about 7 to about 22, about 8 to about 19, about 9 to about 22, about 9 to about 20, about 10 to about 18, about 11 to about 20, about 11 to about 18, about 13 to about 18, or about 13 to about 16 residues in length.

In certain embodiments of the invention, the length distribution of a CDRH3 library of the invention may be defined based on the percentage of sequences within a certain length range. For example, in certain embodiments of the invention, CDRH3s with a length of about 10 to about 18 amino acid residues comprise about 84% to about 94% of the sequences of a the library. In some embodiments, sequences within this length range comprise about 89% of the sequences of a library.

In other embodiments of the invention, CDRH3s with a length of about 11 to about 17 amino acid residues comprise about 74% to about 84% of the sequences of a library. In some embodiments, sequences within this length range comprise about 79% of the sequences of a library.

In still other embodiments of the invention, CDRH3s with a length of about 12 to about 16 residues comprise about 57% to about 67% of the sequences of a library. In some embodiments, sequences within this length range comprise about 62% of the sequences of a library.

In certain embodiments of the invention, CDRH3s with a length of about 13 to about 15 residues comprise about 35% to about 45% of the sequences of a library. In some embodiments, sequences within this length range comprise about 40% of the sequences of a library.

2.3. Design of the Antibody Library CDRL3 Component

The CDRL3 libraries of the invention can be generated by one of several approaches. The actual version of the CDRL3 library made and used in a particular embodiment of the invention will depend on objectives for the use of the library. More than one CDRL3 library may be used in a particular embodiment; for example, a library containing CDRH3 diversity, with kappa and lambda light chains is within the scope of the invention.

In certain embodiments of the invention, a CDRL3 library is a VKCDR3 (kappa) library and/or a VλCDR3 (lambda) library. The CDRL3 libraries described herein differ significantly from CDRL3 libraries in the art. First, they consider length variation that is consistent with what is observed in actual human sequences. Second, they take into consideration the fact that a significant portion of the CDRL3 is encoded by the IGLV gene. Third, the patterns of amino acid variation within the IGLV gene-encoded CDRL3 portions are not stochastic and are selected based on depending on the identity of the IGLV gene. Taken together, the second and third distinctions mean that CDRL3 libraries that faithfully mimic observed patterns in human sequences cannot use a generic design that is independent of the chassis sequences in FRM1 to FRM3. Fourth, the contribution of JL to CDRL3 is also considered explicitly, and enumeration of each amino acid residue at the relevant positions is based on the compositions and natural variations of the JL genes themselves.

As indicated above, and throughout the application, a unique aspect of the design of the libraries of the invention is the germline or “chassis-based” aspect, which is meant to preserve more of the integrity and variability of actual human sequences. This is in contrast to other codon-based synthesis or degenerate oligonucleotide synthesis approaches that have been described in the literature and that aim to produce “one-size-fits-all” (e.g., consensus) libraries (e.g., Knappik, et al., J Mol Biol, 2000, 296: 57; Akamatsu et al., J Immunol, 1993, 151: 4651, each incorporated by reference in its entirety).

In certain embodiments of the invention, patterns of occurrence of particular amino acids at defined positions within VL sequences are determined by analyzing data available in public or other databases, for example, the NCBI database (see, for example, GI numbers in Appendices A and B filed herewith). In certain embodiments of the invention, these sequences are compared on the basis of identity and assigned to families on the basis of the germline genes from which they are derived. The amino acid composition at each position of the sequence, in each germline family, may then be determined. This process is illustrated in the Examples provided herein.

2.3.1. Minimalist VKCDR3 Libraries

In certain embodiments of the invention, the light chain CDR3 library is a VKCDR3 library. Certain embodiments of the invention may use only the most common VKCDR3 length, nine residues; this length occurs in a dominant proportion (greater than about 70%) of human VKCDR3 sequences. In human VKCDR3 sequences of length nine, positions 89-95 are encoded by the IGKV gene and positions 96-97 are encoded by the IGKJ gene. Analysis of human kappa light chain sequences indicates that there are not strong biases in the usage of the IGKJ genes. Therefore, in certain embodiments of the invention, each of the five the IGKJ genes can be represented in equal proportions to create a combinatorial library of (M VK chassis)×(5 JK genes), or a library of size M×5. However, in other embodiments of the invention, it may be desirable to bias IGKJ gene representation, for example to restrict the size of the library or to weight the library toward IGKJ genes known to have particular properties.

As described in Example 6.1, examination of the first amino acid encoded by the IGKJ gene (position 96) indicated that the seven most common residues found at this position are L, Y, R, W, F, P, and I. These residues cumulatively account for about 85% of the residues found in position 96 in naturally occurring kappa light chain sequences. In certain embodiments of the invention, the amino acid residue at position 96 may be one of these seven residues. In other embodiments of the invention, the amino acid at this position may be chosen from amongst any of the other 13 amino acid residues. In still other embodiments of the invention, the amino acid residue at position 96 may be chosen from amongst the top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids that occur at position 96, or even residues that never occur at position 9%. Similarly, the occurrence of the amino acids selected to occupy position 96 may be equivalent or weighted. In certain embodiments of the invention, it may be desirable to include each of the amino acids selected for inclusion in position 96 at equivalent amounts. In other embodiments of the invention, it may be desirable to bias the composition of position 96 to include particular residues more or less frequently than others. For example, as presented in Example 6.1, arginine occurs at position 96 most frequently when the IGKJ1 (SEQ ID NO. 552) germline sequence is used. Therefore, in certain embodiments of the invention, it may be desirable to bias amino acid usage at position 96 according to the origin of the IGKJ germline sequence(s) and/or the IGKV germline sequence(s) selected for representation in a library.

Therefore, in certain embodiments of the invention, a minimalist VKCDR3 library may be represented by one or more of the following amino acid sequences:

[VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-[JK*]

[VK_Chassis]-[L3-VK]-[X]-[JK*]

In these schematic exemplary sequences, VK_Chassis represents any VK chassis selected for inclusion in a library of the invention (e.g., see Table 11). Specifically, VK_Chassis comprises about Kabat residues 1 to 88 of a selected IGKV sequence. L3-VK represents the portion of the VKCDR3 encoded by the chosen IGKV gene (in this embodiment, Kabat residues 89-95). F, L, I, R, W, Y, and P are the seven most commonly occurring amino acids at position 96 of VKCDR3s with length nine, X is any amino acid, and JK* is an IGKJ amino acid sequence without the N-terminal residue (i.e., the N-terminal residue is substituted with F, L, I, R, W, Y, P, or X). Thus, in one possible embodiment of the minimalist VKCDR3 library, 70 members could be produced by utilizing 10 VK chassis, each paired with its respective L3-VK, 7 amino acids at position 96 (i.e., X), and one JK* sequence. Another embodiment of the library may have 350 members, produced by combining 10 VK chassis, each paired with its respective L3-VK, with 7 amino acids at position 96, and all 5 JK* genes. Still another embodiment of the library may have 1,125 members, produced by combining 15 VK chassis, each paired with its respective H3-JK, with 15 amino acids at position 96 and all JK* genes, and so on. A person of ordinary skill in the art will readily recognize that many other combinations are possible. Moreover, while it is believed that maintaining the pairing between the VK chassis and the L3-VK results in libraries that are more similar to human kappa light chain sequences in composition, the L3-VK regions may also be combinatorially varied with different VK chassis regions, to create additional diversity.

2.3.2. VKCDR3 Libraries of about 10⁵Complexity

While the dominant length of VKCDR3 sequences in humans is about nine amino acids, other lengths appear at measurable frequencies that cumulatively approach almost about 30% of VKCDR3 sequences. In particular, VKCDR3 of lengths 8 and 10 represent about 8.5% and about 16%, respectively, of VKCDR3 lengths in representative samples (Example 6.2; FIG. 3). Thus, more complex VKCDR3 libraries may include CDR lengths of 8, 10, and 11 amino acids. Such libraries could account for a greater percentage of the length distribution observed in collections of human VKCDR3 sequences, or even introduce VKCDR3 lengths that do not occur frequently in human VKCDR3 sequences (e.g., less than eight residues or greater than 11 residues).

The inclusion of a diversity of kappa light chain length variations in a library of the invention also enables one to include sequence variability that occurs outside of the amino acid at the VK-JK junction (i.e., position 96, described above). In certain embodiments of the invention, the patterns of sequence variation within the VK, and/or JK segments can be determined by aligning collections of sequences derived from particular germline sequences. In certain embodiments of the invention, the frequency of occurrence of amino acid residues within VKCDR3 can be determined by sequence alignments (e.g., see Example 6.2 and Table 30). In some embodiments of the invention, this frequency of occurrence may be used to introduce variability into the VK_Chassis, L3-VK and/or JK segments that are used to synthesize the VKCDR3 libraries. In certain embodiments of the invention, the top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids that occur at any particular position in a naturally occurring repertoire may be included at that position in a VKCDR3 library of the invention. In certain embodiments of the invention, the percent occurrence of any amino acid at any particular position within the VKCDR3 or a VK light chain may be about 0%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100%. In certain embodiments of the invention, the percent occurrence of any amino acid at any position within a VKCDR3 or kappa light chain library of the invention may be within at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9% 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 120%, 140%, 160%, 180%, or 200% of the percent occurrence of any amino acid at any position within a naturally occurring VKCDR3 or kappa light chain domain.

In some embodiments of the invention, a VKCDR3 library may be synthesized using degenerate oligonucleotides (see Table 31 for IUPAC base symbol definitions). In some embodiments of the invention, the limits of oligonucleotide synthesis and the genetic code may require the inclusion of more or fewer amino acids at a particular position in the VKCDR3 sequences. An illustrative embodiment of this approach is provided in Example 6.2.

2.3.3. More Complex VKCDR3 Libraries

The limitations inherent in using the genetic code and degenerate oligonucleotide synthesis may, in some cases, require the inclusion of more or fewer amino acids at a particular position within VKCDR3 (e.g., Example 6.2, Table 32), in comparison to those amino acids found at that position in nature. This limitation can be overcome through the use of a codon-based synthesis approach (Vimekas et al. Nucleic Acids Res., 1994, 22: 5600, incorporated by reference in its entirety), which enables precise synthesis of oligonucleotides encoding particular amino acids and a finer degree of control over the proportion of any particular amino acid incorporated at any position. Example 6.3 describes this approach in greater detail.

In some embodiments of the invention, a codon-based synthesis approach may be used to vary the percent occurrence of any amino acid at any particular position within the VKCDR3 or kappa light chain. In certain embodiments, the percent occurrence of any amino acid at any position in a VKCDR3 or kappa light chain sequence of the library may be about 0%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100%. In some embodiments of the invention, the percent occurrence of any amino acid at any position may be about 1%, 2%, 3%, or 4%. In certain embodiments of the invention, the percent occurrence of any amino acid at any position within a VKCDR3 or kappa light chain library of the invention may be within at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 120%, 140%, 160%, 180%, or 200% of the percent occurrence of any amino acid at any position within a naturally occurring VKCDR3 or kappa light chain domain.

In certain embodiments of the invention, the VKCDR3 (and any other sequence used in the library, regardless of whether or not it is part of VKCDR3) may be altered to remove undesirable amino acid motifs. For example, peptide sequences with the pattern N-X-(S or T)-Z, where X and Z are different from P, will undergo post-translational modification (N-linked glycosylation) in a number of expression systems, including yeast and mammalian cells. In certain embodiments of the invention, the introduction of N residues at certain positions may be avoided, so as to avoid the introduction of N-linked glycosylation sites. In some embodiments of the invention, these modifications may not be necessary, depending on the organism used to express the library and the culture conditions. However, even in the event that the organism used to express libraries with potential N-linked glycosylation sites is incapable of N-linked glycosylation (e.g., bacteria), it may still be desirable to avoid N-X-(S/T) sequences, as the antibodies isolated from such libraries may be expressed in different systems (e.g., yeast, mammalian cells) later (e.g., toward clinical development), and the presence of carbohydrate moieties in the variable domains, and the CDRs in particular, may lead to unwanted modifications of activity.

In certain embodiments of the invention, it may be preferable to create the individual sub-libraries of different lengths (e.g., one or more of lengths 5, 6, 7, 8, 9, 10, 11, or more) separately, and then mix the sub-libraries in proportions that reflect the length distribution of VKCDR3 in human sequences: for example, in ratios approximating the 1:9:2 distribution that occurs in natural VKCDR3 sequences of lengths 8, 9, and 10 (see FIG. 3). In other embodiments, it may be desirable to mix these sub-libraries at ratios that are different from the distribution of lengths in natural VKCDR3 sequences, for example, to produce more focused libraries or libraries with particular properties.

2.3.4. VλCDR3 Libraries

The principles used to design the minimalist VλCDR3 libraries of the invention are similar to those enumerated above, for the VKCDR3 libraries, and are explained in more detail in the Examples. One difference between the VλCDR3 libraries of the invention and the VKCDR3 libraries of the invention is that, unlike the IGKV genes, the contribution of the IGVλ genes to CDRL3 (i.e., L3-Vλ) is not constrained to a fixed number of amino acid residues. Therefore, while the combination of the VK (including L3-VK) and JK segments, with inclusion of position 96, yields CDRL3 with a length of only 9 residues, length variation may be obtained within a VλLCDR3 library even when only the Vλ (including L3-Vλ) and Jλ segments are considered.

As for the VKCDR3 sequences, additional variability may be introduced into the VλCDR3 sequences via the same methods outlined above, namely determining the frequency of occurrence of particular residues within VλCDR3 sequences and synthesizing the oligonucleotides encoding the desired compositions via degenerate oligonucleotide synthesis or trinucleotides-based synthesis.

2.4. Synthetic Antibody Libraries

In certain embodiments of the invention, both the heavy and light chain chassis sequences and the heavy and light chain CDR3 sequences are synthetic. The polynucleotide sequences of the instant invention can be synthesized by various methods. For example, sequences can be synthesized by split pool DNA synthesis as described in Feldhaus et al., Nucleic Acids Research, 2000, 28: 534; Omstein et al., Biopolymers, 1978, 17: 2341; and Brenner and Lemer, PNAS, 1992, 87: 6378 (each of which is incorporated by reference in its entirety).

In some embodiments of the invention, cassettes representing the possible V, D, and J diversity found in the human repertoire, as well as junctional diversity, are synthesized de novo either as double-stranded DNA oligonucleotides, single-stranded DNA oligonucleotides representative of the coding strand, or single-stranded DNA oligonucleotides representative of the non-coding strand. These sequences can then be introduced into a host cell along with an acceptor vector containing a chassis sequence and, in some cases a portion of FRM4 and a constant region. No primer-based PCR amplification from mammalian cDNA or mRNA or template-directed cloning steps from mammalian cDNA or mRNA need be employed.

2.5. Construction of Libraries by Yeast Homologous Recombination

In certain embodiments, the present invention exploits the inherent ability of yeast cells to facilitate homologous recombination at high efficiency. The mechanism of homologous recombination in yeast and its applications are briefly described below.

As an illustrative embodiment, homologous recombination can be carried out in, for example, Saccharomyces cerevisiae, which has genetic machinery designed to carry out homologous recombination with high efficiency. Exemplary S. cerevisiae strains include EM93, CEN.PK2, RM11-1a, YJM789, and BJ5465. This mechanism is believed to have evolved for the purpose of chromosomal repair, and is also called “gap repair” or “gap filling”. By exploiting this mechanism, mutations can be introduced into specific loci of the yeast genome. For example, a vector carrying a mutant gene can contain two sequence segments that are homologous to the 5′ and 3′ open reading frame (ORF) sequences of a gene that is intended to be interrupted or mutated. The vector may also encode a positive selection marker, such as a nutritional enzyme allele (e.g., URA3) and/or an antibiotic resistant marker (e.g., Geneticin/G418), flanked by the two homologous DNA segments. Other selection markers and antibiotic resistance markers are known to one of ordinary skill in the art. In some embodiments of the invention, this vector (e.g. a plasmid) is linearized and transformed into the yeast cells. Through homologous recombination between the plasmid and the yeast genome, at the two homologous recombination sites, a reciprocal exchange of the DNA content occurs between the wild type gene in the yeast genome and the mutant gene (including the selection marker gene(s)) that is flanked by the two homologous sequence segments. By selecting for the one or more selection markers, the surviving yeast cells will be those cells in which the wild-type gene has been replaced by the mutant gene (Pearson et al., Yeast, 1998, 14: 391, incorporated by reference in its entirety). This mechanism has been used to make systematic mutations in all 6,000 yeast genes, or open reading frames (ORFs), for functional genomics studies. Because the exchange is reciprocal, a similar approach has also been used successfully to clone yeast genomic DNA fragments into a plasmid vector (Iwasaki et al., Gene, 1991, 109: 81, incorporated by reference in its entirety).

By utilizing the endogenous homologous recombination machinery present in yeast, gene fragments or synthetic oligonucleotides can also be cloned into a plasmid vector without a ligation step. In this application of homologous recombination, a target gene fragment (i.e., the fragment to be inserted into a plasmid vector, e.g., a CDR3) is obtained (e.g., by oligonucleotides synthesis. PCR amplification, restriction digestion out of another vector, etc.). DNA sequences that are homologous to selected regions of the plasmid vector are added to the 5′ and 3′ ends of the target gene fragment. These homologous regions may be fully synthetic, or added via PCR amplification of a target gene fragment with primers that incorporate the homologous sequences. The plasmid vector may include a positive selection marker, such as a nutritional enzyme allele (e.g., URA3), or an antibiotic resistance marker (e.g., Geneticin/G418). The plasmid vector is then linearized by a unique restriction cut located in-between the regions of sequence homology shared with the target gene fragment, thereby creating an artificial gap at the cleavage site. The linearized plasmid vector and the target gene fragment flanked by sequences homologous to the plasmid vector are co-transformed into a yeast host strain. The yeast is then able to recognize the two stretches of sequence homology between the vector and target gene fragment and facilitate a reciprocal exchange of DNA content through homologous recombination at the gap. As a consequence, the target gene fragment is inserted into the vector without ligation.

The method described above has also been demonstrated to work when the target gene fragments are in the form of single stranded DNA, for example, as a circular M13 phage derived form, or as single stranded oligonucleotides (Simon and Moore, Mol. Cell Biol., 1987, 7: 2329; Ivanov et al., Genetics, 1996, 142: 693; and DeMarini et al., 2001, 30: 520, each incorporated by reference in its entirety). Thus, the form of the target that can be recombined into the gapped vector can be double stranded or single stranded, and derived from chemical synthesis, PCR, restriction digestion, or other methods.

Several factors may influence the efficiency of homologous recombination in yeast. For example, the efficiency of the gap repair is correlated with the length of the homologous sequences flanking both the linearized vector and the target gene. In certain embodiments, about 20 or more base pairs may be used for the length of the homologous sequence, and about 80 base pairs may give a near-optimized result (Hua et al., Plasmid, 1997, 38: 91; Raymond et al., Genome Res., 2002, 12: 190, each incorporated by reference in its entirety). In certain embodiments of the invention, at least about 5, 10, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 35, 36, 37, 38, 39, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 187, 190, or 200 homologous base pairs may be used to facilitate recombination. In other embodiments, between about 20 and about 40 base pairs are utilized. In addition, the reciprocal exchange between the vector and gene fragment is strictly sequence-dependent, i.e. it does not cause a frame shift. Therefore, gap-repair cloning assures the insertion of gene fragments with both high efficiency and precision. The high efficiency makes it possible to clone two, three, or more targeted gene fragments simultaneously into the same vector in one transformation attempt (Raymond et al., Biotechniques, 1999, 26: 134, incorporated by reference in its entirety). Moreover, the nature of precision sequence conservation through homologous recombination makes it possible to clone selected genes or gene fragments into expression or fusion vectors for direct functional examination (El-Deiry et al., Nature Genetics, 1992, 1: 4549; Ishioka et al., PNAS, 1997, 94: 2449, each incorporated by reference in its entirety).

Libraries of gene fragments have also been constructed in yeast using homologous recombination. For example, a human brain cDNA library was constructed as a two-hybrid fusion library in vector pJG4-5 (Guidotti and Zervos, Yeast, 1999, 15: 715, incorporated by reference in its entirety). It has also been reported that a total of 6,000 pairs of PCR primers were used for amplification of 6,000 known yeast ORFs for a study of yeast genomic protein interactions (Hudson et al., Genome Res., 1997, 7: 1169, incorporated by reference in its entirety). In 2000, Uetz et al. conducted a comprehensive analysis-of protein-protein interactions in Saccharomyces cerevisiae (Uetz et al., Nature, 2000, 403: 623, incorporated by reference in its entirety). The protein-protein interaction map of the budding yeast was studied by using a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins (Ito et al., PNAS, 2000, 97: 1143, incorporated by reference in its entirety), and the genomic protein linkage map of Vaccinia virus was studied using this system (McCraith et al., PNAS, 2000, 97: 4879, incorporated by reference in its entirety).

In certain embodiments of the invention, a synthetic CDR3 (heavy or light chain) may be joined by homologous recombination with a vector encoding a heavy or light chain chassis, a portion of FRM4, and a constant region, to form a full-length heavy or light chain. In certain embodiments of the invention, the homologous recombination is performed directly in yeast cells. In some embodiments, the method comprises:

- (a) transforming into yeast cells:
  - (i) a linearized vector encoding a heavy or light chain chassis, a portion of FRM4, and a constant region, wherein the site of linearization is between the end of FRM3 of the chassis and the beginning of the constant region; and
  - (ii) a library of CDR3 insert nucleotide sequences that are linear and double stranded, wherein each of the CDR3 insert sequences comprises a nucleotide sequence encoding CDR3 and 5′- and 3′-flanking sequences that are sufficiently homologous to the termini of the vector of (i) at the site of linearization to enable homologous recombination to occur between the vector and the library of CDR3 insert sequences; and
- (b) allowing homologous recombination to occur between the vector and the CDR3 insert sequences in the transformed yeast cells, such that the CDR3 insert sequences are incorporated into the vector, to produce a vector encoding full-length heavy chain or light chain.

As specified above, the CDR3 inserts may have a 5′ flanking sequence and a 3′ flanking sequence that are homologous to the termini of the linearized vector. When the CDR3 inserts and the linearized vectors are introduced into a host cell, for example, a yeast cell, the “gap” (the linearization site) created by linearization of the vector is filled by the CDR3 fragment insert through recombination of the homologous sequences at the 5′ and 3′ termini of these two linear double-stranded DNAs (i.e., the vector and the insert). Through this event of homologous recombination, libraries of circular vectors encoding full-length heavy or light chains comprising variable CDR3 inserts is generated. Particular instances of these methods are presented in the Examples.

Subsequent analysis may be carried out to determine the efficiency of homologous recombination that results in correct insertion of the CDR3 sequences into the vectors. For example, PCR amplification of the CDR3 inserts directly from selected yeast clones may reveal how many clones are recombinant. In certain embodiments, libraries with minimum of about 90% recombinant clones are utilized. In certain other embodiments libraries with a minimum of about 1%, 5% 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% recombinant clones are utilized. The same PCR amplification of selected clones may also reveal the insert size.

To verify the sequence diversity of the inserts in the selected clones, a PCR amplification product with the correct size of insert may be “fingerprinted” with restriction enzymes known to cut or not cut within the amplified region. From a gel electrophoresis pattern, it may be determined whether the clones analyzed are of the same identity or of the distinct or diversified identity. The PCR products may also be sequenced directly to reveal the identity of inserts and the fidelity of the cloning procedure, and to prove the independence and diversity of the clones. FIG. 1 depicts a schematic of recombination between a fragment (e.g., CDR3) and a vector (e.g., comprising a chassis, portion of FRM4, and constant region) for the construction of a library.

2.6. Expression and Screening Systems

Libraries of polynucleotides generated by any of the techniques described herein, or other suitable techniques, can be expressed and screened to identify antibodies having desired structure and/or activity. Expression of the antibodies can be carried out, for example, using cell-free extracts (and e.g., ribosome display), phage display, prokaryotic cells (e.g., bacterial display), or eukaryotic cells (e.g., yeast display). In certain embodiments of the invention, the antibody libraries are expressed in yeast.

In other embodiments, the polynucleotides are engineered to serve as templates that can be expressed in a cell-free extract. Vectors and extracts as described, for example in U.S. Pat. Nos. 5,324,637; 5,492,817; 5,665,563, (each incorporated by reference in its entirety) can be used and many are commercially available. Ribosome display and other cell-free techniques for linking a polynucleotide (i.e., a genotype) to a polypeptide (i.e., a phenotype) can be used, e.g., Profusion™ (see, e.g., U.S. Pat. Nos. 6,348,315; 6,261,804; 6,258,558; and 6,214,553, each incorporated by reference in its entirety).

Alternatively, the polynucleotides of the invention can be expressed in an E coli expression system, such as that described by Pluckthun and Skerra. (Meth. Enzymol., 1989, 178: 476; Biotechnology, 1991, 9: 273, each incorporated by reference in its entirety). The mutant proteins can be expressed for secretion in the medium and/or in the cytoplasm of the bacteria, as described by Better and Horwitz, Meth Enzymol., 1989, 178: 476, incorporated by reference in its entirety. In some embodiments, the single domains encoding VH and VL are each attached to the 3′ end of a sequence encoding a signal sequence, such as the ompA, phoA or pelB signal sequence (Lei et al., J. Bacteriol., 1987, 169: 4379, incorporated by reference in its entirety). These gene fusions are assembled in a dicistronic construct, so that they can be expressed from a single vector, and secreted into the periplasmic space of E. coli where they will refold and can be recovered in active form. (Skerra el al, Biotechnology, 1991, 9: 273, incorporated by reference in its entirety). For example, antibody heavy chain genes can be concurrently expressed with antibody light chain genes to produce antibodies or antibody fragments.

In other embodiments of the invention, the antibody sequences are expressed on the membrane surface of a prokaryote, e.g., E. coli, using a secretion signal and lipidation moiety as described, e.g., in US20040072740; US20030100023; and US20030036092 (each incorporated by reference in its entirety).

Higher eukaryotic cells, such as mammalian cells, for example myeloma cells (e.g., NS/0 cells), hybridoma cells, Chinese hamster ovary (CHO), and human embryonic kidney (HEK) cells, can also be used for expression of the antibodies of the invention. Typically, antibodies expressed in mammalian cells are designed to be secreted into the culture medium, or expressed on the surface of the cell. The antibody or antibody fragments can be produced, for example, as intact antibody molecules or as individual VH and VL fragments, Fab fragments, single domains, or as single chains (scFv) (Huston et al., PNAS, 1988, 85: 5879, incorporated by reference in its entirety).

Alternatively, antibodies can be expressed and screened by anchored periplasmic expression (APEx 2-hybrid surface display), as described, for example, in Jeong et al., PNAS, 2007, 104: 8247 (incorporated by reference in its entirety) or by other anchoring methods as described, for example, in Mazor et al., Nature Biotechnology, 2007, 25: 563 (incorporated by reference in its entirety).

In other embodiments of the invention, antibodies can be selected using mammalian cell display (Ho et al., PNAS, 2006, 103: 9637, incorporated by reference in its entirety).

The screening of the antibodies derived from the libraries of the invention can be carried out by any appropriate means. For example, binding activity can be evaluated by standard immunoassay and/or affinity chromatography. Screening of the antibodies of the invention for catalytic function, e.g., proteolytic function can be accomplished using a standard assays, e.g., the hemoglobin plaque assay as described in U.S. Pat. No. 5,798,208 (incorporated by reference in its entirety). Determining the ability of candidate antibodies to bind therapeutic targets can be assayed in vitro using, e.g., a BIACORE™ instrument, which measures binding rates of an antibody to a given target or antigen based on surface plasmon resonance. In vivo assays can be conducted using any of a number of animal models and then subsequently tested, as appropriate, in humans. Cell-based biological assays are also contemplated.

One aspect of the instant invention is the speed at which the antibodies of the library can be expressed and screened. In certain embodiments of the invention, the antibody library can be expressed in yeast, which have a doubling time of less than about 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, or 24 hours. In some embodiments, the doubling times are about 1 to about 3 hours, about 2 to about 4, about 3 to about 8 hours, about 3 to about 24, about 5 to about 24, about 4 to about 6 about 5 to about 22, about 6 to about 8, about 7 to about 22, about 8 to about 10 hours, about 7 to about 20, about 9 to about 20, about 9 to about 18, about 11 to about 18, about 11 to about 16, about 13 to about 16, about 16 to about 20, or about 20 to about 30 hours. In certain embodiments of the invention, the antibody library is expressed in yeast with a doubling time of about 16 to about 20 hours, about 8 to about 16 hours, or about 4 to about 8 hours. Thus, the antibody library of the instant invention can be expressed and screened in a matter of hours, as compared to previously known techniques which take several days to express and screen antibody libraries. A limiting step in the throughput of such screening processes in mammalian cells is simply the time required to iteratively regrow populations of isolated cells, which, in some cases, have doubling times greater than the doubling times of the yeast used in the current invention.

In certain embodiments of the invention, the composition of a library may be defined after one or more enrichment steps (for example by screening for antigen binding, or other properties). For example, a library with a composition comprising about x % sequences or libraries of the invention may be enriched to contain about 2x %, 3x %, 4x %, 5x %, 6x %, 7x %, 8x %, 9x %, 10x %, 20x %, 25x %, 40x %, 50x %, 60x % 75x %, 80x %, 90x %, 95x %, or 99x % sequences or libraries of the invention, after one or more screening steps. In other embodiments of the invention, the sequences or libraries of the invention may be enriched about 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 100-fold, 1,000-fold, or more, relative to their occurrence prior to the one or more enrichment steps. In certain embodiments of the invention, a library may contain at least a certain number of a particular type of sequence(s), such as CDRH3s, CDRL3s, heavy chains, light chains, or whole antibodies (e.g., at least about 10³, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, 10¹¹, 10¹², 10¹³, 10¹⁴, 10¹⁵, 10¹⁶, 10¹⁷, 10¹⁸, 10¹⁹, or 10²⁰). In certain embodiments, these sequences may be enriched during one or more enrichment steps, to provide libraries comprising at least about 10², 10³, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, 10¹¹, 10¹², 10¹³, 10¹⁴, 10¹⁵, 10¹⁶, 10¹⁷, 10¹⁸, or 10¹⁹of the respective sequence(s).

2.7. Mutagenesis Approaches for Affinity Maturation

As described above, antibody leads can be identified through a selection process that involves screening the antibodies of a library of the invention for binding to one or more antigens, or for a biological activity. The coding sequences of these antibody leads may be further mutagenized in vitro or in vivo to generate secondary libraries with diversity introduced in the context of the initial antibody leads. The mutagenized antibody leads can then be further screened for binding to target antigens or biological activity, in vitro or in vivo, following procedures similar to those used for the selection of the initial antibody lead from the primary library. Such mutagenesis and selection of primary antibody leads effectively mimics the affinity maturation process naturally occurring in a mammal that produces antibodies with progressive increases in the affinity to an antigen. In one embodiment of the invention, only the CDRH3 region is mutagenized. In another embodiment of the invention, the whole variable region is mutagenized. In other embodiments of the invention one or more of CDRH1, CDRH2, CDRH3, CDRL1, CDRL2, and/CDRL3 may be mutagenized. In some embodiments of the invention, “light chain shuffling” may be used as part of the affinity maturation protocol. In certain embodiments, this may involve pairing one or more heavy chains with a number of light chains, to select light chains that enhance the affinity and/or biological activity of an antibody. In certain embodiments of the invention, the number of light chains to which the one or more heavy chains can be paired is at least about 2, 5, 10, 100, 1000, 10⁴, 10⁵, 10⁶, 10⁷, 10⁸, 10⁹, or 10¹⁰. In certain embodiments of the invention, these light chains are encoded by plasmids. In other embodiments of the invention, the light chains may be integrated into the genome of the host cell.

The coding sequences of the antibody leads may be mutagenized by a wide variety of methods. Examples of methods of mutagenesis include, but are not limited to site-directed mutagenesis, error-prone PCR mutagenesis, cassette mutagenesis, and random PCR mutagenesis. Alternatively, oligonucleotides encoding regions with the desired mutations can be synthesized and introduced into the sequence to be mutagenized, for example, via recombination or ligation.

Site-directed mutagenesis or point mutagenesis may be used to gradually change the CDR sequences in specific regions. This may be accomplished by using oligonucleotide-directed mutagenesis or PCR. For example, a short sequence of an antibody lead may be replaced with a synthetically mutagenized oligonucleotide in either the heavy chain or light chain region, or both. The method may not be efficient for mutagenizing large numbers of CDR sequences, but may be used for fine tuning of a particular lead to achieve higher affinity toward a specific target protein.

Cassette mutagenesis may also be used to mutagenize the CDR sequences in specific regions. In a typical cassette mutagenesis, a sequence block, or a region, of a single template is replaced by a completely or partially randomized sequence. However, the maximum information content that can be obtained may be statistically limited by the number of random sequences of the oligonucleotides. Similar to point mutagenesis, this method may also be used for fine tuning of a particular lead to achieve higher affinity towards a specific target protein.

Error-prone PCR, or “poison” PCR, may be used to mutagenize the CDR sequences by following protocols described in Caldwell and Joyce, PCR Methods and Applications, 1992, 2: 28; Leung et al., Technique, 1989, 1: 11; Shafikhani et al., Biotechniques, 1997, 23: 304; and Stemmer et al., PNAS, 1994, 91: 10747 (each of which is incorporated by reference in its entirety).

Conditions for error prone PCR may include (a) high concentrations of Mn²⁺ (e.g., about 0.4 to about 0.6 mM) that efficiently induces malfunction of Taq DNA polymerase; and (b) a disproportionally high concentration of one nucleotide substrate (e.g., dGTP) in the PCR reaction that causes incorrect incorporation of this high concentration substrate into the template and produces mutations. Additionally, other factors such as, the number of PCR cycles, the species of DNA polymerase used, and the length of the template, may affect the rate of misincorporation of “wrong” nucleotides into the PCR product. Commercially available kits may be utilized for the mutagenesis of the selected antibody library, such as the “Diversity PCR random mutagenesis kit” (CLONTECH™).

The primer pairs used in PCR-based mutagenesis may, in certain embodiments, include regions matched with the homologous recombination sites in the expression vectors. This design allows facile re-introduction of the PCR products back into the heavy or light chain chassis vectors, after mutagenesis, via homologous recombination.

Other PCR-based mutagenesis methods can also be used, alone or in conjunction with the error prone PCR described above. For example, the PCR amplified CDR segments may be digested with DNase to create nicks in the double stranded DNA. These nicks can be expanded into gaps by other exonucleases such as Bal 31. The gaps may then be filled by random sequences by using DNA Klenow polymerase at a low concentration of regular substrates dGTP, dATP, dTTP, and dCTP with one substrate (e.g., dGTP) at a disproportionately high concentration. This fill-in reaction should produce high frequency mutations in the filled gap regions. These method of DNase digestion may be used in conjunction with error prone PCR to create a high frequency of mutations in the desired CDR segments.

The CDR or antibody segments amplified from the primary antibody leads may also be mutagenized in vivo by exploiting the inherent ability of mutation in pre-B cells. The Ig genes in pre-B cells are specifically susceptible to a high-rate of mutation. The Ig promoter and enhancer facilitate such high rate mutations in a pre-B cell environment while the pre-B cells proliferate. Accordingly. CDR gene segments may be cloned into a mammalian expression vector that contains a human Ig enhancer and promoter. This construct may be introduced into a pre-B cell line, such as 38B9, which allows the mutation of the VH and VL gene segments naturally in the pre-B cells (Liu and Van Ness, Mol. Immunol., 1999, 36: 461, incorporated by reference in its entirety). The mutagenized CDR segments can be amplified from the cultured pre-B cell line and re-introduced back into the chassis-containing vector(s) via, for example, homologous recombination.

In some embodiments, a CDR “hit” isolated from screening the library can be re-synthesized, using degenerate codons or trinucleotides, and re-cloned into the heavy or light chain vector using gap repair.

3. Library Sampling

In certain embodiments of the invention, a library of the invention comprises a designed, non-random repertoire wherein the theoretical diversity of particular components of the library (for example, CDRH3), but not necessarily all components or the entire library, can be over-sampled in a physical realization of the library, at a level where there is a certain degree of statistical confidence (e.g., 95%) that any given member of the theoretical library is present in the physical realization of the library at least at a certain frequency (e.g., at least once, twice, three times, four times, five times, or more) in the library.

In a library, it is generally assumed that the number of copies of a given clone obeys a Poisson probability distribution (see Feller, W, An Introduction to Probability Theory and Its Applications, 1968, Wiley New York, incorporated by reference in its entirety). The probability of a Poisson random number being zero, corresponding to the probability of missing a given component member in an instance of a library (see below), is e^−Nwhere N is the average of the random number. For example, if there are 10⁶possible theoretical members of a library and a physical realization of the library has 10⁷members, with an equal probability of each member of the theoretical library being sampled, then the average number of times that each member occurs in the physical realization of the library is 10⁷/10⁶=10, and the probability that the number of copies of a given member is zero is e^−N=e⁻¹⁰=0.000045; or a 99.9955% chance that there is at least one copy of any of the 10⁶theoretical members in this 10× oversampled library. For a 2.3× oversampled library one is 90% confident that a given component is present. For a 3× oversampled library one is 95% confident that a given component is present. For a 4.6× oversampled library one is 99% confident a given clone is present, and so on.

Therefore, if M is the maximum number of theoretical library members that can be feasibly physically realized, then M/3 is the maximum theoretical repertoire size for which one can be 95% confident that any given member of the theoretical library will be sampled. It is important to note that there is a difference between a 95% chance that a given member is represented and a 95% chance that every possible member is represented. In certain embodiments, the instant invention provides a rationally designed library with diversity so that any given member is 95% likely to be represented in a physical realization of the library. In other embodiments of the invention, the library is designed so that any given member is at least about 0.0001%, 0.001%, 0.01%, 0.1%, 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, 99%, 99.5%, or 99.9% likely to be represented in a physical realization of the library. For a review, see, e.g., Firth and Patrick, Biomol. Eng., 2005, 22: 105, and Patrick et al., Protein Engineering, 2003, 16: 451, each of which is incorporated by reference in its entirety.

In certain embodiments of the invention, a library may have a theoretical total diversity of X unique members and the physical realization of the theoretical total diversity may contain at least about 1×, 2×, 3×, 4×, 5×, 6×, 7×, 8×9×, 10×, or more members. In some embodiments, the physical realization of the theoretical total diversity may contain about 1× to about 2×, about 2× to about 3×, about 3× to about 4×, about 4× to about 5×, about 5× to about 6×members. In other embodiments, the physical realization of the theoretical total diversity may contain about 1× to about 3×, or about 3× to about 5× total members.

An assumption underlying all directed evolution experiments is that the amount of molecular diversity theoretically possible is enormous compared with the ability to synthesize it, physically realize it, and screen it. The likelihood of finding a variant with improved properties in a given library is maximized when that library is maximally diverse. Patrick et al. used simple statistics to derive a series of equations and computer algorithms for estimating the number of unique sequence variants in libraries constructed by randomized oligonucleotide mutagenesis, error-prone PCR and in vitro recombination. They have written a suite of programs for calculating library statistics, such as GLUE, GLUE-IT, PEDEL, PEDEL-AA, and DRIVeR. These programs are described, with instructions on how to access them, in Patrick et al., Protein Engineering, 2003, 16: 451 and Firth et al., Nucleic Acids Res., 2008, 36: W281 (each of which is incorporated by reference in its entirety).

It is possible to construct a physical realization of a library in which some components of the theoretical diversity (such as CDRH3) are oversampled, while other aspects (VH/VL pairings) are not. For example, consider a library in which 10⁸CDRH3 segments are designed to be present in a single VH chassis, and then paired with 10⁵VL genes to produce 10¹³(=10⁸*10⁵) possible full heterodimeric antibodies. If a physical realization of this library is constructed with a diversity of 10⁹transformant clones, then the CDRH3 diversity is oversampled ten-fold (=10⁹/10⁸), however the possible VH/VL pairings are undersampled by 10⁻⁴(=10⁹/10¹³). In this example, on average, each CDRH3 is paired only with 10 samples of the VL from the possible 10⁵partners. In certain embodiments of the invention, it is the CDRH3 diversity that is preferably oversampled.

3.1. Other Variants of the Polynucleotide Sequences of the Invention

In certain embodiments, the invention relates to a polynucleotide that hybridizes with a polynucleotide taught herein, or that hybridizes with the complement of a polynucleotide taught herein. For example, an isolated polynucleotide that remains hybridized after hybridization and washing under low, medium, or high stringency conditions to a polynucleotide taught herein or the complement of a polynucleotide taught herein is encompassed by the present invention.

Exemplary low stringency conditions include hybridization with a buffer solution of about 30% to about 35% formamide, about 1 M NaCl, about 1% SDS (sodium dodecyl sulphate) at about 37° C., and a wash in about 1× to about 2×SSC (20×SSC=3.0 M NaCl/0.3 M trisodium citrate) at about 50° C. to about 55° C.

Exemplary moderate stringency conditions include hybridization in about 40% to about 45% formamide, about 1 M NaCl, about 1% SDS at about 37° C., and a wash in about 0.5× to about 1×SSC at abut 55° C. to about 60° C.

Exemplary high stringency conditions include hybridization in about 50% formamide, about 1 M NaCl, about 1% SDS at about 37° C., and a wash in about 0.1×SSC at about 60° C. to about 65° C.

Optionally, wash buffers may comprise about 0.1% to about 1% SDS.

The duration of hybridization is generally less than about 24 hours, usually about 4 to about 12 hours.

3.2. Sub-Libraries and Larger Libraries Comprising the Libraries or Sub-Libraries of the Invention

As described throughout the application, the libraries of the current invention are distinguished, in certain embodiments, by their human-like sequence composition and length, and the ability to generate a physical realization of the library which contains all members of (or, in some cases, even oversamples) a particular component of the library. Libraries comprising combinations of the libraries described herein (e.g., CDRH3 and CDRL3 libraries) are encompassed by the invention. Sub-libraries comprising portions of the libraries described herein are also encompassed by the invention (e.g., a CDRH3 library in a particular heavy chain chassis or a sub-set of the CDRH3 libraries). One of ordinary skill in the art will readily recognize that each of the libraries described herein has several components (e.g., CDRH3, VH, CDRL3, VL, etc.), and that the diversity of these components can be varied to produce sub-libraries that fall within the scope of the invention.

Moreover, libraries containing one of the libraries or sub-libraries of the invention also fall within the scope of the invention. For example, in certain embodiments of the invention, one or more libraries or sub-libraries of the invention may be contained within a larger library, which may include sequences derived by other means, for example, non-human or human sequence derived by stochastic or semi-stochastic synthesis. In certain embodiments of the invention, at least about 1% of the sequences in a polynucleotide library may be those of the invention (e.g., CDRH3 sequences. CDRL3 sequences. VH sequences, VL sequences), regardless of the composition of the other 99% of sequences. In other embodiments of the invention, at least about 0.001%, 0.01%, 0.1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91,%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% of the sequences in any polynucleotide library may be those of the invention, regardless of the composition of the other sequences. In some embodiments, the sequences of the invention may comprise about 0.001% to about 1%, about 1% to about 2%, about 2% to about 5%, about 5% to about 10%, about 10% to about 15%, about 15% to about 20%, about 20% to about 25%, about 25% to about 30%, about 30% to about 35%, about 35% to about 40%, about 40% to about 45%, about 45% to about 50%, about 50% to about 55%, about 55% to about 60%, about 60% to about 65%, about 65% to about 70%, about 70% to about 75%, about 75% to about 80%, about 80% to about 85%, about 85% to about 90%, about 90% to about 95%, or about 95% to about 99% of the sequences in any polynucleotide library, regardless of the composition of the other sequences. Thus, libraries more diverse than one or more libraries or sub-libraries of the invention, but yet still comprising one or more libraries or sub-libraries of the invention, in an amount in which the one or more libraries or sub-libraries of the invention can be effectively screened and from which sequences encoded by the one or more libraries or sub-libraries of the invention can be isolated, also fall within the scope of the invention.

3.3. Alternative Scaffolds

In certain embodiments of the invention, the amino acid products of a library of the invention (e.g., a CDRH3 or CDRL3) may be displayed on an alternative scaffold. Several of these scaffolds have been shown to yield molecules with specificities and affinities that rival those of antibodies. Exemplary alternative scaffolds include those derived from fibronectin (e.g., AdNectin), the β-sandwich (e.g., iMab), lipocalin (e.g., Anticalin), EETI-II/AGRP, BPTI/LACI-DI/ITI-D2 (e.g., Kunitz domain), thioredoxin (e.g., peptide aptamer), protein A (e.g., Affibody), ankyrin repeats (e.g., DARPin), γB-crystallin/ubiquitin (e.g., Affilin), CTLD₃(e.g., Tetranectin), and (LDLR-A module)₃(e.g., Avimers). Additional information on alternative scaffolds are provided in Binz et al., Nat. Biotechnol., 2005 23: 1257 and Skerra, Current Opin. in Biotech., 2007 18: 295-304, each of which is incorporated by reference in its entirety.

4. Other Embodiments of the Invention

In certain embodiments, the invention comprises a synthetic preimmune human antibody CDRH3 library comprising 10′ to 10⁸polynucleotide sequences representative of the sequence diversity and length diversity found in known heavy chain CDR3 sequences.

In other embodiments, the invention comprises a synthetic preimmune human antibody CDRH3 library comprising polynucleotide sequences encoding CDRH3 represented by the following formula:

[G/D/E/-][N1][DH][N2][H3-JH],

wherein [G/D/E/-] is zero to one amino acids in length, [N1] is zero to three amino acids, [DH] is three to ten amino acids in length, [N2] is zero to three amino acids in length, and [H3-JH] is two to nine amino acids in length.

In certain embodiments of the invention, [G/D/E/-] is represented by an amino acid sequence selected from the group consisting of: G, D, E, and nothing.

In some embodiments of the invention, [N1] is represented by an amino acid sequence selected from the group consisting of: G, R, S, P, L, A, V, T, (G/P)(G/R/S/P/L/A/V/T), (R/S/L/A/V/T)(G/P), GG(G/R/S/P/L/A/V/T), G(R/S/P/L/A/V/T)G, (R/S/P/L/A/V/T)GG, and nothing.

In certain embodiments of the invention. [N2] is represented by an amino acid sequence selected from the group consisting of: G, R, S, P, L, A, V, T, (G/P)(G/R/S/P/L/AN/T), (R/SL/AN/T)(G/P), GG(G/R/S/P/L/A/V/T), G(R/S/P/L/A/V/T)G, (R/S/P/L/A/V/T)GG, and nothing.

In some embodiments of the invention. [DH] comprises a sequence selected from the group consisting of: IGHD3-10 reading frame 1 (SEQ ID NO: 1), IGHD3-10 reading frame 2 (SEQ ID NO: 2), IGHD3-10 reading frame 3 (SEQ ID NO: 3), IGHD3-22 reading frame 2 (SEQ ID NO: 4), IGHD6-19 reading frame 1 (SEQ ID NO: 5), IGHD6-19 reading frame 2 (SEQ ID NO: 6), IGHD6-13 reading frame 1 (SEQ ID NO: 7), IGHD6-13 reading frame 2 (SEQ ID NO: 8), IGHD3-03 reading frame 3 (SEQ ID NO: 9), IGHD2-02 reading frame 2 (SEQ ID NO: 10), IGHD2-02 reading frame 3 (SEQ ID NO: 11), IGHD4-17 reading frame 2 (SEQ ID NO: 12). IGHD1-26 reading frame 1 (SEQ ID NO: 13), IGHD1-26 reading frame 3 (SEQ ID NO: 14), IGHD5-5/5-18 reading frame 3 (SEQ ID NO: 15), IGHD2-15 reading frame 2 (SEQ ID NO: 16), and all possible N-terminal and C-terminal truncations of the above-identified IGHDs down to three amino acids.

In certain embodiments of the invention, [H3-JH] comprises a sequence selected from the group consisting of: AEYFQH (SEQ ID NO: 17), EYFQH (SEQ ID NO: 583), YFQH (SEQ ID NO: 584), FQH, QH, YWYFDL (SEQ ID NO: 18), WYFDL (SEQ ID NO: 585), YFDL (SEQ ID NO: 586), FDL. DL, AFDV (SEQ ID NO: 19), FDV, DV, YFDY (SEQ ID NO: 20), FDY, DY, NWFDS (SEQ ID NO: 21), WFDS (SEQ ID NO: 587), FDS, DS, YYYYYGMDV (SEQ ID NO: 22), YYYYGMDV (SEQ ID NO: 588), YYYGMDV (SEQ ID NO: 589), YYGMDV (SEQ ID NO: 590), YGMDV (SEQ ID NO: 591). GMDV (SEQ ID NO: 592). MDV, and DV.

In some embodiments of the invention, the sequences represented by [G/D/E/-][N1][ext-DH][N2][H3-JH] comprise a sequence of about 3 to about 26 amino acids in length.

In certain embodiments of the invention, the sequences represented by [G/D/E/-][N1][ext-DH][N2][H3-JH] comprise a sequence of about 7 to about 23 amino acids in length.

In some embodiments of the invention, the library comprises about 10⁷to about 10¹⁰sequences.

In certain embodiments of the invention, the library comprises about 10⁷sequences.

In some embodiments of the invention, the polynucleotide sequences of the libraries further comprise a 5′ polynucleotide sequence encoding a framework 3 (FRM3) region on the corresponding N-terminal end of the library sequence, wherein the FRM3 region comprises a sequence of about 1 to about 9 amino acid residues.

In certain embodiments of the invention, the FRM3 region comprises a sequence selected from the group consisting of CAR, CAK, and CAT.

In some embodiments of the invention, the polynucleotide sequences further comprise a 3′ polynucleotide sequence encoding a framework 4 (FRM4) region on the corresponding C-terminal end of the library sequence, wherein the FRM4 region comprises a sequence of about 1 to about 9 amino acid residues.

In certain embodiments of the invention, the library comprises a FRM4 region comprising a sequence selected from WGRG (SEQ ID NO: 23) and WGQG (SEQ ID NO: 23).

In some embodiments of the invention, the polynucleotide sequences further comprise an FRM3 region coding for a corresponding polypeptide sequence comprising a sequence selected from the group consisting of CAR, CAK, and CAT; and an FRM4 region coding for a corresponding polypeptide sequence comprising a sequence selected from WGRG (SEQ ID NO: 23) and WGQG (SEQ ID NO: 23).

In certain embodiments of the invention, the polynucleotide sequences further comprise 5′ and 3′ sequences which facilitate homologous recombination with a heavy chain chassis.

In some embodiments, the invention comprises a synthetic preimmune human antibody light chain library comprising polynucleotide sequences encoding human antibody kappa light chains represented by the formula:

[IGKV(1-95)][F/l/R/W/Y][JK].

In certain embodiments of the invention. [IGKV (1-95)] is selected from the group consisting of IGKV3-20 (SEQ ID NO: 237) (1-95), IGKV1-39 (SEQ ID NO: 233) (1-95), IGKV3-11 (SEQ ID NO: 235) (1-95), IGKV3-15 (SEQ ID NO: 236) (1-95), IGKV1-05 (SEQ ID NO: 229) (1-95), IGKV4-01 (1-95). IGKV2-28 (SEQ ID NO: 234) (1-95), IGKV1-33 (1-95), IGKV1-09 (SEQ ID NO: 454) (1-95). IGKV1-12 (SEQ ID NO: 230) (1-95), IGKV2-30 (SEQ ID NO: 467) (1-95), IGKV1-27 (SEQ ID NO: 231) (1-95), IGKV1-16 (SEQ ID NO: 456) (1-95), and truncations of said group up to and including position 95 according to Kabat.

In some embodiments of the invention, [F/L/I/R/W/Y] is an amino acid selected from the group consisting of F, L, I, R, W, and Y.

In certain embodiments of the invention, [JK] comprises a sequence selected from the group consisting of TFGQGTKVEIK (SEQ ID NO: 528) and TFGGGT (SEQ ID NO: 529).

In some embodiments of the invention, the light chain library comprises a kappa light chain library.

In certain embodiments of the invention, the polynucleotide sequences further comprise 5′ and 3′ sequences which facilitate homologous recombination with a light chain chassis.

In some embodiments, the invention comprises a method for producing a synthetic preimmune human antibody CDRH3 library comprising 10⁷to 10⁸polynucleotide sequences, said method comprising:

- a) selecting the CDRH3 polynucleotide sequences encoded by the CDRH3 sequences, as follows:
  - {0 to 5 amino acids selected from the group consisting of fewer than ten of the amino acids preferentially encoded by terminal deoxynucleotidyl transferase (TdT) and preferentially functionally expressed by human B cells}, followed by
  - {all possible N or C-terminal truncations of IGHD alone and all possible combinations of N and C-terminal truncations}, followed by
  - {0 to 5 amino acids selected from the group consisting of fewer than ten of the amino acids preferentially encoded by TdT and preferentially functionally expressed by human B cells}, followed by
  - {all possible N-terminal truncations of IGHJ, down to DXWG, wherein X is S, V, L, or Y}; and
- b) synthesizing the CDRH3 library described in a) by chemical synthesis, wherein a synthetic preimmune human antibody CDRH3 library is produced.

In certain embodiments, the invention comprises a synthetic preimmune human antibody CDRH3 library comprising 10⁷to 10¹⁰polynucleotide sequences representative of known human IGHD and IGHJ germline sequences encoding CDRH3, represented by the following formula:

- {0 to 5 amino acids selected from the group consisting of fewer than ten of the amino acids preferentially encoded by terminal deoxynucleotidyl transferase (TdT) and preferentially functionally expressed by human B cells}, followed by
- {all possible N or C-terminal truncations of IGHD alone and all possible combinations of N and C-terminal truncations}, followed by
- {0 to 5 amino acids selected from the group consisting of fewer than ten of the amino acids preferentially encoded by TdT and preferentially functionally expressed by human B cells}, followed by
- {all possible N-terminal truncations of IGHJ, down to DXWG (SEQ ID NO: 530), wherein X is S, V, L, or Y}.

In certain embodiments, the invention comprises a synthetic preimmune human antibody heavy chain variable domain library comprising 10⁷to 10¹⁰polynucleotide sequences encoding human antibody heavy chain variable domains, said library comprising:

- a) an antibody heavy chain chassis, and
- b) a CDRH3 repertoire designed based on the human IGHD and IGHJ germline sequences, as follows:
  - {0 to 5 amino acids selected from the group consisting of fewer than ten of the amino acids preferentially encoded by terminal deoxynucleotidyl transferase (TdT) and preferentially functionally expressed by human B cells}, followed by
  - {all possible N or C-terminal truncations of IGHD alone and all possible combinations of N and C-terminal truncations}, followed by
  - {0 to 5 amino acids selected from the group consisting of fewer than ten of the amino acids preferentially encoded by TdT and preferentially functionally expressed by human B cells}, followed by
  - {all possible N-terminal truncations of IGHJ, down to DXWG (SEQ ID NO: 530), wherein X is S, V, L, or Y}.

In some embodiments of the invention, the synthetic preimmune human antibody heavy chain variable domain library is expressed as a full length chain selected from the group consisting of an IgG1 full length chain, an IgG2 full length chain, an IgG3 full length chain, and an IgG4 full length chain.

In certain embodiments of the invention, the human antibody heavy chain chassis is selected from the group consisting of IGHV4-34 (SEQ ID NO: 35), IGHV3-23 (SEQ ID NO: 30), IGHV5-51 (SEQ ID NO: 40), IGHV1-69 (SEQ ID NO: 27), IGHV3-30 (SEQ ID NO: 31), IGHV4-39 (SEQ ID NO: 36), IGHV1-2 (SEQ ID NO: 24), IGHV1-18 (SEQ ID NO: 25). IGHV2-5 (SEQ ID NO: 429), IGHV2-70 (SEQ ID NO: 431, 432), IGHV3-7 (SEQ ID NO: 28), IGHV6-1 (SEQ ID NO: 449), IGHV1-46 (SEQ ID NO: 26), IGHV3-33 (SEQ ID NO: 32), IGHV4-31 (SEQ ID NO: 34), IGHV4-4 (SEQ ID NO: 446, 447), IGHV4-61 (SEQ ID NO: 38), and IGHV3-15 (SEQ ID NO: 29).

In some embodiments of the invention, the synthetic preimmune human antibody heavy chain variable domain library comprises 10⁷to 10¹⁰polynucleotide sequences encoding human antibody heavy chain variable domains, said library comprising:

- a) an antibody heavy chain chassis, and
- b) a synthetic preimmune human antibody CDRH3 library.

In some embodiments of the invention, the polynucleotide sequences are single-stranded coding polynucleotide sequences.

In certain embodiments of the invention, the polynucleotide sequences are single-stranded non-coding polynucleotide sequences.

In some embodiments of the invention, the polynucleotide sequences are double-stranded polynucleotide sequences.

In certain embodiments, the invention comprises a population of replicable cells with a doubling time of four hours or less, in which a synthetic preimmune human antibody repertoire is expressed.

In some embodiments of the invention, the population of replicable cells are yeast cells.

In certain embodiments, the invention comprises a method of generating a full-length antibody library comprising transforming a cell with a preimmune human antibody heavy chain variable domain library and a synthetic preimmune human antibody light chain library.

In some embodiments, the invention comprises a method of generating a full-length antibody library comprising transforming a cell with a preimmune human antibody heavy chain variable domain library and a synthetic preimmune human antibody light chain library.

In certain embodiments, the invention comprises a method of generating an antibody library comprising synthesizing polynucleotide sequences by split-pool DNA synthesis.

In some embodiments of the invention, the polynucleotide sequences are selected from the group consisting of single-stranded coding polynucleotide sequences, single-stranded non-coding polynucleotide sequences, and double-stranded polynucleotide sequences.

In certain embodiments, the invention comprises a synthetic full-length preimmune human antibody library comprising about 10⁷to about 10¹⁰polynucleotide sequences representative of the sequence diversity and length diversity found in known heavy chain CDR3 sequences.

In certain embodiments, the invention comprises a method of selecting an antibody of interest from a human antibody library, comprising providing a synthetic preimmune human antibody CDRH3 library comprising a theoretical diversity of (N) polynucleotide sequences representative of the sequence diversity and length diversity found in known heavy chain CDR3 sequences, wherein the physical realization of that diversity is an actual library of a size at least 3(N), thereby providing a 95% probability that a single antibody of interest is present in the library, and selecting an antibody of interest.

In some embodiments of the invention, the theoretical diversity is about 10⁷to about 10⁸polynucleotide sequences.

Examples

This invention is further illustrated by the following examples which should not be construed as limiting. The contents of all references, patents and published patent applications cited throughout this application are hereby incorporated by reference.

In general, the practice of the present invention employs, unless otherwise indicated, conventional techniques of chemistry, molecular biology, recombinant DNA technology, PCR technology, immunology (especially, e.g., antibody technology), expression systems (e.g., yeast expression, cell-free expression, phage display, ribosome display, and PROFUSION™), and any necessary cell culture that are within the skill of the art and are explained in the literature. See, e.g., Sambrook, Fritsch and Maniatis, Molecular Cloning: Cold Spring Harbor Laboratory Press (1989); DNA Cloning, Vols. 1 and 2, (D. N. Glover, Ed. 1985); Oligonucleotide Synthesis (M. J. Gait, Ed. 1984); PCR Handbook Current Protocols in Nucleic Acid Chemistry, Beaucage, Ed. John Wiley & Sons (1999) (Editor); Oxford Handbook of Nucleic Acid Structure, Neidle, Ed., Oxford Univ Press (1999); PCR Protocols: A Guide to Methods and Applications, Innis et al., Academic Press (1990); PCR Essential Techniques: Essential Techniques, Burke, Ed., John Wiley & Son Ltd (1996); The PCR Technique: RT-PCR, Siebert, Ed., Eaton Pub. Co. (1998); Antibody Engineering Protocols (Methods in Molecular Biology), 510, Paul, S., Humana Pr (1996); Antibody Engineering: A Practical Approach (Practical Approach Series, 169), McCafferty, Ed., Irl Pr (1996); Antibodies: A Laboratory Manual, Harlow et al., C. S. H. L. Press, Pub. (1999); Current Protocols in Molecular Biology, eds. Ausubel et al., John Wiley & Sons (1992); Large-Scale Mammalian Cell Culture Technology. Lubiniecki, A., Ed., Marcel Dekker, Pub., (1990); Phage Display: A Laboratory Manual, C. Barbas (Ed.), CSHL Press, (2001); Antibody Phage Display. P O'Brien (Ed.), Humana Press (2001) Border et al., Nature Biotechnology, 1997, 15: 553; Border et al., Methods Enzymol., 2000, 328: 430; ribosome display as described by Pluckthun et al. in U.S. Pat. No. 6,348,315, and Profusions™ as described by Szostak el al. in U.S. Pat. Nos. 6,258,558; 6,261,804; and 6,214,553; and bacterial periplasmic expression as described in US20040058403A1. Each of the references cited in this paragraph is incorporated by reference in its entirety.

Further details regarding antibody sequence analysis using Kabat conventions and programs to screen aligned nucleotide and amino acid sequences may be found, e.g., in Johnson et al., Methods Mol. Biol., 2004, 248: 11; Johnson et al., Int. Immunol., 1998, 10: 1801; Johnson et al., Methods Mol. Biol., 1995, 51: 1; Wu et al., Proteins, 1993, 16: 1; and Martin, Proteins, 1996, 25: 130. Each of the references cited in this paragraph is incorporated by reference in its entirety.

Further details regarding antibody sequence analysis using Chothia conventions may be found, e.g., in Chothia et al., J. Mol. Biol., 1998, 278: 457; Morea et al., Biophys. Chem., 1997, 68: 9; Morea et al., J. Mol. Biol., 1998, 275: 269; Al-Lazikani et al., J. Mol. Biol., 1997, 273: 927. Barre et al., Nat. Struct. Biol., 1994, 1: 915; Chothia et al., J. Mol. Biol., 1992, 227: 799; Chothia et al., Nature, 1989, 342: 877; and Chothia et al., J. Mol. Biol., 1987, 196: 901. Further analysis of CDRH3 conformation may be found in Shirai et al., FEBS Lett., 1999, 455: 188 and Shirai et al., FEBS Lett., 1996, 399: 1. Further details regarding Chothia analysis are described, for example, in Chothia et al., Cold Spring Harb. Symp. Quant Biol., 1987, 52: 399. Each of the references cited in this paragraph is incorporated by reference in its entirety.

Further details regarding CDR contact considerations are described, for example, in MacCallum et al., J. Mol. Biol., 1996, 262: 732, incorporated by reference in its entirety.

Further details regarding the antibody sequences and databases referred to herein are found, e.g., in Tomlinson et al., J. Mol. Biol., 1992, 227: 776, VBASE2 (Retter et al., Nucleic Acids Res., 2005, 33: D671); BLAST (wnw.ncbi.nlm.nih.gov/BLAST/); CDHIT (bioinformatics.ljcrf.edu/cd-hi/); EMBOSS (www.hgmp.mrc.ac.uk/Software/EMBOSS/); PHYLIP (evolution.genetics.washington.edu/phylip.html); and FASTA (fastabioch.virginia.edu). Each of the references cited in this paragraph is incorporated by reference in its entirety.

Example 1: Design of an Exemplary VH Chassis Library

This example demonstrates the selection and design of exemplary, non-limiting VH chassis sequences of the invention. VH chassis sequences were selected by examining collections of human IGHV germline sequences (Scaviner et al., Exp. Clin. Immunogenet., 1999, 16: 234; Tomlinson et al., J. Mol. Biol., 1992, 227: 799; Matsuda et al., J. Exp. Med., 1998, 188: 2151, each incorporated by reference in its entirety). As discussed in the Detailed Description, as well as below, a variety of criteria can be used to select VH chassis sequences, from these data sources or others, for inclusion in the library.

Table 3 (adapted from information provided in Scaviner et al., Exp. Clin. Immunogenet., 1999, 16: 234; Matsuda et al., J. Exp. Med., 1998, 188: 2151; and Wang et al. Immunol. Cell. Biol., 2008, 86: 111, each incorporated by reference in its entirety) lists the CDRH1 and CDRH2 length, the canonical structure and the estimated relative occurrence in peripheral blood, for the proteins encoded by each of the human IGHV germline sequences.

TABLE 3

IGHV Characteristics and Occurrence

in Antibodies from Peripheral Blood

Estimated Relative

IGHV
Length of
Length of
Canonical
Occurrence in

Germline
CDRH1
CDRH2
Structures¹
Peripheral Blood²

IGHV1-2
5
17
1-3
37

IGHV1-3
5
17
1-3
15

IGHV1-8
5
17
1-3
13

IGHV1-18
5
17
1-2
25

IGHV1-24
5
17
1-U
5

IGHV1-45
5
17
1-3
0

IGHV1-46
5
17
1-3
25

IGHV1-58
5
17
1-3
2

IGHV1-69
5
17
1-2
58

IGHV2-5
7
16
3-1
10

IGHV2-26
7
16
3-1
9

IGHV2-70
7
16
3-1
13

IGHV3-7
5
17
1-3
26

IGHV3-9
5
17
1-3
15

IGHV3-11
5
17
1-3
13

IGHV3-13
5
16
1-1
3

IGHV3-15
5
19
1-4
14

IGHV3-20
5
17
1-3
3

IGHV3-21
5
17
1-3
19

IGHV3-23
5
17
1-3
80

IGHV3-30
5
17
1-3
67

IGHV3-33
5
17
1-3
28

IGHV3-43
5
17
1-3
2

IGHV3-48
5
17
1-3
21

IGHV3-49
5
19
1-U
8

IGHV3-53
5
16
1-1
7

IGHV3-64
5
17
1-3
2

IGHV3-66
5
17
1-3
3

IGHV3-72
5
19
1-4
2

IGHV3-73
5
19
1-4
3

IGHV3-74
5
17
1-3
14

IGHV4-4
5
16
1-1
33

IGHV4-28
6
16
2-1
1

IGHV4-31
7
16
3-1
25

IGHV4-34
5
16
1-1
125

IGHV4-39
7
16
3-1
63

IGHV4-59
5
16
1-1
51

IGHV4-61
7
16
3-1
23

IGHV4-B
6
16
2-1
7

IGHV5-51
5
17
1-2
52

IGHV6-1
7
18
3-5
26

IGHV7-4-1
5
17
1-2
8

¹Adapted from Chothia et al., J. Mol. Biol., 1992, 227: 799

²Adapted from Table S1 of Wang et al., Immunol. Cell. Biol., 2008, 86: 111

In the currently exemplified library, 17 germline sequences were chosen for representation in the VH chassis of the library (Table 4). As described in more detail below, these sequences were selected based on their relatively high representation in the peripheral blood of adults, with consideration given to the structural diversity of the chassis and the representation of particular germline sequences in antibodies used in the clinic. These 17 sequences account for about 76% of the total sample of heavy chain sequences used to derive the results of Table 4. As outlined in the Detailed Description, these criteria are non-limiting, and one of ordinary skill in the art will readily recognize that a variety of other criteria can be used to select the VH chassis sequences, and that the invention is not limited to a library comprising the 17 VH chassis genes presented in Table 4.

TABLE 4

VH Chassis Selected for Use in the Exemplary Library

Length
Length

VH
Relative
of
of

Chassis
Occurrence
CDRH1
CDRH2
Comment

VH1-2
37
5
17
Among highest usage for

VH1 family

VH1-18
25
5
17
Among highest usage for

VH1 family

VH1-46
25
5
17
Among highest usage for

VH1 family

VH1-69
58
5
17
Highest usage for VH1

family. The four chosen

VH1 chassis represent about

80% of the VH1 repertoire.

VH3-7
26
5
17
Among highest usage in

VH3 family

VH3-15
14
5
19
Not among highest usage,

but it has unique structure

(H2 of length 19). Highest

occurrence among those

with such structure.

VH3-23
80
5
17
Highest usage in VH3

family.

VH3-30
67
5
17
Among highest usage in

VH3 family

VH3-33
28
5
17
Among highest usage in

VH3 family

VH3-48
21
5
17
Among highest usage in

VH3 family. The six chosen

VH3 chassis account for

about 70% of the VH3

repertoire.

VH4-31
25
7
16
Among highest usage in

VH4 family

VH4-34
125
5
16
Highest usage in VH4

family

VH4-39
63
7
16
Among highest usage in

VH4 family

VH4-59
51
5
16
Among highest usage in

VH4 family

VH4-61
23
7
16
Among highest usage in

VH4 family

VH4-B
7
6
16
Not among highest usage in

VH4 family, but has unique

structure (H1 of length 6).

The 6 chosen VH4 chassis

account for close to 90% of

the VH4 family repertoire

VH5-51
52
5
17
High usage

In this particular embodiment of the library, VH chassis derived from sequences in the IGHV2, IGHV6 and IGHV7 germline families were not included. As described in the Detailed Description, this exemplification is not meant to be limiting, as, in some embodiments, it may be desirable to include one or more of these families, particularly as clinical information on antibodies with similar sequences becomes available, to produce libraries with additional diversity that is potentially unexplored, or to study the properties and potential of these IGHV families in greater detail. The modular design of the library of the present invention readily permits the introduction of these, and other, VH chassis sequences. The amino acid sequences of the VH chassis utilized in this particular embodiment of the library, which are derived from the IGHV germline sequences, are presented in Table 5. The details of the derivation procedures are presented below.

TABLE 5

Amino Acid Sequences for VH Chassis Selected

for Inclusion in the Exemplary Library

SEQ

Chas-
ID

sis
NO:
FRM1
CDRH1
FRM2
CDRH2
FRM3

VH1-
24
QVQLVQSG
GYYMH
WVRQAPG
WINPNSG
RVTMTRDTSI

2

AEVKKPGA

QGLEWMG
GTNYAQK
STAYMELSRL

SVKVSCKA

FQG
RSDDTAVYYC

SGYTFT

AR

VH1-
25
QVQLVQSG
SYGIS
WVRQAPG
WISAYNG
RVTMTTDTST

18

AEVKKPGA

QGLEWMG
NTNYAQK
STAYMELRSL

SVKVSCKA

LQG
RSDDTAVYYC

SGYTFT

AR

VH1-
26
QVQLVQSG
SYYMH
WVRQAPG
IINPSGG
RVTMTRDTST

46

AEVKKPGA

QGLEWMG
STSYAQK
STVYMELSSL

SVKVSCKA

FQG
RSEDTAVYYC

SGYTFT

AR

VH1-
27
QVQLVQSG
SYAIS
WVRQAPG
GIIPIFG
RVTITADKST

69

AEVKKPGS

QGLEWMG
TANYAQK
STAYMELSSL

SVKVSCKA

FQG
RSEDTAVYYC

SGGTFS

AR

VH3-
28
EVQLVESG
SYWMS
WVRQAPG
NIKQDGS
RFTISRDNAK

7

GGLVQPGG

KGLEWVA
EKYYVDS
NSLYLQMNSL

SLRLSCAA

VKG
RAEDTAVYYC

SGFTFS

AR

VH3-
29
EVQLVESG
NAWMS
WVRQAPG
RIKSKTD
RFTISRDDSK

15¹

GGLVKPGG

KGLEWVG
GGTTDYA
NTLYLQMNSL

SLRLSCAA

APVKG

RA
EDTAVYYC

SGFTFS

AR

VH3-
30
EVQLLESG
SYAMS
WVRQAPG
AISGSGG
RFTISRDNSK

23

GGLVQPGG

KGLEWVS
STYYADS
NTLYLQMNSL

SLRLSCAA

VKG
RAEDTAVYYC

SGFTFS

AK

VH3-
31
QVQLVESG
SYGMH
WVRQAPG
VISYDGS
RFTISRDNSK

30

GGVVQPGR

KGLEWVA
NKYYADS
NTLYLQMNSL

SLRLSCAA

VKG
RAEDTAVYYC

SGFTFS

AR

VH3-
32
QVQLVESG
SYGMH
WVRQAPG
VIWYDGS
RFTISRDNSK

33

GGVVQPGR

KGLEWVA
NKYYADS
NTLYLQMNSL

SLRLSCAA

VKG
RAEDTAVYYC

SGFTFS

AR

VH3-
33
EVQLVESG
SYSMN
WVRQAPG
YISSSSS
RFTISRDNAK

48

GGLVQPGG

KGLEWVS
TIYYADS
NSLYLQMNSL

SLRLSCAA

VKG
RAEDTAVYYC

SGFTFS

AR

VH4-
34
QVQLQESG
SGGYY
WIRQHPG
YIYYSGS
RVTISVDTSK

31

PGLVKPSQ
WS
KGLEWIG
TYYNPSL
NQFSLKLSSV

TLSLTCTV

KS
TAADTAVYYC

SGGSIS

AR

VH4-
35
QVQLQQWG
GYYWS
WIRQPPG
EIDHSGS
RVTISVDTSK

34²

AGLLKPSE

KGLEWIG
TNYNPSL
NQFSLKLSSV

TLSLTCAV

KS
TAADTAVYYC

YGGSFS

AR

VH4-
36
QLQLQESG
SSSYY
WIRQPPG
SIYYSGS
RVTISVDTSK

39

PGLVKPSE
WG
KGLEWIG
TYYNPSL
NQFSLKLSSV

TLSLTCTV

KS
TAADTAVYYC

SGGSIS

AR

VH4-
37
QVQLQESG
SYYWS
WIRQPPG
YIYYSGS
RVTISVDTSK

59

PGLVKPSE

KGLEWIG
TNYNPSL
NQFSLKLSSV

TLSLTCTV

KS
TAADTAVYYC

SGGSIS

AR

VH4-
38
QVQLQESG
SGSYY
WIRQPPG
YIYYSGS
RVTISVDTSK

61

PGLVKPSE
WS
KGLEWIG
TNYNPSL
NQFSLKLSSV

TLSLTCTV

KS
TAADTAVYYC

SGGSVS

AR

VH4-
39
QVQLQESG
SGYYW
WIRQPPG
SIYHSGS
RVTISVDTSK

B

PGLVKPSE
G
KGLEWIG
TYYNPSL
NQFSLKLSSV

TLSLTCAV

KS
TAADTAVYYC

SGYSIS

AR

VH5-
40
EVQLVQSG
SYWIG
WVRQMPG
IIYPGDS
QVTISADKSI

51

AEVKKPGE

KGLEWMG
DTRYSPS
STAYLQWSSL

SLKISCKG

FQG
KASDTAVYYC

SGYSFT

AR

¹The original KT sequence in VH3-15 was mutated to RA (bold/underlined) and TT to AR (bold/underlined), in order to match other VH3 family members selected for inclusion in the library. The modification to RA was made so that no unique sequence stretches of up to about 20 amino acids are created. Without being bound by theory, this modification is expected to reduce the odds of introducing novel T-cell epitopes in the VH3-15-derived chassis sequence. The avoidance of T cell epitopes is an additional criterion that can be considered in the design of certain libraries of the invention.

²The original NHS motif in VH4-34 was mutated to DHS, in order to remove a possible N-linked glycosylation site in CDR-H2. In certain embodiments of the invention, for example, if the library is transformed into yeast, this may prevent unwanted N-linked glycosylation.

Table 5 provides the amino acid sequences of the seventeen chassis. In nucleotide space, most of the corresponding germline nucleotide sequences include two additional nucleotides on the 3′ end (i.e., two-thirds of a codon). In most cases, those two nucleotides are GA. In many cases, nucleotides are added to the 3′ end of the IGHV-derived gene segment in vivo, prior to recombination with the IGHD gene segment. Any additional nucleotide would make the resulting codon encode one of the following two amino acids: Asp (if the codon is GAC or GAT) or Glu (if the codon is GAA or GAG). One, or both, of the two 3′-terminal nucleotides may also be deleted in the final rearranged heavy chain sequence. If only the A is deleted, the resulting amino acid is very frequently a G. If both nucleotides are deleted, this position is “empty,” but followed by a general V-D addition or an amino acid encoded by the IGHD gene. Further details are presented in Example 5. This first position, after the CAR or CAK motif at the C-terminus of FRM3 (Table 5), is designated the “tail.” In the currently exemplified embodiment of the library, this residue may be G, D, E, or nothing. Thus, adding the tail to any chassis enumerated above (Table 5) can produce one of the following four schematic sequences, wherein the residue following the VH chassis is the tail:

- (1) [VH_Chassis]-[G]
- (2) [VH_Chassis]-[D]
- (3) [VH_Chassis]-[E]
- (4) [VH_Chassis]
  
  These structures can also be represented in the format:
- [VH_Chassis]-[G/D/E/-],
  
  wherein the hyphen symbol (-) indicates an empty or null position.

Using the CDRH3 numbering system defined in the Definitions section, the above sequences could be denoted to have amino acid 95 as G, D. or E, for instances (1), (2), and (3), respectively, while the sequence of instance 4 would have no position 95, and CDRH3 proper would begin at position 96 or 97.

In some embodiments of the invention, VH3-66, with canonical structure 1-1 (five residues in CDRH1 and 16 for CDRH2) may be included in the library. The inclusion of VH3-66 may compensate for the removal of other chassis from the library, which may not express well in yeast under some conditions (e.g., VH4-34 and VH4-59).

Example 2: Design of VH Chassis Variants with Variation within CDRH1 and CDRH2

This example demonstrates the introduction of further diversity into the VH chassis by creating mutations in the CDRH1 and CDRH2 regions of each chassis shown in Example 1. The following approach was used to select the positions and nature of the amino acid variation for each chassis First, the sequence identity between rearranged human heavy chain antibody sequences was analyzed (Lee et al., Immunogenetics, 2006, 57: 917; Jackson et al., J. Immunol. Methods, 2007, 324: 26) and they were classified by the origin of their respective IGHV germine sequence. As an illustrative example, about 200 sequences in the data set exhibited greatest identity to the IGHV1-69 germline, indicating that they were likely to have been derived from IGHV11-69. Next, the occurrence of amino acid residues at each position within the CDRH1 and CDRH2 segments, in each germline family selected in Example 1 was determined. For VH1-69, these occurrences are illustrated in Tables 6 and 7. Second, neutral and/or smaller amino acid residues were favored, where possible, as replacements. Without being bound by theory, the rationale for the choice of these amino acid residues is the desire to provide a more flexible and less sterically hindered context for the display of a diversity of CDR sequences.

TABLE 6

Occurrence of Amino Acid Residues at Each Position

Within IGHV1-69-derived CDRH1 Sequences

SEQ ID
31
32
33
34
35

NO: 1391
S
Y
A
I
S

A
1
0

129

0
0

C
0
1
0
0
2

D
0
5
1
0
0

E
0
0
0
0
0

F
0
9
1
8
0

G
0
0

24

0
3

H
2
11
0
0
4

I
2
0
0

159

1

K
3
0
0
0
0

L
0

10

2
5
0

M
1
0
0
0
0

N

21

2
2
0

27

P
0
0
1
0
0

Q
1
1
0
0
5

R
9
0
0
0
1

S

133

3
7
0

129

T
12
1

10

0
12

V
0
0
7
13
0

W
0
0
0
0
0

Y
0

142

1
0
1

TABLE 7

Occurrence of Amino Acid Residues at Each Position Within IGHV1-69-derived CDRH2 Sequences

SEQ ID NO: 1392

50
51
52
52A
53
54
55
56
57
58
59
60
61
62
63
64
65

G
I
I
P
I
F
G
T
A
N
Y
A
Q
K
F
Q
G

A
0
0

7

0
2
0
4
3

132

0
0

178

0
0
0
0
0

C
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

D
1
0
0
0
0
0
11
0
1
21
0
0
0
2
0
0
12

E
2
0
0
0
0
0
4
0
0
2
0
1
1
4
0
2
0

F
0
1
0
1
7

119

0
0
0
0
0
0
0
0

180

0
0

G

135

0
1
0
0
0

155

0
3
1
0
0
0
0
0
0

173

H
0
0
0
0
1
0
0
0
0
4
4
0
3
0
0
4
0

I
0

166

159

0

132

2
0
34
0
2
1
0
0
0
0
0
0

K
1
0
0
0
0
0
0
4
1
5
0
0
2

156

0
3
0

L
0
1
2
0

16

37
0
1
0
0
0
0
0
0
3
2
0

M
0
6
2
0
9
1
0
3
1
0
0
0
0
0
0
0
0

N
0
0
1
0
2
0
5
0
0

132

1
0
0
8
0
0
0

P
0
2
0

181

1
3
0
0
15
0
0
3
6
0
0
0
0

Q
0
0
0
0
0
0
1
0
1
0
0
0

173

2
0

164

0

R

44

0
0
0
0
0
1
4
0
3
0
0
0
13
0
9
0

S
1
0
1
1
2
6
3
5
8

7

0
2
0
0
1
0
0

T
1
1
7
2
2
1
0

127

15
8
3
1
0
0
0
0
0

V
0
8
5
0
11
4
0
4
8
0
0
0
0
0
0
0
0

W
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0

Y
0
0
0
0
0
11
1
0
0
0

176

0
0
0
1
1
0

The original germline sequence is provided in the second row of the tables, in bold font, beneath the residue number (Kabat system). The entries in the table indicate the number of times a given amino acid residue (first column) is observed at the indicated CDRH1 (Table 6) or CDRH2 (Table 7) position. For example, at position 33 the amino acid type G (glycine) is observed 24 times in the set of IGHV1-69-based sequences that were examined. Thus, applying the criteria above, variants were constructed with N at position 31, L at position 32 (H can be charged, under some conditions), G and T at position 33, no variants at position 34 and N at position 35, resulting in the following VH1-69 chassis CDRH1 single-amino acid variant sequences:

(SEQ ID NO: 41)

N
YAIS

(SEQ ID NO: 42)

SLAIS

(SEQ ID NO: 43)

SYGIS

(SEQ ID NO: 44)

SYTIS

(SEQ ID NO: 45)

SYAIN

Similarly, the analysis that produced Table 7 provided a basis for choosing the following single-amino acid variant sequences for VH1-69 chassis CDRH2s:

(SEQ ID NO: 46)

S
IIPIFGTANYAQKFQG

(SEQ ID NO: 47)

GIAPIFGTANYAQKFQG

(SEQ ID NO: 48)

GIIPILGTANYAQKFQG

(SEQ ID NO: 49)

GIIPIFGTASYAQKFQG

A similar approach was used to design and construct variants of the other selected chassis; the resulting CDRH1 and CDRH2 variants for each of the exemplary chassis are provided in Table 8. One of ordinary skill in the art will readily recognize that the methods described herein can be applied to create variants of other VH chassis and VL chassis.

TABLE 8

VH Chassis Variants

SEQ

SEQ

ID

ID

Chassis
CDRH1
NO:
CDRH2
NO:

1-18.0
SYGIS
50
WISAYNGNT
56

NYAQKLQG

1-18.1

N
YGIS
51
WISAYNGNT
56

NYAQKLQG

1-18.2
SNGIS
52
WISAYNGNT
56

NYAQKLQG

1-18.3
SYAIS
53
WISAYNGNT
56

NYAQKLQG

1-18.4
SYGIT
54
WISAYNGNT
56

NYAQKLQG

1-18.5
SYGIH
55
WISAYNGNT
56

NYAQKLQG

1-18.6
SYGIS
50

S
ISAYNGNT
57

NYAQKLQG

1-18.7
SYGIS
50
WISTYNGNT
58

NYAQKLQG

1-18.8
SYGIS
50
WISPYNGNT
59

NYAQKLQG

1-18.9
SYGIS
50
WISAYNGNT
60

YYAQKLQG

1-2.0
GYYMH
61
WINPNSGGT
67

NYAQKFQG

1-2.1

D
YYMH
62
WINPNSGGT
67

NYAQKFQG

1-2.2

R
YYMH
63
WINPNSGGT
67

NYAQKFQG

1-2.3
GSYMH
64
WINPNSGGT
67

NYAQKFQG

1-2.4
GYSMH
65
WINPNSGGT
67

NYAQKFQG

1-2.5
GYYMQ
66
WINPNSGGT
67

NYAQKFQG

1-2.6
GYYMH
61

S
INPNSGGT
68

NYAQKFQG

1-2.7
GYYMH
61
WINPSSGGT
69

NYAQKFQG

1-2.8
GYYMH
61
WINPNSGGT
70

K
YAQKFQG

1-2.9
GYYMH
61
WINPNSGGT
71

S
YAQKFQG

1-46.0
SYYMH
72
IINPSGGST
79

SYAQKFQG

1-46.1

N
YYMH
73
IINPSGGST
79

SYAQKFQG

1-46.2
SSYMH
74
IINPSGGST
79

SYAQKFQG

1-46.3
SYSMH
75
IINPSGGST
79

SYAQKFQG

1-46.4
SYYIH
76
IINPSGGST
79

SYAQKFQG

1-46.5
SYYMV
77
IINPSGGST
79

SYAQKFQG

1-46.6
SYYMS
78
IINPSGGST
79

SYAQKFQG

1-46.7
SYYMH
72

V
INPSGGST
80

SYAQKFQG

1-46.8
SYYMH
72
IINPGGGST
81

SYAQKFQG

1-46.9
SYYMH
72
IINPSGGST
82

T
YAQKFQG

1-69.0
SYAIS
83
GIIPIFGTA
84

NYAQKFQG

1-69.1

N
YAIS
41
GIIPIFGTA
84

NYAQKFQG

1-69.2
SLAIS
42
GIIPIFGTA
84

NYAQKFQG

1-69.3
SYGIS
43
GIIPIFGTA
84

NYAQKFQG

1-69.4
SYTIS
44
GIIPIFGTA
84

NYAQKFQG

1-69.5
SYAIN
45
GIIPIFGTA
84

NYAQKFQG

1-69.6
SYAIS
83

S
IIPIFGTA
46

NYAQKFQG

1-69.7
SYAIS
83
GIAPIFGTA
47

NYAQKFQG

1-69.8
SYAIS
83
GIIPILGTA
48

NYAQKFQG

1-69.9
SYAIS
83
GIIPIFGTA
49

S
YAQKFQG

3-15.0
NAWMS
85
RIKSKTDGG
91

TTDYAAPVK

G

3-15.1

K
AWMS
86
RIKSKTDGG
91

TTDYAAPVK

G

3-15.2

D
AWMS
87
RIKSKTDGG
91

TTDYAAPVK

G

3-15.3
NALMS
88
RIKSKTDGG
91

TTDYAAPVK

G

3-15.4
NAAMS
89
RIKSKTDGG
91

TTDYAAPVK

G

3-15.5
NAWMN
90
RIKSKTDGG
91

TTDYAAPVK

G

3-15.6
NAWMS
85

S
IKSKTDGG
92

TTDYAAPVK

G

3-15.7
NAWMS
85
RIKSTTDGG
93

TTDYAAPVK

G

3-15.8
NAWMS
85
RIKSKADGG
94

TTDYAAPVK

G

3-15.9
NAWMS
85
RIKSKTDGG
95

TTGYAAPVK

G

3-23.0
SYAMS
96
AISGSGGST
100

YYADSVKG

3-23.1

N
YAMS
97
AISGSGGST
100

YYADSVKG

3-23.2

T
YAMS
98
AISGSGGST
100

YYADSVKG

3-23.3
SSAMS
99
AISGSGGST
100

YYADSVKG

3-23.4
SYAMS
96

G
ISGSGGST
101

YYADSVKG

3-23.5
SYAMS
96

S
ISGSGGST
102

YYADSVKG

3-23.6
SYAMS
96

T
ISGSGGST
103

YYADSVKG

3-23.7
SYAMS
96

V
ISGSGGST
104

YYADSVKG

3-23.8
SYAMS
96
AISASGGST
105

YYADSVKG

3-23.9
SYAMS
96
AISGSGGST
106

S
YADSVKG

3-30.0
SYGMH
107
VISYDGSNK
111

YYADSVKG

3-30.1
NYGMH
108
VISYDGSNK
111

YYADSVKG

3-30.2
SYAMH
109
VISYDGSNK
111

YYADSVKG

3-30.3
SYGFH
110
VISYDGSNK
111

YYADSVKG

3-30.4
SYGMH
107

F
ISYDGSNK
112

YYADSVKG

3-30.5
SYGMH
107

L
ISYDGSNK
113

YYADSVKG

3-30.6
SYGMH
107
VISSDGSNK
114

YYADSVKG

3-30.7
SYGMH
107
VISYDGNNK
115

YYADSVKG

3-30.8
SYGMH
107
VISYDGSIK
116

YYADSVKG

3-30.9
SYGMH
107
VISYDGSNQ
117

YYADSVKG

3-33.0
SYGMH
118
VIWYDGSNK
124

YYADSVKG

3-33.1

T
YGMH
119
VIWYDGSNK
124

YYADSVKG

3-33.2

N
YGMH
120
VIWYDGSNK
124

YYADSVKG

3-33.3
SSGMH
121
VIWYDGSNK
124

YYADSVKG

3-33.4
SYAMH
122
VIWYDGSNK
124

YYADSVKG

3-33.5
SYGMN
123
VIWYDGSNK
124

YYADSVKG

3-33.6
SYGMH
118

L
IWYDGSNK
125

YYADSVKG

3-33.7
SYGMH
118

F
IWYDGSNK
126

YYADSVKG

3-33.8
SYGMH
118
VIWYDGSNK
127

S
YADSVKG

3-33.9
SYGMH
118
VIWYDGSNK
128

G
YADSVKG

3-48.0
SYSMN
129
YISSSSSTI
136

YYADSVKG

3-48.1¹

N
YSMN
130
YISSSSSTI
136

YYADSVKG

3-48.2

I
YSMN
131
YISSSSSTI
136

YYADSVKG

3-48.3
SNSMN
132
YISSSSSTI
136

YYADSVKG

3-48.4
SYEMN
133
YISSSSSTI
136

YYADSVKG

3-48.5
SYNMN
134
YISSSSSTI
136

YYADSVKG

3-48.6
SYSMT
135
YISSSSSTI
136

YYADSVKG

3-48.7
SYSMN
129
TISSSSSTI
137

YYADSVKG

3-48.8
SYSMN
129
YISGSSSTI
138

YYADSVKG

3-48.9
SYSMN
129
YISSSSSTI
139

L
YADSVKG

3-7.0
SYWMS
140
NIKQDGSEK
152

YYVDSVKG

3-7.1

T
YWMS
141
NIKQDGSEK
152

YYVDSVKG

3-7.2

N
YWMS
142
NIKQDGSEK
152

YYVDSVKG

3.7.3
SSWMS
143
NIKQDGSEK
152

YYVDSVKG

3-7.4
SYGMS
144
NIKQDGSEK
152

YYVDSVKG

3-7.5
SYWMT
145
NIKQDGSEK
152

YYVDSVKG

3-7.6
SYWMS
140

S
IKQDGSEK
153

YYVDSVKG

3-7.7
SYWMS
140
NINQDGSEK
154

YYVDSVKG

3-7.8
SYWMS
140
NIKSDGSEK
155

YYVDSVKG

3-7.9
SYWMS
140
NIKQDGSEK
156

Q
YVDSVKG

4-31.0
SGGYYWS
147
YIYYSGSTY
157

YNPSLKS

4-31.1
SGSYYWS
148
YIYYSGSTY
157

YNPSLKS

4-31.2
SGTYYWS
149
YIYYSGSTY
157

YNPSLKS

4-31.3
SGGTYWS
150
YIYYSGSTY
157

YNPSLKS

4-31.4
SGGYSWS
151
YIYYSGSTY
157

YNPSLKS

4-31.5
SGGYYWS
147

S
IYYSGSTY
158

YNPSLKS

4-31.6
SGGYYWS
147

N
IYYSGSTY
159

YNPSLKS

4-31.7
SGGYYWS
147
YIYYSGNTY
160

YNPSLKS

4-31.8
SGGYYWS
147
YIYYSGSTS
161

YNPSLKS

4-31.9
SGGYYWS
147
YIYYSGSTV
162

YNPSLKS

4-34.0
GYYWS
163
EIDHSGSTN
166

YNPSLKS

4-34.1

D
YYWS
164
EIDHSGSIN
166

YNPSLKS

4-34.2
GYYWT
165
EIDHSGSTN
166

YNPSLKS

4-34.3
GYYWS
163

D
IDHSGSTN
167

YNPSLKS

4-34.4
GYYWS
163
EISHSGSTN
168

YNPSLKS

4-34.5
GYYWS
163
EIDQSGSTN
169

YNPSLKS

4-34.6
GYYWS
163
EIDHGGSTN
170

YNPSLKS

4-34.7
GYYWS
163
EIDHSGNTN
171

YNPSLKS

4-34.8
GYYWS
163
EIDHSGSTS
172

YNPSLKS

4-34.9
GYYWS
163
EIDHSGSTD
173

YNPSLKS

4-39.0
SSSYYWG
174
SIYYSGSTY
181

YNPSLKS

4-39.1

T
SSYYWG
175
SIYYSGSTY
181

YNPSLKS

4-39.2
SNSYYWG
176
SIYYSGSTY
181

YNPSLKS

4-39.3
SSDYYWG
177
SIYYSGSTY
181

YNPSLKS

4-39.4
SSNYYWG
178
SIYYSGSTY
181

YNPSLKS

4-39.5
SSRYYWG
179
SIYYSGSTY
181

YNPSLKS

4-39.6
SSSYAWG
180
SIYYSGSTY
181

YNPSLKS

4-39.7
SSSYYWG
174

N
IYYSGSTY
182

YNPSLKS

4-39.8
SSSYYWG
174
SISYSGSTY
183

YNPSLKS

4-39.9
SSSYYWG
174
SIYYSGSTS
184

YNPSLKS

4-59.0
SYYWS
185
YIYYSGSTN
189

YNPSLKS

4-59.1

T
YYWS
186
YIYYSGSTN
189

YNPSLKS

4-59.2
SSYWS
187
YIYYSGSTN
189

YNPSLKS

4-59.3
SYSWS
188
YIYYSGSTN
189

YNPSLKS

4-59.4
SYYWS
185

F
IYYSGSTN
190

YNPSLKS

4-59.5
SYYWS
185

H
IYYSGSTN
191

YNPSLKS

4-59.6
SYYWS
185

S
IYYSGSTN
192

YNPSLKS

4-59.7
SYYWS
185
YIYSSGSTN
193

YNPSLKS

4-59.8
SYYWS
185
YIYYSGSTD
194

YNPSLKS

4-59.9
SYYWS
185
YIYYSGSTT
195

YNPSLKS

4-61.0
SGSYYWS
196
YIYYSGSTN
202

YNPSLKS

4-61.1
SGGYYWS
197
YIYYSGSTN
202

YNPSLKS

4-61.2
SGNYYWS
198
YIYYSGSTN
202

YNPSLKS

4-61.3
SGSSYWS
199
YIYYSGSTN
202

YNPSLKS

4-61.4
SGSYSWS
200
YIYYSGSTN
202

YNPSLKS

4-61.5
SGSYYWT
201
YIYYSGSTN
202

YNPSLKS

4-61.6
SGSYYWS
196

R
IYYSGSTN
203

YNPSLKS

4-61.7
SGSYYWS
196

S
IYYSGSTN
204

YNPSLKS

4-61.8
SGSYYWS
196
YIYTSGSTN
205

YNPSLKS

4-61.9
SGSYYWS
196
YIYYSGSTS
206

YNPSLKS

4-B.0
SGYYWG
207
SIYHSGSTY
212

YNPSLKS

4-B.1
SAYYWG
208
SIYHSGSTY
212

YNPSLKS

4-B.2
SGSYWG
209
SIYHSGSTY
212

YNPSLKS

4-B.3
SGYNWG
210
SIYHSGSTY
212

YNPSLKS

4-B.4
SGYYWA
211
SIYHSGSTY
212

YNPSLKS

4-B.5
SGYYWG
207

T
IYHSGSTY
213

YNPSLKS

4-B.6
SGYYWG
207
SSYHSGSTY
214

YNPSLKS

4-B.7
SGYYWG
207
SIYHSGNTY
215

YNPSLKS

4-B.8
SGYYWG
207
SIYHSGSTN
216

YNPSLKS

4-B.9
SGYYWG
207
SIYHSGSTG
217

YNPSLKS

5-51.0
SYWIG
218
IIYPGDSDT
224

RYSPSFQG

5-51.1

T
YWIG
219
IIYPGDSDT
224

RYSPSFQG

5-51.2

N
YWIG
220
ITYPGDSDT
224

RYSPSFQG

5-51.3
SNWIG
221
ITYPGDSDT
224

RYSPSFQG

5-51.4
SYYIG
222
ITYPGDSDT
224

RYSPSFQG

5-51.5
SYWIS
223
ITYPGDSDT
224

RYSPSFQG

5-51.6
SYWIG
218
SIYPGDSDT
225

RYSPSFQG

5-51.7
SYWIG
218
IIYPADSDT
226

RYSPSFQG

5-51.8
SYWIG
218
ITYPGDSST
227

RYSPSFQG

5-51.9
SYWIG
218
IIYPGDSDT
228

TYSPSFQG

¹Contains an N-linked glycosylation site which can be removed, if desired, as described herein.

As specified in the Detailed Description, other criteria can be used to select which amino acids are to be altered and the identity of the resulting altered sequence. This is true for any heavy chain chassis sequence, or any other sequence of the invention. The approach outlined above is meant for illustrative purposes and is non-limiting.

Example 3: Design of an Exe Pre VK Chassis Library

This example describes the design of an exemplary VK chassis library. One of ordinary skill in the art will recognize that similar principles may be used to design a Vλ library, or a library containing both VK and Vλ, chassis. Design of a Vλ chassis library is presented in Example 4.

As was previously demonstrated in Example 1, for IGHV germline sequences, the sequence characteristics and occurrence of human IGKV germline sequences in antibodies from peripheral blood were analyzed. The data are presented in Table 9.

TABLE 9

IGKV Gene Characteristics and Occurrence

in Antibodies from Peripheral Blood

Estimated

Alter-

Canonical
Relative Occur-

native
CDRL1
CDRL2
Struc-
rence in Pe-

IGKV Gene
Names
Length
Length
tures¹
ripheral Blood²

IGKV1-05
L12
11
7
2-1-(U)

69

IGKV1-06
L11
11
7
2-1-(1)
14

IGKV1-08
L9
11
7
2-1-(1)
9

IGKV1-09
L8
11
7
2-1-(1)

24

IGKV1-12
L5, L19
11
7
2-1-(1)

32

IGKV1-13
L4, L18
11
7
2-1-(1)
13

IGKV1-16
L1
11
7
2-1-(1)
15

IGKV1-17
A30
11
7
2-1-(1)

34

IGKV1-27
A20
11
7
2-1-(1)

27

IGKV1-33
O8, O18
11
7
2-1-(1)

43

IGKV1-37
O14, O4
11
7
2-1-(1)
3

IGKV1-39
O2, O12
11
7
2-1-(1)

147

IGKV1D-16
L15
11
7
2-1-(1)
6

IGKV1D-17
L14
11
7
2-1-(1)
1

IGKV1D-43
L23
11
7
2-1-(1)
1

IGKV1D-8
L24
11
7
2-1-(1)
1

IGKV2-24
A23
16
7
4-1-(1)
8

IGKV2-28
A19, A3
16
7
4-1-(1)

62

IGKV2-29
A18
16
7
4-1-(1)
6

IGKV2-30
A17
16
7
4-1-(1)

30

IGKV2-40
O1, O11
17
7
3-1-(1)
3

IGKV2D-26
A8
16
7
4-1-(1)
0

IGKV2D-29
A2
16
7
4-1-(1)

20

IGKV2D-30
A1
16
7
4-1-(1)
4

IGKV3-11
L6
11
7
2-1-(1)

87

IGKV3-15
L2
11
7
2-1-(1)

53

IGKV3-20
A27
12
7
6-1-(1)

195

IGKV3D-07
L25
12
7
6-1-(1)
0

IGKV3D-11
L20
11
7
2-1-(U)
0

IGKV3D-20
A11
12
7
6-1-(1)
2

IGKV4-1
B3
17
7
3-1-(1)

83

IGKV5-2
B2
11
7
2-1-(1)
1

IGKV6-21
A10, A26
11
7
2-1-(1)
6

IGKV6D-41
A14
11
7
2-1-(1)
0

¹Adapted from Tomlinson et al. EMBO J., 1995, 14: 4628, incorporated by reference in its entirety. The number in parenthesis refers to canonical structures in CDRL3, if one assuming the most common length (see Example 5 for further detail about CDRL3).

²Estimated from sets of human VK sequences compiled from the NCBI database; full set of GI numbers provided in Appendix A.

The 14 most commonly occurring IGKV germline genes (bolded in column 6 of Table 9) account for just over 90% of the usage of the entire repertoire in peripheral blood. From the analysis of Table 9, ten IGKV germline genes were selected for representation as chassis in the currently exemplified library (Table 10). All but V1-12 and V1-27 are among the top 10 most commonly occurring. IGKV germline genes VH2-30, which was tenth in terms of occurrence in peripheral blood, was not included in the currently exemplified embodiment of the library, in order to maintain the proportion of chassis with short (i.e., 11 or 12 residues in length) CDRL1 sequences at about 80% in the final set of 10 chassis. V1-12 was included in its place. V1-17 was more similar to other members of the V1 family that were already selected; therefore. V1-27 was included, instead of V1-17. In other embodiments, the library could include 12 chassis (e.g., the ten of Table 10 plus V1-17 and V2-30), or a different set of any “˜N” chassis, chosen strictly by occurrence (Table 9) or any other criteria. The ten chosen VK chassis account for about 80% of the usage in the data set believed to be representative of the entire kappa light chain repertoire.

TABLE 10

VK Chassis Selected for Use in the Exemplary Library

Estimated Relative

CDR-L1
CDR-L2
Canonical
Occurrence in

Chassis
Length
Length
Structures
Peripheral Blood

VK1-5
11
7
2-1-(U)
69

VK1-12
11
7
2-1-(1)
32

VK1-27
11
7
2-1-(1)
27

VK1-33
11
7
2-1-(1)
43

VK1-39
11
7
2-1-(1)
147

VK2-28
16
7
4-1-(1)
62

VK3-11
11
7
2-1-(1)
87

VK3-15
11
7
2-1-(1)
53

VK3-20
12
7
6-1-(1)
195

VK4-1
17
7
3-1-(1)
83

The amino acid sequences of the selected VK chassis enumerated in Table 10 are provided in Table 11.

TABLE 11

Amino Acid Sequences for VK Chassis Selected

for Inclusion in the Exemplary Library

SEQ

Chas-

ID

sis
FRM1
CDRL1
FRM2
CDRL2
FRM3
CDRL3¹
NO:

VK1-
DIQMTQS
RASQSI
WYQQKP
DASSLE
GVPSRES
QYNSY
229

5
PSTLSAS
SSWLA
GKAPKL
S
GSGSGTE
S

VGDRVTI

LIY

FTLTISS

TC

LQPDDFA

TYYC

VK1-
DIQMTQS
RASQGI
WYQQKP
AASSLQ
GVPSRFS
QANSF
230

12
PSSVSAS
SSWLA
GKAPKL
S
GSGSGTD
P

VGDRVTI

LIY

FTLTISS

TC

LQPEDFA

TYYC

VK1-
DIQMTQS
RASQGI
WYQQKP
AASTLQ
GVPSRFS
KYNSA
231

27
PSSLSAS
SNYLA
GKVPKL
S
GSGSGTD
P

VGDRVTI

LIY

FTLTISS

TC

LQPEDV

ATYYC

VK1-
DIQMTQS
QASQDI
WYQQKP
DASNLE
GVPSRFS
QYDNL
232

33
PSSLSAS
SNYLN
GKAPKL
T
GSGSGTD
P

VGDRVTI

LIY

FTFTISS

TC

LQPEDIA

TYYC

VK1-
DIQMTQS
RASQSI
WYQQKP
AASSLQ
GVPSRFS
QSYST
233

39
PSSLSAS
SSYLN
GKAPKL
S
GSGSGTD
P

VGDRVTI

LIY

FTLTISS

TC

LQPED

FATYYC

VK2-
DIVMTQS
RSSQSL
WYLQKP
LGSNRA
GVPDRFS
QALQT
234

28
PLSLPVT
LASNGY
GQSPQL
S
GSGSGTD
P

PGEPASI
NYLD
LIY

FTLKISR

SC

VEAEDVG

VYYC

VK3-
EIVLTQS
RASQSV
WYQQKP
DASNRA
GIPARFS
QRSNW
235

11
PATLSLS
SSYLA
GQAPRL
T
GSGSGTD
P

PGERATL

LIY

FTLTISS

SC

LEPEDFA

VYYC

VK3-
EIVMTQS
RASQSV
WYQQKP
GASTRA
GIPARFS
QYNNW
236

15
PATLSVS
SSNLA
GQAPRL
T
GSGSGTE
P

PGERATL

LIY

FTLTISS

SC

LQSEDFA

VYYC

VK3-
EIVLTQS
RASQSV
WYQQKP
GASSRA
GIPDRFS
QYGSS
237

20
PGTLSLS
SSSYLA
GQAPRL
T
GSGSGTD
P

PGERATL

LIY

FTLTISR

SC

LEPEDFA

VYYC

VK4-
DIVMTQS
KSSQSV
WYQQKP
WASTRE
GVPDRFS
QYYST
238

1
PDSLAVS
LYSSNN
GQPPKL
S
GSGSGTD
P

LGERATI
KNYLA
LIY

FTLTISS

NC

LQAEDVA

VYYC

¹Note that the portion of the IGKV gene contributing to VKCDR3 is not considered part of the chassis as described herein. The VK chassis is defined as Kabat residues 1 to 88 of the IGKV-encoded sequence, or from the start of FRM1 to the end of FRM3. The portion of the VKCDR3 sequence contributed by the IGKV gene is referred to herein as the L3-VK region.

Example 4. Design of as Exemplary Vλ Chassis Library

This example, describes the design of an exemplary Vλ, chassis library. As was previously demonstrated in Examples 1-3, for the VH and VK chassis sequences, the sequence characteristics and occurrence of human IgλV germline-derived sequences in peripheral blood were analyzed. As with the assignment of other sequences set forth herein to germline families, assignment of Vλ□ sequences to a germline family was performed via SoDA and VBASE2 (Volpe and Kepler. Bioinformatics, 2006, 22: 438; Mollova et al., BMS Systems Biology, 2007, 1S: P30, each incorporated by reference in its entirety). The data are presented in Table 12.

TABLE 12

IGλV Gene Characteristics and Occurrence in Peripheral Blood

Contribution
Estimated

of IGVλ
Relative Occur-

IGλV
Alternative
Canonical
Gene to
rence in Pe-

Gene
Name
Structures¹
CDRL3
ripheral Blood²

IGλV3-1
3R
11-7(*)
8
11.5

IGλV3-21
3H
11-7(*)
9
10.5

IGλV2-14
2A2
14-7(A)
9
10.1

IGλV1-40
1E
14-7(A)
9
7.7

IGλV3-19
3L
11-7(*)
9
7.6

IGλV1-51
1B
13-7(A)
9
7.4

IGλV1-44
1C
13-7(A)
9
7.0

IGλV6-57
6A
13-7(B)
7
6.1

IGλV2-8
2C
14-7(A)
9
4.7

IGλV3-25
3M
11-7(*)
9
4.6

IGλV2-23
2B2
14-7(A)
9
4.3

IGλV3-10
3P
11-7(*)
9
3.4

IGλV4-69
4B
12-11(*)
7
3.0

IGλV1-47
1G
13-7(A)
9
2.9

IGλV2-11
2E
14-7(A)
9
1.3

IGλV7-43
7A
14-7(B)
8
1.3

IGλV7-46
7B
14-7(B)
8
1.1

IGλV5-45
5C
14-11(*)
8
1.0

IGλV4-60
4A
12-11(*)
7
0.7

IGλV10-54
8A
14-7(B)
8
0.7

IGλV8-61
10A
13-7(C)
9
0.7

IGλV3-9
3J
11-7(*)
8
0.6

IGλV1-36
1A
13-7(A)
9
0.4

IGλV2-18
2D
14-7(A)
9
0.3

IGλV3-16
3A
11-7(*)
9
0.2

IGλV3-27

11-7(*)
7
0.2

IGλV4-3
5A
14-11(*)
8
0.2

IGλV5-39
4C
12-11(*)
12
0.2

IGλV9-49
9A
12-12(*)
12
0.2

IGλV3-12
3I
11-7(*)
9
0.1

¹Adapted from Williams et al. J. Mol, Biol, 1996: 264, 220-32. The (*) indicates that the canonical structure is entirely defined by the lengths of CDRs L1 and L2. When distinct structures are possible for identical L1 and L2 length combinations, the structure present in a given gene is set forth as A, B, or C.

²Estimated from a set of human Vλ sequences compiled from the NCBI database; full set of GI codes set forth in Appendix B.

To choose a subset of the sequences from Table 12 to serve as chassis, those represented at less than 1% in peripheral blood (as extrapolated from analysis of published sequences corresponding to the GI codes provided in Appendix B) were first discarded. From the remaining 18 germline sequences, the top occurring genes for each unique canonical structure and contribution to CDRL3, as well as any germline gene represented at more than the 5% level, were chosen to constitute the exemplary Vλ chassis. The list of 11 such sequences is given in Table 13, below. These 11 sequences represent approximately 73% of the repertoire in the examined data set (Appendix B).

TABLE 13

Vλ Chassis Selected for Use in the Exemplary Library

CDRL1
CDRL2
Canonical
Relative

Chassis
Length
Length
Structure
Occurrence

Vλ3-1
11
7
11-7(*)
11.5

Vλ3-21
11
7
11-7(*)
10.5

Vλ2-14
14
7
14-7(A)
10.1

Vλ1-40
14
7
14-7(A)
7.7

Vλ3-19
11
7
11-7(*)
7.6

Vλ1-51
13
7
13-7(A)
7.4

Vλ1-44
13
7
13-7(A)
7.0

Vλ6-57
13
7
13-7(B)
6.1

Vλ4-69
12
11
12-11(*)
3.0

Vλ7-43
14
7
14-7(B)
1.3

Vλ5-45
11
11
14-11(*)
1.0

The amino acid sequences of the selected Vλ□ chassis enumerated in Table 13 are provided in Table 14, below.

TABLE 14

Amino Acid Sequences for Vλ Chassis Selected

for Inclusion in the Exemplary Library

Chas-

sis
FRM1
CDRL1
FRM2
CDRL2
FRM3
CDRL3²

Vλ1-
QSVLTQP
TGSSSN
WYQQLP
GN----
GVPDRES
QSYDSS

40
PSVSGAP
IGAGYD
GTAPKL
SNRPS
GSKSG--
LSG

SEQ
GQRVTIS
---VH
LIY

TSASLAI

ID
C

TGLQAED

NO:

EADYYC

531

Vλ1-
QSVLTQP
SGSSSN
WYQQLP
SN----
GVPDRFS
AAWDDS

44
PSASGTP
IGSNT-
GTAPKL
NQRPS
GSKSG--
LNG

SEQ
GORVTIS
---VN
LIY

TSASLAI

ID
C

SGLQSED

NO:

EADYYC

532

Vλ1-
QSVLTQP
SGSSSN
WYQQLP
DN----
GIPDRFS
GTWDSS

51
PSVSAAP
IGNNY-
GTAPKL
NKRPS
GSKSG--
LSA

SEQ
GQKVTIS
----VS
LIY

TSATLGI

ID
C

TGLQTGD

NO:

EADYYC

533

Vλ2-
QSALTQP
TGTSSD
WYQQHP
EV----
GVSNRFS
SSYTSS

14
ASVSGSP
VGGYNY
GKAPKL
SNRPS
GSKSG--
STL

SEQ
GQSITIS
----VS
MIY

NTASLTI

ID
C

SGLQAED

NO:

EADYYC

534

Vλ3-
SYELTQP
SGDKLG
WYQQKP
QD----
GIPERFS
QAWDSS

1¹
PSVSVSP
DKY---
GQSPVL
SKRPS
GSNSG--
TA-

SEQ
GQTASIT
---AS
VIY

NTATLTI

ID
C

SGTQAMD

NO:

EADYYC

535

Vλ3-
SSELTQD
QGDSLR
WYQQKP
GK----
GIPDRFS
NSRDSS

19
PAVSVAL
SYY---
GQAPVL
NNRPS
GSSSG--
GNH

SEQ
GQTVRIT
---AS
VIY

NTASLTI

ID
C

TGAQAED

NO:

EADYYC

536

Vλ3-
SYVLTQP
GGNNIG
WYQQKP
YD----
GIPERFS
QVWDSS

21
PSVSVAP
SKS---
GQAPVL
SDRPS
GSNSG--
SDH

SEQ
GKTARIT
---VH
VIY

NTATLTI

ID
C

SRVEAGD

NO:

EADYYC

537

Vλ4-
QLVLTQS
TLSSGH
WHQQQP
LNSDGS
GIPDRFS
QTWGTG

69
PSASASL
SSYA--
EKGPRY
HSKGD
GSSSG--
I--

SEQ
GASVKLT
---IA
LMK

AERYLTI

ID
C

SSLQSED

NO:

EADYYC

538

Vλ6-
NEMLTQP
TRSSGS
WYQQRP
ED----
GVPDRFS
QSYDSS

57
HSVSESP
IASNY-
GSSPTT
NQRPS
GSIDSSS
N--

SEQ
GKTVTIS
---VQ
VIY

NSASLTI

ID
C

SGLKTED

NO:

EADYYC

539

Vλ5-
QAVLTQP
TLRSGI
WYQQKP
YKSDSD
GVPSRFS
MIWHSS

45
ASLSASP
NVGTYR
GSPPQY
KQQGS
GSKDASA
AS-

SEQ
GASASLT
---TY
LLR

NAGILLI

ID
C

SGLQSED

NO:

EADYYC

540

Vλ7-
QTVVTQE
ASSTGA
WFQQKP
ST----
WTPARFS
LLYYGG

43
PSLTVSP
VTSGYY
GQAPRA
SNKHS
GSLLG--
AQ-

SEQ
GGTVTLT
---PN
LIY

GKAALTL

ID
C

SGVQPED

NO:

EAEYYC

541

¹The last amino acid in CDRL1 of the Vλ3-1 chassis, S, differs from the corresponding one in the IGλV3-1 germline gene, C. This was done to avoid having a potentially unpaired CYS (C) amino acid in the resulting synthetic light chain.

²Note that, as for the VK chassis, the portion of the IGλV gene contributing to VλCDR3 is not considered part of the chassis as described herein. The Vλ chassis is defined as Kabat residues 1 to 88 of the IGλV-encoded sequence, or from the start of FRM1 to the end of FRM3. The portion of the VλCDR3 sequence contributed by the IGλV gene is referred to herein as the L3-Vλ region.

Example 5: Design of a CDRH3 Library

This example describes the design of a CDHR3 library from its individual components. In nature, the CDRH3 sequence is derived from a complex process involving recombination of three different genes, termed IGHV, IGHD and IGHJ. In addition to recombination, these genes may also undergo progressive nucleotide deletions: from the 3′ end of the IGHV gene, either end of the IGHD gene, and/or the 5′ end of the IGHJ gene. Non-templated nucleotide additions may also occur at the junctions between the V, D and J sequences. Non-templated additions at the V-D junction are referred to as “N1”, and those at the D-J junction are referred to as “N2”. The D gene segments may be read in three forward and, in some cases, three reverse reading frames.

In the design of the present exemplary library, the codon (nucleotide triplet) or single amino acid was designated as a fundamental unit, to maintain all sequences in the desired reading frame. Thus, all deletions or additions to the gene segments are carried out via the addition or deletion of amino acids or codons, and not single nucleotides. According to the CDRH3 numbering system of this application, CDRH3 extends from amino acid number 95 (when present; see Example 1) to amino acid 10².

Example 5.1: Selection of the DH Segments

In this illustrative example, selection of DH gene segments for use in the library was performed according to principles similar to those used for the selection of the chassis sequences. First, an analysis of IGHD gene usage was performed, using data from Lee et al., Immunogenetics, 2006, 57: 917; Corbett et al., PNAS, 1982, 79: 4118; and Souto-Cameiro et al., J. Immunol., 2004, 172: 6790 (each incorporated by reference in its entirety), with preference for representation in the library given to those IGHD genes most frequently observed in human sequences. Second, the degree of deletion on either end of the IGHD gene segments was estimated by comparison with known heavy chain sequences, using the SoDA algorithm (Volpe et al., Bioinformatics, 2006, 22: 438, incorporated by reference in its entirety) and sequence alignments. For the presently exemplified library, progressively deleted DH segments, as short as three amino acids, were included. As enumerated in the Detailed Description, other embodiments of the invention comprise DH segments with deletions to a different length, for example, about 1, 2, 4, 5, 6, 7, 8, 9, or 10 amino acids. Table 15 shows the relative occurrence of IGHD gene usage in human antibody heavy chain sequences isolated mainly from peripheral blood B cells (list adapted from Lee et al., Immunogenetics, 2006, 57: 917, incorporated by reference in its entirety).

TABLE 15

Usage of IGHD Genes Based on Relative

Occurrence in Peripheral Blood*

Estimated Relative Occurrence

IGHD Gene
in Peripheral Blood³

IGHD3-10
117

IGHD3-22
111

IGHD6-19
95

IGHD6-13
93

IGHD3-3
82

IGHD2-2
63

IGHD4-17
61

IGHD1-26
51

ICHD5-5/5-18¹
49

IGHD2-15
47

IGHD6-6
38

IGHD3-9
32

IGHD5-12
29

IGHD5-24
29

IGHD2-21
28

IGHD3-16
18

IGHD4-23
13

IGHD1-1
9

IGHD1-7
9

IGHD4-4/4-11²
7

IGHD1-20
6

IGHD7-27
6

IGHD2-8
4

IGHD6-25
3

¹Although distinct genes in the genome, the nucleotide sequences of IGHD5-5 and IGHD5-18 are 100% identical and thus indistinguishable in rearranged VH sequences.

²IGHD4-4 and IGHD4-11 are also 100% identical.

³Adapted from Lee et al. Immunogenetics, 2006, 57: 917, by merging the information for distinct alleles of the same IGHD gene.

*IGHD1-14 may also be included in the libraries of the invention.

The translations of the ten most commonly expressed IGHD gene sequences found in naturally occurring human antibodies, in three reading frames, are shown in Table 16. Those reading frames which occur most commonly in peripheral blood have been highlighted in gray. As in Table 15, data regarding IGHD sequence usage and reading frame statistics were derived from Lee et al., 2006, and data regarding IGHD sequence reading frame usage were further complemented by data derived from Corbett et al., PNAS, 1982, 79: 4118 and Souto-Cameiro et al., J. Immunol, 2004, 172: 6790, each of which is incorporated by reference in its entirety.

TABLE 16

Translations of the Ten Most Common

Naturally Occurring IGHD Sequences,

in Three Reading Frames (RF)

SEQ

SEQ

SEQ

ID

ID

ID

IGHD
RF 1
NO
RF 2
NO
RF 3
NO

IGHD3-

VLLWFGELL

1

YYYGSGSY

2

ITMVRGV

3

10

YN

II

IGHD3-
VLLL###WILL
239

YYYDSSGY

4
ITMIVVV
240

22

YY

IT

IGHD6-

GYSSGWY

5

GIAVAG

6
V#QWLV
241

19

IGHD6-

GYSSSWY

7

GIAAAG

8
V#QQLV
242

13

IGHD3-
VLRFLEWLLY
243
YYDFWSGY
244

ITIFGVV

9

03

YT

II

IGHD2-
WIL##YQLLC
245

GYCSSTSC

10

DIVVVPA

11

02

YT

AM

IGHD4-
#LR#L
246

DYGDY

12
TTVT
247

17

IGHD1-

GIVGATT

13
V#WELL
248

YSGSYY

14

26

IGHD5-
VDTAMVT
249
WIQLWL
250

GYSYGY

15

5/5-18

IGHD2-
RIL#WW#LLL
251

GYCSGGSC

16
DIVVVVA
252

15

YS

AT

# represents a stop codon.

Reading frames in bold type correspond to the most commonly used reading frames.

In the presently exemplified library, the top 10 IGHD genes most frequently used in heavy chain sequences occurring in peripheral blood were chosen for representation in the library. Other embodiments of the library could readily utilize more or fewer D genes. The amino acid sequences of the selected IGHD genes, including the most commonly used reading frames and the total number of variants after progressive N- and C-terminal deletion to a minimum of three residues, are listed in Table 17. As depicted in Table 17, only the most commonly occurring alleles of certain IGHD genes were included in the illustrative library. This is, however, not required, and other embodiments of the invention may utilize IGHD reading frames that occur less frequently in the peripheral blood.

TABLE 17

D Genes Selected for use in the

Exemplary Library

SEQ
Total

IGHD
Amino Acid
ID
Number of

Gene¹
Sequence
NO:
Variants²

IGHD1-26_1
GIVGATT
13
15

IGHD1-26_3
YSGSYY
14
10

IGHD2-2_2
GYCSSTSCYT
10
9³

IGHD2-2_3
DIVVVPAAM
11
28

IGHD2-15_2
GYCSGGSCYS
16
9

IGHD3-3_3
ITIFGVVII
9
28

IGHD3-10_1
VLLWFGELL
1
28

IGHD3-10_2
YYYGSGSYYN
2
36

IGHD3-10_3
ITMVRGVII
3
28

IGHD3-22_2
YYYDSSGYYY
4
36

IGHD4-17_2
DYGDY
12
6

IGHD5-5_3
GYSYGY
15
10

IGHD6-13_1
GYSSSWY
7
15

IGHD6-13_2
GIAAAG
8
10

IGHD6-19_1
GYSSGWY
5
15

IGHD6-19_2
GIAVAG
6
10

¹The reading frame (RF) is specified as RF after the name of the gene.

²In most cases the total number of variants is given by (N-1) times (N-2) divided by two, where N is the total length in amino acids of the intact D segment.

³As detailed herein, the number of variants for segments containing a putative disulfide bond (two C or Cys residues) is limited in this illustrative embodiment.

For each of the selected sequences of Table 17, variants were generated by systematic deletion from the N- and/or C-termini, until there were three amino acids remaining. For example, for the IGHD4-17_2 above, the full sequence DYGDY (SEQ ID NO: 12) may be used to generate the progressive deletion variants: DYGD (SEQ ID NO: 613), YGDY (SEQ ID NO: 614), DYG, GDY and YGD. In general, for any full-length sequence of size N, there will be a total of (N−1)*(N−2)/2 total variants, including the original full sequence. For the disulfide-loop-encoding segments, as exemplified by reading frame 2 of both IGHD2-2 and IGHD2-15, (i.e., IGHD2-2_2 and IGH2-15_2), the progressive deletions were limited, so as to leave the loop intact i.e., only amino acids N-terminal to the first Cys, or C-terminal to the second Cys, were deleted in the respective DH segment variants. The foregoing strategy was used to avoid the presence of unpaired cysteine residues in the exemplified version of the library. However, as discussed in the Detailed Description, other embodiments of the library may include unpaired cysteine residues, or the substitution of these cysteine residues with other amino acids. In the cases where the truncation of the IGHD gene is limited by the presence of the Cys residues, only 9 variants (including the original full sequence) were generated; e.g., for IGHD2-2_2, the variants would be: GYCSSTSCYT (SEQ ID NO: 10), GYCSSTSCY (SEQ ID NO: 615), YCSSTSCYT (SEQ ID NO: 616) CSSTSCYT (SEQ ID NO: 617), GYCSSTSC (SEQ ID NO: 618). YCSSTSCY (SEQ ID NO: 619). CSSTSCY (SEQ ID NO: 620), YCSSTSC (SEQ ID NO: 621) and CSSTSC (SEQ ID NO: 622).

According to the criteria outlined above, 293 DH sequences were obtained from the selected IGHD gene segments, including the original IGHD gene segments. Certain sequences are redundant. For example, it is possible to obtain the YYY variant from either IGHD3-10_2 (full sequence YYYGSGSYYN (SEQ ID NO: 2)), or in two different ways from IGHD3-22_2 (SEQ ID NO: 4) (YYYDSSGYYY). When redundant sequences are removed, the number of unique DH segment sequences in this illustrative embodiment of the library is 278. These sequences are enumerated in Table 18.

TABLE 18

DH Gene Segments Used in the Presently

Exemplified Library*

DH

DH

Segment

SEQ
Segment

SEQ

Designa-

ID
Designa-

ID

tion¹
Peptide
NO:
tion
Peptide
NO:

IGHD1-
ATT

IGHD3-10_2-
YYGSG
713

26_1-1

20

IGHD1-
GAT

IGHD3-10_2-
YYYGS
714

26_1-2

21

IGHD1-
GIV

IGHD3-10_2-
GSGSYY
715

26_1-3

22

IGHD1-
IVG

IGHD3-10_2-
SGSYYN
716

26_1-4

23

IGHD1-
VGA

IGHD3-10_2-
YGSGSY
717

26_1-5

24

IGHD1-
GATT
623
IGHD3-10_2-
YYGSGS
718

26_1-6

25

IGHD1-
GIVG
624
IGHD3-10_2-
YYYGSG
719

26_1-7

26

IGHD1-
IVGA
625
IGHD3-10_2-
GSGSYYN
720

26_1-8

27

IGHD1-
VGAT
626
IGHD3-10_2-
YGSGSYY
721

26_1-9

28

IGHD1-
GIVGA
627
IGHD3-10_2-
YYGSGSY
722

26_1-10

29

IGHD1-
IVGAT
628
IGHD3-10_2-
YYYGSGS
723

26_1-11

30

IGHD1-
VGATT
629
IGHD3-10_2-
YGSGSYYN
724

26_1-12

31

IGHD1-
GIVGAT
630
IGHD3-10_2-
YYGSGSYY
725

26_1-13

32

IGHD1-
IVGATT
631
IGHD3-10_2-
YYYGSGSY
726

26_1-14

33

IGHD1-
GIVGATT
13
IGHD3-10_2-
YYGSGSYYN
727

26_1-15

34

IGHD1-
YSG

IGHD3-10_2-
YYYGSGSYY
728

26-3-1

35

IGHD1-
YSGS
632
IGHD3-10_2-
YYYGSGSYYN
2

26_3-2

36

IGHD1-
YSGSY
633
IGHD3-10_3-
GVI

26_3-3

1

IGHD1-
YSGSYY
14
IGHD3-10_3-
ITM

26_3-4

2

IGHD2-
CSSTSC
622
IGHD3-10_3-
MVR

02_2-1

3

IGHD2-
CSSTSCY
620
IGHD3-10_3-
RGV

02_2-2

4

IGHD2-
YCSSTSC
621
IGHD3-10_3-
TMV

02_2-3

5

IGHD2-
CSSTSCYT
617
IGHD3-10_3-
VII

02_2-4

6

IGHD2-
GYCSSTSC
618
IGHD3-10_3-
VRG

02_2-5

7

IGHD2-
YCSSTSCY
619
IGHD3-10_3-
GVII
729

02_2-6

8

IGHD2-
GYCSSTSCY
615
IGHD3-10_3-
ITMV
730

02_2-7

9

IGHD2-
YCSSTSCYT
616
IGHD3-10_3-
MVRG
731

02_2-8

10

IGHD2-
GYCSSTSCYT
10
IGHD3-10_3-
RGVI
732

02_2-9

11

IGHD2-
AAM

IGHD3-10_3-
TMVR
733

02_3-1

12

IGHD2-
DIV

IGHD3-10_3-
VRGV
734

02_3-2

13

IGHD2-
IVV

IGHD3-10_3-
ITMVR
735

02_3-3

14

IGHD2-
PAA

IGHD3-10_3-
MVRGV
736

02_3-4

15

IGHD2-
VPA

IGHD3-10_3-
RGVII
737

02_3-5

16

IGHD2-
VVP

IGHD3-10_3-
TMVRG
738

02_3-6

17

IGHD2-
VVV

IGHD3-10_3-
VRGVI
739

02_3-7

18

IGHD2-
DIVV
634
IGHD3-10_3-
ITMVRG
740

02_3-8

19

IGHD2-
IVVV
635
IGHD3-10_3-
MVRGVI
741

02_3-9

20

IGHD2-
PAAM
636
IGHD3-10_3-
TMVRGV
742

02_3-10

21

IGHD2-
VPAA
637
IGHD3-10_3-
VRGVII
743

02_3-11

22

IGHD2-
VVPA
638
IGHD3-10_3-
ITMVRGV
744

02_3-12

23

IGHD2-
VVVP
639
IGHD3-10_3-
MVRGVII
745

02_3-13

24

IGHD2-
DIVVV
640
IGHD3-10_3-
TMVRGVI
746

02_3-14

25

IGHD2-
IVVVP
641
IGHD3-10_3-
ITMVRGVI
747

02_3-15

26

IGHD2-
VPAAM
642
IGHD3-10_3-
TMVRGVII
748

02_3-16

27

IGHD2-
VVPAA
643
IGHD3-10_3-
ITMVRGVII
3

02_3-17

28

IGHD2-
VVVPA
644
IGHD3-22_2-
DSS

02_3-18

1

IGHD2-
DIVVVP
645
IGHD3-22_2-
GYY

02_3-19

2

IGHD2-
IVVVPA
646
IGHD3-22_2-
SGY

02_3-20

3

IGHD2-
VVPAAM
647
IGHD3-22_2-
SSG

02_3-21

4

IGHD2-
VVVPAA
648
IGHD3-22_2-
YDS

02_3-22

5

IGHD2-
DIVVVPA
649
IGHD3-22_2-
YYD

02_3-23

6

IGHD2-
IVVVPAA
650
IGHD3-22_2-
DSSG
749

02_3-24

7

IGHD2-
VVVPAAM
651
IGHD3-22_2-
GYYY
750

02_3-25

8

IGHD2-
DIVVVPAA
652
IGHD3-22_2-
SGYY
751

02_3-26

9

IGHD2-
IVVVPAAM
653
IGHD3-22_2-
SSGY
752

02_3-27

10

IGHD2-
DIVVVPAAM
11
IGHD3-22_2-
YDSS
753

02_3-28

11

IGHD2-
CSGGSC
654
IGHD3-22_2-
YYDS
754

15_2-1

12

IGHD2-
CSGGSCY
655
IGHD3-22_2-
YYYD
755

15_2-2

13

IGHD2-
YCSGGSC
656
IGHD3-22_2-
DSSGY
756

15_2-3

14

IGHD2-
CSGGSCYS
657
IGHD3-22_2-
SGYYY
757

15_2-4

15

IGHD2-
GYCSGGSC
658
IGHD3-22_2-
SSGYY
758

15_2-5

16

IGHD2-
YCSGGSCY
659
IGHD3-22_2-
YDSSG
759

15_2-6

17

IGHD2-
GYCSGGSCY
660
IGHD3-22_2-
YYDSS
760

15_2-7

18

IGHD2-
YCSGGSCYS
661
IGHD3-22_2-
YYYDS
761

15_2-8

19

IGHD2-
GYCSGGSCYS
16
IGHD3-22_2-
DSSGYY
762

15_2-9

20

IGHD3-
FGV

IGHD3-22_2-
SSGYYY
763

03_3-1

21

IGHD3-
GVV

IGHD3-22_2-
YDSSGY
764

03_3-2

22

IGHD3-
IFG

IGHD3-22_2-
YYDSSG
765

03_3-3

23

IGHD3-
ITI

IGHD3-22_2-
YYYDSS
766

03_3-4

24

IGHD3-
TIF

IGHD3-22_2-
DSSGYYY
767

03_3-5

25

IGHD3-
VVI

IGHD3-22_2-
YDSSGYY
768

03_3-6

26

IGHD3-
FGVV
662
IGHD3-22_2-
YYDSSGY
769

03_3-7

27

IGHD3-
GVVI
663
IGHD3-22_2-
YYYDSSG
770

03_3-8

28

IGHD3-
IFGV
664
IGHD3-22_2-
YDSSGYYY
771

03_3-9

29

IGHD3-
ITIF
665
IGHD3-22_2-
YYDSSGYY
772

03_3-10

30

IGHD3-
TIFG
666
IGHD3-22_2-
YYYDSSGY
773

03_3-11

31

IGHD3-
VVII
667
IGHD3-22_2-
YYDSSGYYY
774

03_3-12

32

IGHD3-
FGVVI
668
IGHD3-22_2-
YYYDSSGYY
775

03_3-13

33

IGHD3-
GVVII
669
IGHD3-22_2-
YYYDSSGYYY
4

03_3-14

34

IGHD3-
IFGVV
670
IGHD4-17_2-
DYG

03_3-15

1

IGHD3-
ITIFG
671
IGHD4-17_2-
GDY

03_3-16

2

IGHD3-
TIFGV
672
IGHD4-17_2-
YGD

03_3-17

3

IGHD3-
FGVVII
673
IGHD4-17_2-
DYGD
613

03_3-18

4

IGHD3-
IFGVVI
674
IGHD4-17_2-
YGDY
614

03_3-19

5

IGHD3-
ITIFGV
675
IGHD4-17_2-
DYGDY
12

03_3-20

6

IGHD3-
TIFGVV
676
IGHD5-5_3-1
SYG

03_3-21

IGHD3-
IFGVVII
677
IGHD5-5_3-2
YGY

03_3-22

IGHD3-
ITIFGVV
678
IGHD5-5_3-3
YSY

03_3-23

IGHD3-
TIFGVVI
679
IGHD5-5_3-4
GYSY
776

03_3-24

IGHD3-
ITIFGVVI
680
IGHD5-5_3-5
SYGY
777

03_3-25

IGHD3-
TIFGVVII
681
IGHD5-5_3-6
YSYG
778

03_3-26

IGHD3-
ITIFGVVII
9
IGHD5-5_3-7
GYSYG
779

03_3-27

IGHD3-
ELL

IGHD5-5_3-8
YSYGY
780

10_1-1

IGHD3-
FGE

IGHD5-5_3-9
GYSYGY
15

10_1-2

IGHD3-
GEL

IGHD6-13_1-
SSS

10_1-3

1

IGHD3-
LLW

IGHD6-13_1-
SSW

10_1-4

2

IGHD3-
LWF

IGHD6-13_1-
SWY

10_1-5

3

IGHD3-
VLL

IGHD6-13_1-
SSSW
781

10_1-6

4

IGHD3-
WFG

IGHD6-13_1-
SSWY
782

10_1-7

5

IGHD3-
FGEL
682
IGHD6-13_1-
YSSS
783

10_1-8

6

IGHD3-
GELL
683
IGHD6-13_1-
GYSSS
784

10_1-9

7

IGHD3-
LLWF
684
IGHD6-13_1-
SSSWY
785

10_1-10

8

IGHD3-
LWFG
685
IGHD6-13_1-
YSSSW
786

10_1-11

9

IGHD3-
VLLW
686
IGHD6-13_1-
GYSSSW
787

10_1-12

10

IGHD3-
WFGE
687
IGHD6-13_1-
YSSSWY
788

10_1-13

11

IGHD3-
FGELL
688
IGHD6-13_1-
GYSSSWY
7

10_1-14

12

IGHD3-
LLWFG
689
IGHD6-19_1-
GWY

10_1-15

1

IGHD3-
LWFGE
690
IGHD6-19_1-
GYS

10_1-16

2

IGHD3-
VLLWF
691
IGHD6-19_1-
SGW

10_1-17

3

IGHD3-
WFGEL
692
IGHD6-19_1-
YSS

10_1-18

4

IGHD3-
LLWFGE
693
IGHD6-19_1-
GYSS
789

10_1-19

5

IGHD3-
LWFGEL
694
IGHD6-19_1-
SGWY
790

10_1-20

6

IGHD3-
VLLWFG
695
IGHD6-19_1-
SSGW
791

10_1-21

7

IGHD3-
WFGELL
696
IGHD6-19_1-
YSSG
792

10_1-22

8

IGHD3-
LLWFGEL
697
IGHD6-19_1-
GYSSG
793

10_1-23

9

IGHD3-
LWFGELL
698
IGHD6-19_1-
SSGWY
794

10_1-24

10

IGHD3-
VLLWFGE
699
IGHD6-19_1-
YSSGW
795

10_1-25

11

IGHD3-
LLWFGELL
700
IGHD6-19_1-
GYSSGW
796

10_1-26

12

IGHD3-
VLLWFGEL
701
IGHD6-19_1-
YSSGWY
797

10_1-27

13

IGHD3-
VLLWFGELL
1
IGHD6-19-1-
GYSSGWY
5

10_1-28

14

IGHD3-
GSG

IGHD6-19_2-
AVA

10_2-1

1

IGHD3-
GSY

IGHD6-19_2-
GIA

10_2-2

2

IGHD3-
SGS

IGHD6-19_2-
IAV

10_2-3

3

IGHD3-
SYY

IGHD6-19_2-
VAG

10_2-4

4

IGHD3-
YGS

IGHD6-19_2-
AVAG
798

10_2-5

5

IGHD3-
YYG

IGHD6-19_2-
GIAV
799

10_2-6

6

IGHD3-
YYN

IGHD6-19_2-
IAVA
800

10_2-7

7

IGHD3-
YYY

IGHD6-19_2-
GIAVA
801

10_2-8

8

IGHD3-
GSGS
702
IGHD6-19_2-
IAVAG
802

10_2-9

9

IGHD3-
GSYY
703
IGHD6-19_2-
GIAVAG
6

10_2-10

10

IGHD3-
SGSY
704
IGHD6-13_2-
AAA

10_2-11

1

IGHD3-
SYYN
705
IGHD6-13_2-
AAG

10_2-12

2

IGHD3-
YGSG
706
IGHD6-13_2-
IAA

10_2-13

3

IGHD3-
YYGS
707
IGHD6-13_2-
AAAG
803

10_2-14

4

IGHD3-
YYYG
708
IGHD6-13_2-
GIAA
804

10_2-15

5

IGHD3-
GSGSY
709
IGHD6-13_2-
IAAA
805

10_2-16

6

IGHD3-
GSYYN
710
IGHD6-13_2-
GIAAA
806

10_2-17

7

IGHD3-
SGSYY
711
IGHD6-13_2-
IAAAG
807

10_2-18

8

IGHD3-
YGSGS
712
IGHD6-13_2-
GIAAAG
8

10_2-19

9

¹The sequence designation is formatted as follows: (IGHD Gene Name)_(Reading Frame)-(Variant Number)

*Note that the origin of certain variants is rendered somewhat arbitrary when redundant segments are deleted from the library (i.e., certain segments may have their origins with more than one parent, including the one specified in the table).

Table 19 shows the length distribution of the 278 DH segments selected according to the methods described above.

TABLE 19

Length Distributions of DH Segments Selected

for Inclusion in the Exemplary Library

Number of

DH Size
Occurrences

3
78

4
64

5
50

6
38

7
27

8
20

9
12

10
4

As specified above, based on the CDRH3 numbering system defined in this application, IGHD-derived amino acids (i.e., DH segments) are numbered beginning with position 97, followed by positions 97A, 971B, etc. In the currently exemplified embodiment of the library, the shortest DH segment has three amino acids: 97, 97A and 97B, while the longest DH segment has 10 amino acids: 97, 97A, 97B, 97C, 97D, 97E, 97F, 97G, 97H and 971.

Example 5.2: Selection of the H3-JH Segments

There are six human germline IGHJ genes. During in vivo assembly of antibody genes, these segments are progressively deleted at their 5′ end. In this exemplary embodiment of the library, IGHJ gene segments with no deletions, or with 1, 2, 3, 4, 5, 6, or 7 deletions (at the amino acid level), yielding JH segments as short as 13 amino acids, were included (Table 20). Other embodiments of the invention, in which the IGHJ gene segments are progressively deleted (at their 5′/N-terminal end) to yield 15, 14, 12, or 11 amino acids are also contemplated.

TABLE 20

IGHJ Gene Segments Selected for use in the Exemplary Library

IGHJ

SEQ ID

Segment
[H3-JH]-[FRM4]¹
NO:
H3-JH
SEQ ID NO:

JH1 parent or
AEYEQHWGQGTLVTVSS
253
AEYFQH
17

JH1_1

JH1_2
EYFQHWGQGTLVTVSS
808
EYFQH
830

JH1_3
YFQHWGQGTLVTVSS
809
YFQH
831

JH1_4
FQHWGQGTLVTVSS
810
FQH

JH1_5
QHWGQGTLVTVSS
811
QH

JH2 parent or
YWYFDLWGRGTLVTVSS
254
YWYFDL
18

JH2_1

JH2_2
WYFDLWGRGTLVTVSS
812
WYFDL
832

JH2_3
YFDLWGRGTLVTVSS
813
YFDL
833

JH2_4
FDLWGRGTLVTVSS
814
FDL

JH2_5
DLWGRGTLVTVSS
815
DL

JH3 parent or
AFDVWGQGTMVTVSS
255
AFDV
19

JH3_1

JH3_2
FDVWGQGTMVTVSS
816
FDV

JH3_3
DVWGQGTMVTVSS
817
DV

JH4 parent or
YFDYWGQGTLVTVSS
256
YFDY
20

JH4_1

JH4_2
FDYWGQGTLVTVSS
818
FDY

JH4_3
DYWGQGTLVTVSS
819
DY

JH5 parent or
NWFDSWGQGTLVTVSS
257
NWFDS
21

JH5_1

JH5_2
WEDSWGQGTLVTVSS
820
WEDS
834

JH5_3
FDSWGQGTLVTVSS
821
FDS

JH5_4
DSWGQGTLVTVSS
822
DS

JH6 parent or
YYYYYGMDVWGQGTTVTVSS
258
YYYYYGMDV
22

JH6_1

JH6_2
YYYYGMDVWGQGTTVTVSS
823
YYYYGMDV
835

JH6_3
YYYGMDVWGQGTTVTVSS
824
YYYGMDV
836

JH6_4
YYGMDVWGQGTTVTVSS
825
YYGMDV
837

JH6_5
YGMDVWGQGTTVTVSS
826
YGMDV
838

JH6_6
GMDVWGQGTTVTVSS
827
GMDV
839

JH6_7
MDVWGQGTTVTVSS
828
MDV

JH6_8
DVWGQGTTVTVSS
829
DV

¹H3-JH is defined as the portion of the IGHJ segment included within the Kabat definition of CDRH3; FRM4 is defined as the portion of the IGHJ segment encoding framework region four.

According to the CDRH3 numbering system of this application, the contribution of, for example, JH6_1 to CDRH3, would be designated by positions 99F, 99E, 99D, 99C, 99B, 99A, 100, 101 and 102 (Y, Y, Y, Y, Y, G, M, D and V, respectively). Similarly, the JH4_3 sequence would contribute amino acid positions 101 and 102 (D and Y, respectively) to CDRH3. However, in all cases of the exemplified library, the JH segment will contribute amino acids 103 to 113 to the FRM4 region, in accordance with the standard Kabat numbering system for antibody variable regions (Kabat, op. cit. 1991). This may not be the case in other embodiments of the library.

Example 5.3: Selection of the N1 and N2 Segments

While the consideration of V-D-J recombination enhanced by mimicry of the naturally occurring process of progressive deletion (as exemplified above) can generate enormous diversity, the diversity of the CDRH3 sequences in vivo is further amplified by non-templated addition of a varying number of nucleotides at the V-D junction and the D-J junction.

N1 and N2 segments located at the V-D and D-J junctions, respectively, were identified in a sample containing about 2,700 antibody sequences (Jackson et al., J. Immunol. Methods, 2007, 324: 26) also analyzed by the SoDA method of Volpe et al., Bioinformatics, 2006, 22: 438-44; (both Jackson et al., and Volpe et al., are incorporated by reference in their entireties). Examination of these sequences revealed patterns in the length and composition of N1 and N2. For the construction of the currently exemplified CDRH3 library, specific short amino acid sequences were derived from the above analysis and used to generate a number of N1 and N2 segments that were incorporated into the CDRH3 design, using the synthetic scheme described herein.

As described in the Detailed Description, certain embodiments of the invention include N1 and N2 segments with rationally designed length and composition, informed by statistical biases in these parameters that are found by comparing naturally occurring N1 and N2 segments in human antibodies. According to data compiled from human databases (see, e.g., Jackson et al., J. Immunol Methods, 2007, 324: 26, incorporated by reference in its entirety), there are an average of about 3.02 amino acid insertions for N1 and about 2.4 amino acid insertions for N2, not taking into account insertions of two nucleotides or less. FIG. 2 shows the length distributions of the N1 and N2 regions in human antibodies. In this exemplary embodiment of the invention, N1 and N2 were fixed to a length of 0, 1, 2, or 3 amino acids. The naturally occurring composition of these sequences in human antibodies was used as a guide for the inclusion of different amino acid residues.

The naturally occurring composition of single amino acid, two amino acids, and three amino acids N1 additions is defined in Table 21, and the naturally occurring composition of the corresponding N2 additions is defined in Table 22. The most frequently occurring duplets in the N1 and N2 set are compiled in Table 23.

TABLE 21

Composition of Naturally Occurring 1,

2, and 3 Amino Acid N1 Additions*

Position
Number of
Position
Number of
Position
Number of

1
Occurrences
2
Occurrences
3
Occurrences

R
251
G
97
G
101

G
249
P
67
R
66

P
173
R
67
P
47

L
130
S
42
S
47

S
117
L
39
L
38

A
84
V
33
A
33

V
62
E
24
V
28

K
61
A
21
T
27

I
55
D
18
E
24

Q
51
I
18
D
22

T
51
T
18
K
18

D
50
K
16
F
14

E
49
Y
16
I
13

F
3
H
13
W
13

H
32
F
12
N
10

N
30
Q
11
Y
10

W
28
N
5
H
8

Y
21
W
5
Q
5

M
16
C
4
C
3

C
3
M
4
M
3

1546

530

530

*Defined as the sequence C-terminal to “CARX” (SEQ ID NO: 840), or equivalent, of VH, wherein “X” is the “tail” (e.g., D, E, G, or no amino acid residue).

TABLE 22

Composition of Naturally Occurring 1,

2, and 3 Amino Acid N2 Additions*

Position
Number of
Position
Number of
Position
Number of

1
Occurrences
2
Occurrences
3
Occurrences

G
242
G
244
G
156

P
219
P
138
P
79

R
180
R
86
S
54

L
132
S
85
R
51

S
123
T
77
L
49

A
97
L
74
A
41

T
78
A
69
T
31

V
75
V
46
V
29

E
57
E
41
D
23

D
56
Y
38
E
23

F
54
D
36
W
23

H
54
K
30
Q
19

Q
53
F
29
F
17

I
49
W
27
Y
17

N
45
H
24
H
16

Y
40
I
23
I
11

K
35
Q
23
K
11

W
29
N
21
N
8

M
20
M
8
C
6

C
6
C
5
M
6

1644

1124

670

*Defined as the sequence C-terminal to the D segment but not encoded by IGHJ genes.

TABLE 23

Top Twenty-Five Naturally Occurring N1 and N2 Duplets

Number of
Cumulative
Individual

Sequence
Occurrences
Frequency
Frequency

GG
17
0.037
0.037

PG
15
0.070
0.033

RG
15
0.103
0.033

PP
13
0.132
0.029

GP
12
0.158
0.026

GL
11
0.182
0.024

PT
10
0.204
0.022

TG
10
0.226
0.022

GV
9
0.246
0.020

RR
9
0.266
0.020

SG
8
0.284
0.018

RP
7
0.299
0.015

IG
6
0.312
0.013

GS
6
0.325
0.013

SR
6
0.338
0.013

PA
6
0.352
0.013

LP
6
0.365
0.013

VG
6
0.378
0.013

KG
6
0.389
0.011

GW
5
0.400
0.011

FP
5
0.411
0.011

LG
5
0.422
0.011

RS
5
0.433
0.011

TP
5
0.444
0.011

EG
5
0.455
0.011

Example 5.3.1 Selection of the N1 Segments

Analysis of the identified N1 segments, located at the junction between V and D, revealed that the eight most frequently occurring amino acid residues were G, R, S, P, L, A. T and V (Table 21). The number of amino acid additions in the N1 segment was frequently none, one, two, or three (FIG. 2). The addition of four or more amino acids was relatively rare. Therefore, in the currently exemplified embodiment of the library, the N1 segments were designed to include zero, one, two or three amino acids. However, in other embodiments, N1 segments of four, five, or more amino acids may also be utilized. G and P were always among the most commonly occurring amino acid residues in the N1 regions. Thus, in the present exemplary embodiment of the library, the N1 segments that are dipeptides are of the form GX, XG, PX, or XP, where X is any of the eight most commonly occurring amino acids listed above. Due to the fact that G residues were observed more frequently than P residues, the tripeptide members of the exemplary N1 library have the form GXG, GGX, or XGG, where X is, again, one of the eight most frequently occurring amino acid residues listed above. The resulting set of N1 sequences used in the present exemplary embodiment of the library, include the “zero” addition amounts to 59 sequences, which are listed in Table 24.

TABLE 24

N1 Sequences Selected for Inclusion

in the Exemplary Library

Segment

Type
Sequences
Number

“Zero”
(no addition) V segment
1

joins directly to D

segment

Monomers
G, P, R, A, S, L, T, V
8

Dimers
GG, GP, GR, GA, GS, GL,
28

GT, GV, PG, RG, AG, SG,

LG, TG, VG, PP, PR, PA,

PS, PL, PT, PV, RP, AP,

SP, LP, TP, VP

Trimers
GGG, GPG, GRG, GAG, GSG,
22

GLG, GTG, GVG, PGG, RGG,

AGG, SGG, LGG, TGG, VGG,

GGP, GGR, GGA, GGS, GGL,

GGT, GGV

In accordance with the CDRH3 numbering system of the application, the sequences enumerated in Table 24 contribute the following positions to CDRH3: the monomers contribute position 96, the dimers to 96 and 96A, and the trimers to 96, 96A and 96B. In alternative embodiments, where tetramers and longer segments could be included among the N1 sequences, the corresponding numbers would go on to include 96C, and so on.

Example 5.3.2 Selection of the N2 Segments

Similarly, analysis of the identified N2 segments, located at the junction between D and J, revealed that the eight most frequently occurring amino acid residues were also G, R, S, P, L, A, T and V (Table 22). The number of amino acid additions in the N2 segment was also frequently none, one, two, or three (FIG. 2). For the design of the N2 segments in the exemplary library, an expanded set of sequences was utilized. Specifically, the sequences in Table 25 were used, in addition to the 59 sequences enumerated in Table 24, for N1.

TABLE 25

Extra Sequences in N2 Additions

Seg-

ment

Number
Number

Type
Sequence
New
Total

Mono-
D, E, F, H, I, K, M, Q, W, Y
10
18

mers

Di-
AR, AS, AT, AY, DL, DT, EA,
54
82

mers
EK, FH, FS, HL, HW, IS, KV,

LD, LE, LR, LS, LT, NR, NT,

QE, QL, QT, RA, RD, RE, RF,

RH, RL, RR, RS, RV, SA, SD,

SE, SF, SI, SK, SL, SQ, SR,

SS, ST, SV, TA, TR, TS, TT,

TW, VD, VS, WS, YS

Tri-
AAE, AYH, DTL, EKR, ISR, NTP,
18
40

mers
PKS, PRP, PTA, PTQ, REL, RPL,

SAA, SAL, SGL, SSE, TGL, WGT

The presently exemplified embodiment of the library, therefore, contains 141 total N2 sequences, including the “zero” state. One of ordinary skill in the art will readily recognize that these 141 sequences may also be used in the N1 region, and that such embodiments are within the scope of the invention. In addition, the length and compositional diversity of the N1 and N2 sequences can be further increased by utilizing amino acids that occur less frequently than G, R, S, P, L, A, T and V, in the N1 and N2 regions of naturally occurring antibodies, and including N1 and N2 segments of four, five, or more amino acids in the library. Tables 21 to 23 and FIG. 2 provides information about the composition and length of the N1 and N2 sequences in naturally occurring antibodies that is useful for the design of additional N1 and N2 regions which mimic the natural composition and length.

In accordance with the CDRH3 numbering system of the application, N2 sequences will begin at position 98 (when present) and extend to 98A (dimers) and 98B (trimers). Alternative embodiments may occupy positions 98C, 98D, and so on.

Example 5.4. A CDRH3 Library

When the “tail” (i.e., G/D/E/-) is considered, the CDRH3 in the exemplified library may be represented by the general formula:

[G/D/E/-]-[N1]-[DH]-[N2]-[H3-JH]

In the currently exemplified, non-limiting, embodiment of the library, [G/D/E/-] represents each of the four possible terminal amino acid “tails”; N1 can be any of the 59 sequences in Table 24; DH can be any of the 278 sequences in Table 18; N2 can be any of the 141 sequences in Tables 24 and 25; and H3-JH can be any of the 28 H3-JH sequences in Table 20. The total theoretical diversity or repertoire size of this CDRH3 library is obtained by multiplying the variations at each of the components, i.e., 4×59×278×141×28=2.59×10⁸.

However, as described in the previous examples, redundancies may be eliminated from the library. In the presently exemplified embodiment, the tail and N1 segments were combined, and redundancies were removed from the library. For example, considering the VH chassis, tail, and N1 regions, the sequence [VH_Chassis]-[G] may be obtained in two different ways: [VH_Chassis]+[G]+[nothing] or [VH_Chassis]+[nothing]+[G]. Removal of redundant sequences resulted in a total of 212 unique [G/D/E−]-[N1] segments out of the 236 possible combinations (i.e., 4 tails×59 N1). Therefore, the actual diversity of the presently exemplified CDRH3 library is 212×278×141×28=2.11×10⁸. FIG. 23 depicts the frequency of occurrence of different CDRH3 lengths in this library, versus the preimmune repertoire of Lee et al.

Table 26 further illustrates specific exemplary sequences from the CDRH3 library described above, using the CDRH3 numbering system of the present application. In instances where a position is not used, the hyphen symbol (-) is included in the table instead.

TABLE 26

Examples of Designed CDRH3 Sequences According

to the Library Exemplified in Examples 1 to 5

[Tail]
[N1]
[DH]

95
96
96A
96B
97
97A
97B
97C
97D
97E
97F
97G
97H
97I

No. 1
G
—
—
—
Y
Y
Y
—
—
—
—
—
—
—

No. 2
D
G
—
—
G
Y
C
S
G
G
S
C
Y
S

No. 3
E
R
—
—
I
T
I
F
G
V
—
—
—
—

No. 4
—
P
P
—
V
L
L
W
F
G
E
L
L
—

No. 5
G
G
S
G
Y
Y
Y
G
S
G
S
Y
Y
N

No. 6
D
—
—
—
R
G
V
I
I
—
—
—
—
—

No. 7
E
S
G
—
Y
Y
Y
D
S
S
G
Y
Y
Y

No. 8
—
S
—
—
D
Y
G
D
Y
—
—
—
—
—

No. 9
—
P
G
—
W
F
G
—
—
—
—
—
—
—

No. 10
—
—
—
—
C
S
G
G
S
C
—
—
—
—

[N2]
[H3-JH]
CDRH3

98
98A
98B
99E
99D
99C
99B
99A
99
100
101
102
Length

No. 1
—
—
—
—
—
—
—
—
—
—
D
V
6

No. 2
Y
—
—
—
—
—
—
—
—
F
Q
H
16

No. 3
G
G
—
—
—
—
—
—
Y
F
D
Y
14

No. 4
D
—
—
—
—
—
—
—
—
—
D
L
14

No. 5
P
—
—
—
—
—
A
E
Y
F
Q
H
21

No. 6
M
—
—
Y
Y
Y
Y
Y
G
M
D
V
16

No. 7
T
G
L
—
—
—
—
W
Y
F
D
L
21

No. 8
S
I
—
—
—
—
—
—
—
F
D
I
11

No. 9
P
S
—
—
—
—
Y
Y
G
M
D
V
13

No. 10
A
Y
—
—
—
—
—
N
W
F
D
P
13

Sequence Identifiers: No. 1 (SEQ ID NO: 542); No. 2 (SEQ ID NO: 543); No. 3 (SEQ ID NO: 544); No. 4 (SEQ ID NO: 545); No. 5 (SEQ ID NO: 546); No. 6 (SEQ ID NO: 547); No. 7 (SEQ ID NO: 548); No. 8 (SEQ ID NO: 549); No. 9 (SEQ ID NO: 550); No. 10 (SEQ ID NO: 551).

Example 6: Design of VKCDR3 Libraries

This example describes the design of a number of exemplary VKCDR3 libraries. As specified in the Detailed Description, the actual version(s) of the VKCDR3 library made or used in particular embodiments of the invention will depend on the objectives for the use of the library. In this example the Kabat numbering system for light chain variable regions was used.

In order to facilitate examination of patterns of occurrence, human kappa light chain sequences were obtained from the publicly available NCBI database (Appendix A). As for the heavy chain sequences (Example 2), each of the sequences obtained from the publicly available database was assigned to its closest germline gene, on the basis of sequence identity. The amino acid compositions at each position were then determined within each kappa light chain subset.

Example 6.1: A Minimalist VKCDR3 Library

This example describes the design of a “minimalist” VKCDR3 library, wherein the VKCDR3 repertoire is restricted to a length of nine residues. Examination of the VKCDR3 lengths of human sequences shows that a dominant proportion (over 70%) has nine amino acids within the Kabat definition of CDRL3: positions 89 through 97. Thus, the currently exemplified minimalist design considers only VKCDR3 of length nine. Examination of human kappa light chain sequences shows that there are not strong biases in the usage of IGKJ genes; there are five such IKJ genes in humans. Table 27 depicts IGKJ gene usage amongst three data sets, namely Juul et al. (Clin. Exp. Immunol., 1997, 109: 194, incorporated by reference in its entirety), Klein and Zachau (Eur. J. Immunol., 1993, 23: 3248, incorporated by reference in its entirety), and the kappa light chain data set provided in Appendix A (labeled LUA).

TABLE 27

IGKJ Gene Usage in Various Data Sets

Gene
Klein
Juul
LUA

IGKJ1
35.0%
29.0%
29.3%

IGKJ2
25.0%
23.0%
24.1%

IGKJ3
7.0%
8.0%
12.1%

IGKJ4
26.0%
24.0%
26.5%

IGKJ5
6.0%
18.0%
8.0%

Thus, a simple combinatorial of “M” VK chassis and the 5 IGKJ genes would generate a library of size M×5. In the Kabat numbering system, for VKCDR3 of length nine, amino acid number 96 is the first encoded by the IGKJ gene. Examination of the amino acid occupying this position in human sequences showed that the seven most common residues are L, Y, R, W, F, P, and 1, cumulatively accounting for about 85% of the residues found in position 96. The remaining 13 amino acids account for the other 15%. The occurrence of all 20 amino acids at position 96 is presented in Table 28.

TABLE 28

Occurrence of 20 Amino Acid Residues

at Position 96 in Human VK Data Set

Type
Number
Percent
Cumulative

L
333
22.3
22.3

Y
235
15.8
38.1

R
222
14.9
52.9

W
157
10.5
63.5

F
148
9.9
73.4

I
96
6.4
79.8

P
90
6.0
85.9

Q
53
3.6
89.4

N
39
2.6
92.0

H
31
2.1
94.1

V
21
1.4
95.5

G
20
1.3
96.8

C
14
0.9
97.8

K
7
0.5
98.3

S
6
0.4
98.7

A
5
0.3
99.0

D
5
0.3
99.3

E
5
0.3
99.7

T
5
0.3
100.0

M
0
0.0
100.0

To determine the origins of the seven residues most commonly found in position 96, known human IGKJ amino acid sequences were examined (Table 29).

TABLE 29

Known Human IGKJ Amino Acid Sequences

Gene
Sequence
SEQ ID NO:

IGKJ1
WTFGQGTKVEIK
552

IGKJ2
YTFGQGTKLEIK
553

IGKJ3
FTFGPGTKVDIK
554

IGKJ4
LTFGGGTKVEIK
555

IGKJ5
ITFGQGTRLEIK
556

Without being bound by theory, five of the seven most commonly occurring amino acids found in position 96 of rearranged human sequences appear to originate from the first amino acid encoded by each of the five human IGKJ genes, namely, W, Y, F, L, and I.

Less evident were the origins of the P and R residues. Without being bound by theory, most of the human IGKV gene nucleotide sequences end with the sequence CC, which occurs after (i.e., 3′ to) the end of the last full codon (e.g., that encodes the C-terminal residue shown in Table 11). Therefore, regardless of which nucleotide is placed after this sequence (i.e., CCX, where X may be any nucleotide) the codon will encode a proline (P) residue. Thus, when the IGKJ gene undergoes progressive deletion (just as in the IGHJ of the heavy chain; see Example 5), the first full amino acid is lost and, if no deletions have occurred in the IGKV gene, a P residue will result.

To determine the origin of the arginine residue at position 96, the origin of IGKJ genes in rearranged kappa light chain sequences containing R at position 96 were analyzed. The analysis indicated that R occurred most frequently at position 96 when the IGKJ gene was IGKJ1 (SEQ ID NO: 552). The germline W (position 1; Table 29) for IGKJ1 (SEQ ID NO: 552) is encoded by TGG. Without being bound by theory, a single nucleotide change of T to C (yielding CGG) or A (yielding AGG) will, therefore, result in codons encoding Arg (R). A change to G (yielding GGG) results in a codon encoding Gly (G). R occurs about ten times more often at position 96 in human sequences than G (when the IGKJ gene is IGKJ1 (SEQ ID NO: 552), and it is encoded by CGG more often than AGG. Therefore, without being bound by theory. C may originate from one of the aforementioned two Cs at the end of IGKV gene. However, regardless of the mechanism(s) of occurrence, R and P are among the most frequently observed amino acid types at position 96, when the length of VKCDR3 is 9. Therefore, a minimalist VKCDR3 library may be represented by the following amino acid sequence:

[VK_Chassis]-[L3-VK]-[F/L/V/R/W/Y/P]-

[TFGGGTKVEIK (SEQ ID NO: 841)]

In this sequence, VK_Chassis represents any selected VK chassis (for non-limiting examples, see Table 11), specifically Kabat residues 1 to 88 encoded by the IGKV gene. L3-VK represents the portion of the VKCDR3 encoded by the chosen IGKV gene (in this embodiment, residues 89-95). F/L/I/R/W/Y/P represents any one of amino residues F, L, I, R, W, Y, or P. In this exemplary representation, IKJ4 (minus the first residue) has been depicted. Without being bound by theory, apart from IGKJ4 (SEQ ID NO: 555) being among the most commonly used IGKJ genes in humans, the GGG amino acid sequence is expected to lead to larger conformational flexibility than any of the alternative IGKJ genes, which contain a GXG amino acid sequence, where X is an amino acid other than G. In some embodiments, it may be advantageous to produce a minimalist pre-immune repertoire with a higher degree of conformational flexibility. Considering the ten VK chassis depicted in Table 11, one implementation of the minimalist VKCDR3 library would have 70 members resulting from the combination of 10 VK chassis by 7 junction (position 96) options and one IGKJ-derived sequence (e.g., IGKJ4 (SEQ ID NO: 555). Although this embodiment of the library has been depicted using IGKJ4 (SEQ ID NO: 555), it is possible to design a minimalist VKCDR3 library using one of the other four IGKJ sequences. For example, another embodiment of the library may have 350 members (10 VK chassis by 7 junctions by 5 IGKJ genes).

One of ordinary skill in the art will readily recognize that one or more minimalist VKCDR3 libraries may be constructed using any of the IGKJ genes. Using the notation above, these minimalist VKCDR3 libraries may have sequences represented by, for example:

JK1:

[VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-

[TFGQGTKVEIK (SEQ ID NO: 528)];

JK2:

[VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-

[TFGQGTKLEIK (SEQ ID NO: 842];

JK3:

[VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-

[TFGPGTKVDIK (SEQ ID NO: 843];

and

JK5:

[VK_Chassis]-[L3-VK]-[F/L/I/R/W/Y/P]-

[TFGQGTRLEIK (SEQ ID NO: 844].

Example 6.2: A VKCDR3 Library of about 10⁵Complexity

In this example, the nine residue VKCDR3 repertoire described in Example 6.1 is expanded to include VKCDR3 lengths of eight and ten residues. Moreover, while the previously enumerated VKCDR3 library included the VK chassis and portions of the IGKJ gene not contributing to VKCDR3, the presently exemplified version focuses only on residues comprising a portion of VKCDR3. This embodiment may be favored, for example, when recombination with a vector which already contains VK chassis sequences and constant region sequences is desired.

While the dominant length of VKCDR3 sequences in humans is nine amino acids, other lengths appear at measurable rates that cumulatively approach almost 30% of kappa light chain sequences. In particular, VKCDR3 of lengths 8 and 10 represent, respectively, about 8.5% and about 16% of sequences in representative samples (FIG. 3). Thus, a more complex VKCDR3 library includes CDR lengths of 8 to 10 amino acids; this library accounts for over 95% of the length distribution observed in typical collections of human VKCDR3 sequences. This library also enables the inclusion of additional variation outside of the junction between the VK and JK genes. The present example describes such a library. The library comprises 10 sub-libraries, each designed around one of the 10 exemplary VK chassis depicted in Table 11. Clearly, the approach exemplified here can be generalized to consider M different chassis, where M may be less than or more than 10.

To characterize the variability within the polypeptide segment occupying Kabat positions 89 to 95, human kappa light chain sequence collections derived from each of the ten germline sequences of Example 3 were aligned and compared separately (i.e., within the germline group). This analysis enabled us to discern the patterns of sequence variation at each individual position in each kappa light chain sequence, grouped by germline. The table below shows the results for sequences derived from IGKV1-39 (SEQ ID NO: 233).

TABLE 30

Percent Occurrence of Amino Acid Types

in IGKV1-39-Derived Sequences

Amino Acid
P89
P90
P91
P92
P93
P94
P95

A
0
0
1
0
0
4
1

C
0
0
0
0
0
0
0

D
0
0
1
1
3
0
0

E
0
1
0
0
0
0
0

F
0
0
0
5
0
2
0

G
0
0
2
1
2
0
0

H
1
1
0
4
0
0
0

I
0
0
1
0
4
5
1

K
0
0
0
1
2
0
0

L
3
0
0
1
1
3
7

M
0
0
0
0
0
1
0

N
0
0
3
2
6
2
0

P
0
0
0
0
0
4
85

Q
96
97
0
0
0
0
0

R
0
0
0
0
5
0
2

S
0
0
80
4
65
6
3

T
0
0
9
0
10
65
1

V
0
0
0
0
0
1
1

W
0
0
0
0
0
0
0

Y
0
0
2
80
0
3
0

For example, at position 89, two amino acids, Q and L, account for about 99% of the observed variability, and thus in the currently exemplified library (see below), only Q and L were included in position 89. In larger libraries, of course, additional, less frequently occurring amino acid types (e.g., H), may also be included.

Similarly, at position 93 there is more variation, with amino acid types S, T, N, R and I being among the most frequently occurring. The currently exemplified library thus aimed to include these five amino acids at position 93, although clearly others could be included in more diverse libraries. However, because this library was constructed via standard chemical oligonucleotide synthesis, one is bound by the limits of the genetic code, so that the actual amino acid set represented at position 93 of the exemplified library consists of S, T, N, R, P and H, with P and H replacing I (see exemplary 9 residue VKCDR3 in Table 32, below). This limitation may be overcome by using codon-based synthesis of oligonucleotides, as described in Example 6.3, below. A similar approach was followed at the other positions and for the other sequences: analysis of occurrences of amino acid type per position, choice from among most frequently occurring subset, followed by adjustment as dictated by the genetic code.

As indicated above, the library employs a practical and facile synthesis approach using standard oligonucleotide synthesis instrumentation and degenerate oligonucleotides. To facilitate description of the library, the IUPAC code for degenerate nucleotides, as given in Table 31, will be used.

TABLE 31

Degenerate Base Symbol Definition

IUPAC Symbol
Base Pair Composition

A
A (100%)

C
C (100%)

G
G (100%)

T
T (100%)

R
A (50%) G (50%)

Y
C (50%) T (50%)

W
A (50%) T (50%)

S
C (50%) G (50%)

M
A (50%) C (50%)

K
G (50%) T (50%)

B
C (33%) G (33%) T (33%) (*)

D
A (33%) G (33%) T (33%)

H
A (33%) C (33%) T (33%)

V
A (33%) C (33%) G (33%)

N
A (25%) C (25%) G (25%) T (25%)

(*) 33% is short hand here for ⅓ (i.e., 33.3333 . . . %)

Using the VK1-39 chassis with VKCDR3 of length nine as an example, the VKCDR3 library may be represented by the following four oligonucleotides (left column in Table 32), with the corresponding amino acids encoded at each position of CDRL3 (Kabat numbering) provided in the columns on the right.

TABLE 32

Exemplary Oligonucleotides Encoding

a VK1-39 CDR3 Library

Oligonucleotide

Amino

Sequence
89
90
91
92
93
94
95
95A
96
97
Acid

CWGSAAWCATHCMV
LQ
EQ
ST
FSY
HNP
IST
P
—
FY
T
SEQ ID

TABTCCTTWCACT

RST

NO:

(SEQ ID

1393

NO: 307)

CWGSAAWCATHCMV
LQ
EQ
ST
FSY
HNP
IST
P
—
IL
T
SEQ ID

TABTCCTMTCACT

RST

NO:

(SEQ ID

1394

NO: 308)

CWGSAAWCATHCMV
LQ
EQ
ST
FSY
HNP
IST
P
—
WR
T
SEQ ID

TABTCCTWGGACT

RST

NO:

(SEQ ID

1395

NO: 309)

CWGSAAWCATHCMV
LQ
EQ
ST
FSY
HNP
IST
P
PLR
—
T
SEQ ID

TABTCCTCBTACT

RST

NO:

(SEQ ID

1396

NO: 310)

For example, the first codon (CWG) of the first nucleotide of Table 32, corresponding to Kabat position 89, represents 50% CTG and 50% CAG, which encode Leu (L) and Gln (Q), respectively. Thus, the expressed polypeptide would be expected to have L and Q each about 50% of the time. Similarly, for Kabat position 95A of the fourth oligonucleotide, the codon CBT represents ⅓ each of CCT, CGT and CTT, corresponding in turn to ⅓ each of Pro (P), Leu (L) and Arg (R) upon translation. By multiplying the number of options available at each position of the peptide sequence, one can obtain the complexity, in peptide space, contributed by each oligonucleotide. For the VK1-39 example above, the numbers are 864 for the first three oligonucleotides and 1,296 for the fourth oligonucleotide. Thus, the oligonucleotides encoding VK1-39 CDR3s of length nine contribute 3,888 members to the library. However, as shown in Table 32, sequences with L or R at position 95A (when position 96 is empty) are identical to those with L or R at position 96 (and 95A empty). Therefore, the 3,888 number overestimates the LR contribution and the actual number of unique members is slightly lower, at 3,024. As depicted in Table 33, for the complete list of oligonucleotides that represent VKCDR3 of sizes 8, 9, and 10, for all 10 VK chassis, the overall complexity is about 1.3×10⁵or 1.2×10⁵unique sequences after correcting for over-counting of the LR contribution for the size 9 VKCDR3.

TABLE 33

Degenerate Oligonucleotides Encoding an Exemplary VKCDR3 Library

Degen-

Junc-
erate

tion
Oligo-
SEQ

CDRL3
Type
ID

Amino

Chassis
Length
(1)
tide
NO:
89
90
93
92
93
94
95
95A
96
97
Acid

VK1-5
8
1
CASCASTMCV
259
HQ
HQ
SY
DGHNRS
AGST
FY
—
—
FY
T
SEQ

RTRSTTWCTW

ID

CACT

NO:

1397

VK1-5
8
2
CASCASTMCV
260
HQ
HQ
SY
DGHNRS
AGST
FY
—
—
IL
T
SEQ

RIRSTTWCMT

ID

CACT

NO:

1398

VK1-5
8
3
CASCASTMCV
261
HQ
HQ
SY
DGHNRS
AGST
FY
—
—
WR
T
SEQ

RTRSTTWCWG

ID

GACT

NO:

1399

VK1-5
8
4
CASCASTMCV
262
HQ
HQ
SY
DGHNRS
AGST
FY
PS
—
—
T
SEQ

RTRSTTWCYC

ID

TACT

NO:

1400

VK1-5
9
1
CASCASTMCV
263
HQ
HQ
SY
DGHNRS
AGST
FY
PS
—
FY
T
SEQ

RTRSTTWCYC

ID

TTWCACT

NO:

1401

VK1-5
9
2
CASCASTMCV
264
HQ
HQ
SY
DGHNRS
AGST
FY
PS
—
IL
T
SEQ

RTRSTTWCYC

ID

TMTCACT

NO:

1402

VK1-5
9
3
CASCASTMCV
265
HQ
HQ
SY
DGHNRS
AGST
FY
PS
—
WR
T
SEQ

RTRSTTWCYC

ID

TWGGACT

NO:

1403

VK1-5
9
4
CASCASTMCV
266
HQ
HQ
SY
DGHNRS
AGST
FY
PS
PS
I
T
SEQ

RTRSTTWCYC

ID

TYCTACT

NO:

1404

VK1-5
10
1
CASCASTMCV
267
HQ
HQ
SY
DGHNRS
AGST
FY
PS
PLR
FY
T
SEQ

RTRSTTWCYC

ID

TCBTTWCACT

NO:

1405

VK1-5
10
2
CASCASTMCV
268
HQ
HQ
SY
DGHNRS
AGST
FY
PS
PLR
IL
T
SEQ

RTRSTTWCYC

ID

TCBTMTCACT

NO:

1406

VK1-5
10
3
CASCASTMCV
269
HQ
HQ
SY
DGHNRS
AGST
FY
PS
PLR
WR
T
SEQ

RTRSTTWCYC

ID

TCBTWGGACT

NO:

1407

VK1-12
8
1
CASCASDCTR
270
HQ
HQ
AST
ADGNST
NS
FL
—
—
FY
T
SEQ

VCARTTTSTW

ID

CACT

NO:

1408

VK1-12
8
2
CASCASDCTR
271
HQ
HQ
AST
ADGNST
NS
FL
—
—
TL
T
SEQ

VCARTTTSMT

ID

CACT

NO:

1409

VK1-12
8
3
CASCASDCTR
272
HQ
HQ
AST
ADGNST
NS
FL
—
—
WR
T
SEQ

VCARTTTSWG

ID

GACT

NO:

1410

VK1-12
8
4
CASCASDCTR
273
HQ
HQ
AST
ADGNST
NS
FL
P
—
—
T
SEQ

VCARTTTSCC

ID

TACT

NO:

1411

VK1-12
9
1
CASCASDCTR
274
HQ
HQ
AST
ADGNST
NS
FL
P
—
FY
T
SEQ

VCARTTTSCC

ID

TTWCACT

NO:

1412

VK1-12
9
2
CASCASDCTR
275
HQ
HQ
AST
ADGNST
NS
FL
P
—
IL
T
SEQ

VCARTTTSCC

ID

TMTCACT

NO:

1413

VK1-12
9
3
CASCASDCTR
276
HQ
HQ
AST
ADGNST
NS
FL
P
—
WR
T
SEQ

VCARTTTSCC

ID

TWGGACT

NO:

1414

VK1-12

4
CASCASDCTR
277
HQ
HQ
AST
ADGNST
NS
FL
P
PLR
—
T
SEQ

VCARTTTSCC

ID

TCBTACT

NO:

1415

VK1-12
10
1
CASCASDCTR
278
HQ
HQ
AST
ADGNST
NS
FL
P
PLR
FY
T
SEQ

VCARTTTSCC

ID

TCBTTWCACT

NO:

1416

VK1-12
10
2
CASCASDCTR
279
HQ
HQ
AST
ADGNST
NS
FL
P
PLR
IL
T
SEQ

VCARTTTSCC

ID

TCBTMTCACT

NO:

1417

VK1-12
10
3
CASCASDCTR
280
HQ
HQ
AST
ADGNST
NS
FL
P
PLR
WR
T
SEQ

VCARTTTSCC

ID

TCBTWGGACT

NO:

1525

VK1-27
8
1
CASMAGTWCR
281
HQ
KO
FY
DGNS
RST
AGV
—
—
FY
T
SEQ

RTASKGBATW

ID

CACT

NO:

1418

VK1-27
8
2
CASMAGTWCR
282
HQ
KQ
FY
DGNS
RST
AGV
—
—
IL
T
SEQ

RTASKGBAMT

ID

CACT

NO:

1419

VK1-27
8
3
CASMAGTWCR
283
HQ
KQ
FY
DGNS
RST
AGV
—
—
WR
T
SEQ

RTASKGBAWG

ID

GACT

NO:

1420

VK1-27
8
4
CASMAGTWCR
284
HQ
KQ
FY
DGNS
RST
AGV
P
—
—
T
SEQ

RTASKGBACC

ID

TACT

NO:

1421

VK1-27
9
1
CASMAGTWCR
285
HQ
KQ
FY
DGNS
RST
AGV
P
—
FY
T
SEQ

RTASKGBACC

ID

TTWCACT

NO:

1422

VK1-27
9
2
CASMAGTWCR
286
HQ
KQ
FY
DGNS
RST
AGV
P
—
IL
T
SEQ

RTASKGBACC

ID

TMTCACT

NO:

1423

VK1-27
9
3
CASMAGTWCR
287
HQ
KQ
FY
DGNS
RST
AGV
P
—
WR
T
SEQ

RTASKGBACC

ID

TWGGACT

NO:

1424

VK1-27
9
4
CASMAGTWCR
288
HQ
KQ
FY
DGNS
RST
AGV

PLR
—
T
SEQ

RTASKGBACC

ID

TCBTACT

NO:

1425

VK1-27
10
1
CASMAGTWCR
289
HQ
KQ
FY
DGNS
RST
AGV
P
PLR
FY
T
SEQ

RTASKGBACC

ID

TCBTTWCACT

NO:

1426

VK1-27
10
2
CASMAGTWCR
290
HQ
KQ
FY
DGNS
RST
AGV
P
PLR
TL
T
SEQ

RTASKGBACC

ID

TCBTMTCACT

NO:

1427

VK1-27
10
3
CASMAGTWCR
291
HQ
KQ
FY
DGNS
RST
AGV
P
PLR
WR
T
SEQ

RTASKGBACC

ID

TCBTWGGACT

NO:

1428

VK1-33
8
1
CASCWTTMCR
292
HQ
HL
SY
DN
ADGNST
DFH
—
—
FY
T
SEQ

ATRVCBWTTW

LVY

ID

CACT

NO:

1429

VK1-33
8
2
CASCWTTMCR
293
HQ
HL
SY
DN
ADGNST
DFH
—
—
IL
T
SEQ

ATRVCBWTMT

LVY

ID

CACT

NO:

1430

VK1-33
8
3
CASCWTTMCR
294
HQ
HL
SY
DN
ADGNST
DFH
—
—
WR
T
SEQ

ATRVCBWTWG

LVY

ID

GACT

NO:

1431

VK1-33
8
4
CASCWTTMCR
295
HQ
HL
SY
DN
ADGNST
DFH
P
—
—
T
SEQ

ATRVCBWTCC

LVY

ID

TACT

NO:

1432

VK1-33
9
1
CASCWTTMCR
296
HQ
HL
SY
DN
ADGNST
DFH
P
—
FY
T
SEQ

ATRVCBWTCC

LVY

ID

TTWCACT

NO:

1433

VK1-33
9
2
CASCWTTMCR
297
HQ
HL
SY
DN
ADGNST
DFH
P
—
IL
T
SEQ

ATRVCBWTCC

LVY

ID

TMTCACT

NO:

1434

VK1-33
9
3
CASCWTTMCR
298
HQ
HL
SY
DN
ADGNST
DFH
P
—
WR
T
SEQ

ATRVCBWTCC

LVY

ID

TWGGACT

NO:

1435

VK1-33
9
4
CASCWTTMCR
299
HQ
HL
SY
DN
ADGNST
DFH
P
PLR
—
T
SEQ

ATRVCBWTCC

LVY

ID

TCBTACT

NO:

1436

VK1-33
10
1
CASCWTTMCR
300
HQ
HL
SY
DN
ADGNST
DFH
P
PLR
FY
T
SEQ

ATRVCBWTCC

LVY

ID

TCBTTWCACT

NO:

1437

VK1-33
10
2
CASCWTTMCR
301
HQ
HL
SY
DN
ADGNST
DFH
P
PLR
IL
T
SEQ

ATRVCBWTCC

LVY

ID

TCBTMTCACT

NO:

1438

VK1-33
10
3
CASCWTTMCR
302
HQ
HL
SY
DN
ADGNST
DFH
P
PLR
WR
T
SEQ

ATRVCBWTCC

LVY

ID

TCBTWGGACT

NO:

1439

VK1-39
8
1
CWGSAAWCAT
303
LQ
EQ
ST
FSY
HNPRS
IST
—
—
FY
T
SEQ

HCMVTABTTW

ID

CACT

NO:

1440

VK1-39
8
2
CWGSAAWCAT
304
LQ
EQ
ST
FSY
HNPRST
IST
—
—
IL
T
SEQ

HCMVTABTMT

ID

CACT

NO:

1441

VK1-39
8
3
CWGSAAWCAT
305
LQ
EQ
ST
FSY
HNPRST
IST
—
—
WR
T
SEQ

HCMVTABTWG

ID

GACT

NO:

1526

VK1-39
8
4
CWGSAAWCAT
306
LQ
EQ
ST
FSY
HNPRST
IST
P
—
—
T
SEQ

HCMVTABTCC

ID

TACT

NO:

1442

VK1-39
9

CWGSAAWCAT
307
LQ
EQ
ST
FSY
HNPRST
IST
P
—
FY
—
SEQ

HCMVTABTCC

ID

TTWCACT

NO:

1443

VK1-39
9
2
CWGSAAWCAT
308
LQ
EQ
ST
FSY
HNPRST
IST
P
—
IL
T
SEQ

HCMVTABTCC

ID

TMTCACT

NO:

14441

VK1-39
9
3
CWGSAAWCAT
309
LQ
EQ
ST
FSY
HNPRST
IST
P
—
WR
T
SEQ

HCMVTABTCC

ID

TWGGACT

NO:

1445

VK1-39
9
4
CWGSAAWCAT
310
LQ
EQ
ST
FSY
HNPRST
IST
P
PLR
—
T
SEQ

HCMVTABTCC

ID

TCBTACT

NO:

1446

VK1-39
10
1
CWGSAAWCAT
311
LQ
EQ
ST
FSY
HNPRS
IST
P
PLR
FY
T
SEQ

HCMVTABTCC

ID

TCBTTWCACT

NO:

1447

VK1-39
10
2
CWGSAAWCAT
312
LQ
EQ
ST
FSY
HNPRST
IST
P
PLR
IL
T
SEQ

HCMVTABTCC

ID

TCBTMTCACT

NO:

1448

VK1-39
10
3
CWGSAAWCAT
313
LO
EQ
|ST
FSY
HNPRST
IST
P
PLR
WR
T
SEQ

HCMVTABTCC

ID

TCBTWGGACT

NO:

1449

VK3-11
8
1
CASCASAGWR
314
HQ
HQ
RS
GRS
ADGNS
SW
—
—
FY
T
SEQ

GKRVCTSGTW

ID

CACT

NO:

1450

VK3-11
8
2
CASCASAGWR
315
HQ
HQ
RS
GRS
ADGNST
SW
—
—
IL
T
SEQ

GKRVCTSGMT

ID

CACT

NO:

1451

VK3-11
8
3
CASCASAGWR
316
HQ
HQ
RS
GRS
ADGNST
SW
—
—
WR
T
SEQ

GKRVCTSGWG

ID

GACT

NO:

1452

VK3-11
8
4
CASCASAGWR
317
HQ
HQ
RS
GRS
ADGNS
SW
P
—
—
T
SEQ

GKRVCTSGCC

ID

TACT

NO:

1453

VK3-11
9
1
CASCASAGWR
318
HQ
HQ
RS
GRS
ADGNST
SW
P
—
FY
T
SEQ

GKRVCTSGCC

ID

TTWCACT

NO:

1454

VK3-11
9
2
CASCASAGWR
319
HQ
HQ
RS
GRS
ADGNST
SW
P
—
IL
T
SEQ

GKRVCTSGCC

ID

TMTCACT

NO:

1455

VK3-11
9
3
CASCASAGWR
320
HQ
HQ
RS
GRS
ADGNST
SW
P
—
WR
T
SEQ

GKRVCTSGCC

ID

TWGGACT

NO:

1456

VK3-11
9
4
CASCASAGWR
321
HQ
HQ
RS
GRS
ADGNS
SW
P
PLR
—
T
SEQ

GKRVCTSGCC

ID

TCBTACT

NO:

1457

VK3-11
10
1
CASCASAGWR
322
HQ
HQ
RS
GRS
ADGNST
SW
P
PLR
FY
T
SEQ

GKRVCTSGCC

ID

TCBTTWCACT

NO:

1458

VK3-11
10
2
CASCASAGWR
323
HQ
HQ
RS
GRS
ADGNST
SW
P
PLR
IL
T
SEQ

GKRVCTSGCC

ID

TCBTMTCACT

NO:

1459

VK3-11
10
3
CASCASAGWR
324
HQ
HQ
RS
GRS
ADGNST
SW
P
PLR
WR
T
SEQ

GKRVCTSGCC

ID

TCBTWGGACT

NO:

1460

VK3-15
8
1
CASCASTMCV
325
HQ
HQ
SY
DGHNRS
DEGKNRS
W
—
—
FY
T
SEQ

RTRRKTGGTW

ID

CACT

NO:

1461

VK3-15
8
2
CASCASTMCV
326
HQ
HQ
SY
DGHNRS
DEGKNRS
W
—
—
IL
T
SEQ

RTRRKTGGMT

ID

CACT

NO:

1462

VK3-15
8
3
CASCASTMCV
327
HQ
HQ
SY
DGHNRS
DEGKNRS
W
—
—
WR
T
SEQ

RTRRKTGGWG

ID

GACT

NO:

1463

VK3-15
8
4
CASCASTMCV
328
HQ
HQ
SY
DGHNRS
DEGKNRS
W
P
—
—
T
SEQ

RTRRKTGGCC

ID

TACT

NO:

1464

VK3-15
9
1
CASCASTMCV
329
HQ
HQ
SY
DGHNRS
DEGKNRS
W
P
—
FY
T
SEQ

RIERKTGGCC

ID

TTWCACT

NO:

1465

VK3-15
9
2
CASCASTMCV
330
HQ
HQ
SY
DGHNRS
DEGKNRS
W
P
—
IL
T
SEQ

RTRRKTGGCC

ID

TMTCACT

NO:

1466

VK3-15
9
3
CASCASTMCV
331
HQ
HQ
SY
DGHNRS
DEGKNRS
W
P
—
WR
T
SEQ

RTRRKTGGCC

ID

TWGGACT

NO:

1467

VK3-15
9
4
CASCASTMCV
332
HQ
HQ
SY
DGHNRS
DEGKNRS
W
P
PLR
—
T
SEQ

RTRRKTGGCC

ID

TCBTACT

NO:

1468

VK3-15
10
1
CASCASTMCV
333
HQ
HQ
SY
DGHNRS
DEGKNRS
W
P
PLR
FY
T
SEQ

RTRRKTGGCC

ID

TCBTTWCACT

NO:

1469

VK3-15
10
2
CASCASTMCV
334
HQ
HQ
SY
DGHNRS
DEGKNRS
W
P
PLR
IL
—
SEQ

RTRRKTGGCC

ID

TCBTMTCACT

NO:

1470

VK3-15
10
3
CASCASTMCV
335
HQ
HQ
SY
DGHNRS
DEGKNRS
W
P
PLR
WR
T
SEQ

RTRRKTGGCC

ID

TCBTWGGACT

NO:

1471

VK3-20
8
1
CASCASTWCG
336
HQ
HQ
FY
DG
ADEGKNRST
AS
—
—
FY
T
SEQ

RTRVKKCATW

ID

CACT

NO:

1472

VK3-20
8
2
CASCASTWCG
337
HQ
HQ
FY
DG
ADEGKNRST
AS
—
—
IL
T
SEQ

RTRVKKCAMT

ID

CACT

NO:

1473

VK3-20
8
3
CASCASTWCG
338
HQ
HQ
FY
DG
ADEGKNRST
AS
—
—
WR
T
SEQ

RTRVKKCAWG

ID

GACT

NO:

1474

VK3-20
8
4
CASCASTWCG
339
HQ
HQ
FY
DG
ADEGKNRST
AS
P
—
—
T
SEQ

RTRVKKCACC

ID

TACT

NO:

1475

VK3-20
9

CASCASTWCG
340
HQ
HQ
FY
DG
ADEGKNRST
AS
P
—
FY
T
SEQ

RTRVKKCACC

ID

TTWCACT

NO:

1476

VK3-20
9
2
CASCASTWCG
341
HQ
HQ
FY
DG
ADEGKNRST
AS
P
—
IL
T
SEQ

RTRVKKCACC

ID

TMTCACT

NO:

1477

VK3-20
9
3
CASCASTWCG
342
HQ
HQ
FY
DG
ADEGKNRST
AS
P
—
WR
T
SEQ

RTRVKKCACC

ID

TWGGACT

NO:

1478

VK3-20
9
4
CASCASTWCG
343
HQ
HQ
FY
DG
ADEGKNRST
AS
P
PLR
—
T
SEQ

RTRVKKCACC

ID

TCBTACT

NO:

1479

VK3-20
10
1
CASCASTWCG
344
HQ
HQ
FY
DG
ADEGKNRST
AS
P
PLR
FY
T
SEQ

RTRVKKCACC

ID

TCBTTWCACT

NO:

1480

VK3-20
10
2
CASCASTWCG
345
HQ
HQ
FY
DG
ADEGKNRST
AS
P
PLR
IL
T
SEQ

RTRVKKCACC

ID

TCBTMTCACT

NO:

1481

VK3-20
10
3
CASCASTWCG
346
HQ
HQ
FY
DG
ADEGKNRST
AS
P
PLR
WR
T
SEQ

RTRVKKCACC

ID

TCBTWGGACT

NO:

1482

VK2-28
8
1
ATGCASRBTC
347
M
HQ
AGISTV
LR
DEHQ
IST
—
—
FY
T
SEQ

KTSASABTTW

ID

CACT

NO:

1483

VK2-28
8
2
ATGCASRBTC
348
M
HQ
AGISTV
LR
DEHQ
IST
—
—
IL
T
SEQ

KTSASABTMT

ID

CACT

NO:

1484

VK2-28
8
3
ATGCASRBTC
349
M
HQ
AGISTV
LR
DEHQ
IST
—
—
WR
T
SEQ

KTSASABTWG

ID

GACT

NO:

1485

VK2-28
8

ATGCASRBTC
350
M
HQ
AGISTV
LR
DEHQ
IST
P
—
—
T
SEQ

KTSASABTCC

ID

TACT

NO:

1486

VK2-28
9
1
ATGCASRBTC
351
M
HQ
AGISTV
LR
DEHQ
IST
P
—
FY
T
SEQ

KTSASABTCC

ID

TTWCACT

NO:

1487

VK2-28
9
2
ATGCASRBTC
352
M
HQ
AGISTV
LR
DEHQ
IST
P
—
IL
T
SEQ

KTSASABTCC

ID

TMTCACT

NO:

1488

VK2-28
9
3
ATGCASRBTC
353
M
HQ
AGISTV
LR
DEHQ
IST
P
—
WR
T
SEQ

KTSASABTCC

ID

TWGGACT

NO:

1489

VK2-28
9
4
ATGCASRBTC
354
M
HQ
AGISTV
LR
DEHQ
IST
P
PLR
—
T
SEQ

KTSASABTCC

ID

TCBTACT

NO:

1490

VK2-28
10
1
ATGCASRBTC
355
M
HQ
AGISTV
LR
DEHQ
IST
P
PLR
FY
T
SEQ

KTSASABTCC

ID

TCBTTWCACT

NO:

1491

VK2-28
10
2
ATGCASRBTC
356
M
HQ
AGISTV
LR
DEHQ
IST
P
PLR
IL
T
SEQ

KTSASABTCC

ID

TCBTMTCACT

NO:

1492

VK2-28
10
3
ATGCASRBTC
357
M
HQ
AGISTV
LR
DEHQ
IST

PLR
WR
T
SEQ

KTSASABTCC

ID

TCBTWGGACT

NO:

1493

VK4-1
8
1
CASCASTWCT
358
HQ
HQ
FY
FY
ADGNST
IST
—
—
FY
T
SEQ

WCRVCABTTW

ID

CACT

NO:

1494

VK4-1
8
2
CASCASTWCT
359
HQ
HQ
FY
FY
ADGNST
IST
—
—
IL
T
SEQ

WCRVCABTMT

ID

CACT

NO:

1495

VK4-1
8
3
CASCASTWCT
360
HQ
HQ
FY
FY
ADGNST
IST
—
—
WR
T
SEQ

WCRVCABTWG

ID

GACT

NO:

1496

VK4-1
8
4
CASCASTWCT
361
HQ
HQ
FY
FY
ADGNST
IST
P
—
—
T
SEQ

WCRVCABTCC

ID

TACT

NO:

1497

VK4-1
9
1
CASCASTWCT
362
HQ
HQ
FY
FY
ADGNST
IST
P
—
FY
T
SEQ

WCRVCABTCC

ID

TTWCACT

NO:

1498

VK4-1
9
2
CASCASTWCT
363
HQ
HQ
FY
FY
ADGNST
IST
P
—
IL
T
SEQ

WCRVCABTCC

ID

TMTCACT

NO:

1499

VK4-1
9
3
CASCASTWCT
364
HQ
HQ
FY
FY
ADGNST
IST
P

WR
T
SEQ

WCRVCABTCC

ID

TWGGACT

NO:

1500

VK4-1
9
4
CASCASTWCT
365
HQ
HQ
FY
FY
ADGNST
IST
P
PLR
—
T
SEQ

WCRVCABTCC

ID

TCBTACT

NO:

1501

VK4-1
10
1
CASCASTWCT
366
HQ
HQ
FY
FY
ADGNST
IST
P
PLR
FY
T
SEQ

WCRVCABTCC

ID

TCBTTWCACT

NO:

1502

VK4-1
10
2
CASCASTWCT
367
HQ
HQ
FY
FY
ADGNS
IST
P
PLR
IL
T
SEQ

WCRVCABTCC

ID

TCBTMTCACT

NO:

1503

VK4-1
10
3
CASCASTWCT
368
HQ
HQ
FY
FY
ADGNST
IST
P
PLR
WR
T
SEQ

WCRVCABTCC

ID

TCBTWGGACT

NO:

1504

[Alter-

native

For

VK1-33](2)

VK1-33
8
1
CASCWATMCR
369
HQ
QL
SY
DN
ADGNST
DFH
—
—
FY
T
SEQ

ATRVCBWTTW

LVY

ID

CACT

NO:

1505

VK1-33
8
2
CASCWATMCR
370
HQ
QL
SY
DN
ADGNS
DFH
—
—
IL
T
SEQ

ATRVCBWTMT

LVY

ID

CACT

NO:

1506

VK1-33
8
3
CASCWATMCR
371
HQ
QL
SY
DN
ADGNST
DFH
—
—
WR
T
SEQ

ATRVCBWTWG

LVY

ID

GACT

NO:

1507

VK1-33
8
4
CASCWATMCR
372
HQ
QL
SY
DN
ADGNST
DFH
P
—
—
T
SEQ

ATRVCBWTCC

LVY

ID

TACT

NO:

1508

VK1-33
9
1
CASCWATMCR
373
HQ
QL
SY
DN
ADGNST
DFH
P
—
FY
T
SEQ

ATRVCBWTCC

LVY

ID

TTWCACT

NO:

1509

VK1-33
9
2
CASCWATMCR
374
HQ
QL
SY
DN
ADGNST
DFH
P
—
IL
T
SEQ

ATRVCBWTCC

LVY

ID

TMTCACT

NO:

1510

VK1-33
9
3
CASCWATMCR
375
HQ
QI
SY
DN
ADGNS
DFH
P
—
WR
T
SEQ

ATRVCBWTCC

LVY

ID

TWGGACT

NO:

1511

VK1-33
9
4
CASCWATMCR
376
HQ
QL
SY
DN
ADGNST
DFH
P
PLR
—
T
SEQ

ATRVCBWTCC

LVY

ID

TCBTACT

NO:

1512

VK1-33
10
1
CASCWATMCR
377
HQ
QL
SY
DN
ADGNST
DFH
P
PLR
FY
T
SEQ

ATRVCBWTCC

LVY

ID

TCBTTWCACT

NO:

1513

VK1-33
10
2
CASCWATMCR
378
HQ
QL
SY
DN
ADGNST
DFH
P
PLR
IL
T
SEQ

ATRVCBWTCC

LVY

ID

TCBTMTCACT

NO:

1514

VK1-33
10
3
CASCWATMCR
379
HQ
QL
SY
DN
ADGNST
DFH
P
PLR
WR
T
SEQ

ATRVCBWTCC

LVY

ID

TCBTWGGACT

NO:

1515

(1) Junction type 1 has position 96 as FY, type 2 as IL, type 3 as RW, and type 4 has a deletion.

(2) Two embodiments are shown for the VK1-33 library. In one embodiment, the second codon was CWT. In another embodiment, it was CWA or CWG.

Example 6.3: More Complex VKCDR3 Libraries

This example demonstrates how a more faithful representation of amino acid variation at each position may be obtained by using a codon-based synthesis approach (Vimekas et al. Nucleic Acids Res., 1994, 22: 5600). This synthetic scheme also allows for finer control of the proportions of particular amino acids included at a position. For example, as described above for the VK1-39 sequences, position 89 was designed as 50% Q and 50% L; however, as Table 30 shows, Q is used much more frequently than L. The more complex VKCDR3 libraries of the present example account for the different relative occurrence of Q and L, for example, 900% Q and 10% L. Such control is better exercised within codon-based synthetic schemes, especially when multiple amino acid types are considered.

This example also describes an implementation of a codon-based synthetic scheme, using the ten VK chassis described in Table 11. Similar approaches, of course, can be implemented with more or fewer such chassis. As indicated in the Detailed Description, a unique aspect of the design of the present libraries, as well as those of the preceding examples, is the germline or chassis-based aspect, which is meant to preserve more of the integrity and variation of actual human kappa light chain sequences. This is in contrast to other codon-based synthesis or degenerate oligonucleotide synthesis approaches that have been described in the literature and that aim to produce “one-size-fits-all” (e.g., consensus) kappa light chain libraries (e.g., Knappik, et al., J Mol Biol. 2000, 296: 57, Akamatsu et al., J Immunol, 1993, 151: 4651).

With reference to Table 30, obtained for VK1-39, one can thus design the length nine VKCDR3 library of Table 34. Here, for practical reasons, the proportions at each position are denoted in multiples of five percentage points. As better synthetic schemes are developed, finer resolution may be obtained—for example to resolutions of one, two, three, or four percent.

TABLE 34

Amino Acid Composition (%) at Each VKCDR3 Position

for VK1-39 Library With CDR Length of Nine Residues

Amino Acid
89
90
91
92
93
94
95
96 (*)
97 (*)

A

5

5

D

5
5

E

5

5

F

5

10

G

5
5
5

5

H
5
5

5

5

I

5

5

K

5

L
10

5
10
20

M

N

0

0

5

0

5

P

5
85
5

Q
85
90

5

R

5

5
10

S

80
5
60
5

5

T

10

10
65

90

V

5

W

15

Y

5
75

5

15

Number

3

3

4

6

8

8

3

11

3

Different

(*) The composition of positions 96 and 97, determined largely by junction and IGKJ diversity, could be the same for length 9 VK CDR3 of all chassis.

The library of Table 34 would have 1.37×10⁶unique polypeptide sequences, calculated by multiplying together the numbers in the bottom row of the table.

The underlined 0 entries for Asn (N) at certain positions represent regions where the possibility of having N-linked glycosylation sites in the VKCDR3 has been minimized or eliminated. Peptide sequences with the pattern N-X-(S or T)-Z, where X and Z are different from P, may undergo post-translational modification in a number of expression systems, including yeast and mammalian cells. Moreover, the nature of such modification depends on the specific cell type and, even for a given cell type, on culture conditions. N-linked glycosylation may be disadvantageous when it occurs in a region of the antibody molecule likely to be involved in antigen binding (e.g., a CDR), as the function of the antibody may then be influenced by factors that may be difficult to control. For example, considering position 91 above, one can observe that position 92 is never P. Position 94 is not P in 95% of the cases. However, position 93 is S or T in 75% (65+10) of the cases. Thus, allowing N at position 91 would generate the undesirable motif N-X-(T/S)-Z (with both X and Z distinct from P), and a zero occurrence has therefore been implemented, even though N is observed with some frequency in actual human sequences (see Table 30). A similar argument applies for N at positions 92 and 94. It should be appreciated, however, that if the antibody library were to be expressed in a system incapable of N-linked glycosylation, such as bacteria, or under culture conditions in which N-linked glycosylation did not occur, this consideration may not apply. However, even in the event that the organism used to express libraries with potential N-linked glycosylation sites is incapable of N-linked glycosylation (e.g., bacteria), it may still be desirable to avoid N-X-(S/T) sequences as the antibodies isolated from such libraries may be expressed in different systems (e.g., yeast, mammalian cells) later (e.g., toward clinical development), and the presence of carbohydrate moieties in the variable domains, and the CDRs in particular, may lead to unwanted modifications of activity. These embodiments are also included within the scope of the invention. To our knowledge, VKCDR3 libraries known in the art have not considered this effect, and thus a proportion of their members may have the undesirable qualities mentioned above.

We also designed additional sub-libraries, related to the library outlined in Table 348 for VKCDR3 of lengths 8 and 10. In these embodiments, the compositions at positions 89 to 94 and 97 remain the same as those depicted in Table 34. Additional diversity, introduced at positions 95 and 95A, the latter being defined for VKCDR3 of length 10 only, are illustrated in Table 35.

TABLE 35

Amino Acid Composition (%) for VK1-39

Libraries of Lengths 8 and 10

Amino
Position 95 -
Position 95 -
Position 95A -

Acid
Length 8 (*)
Length 10 (**)
Length 10

A

D

E

F
5

G

5

H

I
10

5

K

L
20
10
10

M

N

P
25
85
60

Q

R
10
5
10

S
5

5

T

5

V
5

W
10

Y
10

Number
9
3
8

Different

(*) Position 96 is deleted in VKCDR3 of size 8.

(**) This is the same composition as in VKCDR3 of size 9.

The total number of unique members in the VK1-39 library of length 8, thus, can be obtained as before, and is 3.73×10⁵(or, 3×3×4×6×8×8×9×3). Similarly, the complexity of the VK1-39 library of length 10 would be 10.9×10⁶(or 8 times that of the library of size 9, as there is additional 8-fold variation at the insertion position 95A). Thus, there would be a total of 12.7×10⁶unique members in the overall VK1-39 library, as obtained by summing the number of unique members for each of the specified lengths. In certain embodiments of the invention, it may be preferable to create the individual sub-libraries of lengths 8, 9 and 10 separately, and then mix the sub-libraries in proportions that reflect the length distribution of VKCDR3 in human sequences; for example, in ratios approximating the 1:9:2 distribution that occurs in natural VKCDR3 sequences (see FIG. 3). The present invention provides the compositions and methods for one of ordinary skill synthesizing VKCDR3 libraries corresponding to other VK chassis.

Example 7: A Minimalist Vλ CDR3 Library

This example describes the design of a minimalist VλCDR3 library. The principles used in designing this library (or more complex Vλ□ libraries) are similar to those used to design the VKCDR3 libraries. However, unlike the VK genes, the contribution of the IgλV □segment to CDRL3 is not constrained to a fixed number of amino acids. Therefore, length variation may be obtained in a minimalist VλCDR3 library even when only considering combinations between Vλ□ chassis and Jλ sequences.

Examination of the VλCDR3 lengths of human sequences shows that lengths of 9 to 12 account for almost about 95% of sequences, and lengths of 8 to 12 account for about 97% of sequences (FIG. 4). Table 36 shows the usage (percent occurrence) of the six known IGλJ genes in the rearranged human lambda light chain sequences compiled from the NCBI database (see Appendix B), and Table 37 shows the sequences encoded by the genes.

TABLE 36

IGλJ Gene Usage in the Lambda Light Chain Sequences

Compiled from the NCBI Database (see Appendix B)

Gene_Allele
LUA

Jλ1_01
20.2%

Jλ2_01
42.2%

Jλ3_02
36.2%

Jλ6_01
0.6%

Jλ7_01
0.9%

TABLE 37

Observed Human IGAJ Amino Acid Sequences

Gene
Sequence
SEQ ID NO:

IGλJ1-01

YVFGTGTKVTVL

557

IGλJ2-01

VVFGGGTKLTVL

558

IGλJ3-01
WVEGGGTKLTVL
559

IGλJ3-02

VVFGGGTKLTVL

560

IGλJ6-01
NVFGSGTKVTVL
561

IGλJ7-01
AVFGGGTQLTVL
562

IGλJ7-02
AVEGGGTQLTAL
563

IGλJ3-01 and IGλJ7-02 are not represented among the sequences that were analyzed; therefore, they were not included in Table 36. As illustrated in Table 36, IGλJ1-01, IGλJ2-01, and IGλJ3-02 are over-represented in their usage, and have thus been bolded in Table 37. In some embodiments of the invention, for example, only these three over-represented sequences may be utilized. In other embodiments of the invention, one may use all six segments, any 1, 2, 3, 4, or 5 of the 6 segments, or any combination thereof may be utilized.

As shown in Table 14, the portion of CDRL3 contributed by the IGλV gene segment is 7, 8, or 9 amino acids. The remainder of CDRL3 and FRM4 are derived from the IGλJ sequences (Table 37). The IGλJ sequences contribute either one or two amino acids to CDRL3. If two amino acids are contributed by IGλJ, the contribution is from the N-terminal two residues of the IGλJ segment: YV (IGλJ1-01). VV (IGλJ2-01), WV (IGλJ3-01), VV (IGλJ3-02), or AV (IGλJ7-01 and IGλJ7-02). If one amino acid is contributed from IGλJ, it is a V residue, which is formed after the deletion of the N-terminal residue of a IGλJ segment.

In this non-limiting exemplary embodiment of the invention, the FRM4 segment was fixed as FGGGTKLTVL, corresponding to IGλJ2-01 and IGλJ3-02 (i.e., portions of SEQ ID NOs: 558 and 560).

Seven of the 11 selected chassis (Vλ1-40 (SEQ ID NO: 531), Vλ3-19 (SEQ ID NO: 536), Vλ3-21 (SEQ ID NO: 537), Vλ6-57 (SEQ ID NO: 539), Vλ1-44 (SEQ ID NO: 532), Vλ1-51 (SEQ ID NO: 533), and Vλ4-69 (SEQ ID NO: 538) have an additional two nucleotides following the last full codon. In four of those seven cases, analysis of the data set provided in Appendix B showed that the addition of a single nucleotide (i.e. without being limited by theory, via the activity of TdT) lead to a further increase in CDRL3 length. This effect can be considered by introducing variants for the L3-Vλ sequences contributed by these four 1GλV sequences (Table 38).

TABLE 38

Variants with an additional

residue in CDRL3

SEQ

CDR3/
ID

Name
Locus
FRM1
CDR1
FRM2
CDR2
FRM3
L3-Vλ
NO:

1E+
IGVλ1-
QSVL
TGSS
WYQQ
YGN-
GVPD
QSYD
564

40+
TQPP
SNIG
LPGT
---S
RFSG
SSLS

SVSG
AGYD
APKL
NRPS
SKSG
GS

APGQ
---V
LI

--TS

RVTI
H

ASLA

SC

ITGL

QAED

EADY

YC

3L+
IGVλ3-
SSEL
QGDS
WYQQ
YGK-
GIPD
NSRD
565

19+
TQDP
LRSY
KPGQ
---N
RESG
SSGN

AVSV
Y---
APVL
NRPS
SSSG
HH/Q

ALGQ
---A
VI

----

TVRI
S

NTAS

TC

ITIT

GAQA

EDEA

DYYC

3H+
IGVλ3-
SYVL
GGNN
WYQQ
YYD-
GIPE
QVWD
566

21+
TQPP
IGSK
KPGQ
---S
RFSG
SSSD

SVSV
S---
APVL
DRPS
SNSG
HP

APGK
---V
VI

----

TARI
H

NTAT

TC

LTIS

RVEA

GDEA

DYYC

6A+
IGVλ6-
NEML
TRSS
WYQQ
YED-
GVPD
QSYD
567

57+
TQPH
GSIA
RPGS
---N
RESG
SSN

SVSE
SNY-
SPTT
QRPS
SIDS

H/Q
-

SPGK
---V
VI

SSNS
-

TVTI
Q

ASLT

SC

ISGL

KTED

EADY

YC

(+) sequences are derived from their parents by the addition of an amino acid at the end of the respective CDR3 (bold underlined).

H/Q can be introduced in a single sequence by use of the degenerate codon CAW or similar.

Thus, the final set of chassis in the currently exemplified embodiment of the invention is 15: eleven contributed by the chassis in Table 14 and an additional four contributed by the chassis of Table 38. The corresponding L3-V, domains of the 15 chassis contribute from 7 to 10 amino acids to CDRL3. When considering the amino acids contributed by the IGλJ sequences, the total variation in the length of CDRL3 is 8 to 12 amino acids, approximating the distribution in FIG. 4. Thus, in this exemplary embodiment of the invention, the minimalist Vλ library may be represented by the following: 15 Chassis×IGλJ-derived segments=75 sequences. Here, the 15 chassis are Vλ1-40 (SEQ ID NO: 531), Vλ1-44 (SEQ ID NO: 532), Vλ1-51 (SEQ ID NO: 533), Vλ2-14 (SEQ ID NO: 534), Vλ3-1* (SEQ ID NO: 535), Vλ3-19 (SEQ ID NO: 536), Vλ3-21 (SEQ ID NO: 537), Vλ4-69 (SEQ ID NO: 538), Vλ6-57 (SEQ ID NO: 539), Vλ5-45 (SEQ ID NO: 540), Vλ7-43 (SEQ ID NO: 541), Vλ1-40+(SEQ ID NO: 564), Vλ3-19+(SEQ ID NO: 565), Vλ3-21+(SEQ ID NO: 566), and Vλ6-57+(SEQ ID NO: 567). The 5 IGλJ-derived segments are YVFGGGTKLTVL (IGλJ1; SEQ ID NO: 568), VVFGGGTKLTVL (IGλJ2; SEQ ID NO: 558), WVFGGGTKLTVL (IGλJ3; SEQ ID NO: 559), AVFGGGTKLTVL (IGλJ7; SEQ ID NO: 569), and -VFGGGTKLTVL (from any of the preceding sequences).

Example 8: Matching to “Reference” Antibodies

CDRH3 sequences of human antibodies of interest that are known in the art, (e.g., antibodies that have been used in the clinic) have close counterparts in the designed library of the invention. A set of fifteen CDRH3 sequences from clinically relevant antibodies is presented in Table 39.

TABLE 39

CDRH3 Sequences of Reference Antibodies

SEQ

Antibody

CDHR3
ID

Name
Target
Origin
Status
sequence
NO:

CAB1
TNF-α
Phage
FDA
AKVSYLSTASS
380

display-
Approved
LDY

human

library

CAB2
EGFR
Transgenic
FDA
VRDRVTGAFDI
381

mouse
Approved

CAB3
IL-12/
Phage
Phase III
KTHGSHDN
382

IL-23
display-

human

library

CAB4
Interleukin-
Transgenic
Phase III
ARDLRTGPFDY
383

1-B
mouse

CAB5
RANKL
Transgenic
Phase III
AKDPGTTVIMS
384

mouse

WEDP

CAB6
IL-12/
Transgenic
Phase III
ARRRPGQGYFD
385

IL-23
mouse

F

CAB7
TNF-α
Transgenic
Phase III
ARDRGASAGGN
386

mouse

YYYYGMDV

CAB8
CTLA4
Transgenic
Phase III
ARDPRGATLYYY
387

mouse

YYGMDV

CAB9
CD20
Transgenic
Phase III
AKDIQYGNYYYG
388

mouse

MDV

CAB10
CD4
Transgenic
Phase III
ARVINWEDP
389

mouse

CAB11
CTLA4
Transgenic
Phase III
ARTGWLGPEDY
390

mouse

CAB12
IGF1-R
Transgenic
Phase II
AKDLGWSDSYYY
391

mouse

YYGMDV

CAB13
EGFR
Transgenic
Phase II
ARDGITMVRGVM
392

mouse

KDYFDY

CAB14
EGFR
Phage
Phase II
ARVSIFGVGTFD
393

display-

Y

human

library

CAB15
BLYS
Phage
Phase II
ARSRDLLLFPHH
394

display-

ALSP

human

library

Each of the above sequences was compared to each of the members of the library of Example 5, and the member, or members, with the same length and fewest number of amino acid mismatches was, or were, recorded. The results are summarized in Table 40, below. For most of the cases, matches with 80% identity or better were found in the exemplified CDRH3 library. To the extent that the specificity and binding affinity of each of these antibodies is influenced by their CDRH3 sequence, without being bound by theory, one or more of these library members could have measurable affinity to the relevant targets.

TABLE 40

Match of Reference Antibody CDRH3 to Designed Library

Number of

% Identity of

Antibody Name
Mismatches (*)
Length
Best Match

CAB1
5
14
64%

CAB2
2
11
82%

CAB3
4
8
50%

CAB4
2
11
82%

CAB5
3
15
80%

CAB6
3
12
75%

CAB7
2
20
90%

CAB8
0
19
100%

CAB9
3
15
80%

CAB10
1
9
89%

CAB11
1
11
91%

CAB12
2
18
89%

CAB13
2
18
89%

CAB14
1
13
92%

CAB15
7
16
56%

(*) For the best-matching sequence(s) in library

Given that a physical realization of a library with about 10⁸distinct members could, in practice, contain every single member, then such sequences with close percent identity to antibodies of interest would be present in the physical realization of the library. This example also highlights one of many distinctions of the libraries of the current invention over those of the art; namely, that the members of the libraries of the invention may be precisely enumerated. In contrast, CDRH3 libraries known in the art cannot be explicitly enumerated in the manner described herein. For example, many libraries known in the art (e.g., Hoet et al., Nat. Biotechnol., 2005, 23: 344. Griffiths et al., EMBO J., 1994, 13: 3245; Griffiths et al., EMBO J., 1993, 12: 725; Marks et al., J. Mol. Biol., 1991, 222: 581, each incorporated by reference in its entirety) are derived by cloning of natural human CDRH3 sequences and their exact composition is not characterized, which precludes enumeration.

Synthetic libraries produced by other (e.g., random or semi-random/biased) methods (Knappik, et al., J Mol Biol, 2000, 296: 57, incorporated by reference in its entirety) tend to have very large numbers of unique members. Thus, while matches to a given input sequence (for example, at 80% or greater) may exist in a theoretical representation of such libraries, the probability of synthesizing and then producing a physical realization of the theoretical library that contains such a sequence and then selecting an antibody corresponding to such a match, in practice, may be remotely small. For example, a CDRH3 of length 19 in the Knappik library may have over 10¹⁹distinct sequences. In a practical realization of such a library a tenth or so of the sequences may have length 19 and the largest total library may have in the order of 10¹⁰to 10¹²transformants; thus, the probability of a given pre-defined member being present, in practice, is effectively zero (less than one in ten million). Other libraries (e.g., Enzelberger et al. WO2008053275 and Ladner US20060257937, each incorporated by reference in its entirety) suffer from at least one of the limitations described throughout this application.

Thus, for example, considering antibody CAB14, there are seven members of the designed library of Example 5 that differ at just one amino acid position from the sequence of the CDRH3 of CAB14 (given in Table 39). Since the total length of this CDRH3 sequence is 13, the percent of identical amino acids is 12/13 or about 92% for each of these 7 sequences of the library of the invention. It can be estimated that the probability of obtaining such a match (or better) in the library of Knappik el al. is about 1.4×10⁻⁹it would be lower still, about 5.5×10⁻¹⁰, in a library with equal amino acid proportions (i.e., completely random). Therefore, in a physical realization of the library with about 10¹⁰transformants of which about a tenth may have length 13, there may be one or two instances of these best matches. However, with longer sequences such as CAB12, the probability of having members in the Knappik library with about 89% or better matching are under about 10⁻¹⁵, so that the expected number of instances in a physical realization of the library is essentially zero. To the extent that sequences of interest resemble actual human CDRH3 sequences, there will be close matches in the library of Example 5, which was designed to mimic human sequences. Thus, one of the many relative advantages of the present library, versus those in the art, becomes more apparent as the length of the CDRH3 increases.

Example 9: Split Pool Synthesis of Oligonucleotides Encoding the DH, N2, and H3-JH Segments

This example outlines the procedures used to synthesize the oligonucleotides used to construct the exemplary libraries of the invention. Custom Primer Support™ 200 dT40S resin (GE Healthcare) was used to synthesize the oligonucleotides, using a loading of about 39 μmol/g of resin. Columns (diameter=30 μm) and frits were purchased from Biosearch Technologies, Inc. A column bed volume of 30 μL was used in the synthesis, with 120 nmol of resin loaded in each column. A mixture of dichloromethane (DCM) and methanol (MeOH), at a ratio of 400/122 (v/v) was used to load the resin. Oligonucleotides were synthesized using a Dr. Oligo® 192 oligonucleotide synthesizer and standard phosphorothioate chemistry.

The split pool procedure for the synthesis of the [DH]-[N2]-[H3-JH] oligonucleotides was performed as follows: First, oligonucleotide leader sequences, containing a randomly chosen 10 nucleotide sequence (ATGCACAGTT, SEQ ID NO: 395), a BsrDI recognition site (GCAATG), and a two base “overlap sequence” (TG, AC, AG, CT, or GA) were synthesized. The purpose of each of these segments is explained below. After synthesis of this 18 nucleotide sequence, the DH segments were synthesized; approximately 1 g of resin (with the 18 nucleotide segment still conjugated) was suspended in 20 mL of DCM/MeOH. About 60 μL of the resulting slurry (120 nmol) was distributed inside each of 278 oligonucleotide synthesis columns. These 278 columns were used to synthesize the 278 DH segments of Table 18, 3′ to the 18 nucleotide segment described above. After synthesis, the 278 DH segments were pooled as follows: the resin and frits were pushed out of the columns and collected inside a 20 mL syringe barrel (without plunger). Each column was then washed with 0.5 mL MeOH, to remove any residual resin that was adsorbed to the walls of the column. The resin in the syringe barrel was washed three times with MeOH, using a low porosity glass filter to retain the resin. The resin was then dried and weighed.

The pooled resin (about 1.36 g) containing the 278 DH segments was subsequently suspended in about 17 mL of DCM/MeOH, and about 60 μL of the resulting slurry was distributed inside each of two sets of 141 columns. The 141 N2 segments enumerated in Tables 24 and 25 were then synthesized, in duplicate (282 total columns), 3′ to the 278 DH segments synthesized in the first step. The resin from the 282 columns was then pooled, washed, and dried, as described above.

The pooled resin obtained from the N2 synthesis (about 1.35 g) was suspended in about 17 mL of DCM/MeOH, and about 60 μL of the resulting slurry was distributed inside each of 280 columns, representing 28 H3-JH segments synthesized ten times each. A portion (described more fully below) of each of the 28 IGHJ segments, including H3-JH of Table 20 were then synthesized, 3′ to the N2 segments, in ten of the columns. Final oligonucleotides were cleaved and deprotected by exposure to gaseous ammonia (85° C., 2 h, 60 psi).

Split pool synthesis was used to synthesize the exemplary CDRH3 library. However, it is appreciated that recent advances in oligonucleotide synthesis, which enable the synthesis of longer oligonucleotides at higher fidelity and the production of the oligonucleotides of the library by synthetic procedures that involve splitting, but not pooling, may be used in alternative embodiments of the invention. The split pool synthesis described herein is, therefore, one possible means of obtaining the oligonucleotides of the library, but is not limiting. One other possible means of synthesizing the oligonucleotides described in this application is the use of trinucleotides. This may be expected to increase the fidelity of the synthesis, since frame shift mutants would be reduced or eliminated.

Example 10: Construction of the CDRH3 and Heavy Chain Libraries

This example outlines the procedures used to create exemplary CDRH3 and heavy chain libraries of the invention. A two step process was used to create the CDRH3 library. The first step involved the assembly of a set of vectors encoding the tail and N1 segments, and the second step involved utilizing the split pool nucleic acid synthesis procedures outlined in Example 9 to create oligonucleotides encoding the DH, N2, and H3-JH segments. The chemically synthesized oligonucleotides were then ligated into the vectors, to yield CDRH3 residues 95-10², based on the numbering system described herein. This CDRH3 library was subsequently amplified by PCR and recombined into a plurality of vectors containing the heavy chain chassis variants described in Examples 1 and 2. CDRH1 and CDRH2 variants were produced by QuikChange® Mutagenesis (Stratagene™), using the oligonucleotides encoding the ten heavy chain chassis of Example 1 as a template. In addition to the heavy chain chassis, the plurality of vectors contained the heavy chain constant regions (i.e., CH1, CH2, and CH3) from IgG1, so that a full-length heavy chain was formed upon recombination of the CDRH3 with the vector containing the heavy chain chassis and constant regions. In this exemplary embodiment, the recombination to produce the full-length heavy chains and the expression of the full-length heavy chains were both performed in S. cerevisiae.

To generate full-length, heterodimeric IgGs, comprising a heavy chain and a light chain, a light chain protein was also expressed in the yeast cell. The light chain library used in this embodiment was the kappa light chain library, wherein the VKCDR3s were synthesized using degenerate oligonucleotides (see Example 6.2). Due to the shorter length of the oligonucleotides encoding the light chain library (in comparison to those encoding the heavy chain library), the light chain CDR3 oligonucleotides could be synthesized de novo, using standard procedures for oligonucleotide synthesis, without the need for assembly from sub-components (as in the heavy chain CDR3 synthesis). One or more light chains can be expressed in each yeast cell which expresses a particular heavy chain clone from a library of the invention. One or more light chains have been successfully expressed from both episomal (e.g., plasmid) vectors and from integrated sites in the yeast genome.

Below are provided further details on the assembly of the individual components for the synthesis of a CDRH3 library of the invention, and the subsequent combination of the exemplary CDRH3 library with the vectors containing the chassis and constant regions. In this particular exemplary embodiment of the invention, the steps involved in the process may be generally characterized as (i) synthesis of 424 vectors encoding the tail and N1 regions; (ii) ligation of oligonucleotides encoding the [DH]-[N2]-[H3-JH]segments into these 424 vectors; (iii) PCR amplification of the CDRH3 sequences from the vectors produced in these ligations; and (iv) homologous recombination of these PCR-amplified CDRH3 domains into the yeast expression vectors containing the chassis and constant regions.

Example 10.1: Synthesis of Vectors Encoding the Tall and N1 Regions

This example demonstrates the synthesis of 424 vectors encoding the tail and N1 regions of CDRH3. In this exemplary embodiment of the invention, the tail was restricted to G, D, E. or nothing, and the N1 region was restricted to one of the 59 sequences shown in Table 24. As described throughout the specification, many other embodiments are possible.

In the first step of the process, a single “base vector” (pJM204, a pUC-derived cloning vector) was constructed, which contained (i) a nucleic acid sequence encoding two amino acids that are common to the C-terminal portion of all 28 IGHJ segments (SS), and (ii) a nucleic acid sequence encoding a portion of the CH1 constant region from IgG1. Thus, the base vector contains an insert encoding a sequence that can be depicted as:

[SS]-[CH1˜],

wherein SS is a common portion of the C-terminus of the 28 IGHJ segments and CH1˜ is a portion of the CH1 constant region from IgG1, namely:

(SEQ ID NO: 396)

ASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTV

SWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLG.

Next, 424 different oligonucleotides were cloned into the base vector, upstream (i.e., 5′) from the region encoding the [SS]-[CH1˜]. These 424 oligonucleotides were synthesized by standard methods and each encoded a C-terminal portion of one of the 17 heavy chain chassis enumerated in Table 5, plus one of four exemplary tail segments (G/D/E/-), and one of 59 exemplary N1 segments (Table 24). These 424 oligonucleotides, therefore, encode a plurality of sequences that may be represented by:

[˜FRM3]-[G/D/E/-]-[N1],

wherein ˜FRM3 represents a C-terminal portion of a FRM3 region from one of the 17 heavy chain chassis of Table 5, G/D/E/- represents G, D, E. or nothing, and N1 represents one of the 59 N1 sequences enumerated in Table 24. As described throughout the specification, the invention is not limited to the chassis exemplified in Table 5, their CDRH1 and CDRH2 variants (Table 8), the four exemplary tail options used in this example, or the 59 N1 segments presented in Table 24.

The oligonucleotide sequences represented by the sequences above were synthesized in two groups: one group containing a ˜FRM3 region identical to the corresponding region on 16 of the 17 the heavy chain chassis enumerated in Table 5, and another group containing a ˜FRM3 region that is identical to the corresponding region on VH3-15. In the former group, an oligonucleotide encoding DTAVYYCAR (SEQ ID NO: 397) was used for ˜FRM3. During subsequent PCR amplification, the V residue of VH5-51 was altered to an M, to correspond to the VH5-51 germline sequence. In the latter group (that with a sequence common to VH3-15), a larger oligonucleotide, encoding the sequence AISGSGGSTYYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCAK (SEQ ID NO: 398) was used for ˜FRM3. Each of the two oligonucleotides encoding the ˜FRM3 regions were paired with oligonucleotides encoding one of the four tail regions (G/D/E/-) and one of the 59 N1 segments, yielding a total of 236 possible combinations for each ˜FRM3 (i.e., 1×4×59), or a total of 472 possible combinations when both ˜FRM3 sequences are considered. However, 48 of these combinations are redundant and only a single representation of these sequences was used in the currently exemplified CDRH3 library, yielding 424 unique oligonucleotides encoding [˜FRM3]-[G/D/E/-]-[N1] sequences.

After the oligonucleotides encoding the [˜FRM3]-[G/D/E/-]-[N1] and [SS]-[CH1˜] segments were cloned into the vector, as described above, additional sequences were added to the vector to facilitate the subsequent insertion of the oligonucleotides encoding the [DH]-[N2]-[H3-JH] fragments synthesized during the split pool synthesis. These additional sequences comprise a polynucleotide encoding a selectable marker protein, flanked on each side by a recognition site for a type II restriction enzyme, for example:

- [Type II RS 1]-[selectable marker protein]-[Type II RS 2].
  
  In this exemplary embodiment, the selectable marker protein is ccdB and the type 11 restriction enzyme recognition sites are specific for BsrDI and BbsI. In certain strains of E. coli, the ccdB protein is toxic, thereby preventing the growth of these bacteria when the gene is present.

An example of the 5′ end of one of the 212 vectors with a ˜FRM3 region based on the VH3-23 chassis, D tail residue and an N1 segment of length zero is presented below (amino acid: SEQ ID NO: 1516; coding strand: SEQ ID NO: 570; complementary strand: SEQ ID NO: 1517):

VH3-23

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A I S G S G G S T Y

961 GCTATTAG TGGTAGTGGT GGTAGCACAT

CGATAATC ACCATCACCA CCATCGTGTA

VH3-23

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Y A D S V K G R E T I S R D N S K N T L Y L Q M N S

1041 ACTACGCAGA CTCCGTGAAG GGCCGGTTCA CCATCTCCAG AGACAATTCC AAGAACACGC TGTATCTGCA AATGAACAGC

TGATGCGTCT GAGGCACTTC COGGCCAAGT GGTAGAGGTC TOTGTTAAGG TTCTTGTGCG ACATAGACGT TTACTTGTCG

VH3-23 ccdB

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

BsrDI

~~~~~~

L R A E D T A V Y Y C A K

1121 CTGAGAGCCG AGGACACGGC GGTGTACTAC TGCGCCAAGG ACCATTGCGC TTAGCCTAGG TTATATTCCC CAGAACATCA

GACTCTCGGC TCCTGTGCCG CCACATGATG ACGCGGTTCC TGGTAACGCG AATCGGATCC AATATAAGGG GTCTYGWAGT

An example of one of the 212 vectors with a ˜FRM3 region based on one of the other 16 chassis, with a D residue as the tail and an N1 segment of length zero is presented below (amino acid: SEQ ID NO: 1518, coding strand: SEQ ID NO: 571. complementary strand: SEQ ID NO: 1519):

Framework 3

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

D T A V Y Y C A R

961 GACACGGCG GTGTACTACT GCGCCAGAGA

CTGTGCCGC CACATGATGA CGCGGTCTCT

ccdB

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

BsrDI

~~~~~~

1041 CCATTGCGCT TAGCCTAGGT TATATTCCCC AGAACATCAG GTTAATGGCG TTTTTGATGT CATTTTCGCG GTGGCTGAGA

GGTAACGCGA ATCGGATCCA ATATAAGGGG TCTTGTAGTC CAATTACCCC AAAAACTACA GTAAAAGCGC CACCGACTCT

ccdB

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1121 TCAGCCACTT CTTCCCCGAT AACGGAAACC GGCACACTGG CCATATCGGT GGTCATCATG CGCCAGCTTT CATCCCCGAT

AGTCGGTGAA GAAGGGGCTA TTGCCTTTGG CCGTGTGACC GGTATAGCCA CCAGTAGTAC GCGGTCGAAA CTAGGGGCTA

ccdB

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1201 ATGCACCACC GGGTAAAGTT CACGGGAGAC TTTATCTGAC AGCAGACGTG CACTGGCCAG GGGGATCACC ATCCGTCGCC

TACGTGGTGG CCCATTTCAA GTGCCCTCTG AAATAGACTG TCGTCTGCAC GTGACCGGTC CCCCTAGTGG TAGGCAGCGG

ccdB

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1281 CGGGCGTGTC AATAATATCA CTCTGTACAT CCACAAACAG ACGATAACGG CTCTCTCTTT TATAGGTGTA AACCTTAAAC

GCCCGCACAG TTATTATAGT GAGACATGTA GGTGTTTGTC TGCTATTGCC GAGAGAGAAA ATATCCACAT TTGGAATTTG

ccdB

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1361 TGCATTTCAC CAGCCCCTGT TCTCGTCAGC AAAAGAGCCG TTCATTTCAA TAAACCGGGC GACCTCAGCC ATCCCTTCCT

ACGTAAAGTG GTCGGGGACA AGAGCAGTCG TTTTCTCGGC AAGTAAAGTT ATTTGGCCCG CTGGAGTCGG TAGGGAAGGA

ccdB

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1441 GATTTTCCGC TTTCCAGCGT TCGGCACGCA GACGACGGGC TTCATTCTGC ATGGTTGTGC TTACCAGACC GGAGATATTG

CTAAAAGGCG AAAGGTCGCA AGCCGTGCGT CTGCTGCCCG AAGTAAGACG TACCAACACG AATGGTCTGG CCTCTATAAC

ccdB

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1521 ACATCATATA TGCCTTGAGC AACTGATAGC TGTCGCTGTC AACTGTCACT GTAATACGCT GCTTCATAGC ATACCTCTTT

TGTAGTATAT ACGGAACTCG TTGACTATCG ACAGCGACAG TTGACAGTGA CATTATGCGA CGAAGTATCG TATGGAGAAA

ccdB

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1601 TTGACATACT TCGGGTATAC ATATCAGTAT ATATTCTTAT ACCGCAAAAA TCAGCGCGCA AATATGCATA CTGTTATCTG

AACTGTATGA AGCCCATATG TATAGTCATA TATAAGAATA TGGCGTTTTT AGTCGCGCGT TTATACGTAT GACAATAGAC

ccdB CH1

~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

BbsI

~~~~~~~

A S T K G P S V F P L A P S

1681 GCTTTTAGTA AGCCGCCTAG GTCATCAGAA GACAACTCAG CTAGCACCAA GGGCCCATCG GTCTTTCCCC TGGCACCCTC

CGAAAATCAT TCGGCGGATC CAGTAGTCTT CTGTTGAGTC GATCGTGGTT CCCGGGTAGC CAGAAAGGGG ACCGTGGGAG

CH1

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

S K S T S G G T A A L G C L V K D Y E P E P V T V S W

1761 CTCCAAGAGC ACCTCTGGGG GCACAGCGGC CCTGGGCTGC CTGGTCAAGG ACTACTTCCC CGAACCGGTG ACGGTGTCGT

GAGGTTCTCG TGGAGACCCC CGTGTCGCCG GGACCCGACG GACCAGTTCC TGATGAAGGG GCTTGGCCAC TGCCACAGCA

CH1

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

N S G A L T S G V H T F P A V L Q S S G L

1841 GGAACTCAGG CGCCCTGACC AGCGGCGTGC ACACCTTCCC GGCTGTCCTA CAGTCCTCAG GACTC

CCTTGAGTCC GCGGGACTGG TCGCCGCACG TGTGGAAGGG CCGACAGGAT GTCAGGAGTC CTGAG

All 424 vectors were sequence verified. A schematic diagram of the content of the 424 vectors, before and after cloning of the [DH]-[N2]-[H3-JH]fragment is presented in FIG. 5. Below is an exemplary sequence from one of the 424 vectors containing a FRM3 region from VH3-23 (amino acid: SEQ ID NO: 1520; coding strand: SEQ ID NO: 572; complementary strand: SEQ ID NO: 1521).

primer EMK135

------------------------

VH3-23

--------------------------------------------------------------

A I S G S G G S T Y Y A D S V K G R F

561
GCTATTA GTGGTAGTGG TGGTAGCACA TACTACGCAG ACTCCGTGAA GGGCCGGTTC

CGATAAT CACCATCACC ACCATCGTGT ATGATGCGTC TGAGGCACTT CCCGGCCAAG

VH3-23

---------------------------------------------------------------------------------------

T I S R D N S K N T L Y L Q M N S L R A E D T A V Y Y

641
ACCATCTCCA GAGACAATTC CAAGAACACG CTGTATCTGC AAATGAACAG CCTGAGAGCC GAGGACACGG CGGTGTACTA

TGGTAGAGGT CTCTGTTAAG GTTCTTGTGC GACATAGACG TTTACTTGTC GGACTCTCGG CTCCTGTGCC GCCACATGAT

VH3-23 D J1

----------- --------------------------------- --------------------

JH6

--------------------

N1_9 N2

------------- ----------

C A K D A G G Y Y Y G S G S Y Y N A A A Y Y Y Y Y G M

721
CTGCGCCAAG GACGCCGGAG GATATTATTA TGGGTCAGGA AGCTATTACA ACGCTGCGGC TTACTACTAC TATTATGGCA

GACGCGGTTC CTGCGGCCTC CTATAATAAT ACCCAGTCCT TCGATAATGT TGCGACGCCG AATGATGATG ATAATACCGT

JH6

---------------------------------------------

J1 CH1

-------- ------------------------------------------

Nhe1

------

D V W G Q G T T V T V S S A S T K G P S V F P L A P

801
TGGACGTGTG GGGACAAGGT ACAACAGTCA CCGTCTCCTC AGCTAGCACC AAGGGCCCAT CGGTCTTTCC CCTGGCACCC

ACCTGCACAC CCCTGTTCCA TGTTGTCAGT GGCAGAGGAG TCGATCGTGG TTCCCGGGTA GCCAGAAAGG GGACCGTGGG

CH1

---------------------------------------------------------------------------------------

S S K S T S G G T A A L G C L V K D Y F P E P V T V S

881
TCCTCCAAGA GCACCTCTGG GGGCACAGCG GCCCTGGGCT GCCTGGTCAA GGACTACTTC CCCGAACCGG TGACGGTGTC

AGGAGGTTCT CGTGGAGACC CCCGTGTCGC CGGGACCCGA CGGACCAGTT CCTGATGAAG GGGCTTGGCC ACTGCCACAG

EK137 CH1 Primer

--------------------

CH1

---------------------------------------------------------------------------------------

W N S G A L T S G V H T F P A V L Q S S G L Y S L S S

961
GTGGAACTCA GGCGCCCTGA CCAGCGGCGT GCACACCTTC CCGGCTGTCC TACAGTCCTC AGGACTCTAC TCCCTCAGCA

CACCTTGAGT CCGCGGGACT GGTCGCCGCA CGTGTGGAAG GGCCGACAGG ATGTCAGGAG TCCTGAGATG AGGGAGTCGT

CH1

-----------------------------------

V V T V P S S S L G

1041
GCGTGGTGAC CGTGCCCTCC AGCAGCTTGG GC

CGCACCACTG GCACGGGAGG TCGTCGAACC CG

Example 10.2: Cloning of the Oligonucleotides Encoding the DH, N2, H3-JH Segments into the Vectors Containing the Tall and N1 Segments

This example describes the cloning of the oligonucleotides encoding the [D]-[N2]-[H3-JH] segments (made via split pool synthesis; Example 9) into the 424 vectors produced in Example 10.1. To summarize, the [DH]-[N2]-[H3-JH] oligonucleotides produced via split pool synthesis were amplified by PCR, to produce double-stranded oligonucleotides, to introduce restriction sites that would create overhangs complementary to those on the vectors (i.e., BsrDI and BbsI), and to complete the 3′ portion of the IGHJ segments that was not synthesized in the split pool synthesis. The amplified oligonucleotides were then digested with the restriction enzymes BsrDI (cleaves adjacent to the DH segment) and BbsI (cleaves near the end of the JH segment). The cleaved oligonucleotides were then purified and ligated into the 424 vectors which had previously been digested with BsrDI and BbsI. After ligation, the reactions were purified, ethanol precipitated, and resolubilized.

This process for one of the [DH]-[N2]-[H3-JH] oligonucleotides synthesized in the split pool synthesis is illustrated below. The following oligonucleotide (SEQ ID NO: 399) is one of the oligonucleotides synthesized during the split pool synthesis:

1
ATGCACAGTTGCAATGTGTATTACTATGGATCTGGTTCTTACTATAATGT
50

51

GGGCGGA
TATTATTACTACTATGGTATGGACGTATGGGGGCAAGGGACC

99

The first 10 nucleotides (ATGCACAGTT: SEQ ID NO: 395) represent a portion of a random sequence that is increased to 20 base pairs in the PCR amplification step, below. This portion of the sequence increases the efficiency of BsrDI digestion and facilitates the downstream purification of the oligonucleotides.

Nucleotides 11-16 (underlined) represent the BsrDI recognition site. The two base overlap sequence that follows this site (in this example TG, bold) was synthesized to be complementary to the two base overhang created by digesting certain of the 424 vectors with BsrDI (i.e., depending on the composition of the tail/N1 region of the particular vector). Other oligonucleotides contain different two-base overhangs, as described below.

The two base overlap is followed by the DH gene segment (nucleotides 1948), in this example, by a 30 bp sequence (TATTACTATGGATCTGGTTCTTACTATAAT, SEQ ID NO: 400) which encodes the ten residue DH segment YYYGSGSYYN (i.e., IGHD3-10_2 of Table 17; SEQ ID NO: 2).

The region of the oligonucleotide encoding the DH segment is followed, in this example, by a nine base region (GTGGGCGGA: bold; nucleotides 49-57), encoding the N2 segment (in this case VGG; Table 24).

The remainder of this exemplary oligonucleotide represents the portion of the JH segment that is synthesized during the split pool synthesis (TATTATTACTACTATGGTATGGACGTATGGGGGCAAGGGACC; SEQ ID NO: 401; nucleotides 58-99; underlined), encoding the sequence YYYYYGMDVWGQGT (Table 20; residues 1-14 of SEQ ID NO: 258). The balance of the IGHJ segment is added during the subsequent PCR amplification described below.

After the split pool-synthesized oligonucleotides were cleaved from the resin and deprotected, they served as a template for a PCR reaction which added an additional randomly chosen 10 nucleotides (e.g., GACGAGCTTC; SEQ ID NO: 402) to the 5′ end and the rest of the IGHJ segment plus the BbsI restriction site to the 3′ end. These additions facilitate the cloning of the [DH]-[N2]-[JH] oligonucleotides into the 424 vectors. As described above (Example 9), the last round of the split pool synthesis involves 280 columns: 10 columns for each of the oligonucleotides encoding one of 28 H3-JH segments. The oligonucleotide products obtained from these 280 columns are pooled according to the identity of their H3-JH segments, for a total of 28 pools. Each of these 28 pools is then amplified in five separate PCR reactions, using five forward primers that each encode a different two base overlap (preceding the DH segment: see above) and one reverse primer that has a sequence corresponding to the familial origin of the H3-JH segment being amplified. The sequences of these 11 primers are provided below:

Forward primers

AC

(SEQ ID NO: 403)

GACGAGCTTCAATGCACAGTTGCAATGAC

AG

(SEQ ID NO: 404)

GACGAGCTTCAATGCACAGTTGCAATGAG

CT

(SEQ ID NO: 405)

GACGAGCTTCAATGCACAGTTGCAATGCT

GA

(SEQ ID NO: 406)

GACGAGCTTCAATGCACAGTTGCAATGGA

TG

(SEQ ID NO: 407)

GACGAGCTTCAATGCACAGTTGCAATGTG

Reverse Primers

JH1

(SEQ ID NO: 408)

TGCATCAGTGCGACTAACGGAAGACTCTGAGGAGACGGTGACCAAGGTGC

CCTGGCCCCA

JH2

(SEQ ID NO: 409)

TGCATCAGTGCGACTAACGGAAGACTCTGAGGAGACAGTGACCAAGGTGC

CACGGCCCCA

JH3

(SEQ ID NO: 410)

TGCATCAGTGCGACTAACGGAAGACTCTGAAGAGACGGTGACCATTGTCC

CTTGGCCCCA

JH4

(SEQ ID NO: 411)

TGCATCAGTGCGACTAACGGAAGACTCTGAGGAGACGGTGACCAAGGTTC

CTTGGCCCCA

JH5

(SEQ ID NO: 412)

TGCATCAGTGCGACTAACGGAAGACTCTGAGGAGACGGTGACCAAGGTTC

CCTGGCCCCA

JH6

(SEQ ID NO: 413)

TGCATCAGTGCGACTAACGGAAGACTCTGAGGAGACGGTGACCGTGGTCC

CTTGCCCCCA

Amplifications were performed using Taq polymerase, under standard conditions. The oligonucleotides were amplified for eight cycles, to maintain the representation of sequences of different lengths. Melting of the strands was performed at 95° C. for 30 seconds, with annealing at 58° C. and a 15 second extension time at 72° C.

Using the exemplary split-pool derived oligonucleotide enumerated above as an example, the PCR amplification was performed using the TG primer and the JH6 primer, where the annealing portion of the primers has been underlined:

TG

(SEQ ID NO: 407)

GACGAGCTTCAATGCACAGTTGCAATGTG

JH6

(SEQ ID NO: 413)

TGCATCAGTGCGACTAACGGAAGACTCTGAGGAGACGGTGACCGT

GGTCCCTTGCCCCCA

The portion of the TG primer that is 5′ to the annealing portion includes the random 10 base pairs described above. The portion of the JH6 primer that is 5′ to the annealing portion includes the balance of the JH6 segment and the BbsI restriction site. The following PCR product (SEQ ID NO: 414) is formed in the reaction (added sequences underlined):

GACGAGCTTCATGCACAGTTGCAATGTGTATTACTATGGATCTGGTTCTT

ACTATAATGTGGGCGGATATTATTACTACTATGGTATGGACGTATGGGGG

CAAGGGACCACGGTCACCGTCTCCTCAGAGTCTTCCGTTAGTCGCACTGA

TGCAG

The PCR products from each reaction were then combined into five pools, based on the forward primer that was used in the reaction, creating sets of sequences yielding the same two-base overhang after BsrDI digestion. The five pools of PCR products were then digested with BsRDI and BbsI (100 sg of PCR product; 1 mL reaction volume: 200 U BbsI; 100 U BsrDI; 2 h; 37° C.; NEB Buffer 2). The digested oligonucleotides were extracted twice with phenol/chloroform, ethanol precipitated, air dried briefly and resolubilized in 300 μL of TE buffer by sitting overnight at 4° C.

Each of the 424 vectors described in the preceding sections was then digested with BsrDI and BbsI, each vector yielding a two base overhang that was complimentary to one of those contained in one of the five pools of PCR products. Thus, one of the five pools of restriction digested PCR products are ligated into each of the 424 vectors, depending on their compatible ends, for a total of 424 ligations.

Example 10.3: PCR Amplification of the CDRH3 from the 424 Vectors

This example describes the PCR amplification of the CDRH3 regions from the 424 vectors described above. As set forth above, the 424 vectors represent two sets: one for the VH3-23 family, with FRM3 ending in CAK (212 vectors) and one for the other 16 chassis, with FRM3 ending in CAR (212 vectors). The CDRH3s in the VH3-23-based vectors were amplified using a reverse primer (EK137, see Table 41) recognizing a portion of the CH1 region of the plasmid and the VH3-23-specific primer EK135 (see Table 41). Amplification of the CDRH3s from the 212 vectors with FRM3 ending in CAR was performed using the same reverse primer (EK137) and each of five FRM3-specific primers shown in Table 41 (EK139, EK140, EK141, EK143, and EK144). Therefore, 212 VH3-23 amplifications and 212×5 FRM3 PCR reactions were performed, for a total of 1,272 reactions. An additional PCR reaction amplified the CDRH3 from the 212 VH3-23-based vectors, using the EK 133 forward primer, to allow the amplicons to be cloned into the other 5 VH3 family member chassis while making the last three amino acids of these chassis CAK instead of the original CAR (VH3-23*). The primers used in each reaction are shown in Table 41.

TABLE 41

Primers Used for Amplification of CDRH3 Sequences

Primer
Compatible

SEQ

No.
Chassis
Primer Sequence
NO

EK135
VH3-23
CACATACTACGCAGACTCCGTG
415

EK133
VH3-48;
CAAATGAACAGCCTGAGAGCCG
416

VH3-7;
AGGACACGGCGGTGTACTACTG

VH3-15;

VH3-30;

VH3-33;

VH3-23*

EK139
VH4-B;
AAGCTGAGTTCTGTGACCGCCG
417

VH4-31;
CAGACACGGCGGTGTACTACTG

VH4-34;

VH4-39;

VH4-59;

VH4-61

EK140
VH1-46;
GAGCTGAGCAGCCTGAGATCTG
418

VH1-69
AGGACACGGCGGTGTACTACTG

EK141
VH1-2
GAGCTGAGCAGGCTGAGATCTG
419

ACGACACGGCGGTGTACTACTG

EK143
VH5-51
CAGTGGAGCAGCCTGAAGGCCT
420

CGGACACGGCGATGTACTACTG

EK144
VH1-18
GAGCTGAGGAGCCTGAGATCTG
421

ACGACACGGCGGTGTACTACTG

EK137
CH1 Rev.
GTAGGACAGCCGGGAAGG
422

Primer

Example 10.4: Homologous Recombination of PCR-Amplified CDRH3 Regions into Heavy Chain Chassis

After amplification, reaction products were pooled according to the respective VH chassis that they would ultimately be cloned into. Table 42 enumerates these pools, with the PCR primers used to obtain the CDRH3 sequences in each pool provided in the last two columns.

TABLE 42

PCR Primers Used to Amplify CDRH3 Regions from 424 Vectors

Pool #
HC Chassis

(Arbitrary)
Target
5′ Primer
3′ Primer

1
1-46
EK140
EK137

1-69
EK140
EK137

2
1-2
EK141
EK137

3
1-18
EK144
EK137

4
4-B
EK139
EK137

4-31
EK139
EK137

4-34²
EK139
EK137

4-39
EK139
EK137

4-59
EK139
EK137

4-61
EK139
EK137

5
5-51
EK143
EK137

6
3-15¹
EK133
EK137

3-7
EK133
EK137

3-33
EK133
EK137

3-33
EK133
EK137

3-48
EK133
EK137

7
3-23
EMK135
EK137

8
3-23*
EK133
EK137

*Allowed the amplicons to be cloned into the other 5 VH3 family member chassis (i.e., other than VH3-23), while making the last three amino acids of these chassis CAK instead of the original CAR.

¹As described in Table 5, the original KT sequence in VH3-15 was mutated to RA, and the original TT to AR.

²As described in Table 5, the potential site for N-linked glycosylation was removed from CDRH2 of this chassis.

After pooling of the amplified CDRH3 regions, according to the process outlined above, the heavy chain chassis expression vectors were pooled according to their origin and cut, to create a “gap” for homologous recombination with the amplified CDRH3s. FIG. 6 shows a schematic structure of a heavy chain vector, prior to recombination with a CDRH3. In this exemplary embodiment of the invention, there were a total of 152 vectors encoding heavy chain chassis and IgG1 constant regions, but no CDRH3. These 152 vectors represent 17 individual variable heavy chain gene families (Table 5; Examples 1 and 2). Fifteen of the families were represented by the heavy chain chassis sequences described in Table 5 and the CDRH1/H2 variants described in Table 8 (i.e., 150 vectors). VH 3-30 differs from VH3-33 by a single amino acid; thus VH3-30 was included in the VH3-33 pool of variants. The 4-34 VH family member was kept separate from all others and, in this exemplary embodiment, no variants of it were included in the library. Thus, a total of 16 pools, representing 17 heavy chain chassis, were generated from the 152 vectors.

The vector pools were digested with the restriction enzyme SfiI, which cuts at two sites in the vector that are located between the end of the FRM3 of the variable domain and the start of the CH1 (amino acid: SEQ ID NO: 1522; coding strand: SEQ ID NO. 573; complementary strand: SEQ ID NO: 1523; “VTVSS” disclosed as SEQ ID NO: 1524; “DYAVYYCAR” disclosed as SEQ ID NO: 1527).

VH3-48

---------------------------------------------------------------------------------------

S V K G R F T I S R D N A K N S L Y L Q M N S L R A E

2801
CTCTGTGAAG GGCCGATTCA CCATCTCCAG AGACAATGCC AAGAACTCAC TGTATCTGCA AATGAACAGC CTGAGAGCTG

GAGACACTTC CCGGCTAAGT GGTAGAGGTC TCTGTTACGG TTCTTGAGTG ACATAGACGT TTACTTGTCG GACTCTCGAC

Constant DTAVYYCAR

-----------------------------

VH3-48 VTVSS common to all J

-- -----

SfiI SfiI

--------------- --------------

D T A V Y Y C A R

V T

2881
AGGACACGGC GGTGTACTAC TGCGCCAGAG GCCAATAGGG CCAACTATAA CAGGGGTACC CCGGCCAATA AGGCCGTCAC

TCCTGTGCCG CCACATGATG ACGCGGTCTC CGGTTATCCC GGTTGATATT GTCCCCATGG GGCCGGTTAT TCCGGCAGTG

VTVSS common to all J

-----------

hIgGlm17,1

---------------------------------------------------------------------------

NheI

------

V S S A S T K G P S V F P L A P S S K S T S G G T A

2961
CGTCTCCTCA GCTAGCACCA AGGGCCCATC GGTCTTCCCC CTGGCACCCT CCTCCAAGAG CACCTCTGGG GGCACAGCGG

GCAGAGGAGT CGATCGTGGT TCCCGGGTAG CCAGAAGGGG GACCGTGGGA GGAGGTTCTC GTGGAGACCC CCGTGTCGCC

The gapped vector pools were then mixed with the appropriate (i.e., compatible) pool of CDRH3 amplicons, generated as described above, at a 50:1 insert to vector ratio. The mixture was then transformed into electrocompetent yeast (S. cerevisiae), which already contained plasmids or integrated genes comprising a VK light chain library (described below). The degree of library diversity was determined by plating a dilution of the electroporated cells on a selectable agar plate. In this exemplified embodiment of the invention, the agar plate lacked tryptophan and the yeast lacked the ability to endogenously synthesize tryptophan. This deficiency was remedied by the inclusion of the TRP marker on the heavy chain chassis plasmid, so that any yeast receiving the plasmid and recombining it with a CDRH3 insert would grow. The electroporated cells were then outgrown approximately 100-fold, in liquid media lacking tryptophan. Aliquots of the library were frozen in 50% glycerol and stored at −80° C. Each transformant obtained at this stage represents a clone that can express a full IgG molecule. A schematic diagram of a CDRH3 integrated into a heavy chain vector and the accompanying sequence are provided in FIG. 7.

A heavy chain library pool was then produced, based on the approximate representation of the heavy chain family members as depicted in Table 43.

TABLE 43

Occurrence of Heavy Chain Chassis in Data Sets Used to Design Library,

Expected (Designed) Library, and Actual (Observed) Library

Relative

Occurrence in

Chassis
Data Sets (1)
Expected (2)
Observed (3)

VH1-2
5.1
6.0
6.4

VH1-18
3.4
3.7
3.8

VH1-46
3.4
5.2
4.7

VH1-69
8.0
8.0
10.7

VH3-7
3.6
6.1
4.5

VH3-15
1.9
6.9
3.6

VH3-23
11.0
13.2
17.1

VH3-33/30
13.1
12.5
6.6

VH3-48
2.9
6.3
7.5

VH4-31
3.4
2.5
4.3

VH4-34
17.2
7.0
4.7

VH4-39
8.7
3.9
3.0

VH4-59
7.0
7.8
9.2

VH4-61
3.2
1.9
2.4

VH4-B
1.0
1.4
0.8

VH5-51
7.2
7.7
10.5

(1) As detailed in Example 1, these 17 sequences account for about 76% of the entire sample of human VH sequences used to represent the human repertoire.

(2) Based on pooling of sub-libraries of each chassis type.

(3) Usage in 531 sequences from library; cf. FIG. 20.

Example 10.5: K94R Mutation in VH3-23 and R94K Mutation in VH3-33, VH3-30, VH3-7, and VH3-48

This example describes the mutation of position 94 in VH3-23, VH3-33, VH3-30, VH3-7, and VH3-48. In VH3-23, the amino acid at this position was mutated from K to R. In VH3-33, VH3-30, VH3-7, and VH3-48, this amino acid was mutated from R to K. In VH3-32, this position was mutated from K to R. The purpose of making these mutations was to enhance the diversity of CDRH3 presentation in the library. For example, in naturally occurring VH3-23 sequences, about 90% have K at position 94, while about 10% have position R. By making these changes the diversity of the CDRH3 presentation is increased, as is the overall diversity of the library.

Amplification was performed using the 424 vectors as a template. For the K94R mutation, the vectors containing the sequence DTAVYYCAK (VH3-23; SEQ ID NO: 578) were amplified with a PCR primer that changed the K to a R and added 5′ tail for homologous recombination with the VH3-48, VH3-33, VH-30, and VH3-7. The “T” base in 3-48 does not change the amino acid encoded and thus the same primer with a T::C mismatch still allows homologous recombination into the 3-48 chassis.

Furthermore, the amplification products from the 424 vectors (produced as described above) containing the DTAVYYCAR (SEQ ID NO: 579) sequence can be homologously recombined into the VH3-23 (CAR) vector, changing R to K in this framework and thus further increasing the diversity of CDRH3 presentation in this chassis.

VH3-48 (240)

SEQ ID NO: (574)

240

TCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCGGTGTACTACTGCG

294

CCAGA

VH3-33/30 (240)

SEQ ID NO: (575)

TCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCGGTGTACTACTGCG

CCAGA

VH3-7 (240)

SEQ ID NO: (576)

TCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCGGTGTACTACTGCG

CCAGA

VH3-23 (240)

SEQ ID NO: (577)

TCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCGGTGTACTACTGCGC

CAAG

Example 11: VK Library Construction

This example describes the construction of a VK library of the invention. The exemplary VK library described herein corresponds to the VKCDR3 library of about 10⁵complexity, described in Example 6.2. As described in Example 6, and throughout the application, other VK libraries are within the scope of the invention, as are Vλ libraries.

Ten VK chassis were synthesized (Table 11), which did not contain VKCDR3, but instead had two SfiI restriction sites in the place of VKCDR3, as for the heavy chain vectors. The kappa constant region followed the SfiI restriction sites. FIG. 8 shows a schematic structure of a light chain vector, prior to recombination with a CDRL3.

Ten VKCDR oligonucleotide libraries were then synthesized, as described in Example 6.2, using degenerate oligonucleotides (Table 33). The oligonucleotides were then PCR amplified, as separate pools, to make them double stranded and to add additional nucleotides required for efficient homologous recombination with the gapped (by SfiI) vector containing the VK chassis and constant region sequences. The VKCDR3 pools in this embodiment of the invention represented lengths 8, 9, and 10 amino acids, which were mixed post-PCR at a ratio 1:8:1. The pools were then cloned into the respective SfiI gapped VK chassis via homologous recombination, as described for the CDRH3 regions, set forth above. A schematic diagram of a CDRL3 integrated into a light chain vector and the accompanying sequence are provided in FIG. 9.

A kappa light chain library pool was then produced, based on the approximate representation of the VK family members found in the circulating pool of B cells. The kappa variable regions used and the relative frequency in the final library pool are shown in Table 44.

TABLE 44

Occurrence of VK Chassis in Data Sets Used to Design Library,

Expected (Designed) Library, and Actual (Observed) Library

Relative

Occurrence in

Chassis
Data Sets (1)
Expected (2)
Observed (3)

VK1-5
8.6
7.1
5.8

VK1-12
4.0
3.6
3.5

VK1-27
3.3
3.6
8.1

VK1-33
5.3
7.1
3.5

VK1-39
18.5
21.4
17.4

VK2-28
7.7
7.1
5.8

VK3-11
10.9
10.7
20.9

VK3-15
6.6
7.1
4.7

VK3-20
24.5
21.4
18.6

VK4-1
10.4
10.7
11.6

(1) As indicated in Example 3, these 10 chassis account for about 80% of the occurrences in the entire data set of VK sequences examined.

(2) Rounded off ratios from the data in column 2, then normalized for actual experimental set up. The relative rounded ratios are 6 for VK1-39 and VK3-20, 3 for VK3-11 and VK4-1, 2 for VK-15, VK1-33, VK2-28 and VK3-15, and 1 for VK1-12 and VK1-27.

(3) Chassis usage in set of 86 sequences obtained from library; see also FIG. 22.

Example 12: Characterization of Exemplary Libraries

This example shows the characteristics of exemplary libraries of the invention, constructed according to the methods described herein.

Example 121. Characterization of the Heavy Chains

To characterize the product of the split pool synthesis, ten of the 424 vectors containing the [Tail]-[N]-[DH]-[N2]-[H3-JH] product were selected at random and transformed into E. coli. The split pool product had a theoretical diversity of about 1.1×10⁶(i.e., 278×141×28). Ninety-six colonies were selected from the transformation and forward and reverse sequences were generated for each clone. Of the 96 sequencing reactions, 90 yielded sequences from which the CDRH3 region could be identified, and about 70% of these sequences matched a designed sequence in the library. The length distribution of the sequenced CDRH3 segments from the ten vectors, as compared to the theoretical distribution (based on design), is provided in FIG. 10. The length distribution of the individual DH, N2, and H3-JH segments obtained from the ten vectors are shown in FIGS. 11-13.

Once the length distribution of the CDRH3 components of the library that were contained in the vector matched design were verified, the CDRH3 domains and heavy chain family representation in yeast that had been transformed according to the process described in Example 10.4 were characterized. Over 500 single-pass sequences were obtained. Of these, 531 yielded enough sequence information to identify, the heavy chain chassis and 291 yielded enough sequence information to characterize the CDRH3. These CDRH3 domains have been integrated with the heavy chain chassis and constant region, according to the homologous recombination processes described herein. The length distribution of the CDRH3 domains from 291 sequences, compared to the theoretical length distribution, is shown in FIG. 14. The mean theoretical length was 14.4±4 amino acids, while the average observed length was 14.3±3 amino acids. The observed length of each portion of the CDRH3, as compared to theoretical, is presented in FIGS. 15-18. FIG. 19 depicts the familial origin of the JH segments identified in the 291 sequences, and FIG. 20 shows the representation of 16 of the chassis of the library. The VH3-15 chassis was not represented amongst these sequences. This was corrected later by introducing yeast transformants containing the VH3-15 chassis, with CDRH3 diversity, into the library at the desired composition.

Example 12.2. Characterization of the Light Chains

The length distribution of the CDRL3 components, from the VKCDR3 library described in Example 6.2, were determined after yeast transformation via the methods described in Example 10.4. A comparison of the CDRL3 length from 86 sequences of the library to the human sequences and designed sequences is provided in FIG. 21. FIG. 22 shows the representation of the light chain chassis from amongst the 86 sequences selected from the library. About 91% of the CDRL3 sequences were exact matches to the design, and about 9% differed by a single amino acid.

Example 13: Characterization of the Composition of the Designed CDRH3 Libraries

This example presents data on the composition of the CDRH3 domains of exemplary libraries, and a comparison to other libraries of the art. More specifically, this example presents an analysis of the occurrence of the 400 possible amino acid pairs (20 amino acids×20 amino acids) occurring in the CDRH3 domains of the libraries. The prevalence of these pairs is computed by examination of the nearest neighbor (i-i+1; designated IP1), next nearest neighbor (i-i+2; designated IP2), and next-next nearest neighbor (i-i+3; designated IP3) of the i residue in CDRH3. Libraries previously known in the art (e.g., Knappik et al., J. Mol. Biol., 2000, 296: 57; Sidhu et al., J. Mol. Biol., 2004, 338: 299; and Lee et al., J. Mol. Biol. 2004, 340: 1073, each of which is incorporated by reference in its entirety) have only considered the occurrence of the 20 amino acids at individual positions within CDRH3, while maintaining the same composition across the center of CDRH3, and not the pair-wise occurrences considered herein. In fact, according to Sidhu et al. (J. Mol. Biol., 2004, 338: 299, incorporated by reference in its entirety), “[i]n CDR-H3, there was some bias towards certain residue types, but all 20 natural amino acid residues occurred to a significant extent, and there was very little position-specific bias within the central portion of the loop”. Thus, the present invention represents the first recognition that, surprisingly, a position-specific bias does exist within the central portion of the CDRH3 loop, when the occurrences of amino acid pairs recited above are considered. This example shows that the libraries described herein more faithfully reproduce the occurrence of these pairs as found in human sequences, in comparison to other libraries of the art. The composition of the libraries described herein may thus be considered more “human” than other libraries of the art.

To examine the pair-wise composition of CDRH3 domains, a portion of CDRH3 beginning at position 95 was chosen. For the purposes of comparison with data presented in Knappik et al. and Lee et al., the last five residues in each of the analyzed CDRH3s were ignored. Thus, for the purposes of this analysis, both members of the pair i-i+X (X=1 to 3) must fall within the region starting at position 95 and ending at (but including) the sixth residue from the C-terminus of the CDRH3. The analyzed portion is termed the “central loop” (see Definitions).

To estimate pair distributions in representative libraries of the invention, a sampling approach was used. A number of sequences were generated by choosing randomly and, in turn, one of the 424 tail plus N1 segments, one of the 278 DH segments, one of the 141 N2 segments and one of the 28 JH segments (the latter truncated to include only the 95 to 102 Kabat CDRH3). The process was repeated 10,000 times to generate a sample of 10,000 sequences. By choosing a different seed for the random number generation, an independent sample of another 10,000 sequences was also generated and the results for pair distributions were observed to be nearly the same. For the calculations presented herein, a third and much larger sample of 50,000 sequences was used. A similar approach was used for the alternative library embodiment (N1-141), whereby the first segment was selected from 1068 tail+N1 segments (resulting after eliminating redundant sequences from 2 times 4 times 141 or 1128 possible combinations).

The pair-wise composition of Knappik et al. was determined based on the percent occurrences presented in FIG. 7a of Knappik et al. (p.71). The relevant data are reproduced below, in Table 45.

TABLE 45

Composition of CDRH3 positions 95-100s (corresponding to positions

95-99B of the libraries of the current invention) of CDRH3

of Knappik et al. (from FIG. 7a of Knappik et al.)

Amino Acid
Planned (%)
Found (%)

A
4.1
3.0

C
1.0
1.0

D
4.1
4.2

E
4.1
2.3

F
4.1
4.9

G
15.0
10.8

H
4.1
4.6

I
4.1
4.5

K
4.1
2.9

L
4.1
6.6

M
4.1
3.3

N
4.1
4.5

P
4.1
4.8

Q
4.1
2.9

R
4.1
4.1

S
4.1
5.6

T
4.1
4.5

V
4.1
3.7

W
4.1
2.0

Y
15.0
19.8

The pair-wise composition of Lee et al. was determined based on the libraries depicted in Table 5 of Lee et al., where the positions corresponding to those CDRH3 regions analyzed from the current invention and from Knappik el al. are composed of an “XYZ” codon in Lee et. The XYZ codon of Lee et al. is a degenerate codon with the following base compositions:

- position 1 (X): 19% A, 17% C, 38% G. and 26% T.
- position 2 (Y): 34% A, 18% C, 31% G. and 17% T. and
- position 3 (Z): 24% G and 76% T.
  
  When the approximately 2%, of codons encoding stop codons are excluded (these do not occur in functionally expressed human CDRH3 sequences), and the percentages are re-normalized to 100%, the following amino acid representation can be deduced from the composition of the XYZ codon of Lee et al. (Table 46).

TABLE 46

Composition of CDRH3 of Lee et al., Based on

the Composition of the Degenerate XYZ Codon.

Type
Percent

A
6.99%

C
6.26%

D
10.03%

E
3.17%

F
3.43%

G
12.04%

H
4.49%

I
2.51%

K
1.58%

L
4.04%

M
0.79%

N
5.02%

P
3.13%

Q
1.42%

R
6.83%

S
9.35%

T
3.49%

V
6.60%

W
1.98%

Y
6.86%

The occurrences of each of the 400 amino acid pairs, in each of the IP1, IP2, and IP3 configurations, can be computed for Knappik et al. and Lee et al. by multiplying together the individual amino acid compositions. For example, for Knappik et al., the occurrence of YS pairs in the library is calculated by multiplying 15% by 4.1%, to yield 6.1%; note that the occurrence of SY pairs would be the same. Similarly, for the XYZ codon-based libraries of Lee et al., the occurrence of YS pairs would be 6.86% (Y) multiplied by 9.35% (S), to give 6.4%; the same, again, for SY.

For the human CDRH3 sequences, the calculation is performed by ignoring the last five amino acids in the Kabat definition. By ignoring the C-terminal 5 amino acids of the human CDRH3, these sequences may be compared to those of Lee et al., based on the XYZ codons. While Lee et al. also present libraries with “NNK” and “NNS” codons, the pair-wise compositions of these libraries are even further away from human CDRH3 pair-wise composition. The XYZ codon was designed by Lee et al. to replicate, to some extent, the individual amino acid type biases observed in CDRH3.

An identical approach was used for the libraries of the invention, after using the methods described above to produce sample sequences. While it is possible to perform these calculations with all sequences in the library, independent random samples of 10,000 to 20,000 members gave indistinguishable results. The numbers reported herein were thus generated from samples of 50,000 members.

Three tables were generated for IP1, IP2 and IP3, respectively (Tables 47, 48, and 49). Out of the 400 pairs, a selection from amongst the 20 most frequently occurring is included in the tables. The sample of about 1,000 human sequences (Lee el al., 2006) is denoted as “Preimmune,” a sample of about 2,500 sequences (Jackson et al., 2007) is denoted as “Humabs,” and the more affinity matured subset of the latter, which excludes all of the Preimmune set, is denoted as “Matured.” Synthetic libraries in the art are denoted as HuCAL (Knappik, et al., 2000) and XYZ (Lee et al., e 2004). Two representative libraries of the invention are included: LUA-59 includes 59 N1 segments, 278 DH segments, 141 N2 segments, and 28 H3-JH segments (see Examples, above). LUA-141 includes 141 N1 segments, 278 DH segments, 141 N2 segments, and 28 H3-JH segments (see Examples, above). Redundancies created by combination of the N1 and tail sequences were removed from the dataset in each respective library. In certain embodiments, the invention may be defined based on the percent occurrence of any of the 400 amino acid pairs, particularly those in Tables 47-49. In certain embodiments, the invention may be defined based on at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more of these pairs. In certain embodiments of the invention, the percent occurrence of certain pairs of amino acids may fall within ranges indicated by “LUA-” (lower boundary) and “LUA+” (higher boundary), in the following tables. In some embodiments of the invention, the lower boundary for the percent occurrence of any amino acid pairs may be about 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 3.75, 4, 4.25, 4.5, 4.75, and 5. In some embodiments of the invention, the higher boundary for the percent occurrence of any amino acid pairs may be about 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 3.75, 4, 4.25, 4.5, 4.75, 5, 5.25, 5.5, 5.75, 6, 6.25, 6.5, 6.75, 7, 7.25, 7.5, 7.75, and 8. According to the present invention, any of the lower boundaries recited may be combined with any of the higher boundaries recited, to establish ranges, and vice-versa.

TABLE 47

Percent Occurrence of i − i + 1 (IP1) Amino Acid Pairs in Human Sequences,

Exemplary Libraries of the Invention, and the Libraries of Knappik et al. and Lee et al.

Pairs
Preimmune
Humabs
Matured
LUA− 59
LUA− 141
HuCAL
XYZ
LUA−
LUA+
Range
HuCAL
XYZ

YY

5.87

4.44

3.27

5.83

5.93

2.25

0.47

2.50

6.50

4.00

0

0

SG

3.54

3.41

3.26

3.90

3.72

0.61

1.13

2.50

4.50

2.00

0

0

SS

3.35

2.65

2.26

2.82

3.08

0.16

0.88

2.00

4.00

2.00

0

0

GS

2.59

2.37

2.20

3.82

3.52

0.61

1.13

1.50

4.00

2.50

0

0

GY

2.55

2.34

2.12

3.15

2.56

2.25

0.83

2.00

3.50

1.50

1

0

GG

2.19

2.28

2.41

6.78

3.51

2.25

1.45

2.00

7.00

5.00

1

0

YS

1.45

1.30

1.23

1.40

1.52

0.61

0.64

0.75

2.00

1.25

0

0

YG

1.35

1.21

1.10

1.64

1.69

2.25

0.83

0.75

2.00

1.25

0

1

SY

1.31

1.07

0.90

1.65

1.77

0.61

0.64

0.75

2.00

1.25

0

0

YD
1.67
1.40
1.17
0.88
0.90
0.61
0.69
0.75
2.25
1.50
0
0

DS
1.53
1.31
1.16
1.20
1.46
0.16
0.94
0.75
2.00
1.25
0
1

DY
1.40
1.23
1.11
0.34
0.48
0.61
0.69
0.25
2.00
1.75
1
1

VV
1.37
0.94
0.64
2.30
2.30
0.16
0.44
0.50
2.50
2.00
0
0

GD
1.20
1.21
1.25
0.49
0.44
0.61
1.21
0.25
1.75
1.50
1
1

AA
1.16
0.93
0.75
1.27
1.46
0.16
0.49
0.60
1.50
0.90
0
0

RG
1.08
1.26
1.38
1.69
1.38
0.61
0.82
1.00
2.00
1.00
0
0

VA
0.91
0.66
0.46
0.36
0.35
0.16
0.46
0.25
1.00
0.75
0
1

GV
0.84
0.89
0.95
2.87
2.16
0.61
0.79
0.80
3.00
2.20
0
0

CS
0.82
0.55
0.38
0.79
0.80
0.04
0.59
0.50
1.00
0.50
0
1

GR
0.74
0.90
1.00
1.01
0.79
0.61
0.82
0.70
1.25
0.55
0
1

The pairs in bold comprise about 19% to about 24% of occurrences (among the possible 400 pairs) for the Preimmune (Lee, et al., 2006), Humabs (Jackson, et al., 2007) and matured (Jackson minus Lee) sets. They account for about 27% to about 31% of the occurrences in the LUA libraries, but only about 12% in the HuCAL library and about 8% in the “XYZ” library. This is a reflection of the fact that pair-wise biases do exist in the human and LUA libraries, but not in the others. The last 2 columns indicate whether the corresponding pair-wise compositions fall within the LUA− and LUA+ boundaries: 0 if outside, 1 if within.

TABLE 48

Percent Occurrence of i − i + 2 (IP2) Amino Acid Pairs in Human Sequences,

Exemplary Libraries of the Invention, and the Libraries of Knappik et al. and Lee et al.

Pairs
Preimmune
Humabs
Matured
LUA− 59
LUA− 141
HuCAL
XYZ
LUA−
LUA+
Range
HuCAL
XYZ

YY

3.57

2.59

1.78

2.99

3.11

2.25

0.47

2.5

4.5

2

0

0

GY

3.34

2.91

2.56

4.96

3.78

2.25

0.83

2.5

5.5

3

0

0

SY

2.94

2.41

2.01

3.03

3.42

0.61

0.64

2

4

2

0

0

YS

2.88

2.34

1.95

3.24

3.32

0.61

0.64

1.75

3.75

2

0

0

SG

2.60

2.29

2.05

2.84

2.96

0.61

1.13

2

3.5

1.5

0

0

SS

2.27

2.01

1.84

2.30

2.50

0.16

0.88

1.5

3

1.5

0

0

GS

2.16

2.12

2.10

2.96

2.32

0.61

1.13

1.5

3

1.5

0

0

GG

1.92

2.25

2.44

6.23

3.68

2.25

1.45

1.5

7

5.5

1

0

YG

1.17

1.14

1.15

1.39

1.47

2.25

0.83

1

2

1

0

0

DS
2.03
1.67
1.40
1.21
1.48
0.16
0.94
1
2.5
1.5
0
0

YD
1.71
1.39
1.11
0.89
0.92
0.61
0.69
0.75
1.75
1
0
0

VG
1.35
1.17
1.01
1.75
1.54
0.61
0.79
1
2
1
0
0

DY
1.06
1.02
0.99
0.23
0.40
0.61
0.69
0.2
1.2
1
1
1

WG
1.06
0.76
0.53
0.85
0.91
0.61
0.24
0.75
1.25
0.5
0
0

RY
0.98
1.00
0.96
0.70
0.91
0.61
0.47
0.6
1
0.4
1
0

GC
0.97
0.75
0.64
0.94
0.81
0.15
0.75
0.5
1
0.5
0
1

DG
0.95
1.05
1.08
1.78
1.05
0.61
1.21
0.75
2
1.25
0
1

GD
0.94
0.88
0.86
0.47
0.36
0.61
1.21
0.25
1
0.75
1
0

VV
0.94
0.59
0.35
0.95
0.90
0.16
0.44
0.5
1
0.5
0
0

AA
0.90
0.73
0.59
0.72
0.74
0.16
0.49
0.5
1
0.5
0
0

The pairs in bold comprise about 18% to about 23% of occurrences (among the possible 400 pairs) for the Preimmune (Lee, et al., 2006), Humabs (Jackson, et al., 2007) and matured (Jackson minus Lee) sets. They account for about 27% to about 30% of the occurrences in the LUA libraries, but only about 12% in the HuCAL library and about 8% in the “XYZ” library. Because of the nature of the construction of the central loops in the HuCAL and XYZ libraries, these numbers are the same for the IP1, IP2, and IP3 pairs. The last 2 columns indicate whether the corresponding pair-wise compositions fall within the LUA− and LUA+ boundaries: 0 if outside, 1 if within.

TABLE 49

Percent Occurrence of i − i + 3 (IP3) Amino Acid Pairs in Human Sequences,

Exemplary Libraries of the Invention, and the Libraries of Knappik et al. and Lee et al.

Pairs
Preimmune
Humabs
Matured
LUA− 59
LUA− 141
HuCAL
XYZ
LUA−
LUA+
Range
HuCAL
XYZ

GY

3.55

2.85

2.32

5.80

4.42

2.25

0.83

2.5

6.5

4

0

0

SY

3.38

3.01

2.67

3.78

4.21

0.61

0.64

1

5

4

0

0

YS

3.18

2.56

2.05

3.20

3.33

0.61

0.64

2

4

2

0

0

SS

2.26

1.74

1.37

1.81

2.18

0.16

0.88

1

3

2

0

0

GS

2.23

2.13

2.00

4.60

3.33

0.61

1.13

2

5

3

0

0

YG

2.14

1.65

1.35

2.69

2.79

2.25

0.83

1.5

3

1.5

1

0

YY

1.86

1.48

1.12

1.18

1.27

2.25

0.47

0.75

2

1.25

0

0

GG

1.60

1.87

2.11

4.73

2.84

2.25

1.45

1.5

5

3.5

1

0

SG

0.90

1.04

1.12

0.93

1.25

0.61

1.13

0.75

1.5

0.75

0

1

DG
2.01
1.94
1.84
2.51
2.03
0.61
1.21
1.5
3
1.5
0
0

DS
1.48
1.31
1.22
0.41
0.55
0.16
0.94
0.25
1.5
1.25
0
1

VA
1.18
0.83
0.55
1.48
1.46
0.16
0.46
0.5
2
1.5
0
0

AG
1.13
1.09
1.03
0.97
1.04
0.61
0.84
0.9
2
1.1
0
0

TY
1.05
0.90
0.76
1.01
1.16
0.61
0.24
0.75
1.75
1
0
0

PY
1.02
0.88
0.79
1.23
0.86
0.61
0.21
0.75
1.75
1
0
0

RS
1.02
0.88
0.77
0.38
0.55
0.16
0.64
0.25
1.25
1
0
1

RY
1.02
1.12
1.14
0.68
0.88
0.61
0.47
0.65
1.25
0.6
0
0

LY
1.01
0.88
0.75
0.69
0.76
0.61
0.28
0.65
1.25
0.6
0
0

DY
0.93
0.84
0.77
0.72
0.95
0.61
0.69
0.7
1.3
0.6
0
0

GC
0.90
0.62
0.48
0.86
0.68
0.15
0.75
0.5
1
0.5
0
1

The pairs in bold make up about 16 to about 21% of the occurrences (among the possible 400 pairs) for the Preimmune (Lee, et al., 2006), Humabs (Jackson, et al., 2007) and matured (Jackson minus Lee) sets. They account for 26 to 29% of the occurrences in the LUA libraris, but only about 12% in the HuCAL library and about 8% for the “XYZ” library. Because of the nature of the construction of the central loops in the HuCAL and XYZ libraries, these numbers are the same for the IP1, IP2, and IP3 pairs. The last 2 columns indicate whether the corresponding pair-wise compositions fall within the LUA− and LUA+ boundaries: 0 if outside, 1 if within.

The analysis provided in this example demonstrates that the composition of the libraries of the present invention more closely mimics the composition of human sequences than other libraries known in the art. Synthetic libraries of the art do not intrinsically reproduce the composition of the “central loop” portion actual human CDRH3 sequences at the level of pair percentages. The libraries of the invention have a more complex pair-wise composition that closely reproduces that observed in actual human CDRH3 sequences. The exact degree of this reproduction versus a target set of actual human CDRH3 sequences may be optimized, for example, by varying the compositions of the segments used to design the CDRH3 libraries. Moreover, it is also possible to utilize these metrics to computationally design libraries that exactly mimic the pair-wise compositional prevalence found in human sequences.

Example 14: Information Content of Exemplary Libraries

One way to quantify the observation that certain libraries, or collection of sequences, may be intrinsically more complex or “less random” than others is to apply information theory (Shannon, Bell Sys. Tech. J., 1984, 27: 379; Martin et al., Bioinformatics, 2005, 21: 4116: Weiss et al., J. Theor. Biol., 2000, 206: 379, each incorporated by reference in its entirety). For example, a metric can be devised to quantify the fact that a position with a fixed amino acid represents less “randomness” than a position where all 20 amino acids may occur with equal probability. Intermediate situations should lead, in turn, to intermediate values of such a metric. According to information theory this metric can be represented by the formula:

I=Σ
_i=1
^N
f
_ilog 2f_i

Here, f_iis the normalized frequency of occurrence of i, which may be an amino acid type (in which case N would be equal to 20). When all f_iare zero except for one, the value of I is zero. In any other case the value of I would be smaller, i.e., negative, and the lowest value is achieved when all f_ivalues are the same and equal to N. For the amino acid case, N is 20, and the resulting value of I would be −4.322. Because I is defined with base 2 logarithms, the units of I are bits.

The I value for the HuCAL and XYZ libraries at the single position level may be derived from Tables 45 and 46, respectively, and are equal to −4.08 and −4.06. The corresponding single residue frequency occurrences in the non-limiting exemplary libraries of the invention and the sets of human sequences previously introduced, taken within the “central loop” as defined above, are provided in Table 50.

TABLE 50

Amino Acid Type Frequencies in Central Loop

Type
Preimmune
Humabs
Matured
LUA-59
LUA-141

A
5.46
5.51
5.39
5.71
6.06

C
1.88
1.46
1.22
1.33
1.34

D
7.70
7.51
7.38
4.76
5.23

E
2.40
2.90
3.28
3.99
4.68

F
2.29
2.60
2.81
1.76
2.17

G
14.86
15.42
15.82
24.90
18.85

H
1.46
1.79
2.01
0.20
0.67

I
3.71
3.26
2.99
3.99
4.34

K
1.06
1.27
1.44
0.21
0.67

L
4.48
4.84
5.16
4.12
4.54

M
1.18
1.03
0.93
0.94
1.03

N
1.81
2.43
2.84
0.41
0.65

P
4.12
4.10
4.13
5.68
3.96

Q
1.60
1.77
1.95
0.21
0.68

R
5.05
5.90
6.41
3.35
4.11

S
12.61
11.83
11.37
11.18
12.77

T
4.59
5.11
5.47
4.36
4.95

V
6.21
5.55
5.12
8.13
7.67

W
2.79
2.91
3.07
1.57
1.98

Y
14.74
12.81
11.24
13.20
13.63

The information content of these sets, computed by the formula given above, would then be −3.88, −3.93, −3.96, −3.56, and −3.75, for the preimmune, human, matured. LUA-59 and LUA-141 sets, respectively. As the frequencies deviate more from completely uniform (5% for each of the 20), then numbers tend to be larger, or less negative.

The identical approach can be used to analyze pair compositions, or frequencies, by calculating the sum in the formula above over the 20×20 or 400 values of the frequencies for each of the pairs. It can be shown that any pair frequency made up of the simple product of two singleton frequency sets is equal to the sum of the individual singleton I values. If the two singleton frequency sets are the same or approximately so, this means that I (independent pairs)=2*I (singles). It is thus possible to define a special case of the mutual information, MI, for a general set of pair frequencies as MI (pair)=I(pair)−2*I (singles) to measure the amount of information gained by the structure of the pair frequencies themselves (compare to the standard definitions in Martin et al., 2005, for example, after considering that I(X)=−H(X) in their notation). When there is no such structure, the value of MI is simply zero.

Values of MI computed from the pair distributions discussed above (over the entire set of 400 values) are given in Table 51.

TABLE 51

Mutual Information Within Central Loop of CDRH3

Library or Set
i − i + 1
i − i + 2
i − i + 3

Preimmune
0.226
0.192
0.163

Humabs
0.153
0.128
0.111

Matured
0.124
0.107
0.100

LUA-59
0.422
0.327
0.278

LUA-141
0.376
0.305
0.277

HuCAL
0.000
0.000
0.000

XYZ
0.000
0.000
0.000

It is notable that the MI values decrease within sets of human sequences as those sequences undergo further somatic mutation, a process that over many independent sequences is essentially random. It is also worth noting that the MI values decrease as the pairs being considered sit further and further apart, and this is the case for both sets of human sequences, and exemplary libraries of the invention. In both cases, as the two amino acids in a pair become further separated the odds of their straddling an actual segment (V, D, J plus V-D or D-J insertions) increase, and their pair frequencies become closer to a simple product of singleton frequencies.

Table 52 contains sequence information on certain immunoglobulin gene segments cited in the application. These sequences are non-limiting, and it is recognized that allelic variants exist and encompassed by, the present invention. Accordingly, the methods present herein can be utilized with mutants of these sequences.

TABLE 52

Sequence Information for Certain Immunoglobulin Gene Segments Cited Herein

SEQ

ID
Se-

NO:
quence
Peptide or Nucleotide Sequence
Observations

423
IGHV1-
QVQLVQSGAEVKKPGASVKVSCKASGYTFTSYAMHWVRQ

3
APGQRLEWMGWINAGNGNTKYSQKFQGRVTITRDTSAST

AYMELSSLRSEDTAVYYCAR

424
IGHV1-
QVQLVQSGAEVKKPGASVKVSCKASGYTFTSYDINWVRQ

8_v1
ATGQGLEWMGWMNPNSGNTGYAQKFQGRVTMTRNTSIS

TAYMELSSLRSEDTAVYYCAR

425
IGHV1-
QVQLVQSGAEVKKPGASVKVSCKASGYTFTSYDINWVRQ
N to D mutation avoids

8_v2
ATGQGLEWMGWMNPNSGNTGYAQKFQGRVTMTRDTSIS
NTS potential glyco-

TAYMELSSLRSEDTAVYYCAR
sylation site in the

original germline

sequence (v1 above).

XTS, where X is not N,

and NTZ, where Z is not

S or T are also options.

NPS is yet another op-

tion that is much less

likely to be N-linked

glycosylated.

426
IGHV1-
QVQLVQSGAEVKKPGASVKVSCKVSGYTLTELSMHWVRQ

24
APGKGLEWMGGFDPEDGETIYAQKFQGRVTMTEDTSTDT

AYMELSSLRSEDTAVYYCAT

427
IGHV1-
QMQLVQSGAEVKKTGSSVKVSCKASGYTFTYRYLHWVRQ

45
APGQALEWMGWITPFNGNTNYAQKFQDRVTITRDRSMST

AYMELSSLRSEDTAMYYCAR

428
IGHV1-
QMQLVQSGPEVKKPGTSVKVSCKASGFTFTSSAVQWVRQ

58
ARGQRLEWIGWIVVGSGNTNYAQKFQERVTITRDMSTSTA

YMELSSLRSEDTAVYYCAA

429
IGHV2-
QITLKESGPTLVKPTQTLTLTCTFSGFSLSTSGVGVGWIRQ

5
PPGKALEWLALIYWDDDKRYSPSLKSRLTITKDTSKNQVVL

TMTNMDPVDTATYYCAHR

430
IGHV2-
QVTLKESGPVLVKPTETLTLTCTVSGFSLSNARMGVSWIRQ

26
PPGKALEWLAHIFSNDEKSYSTSLKSRLTISKDTSKSQVVLT

MTNMDPVDTATYYCARI

431
IGHV2-
RVTLRESGPALVKPTQTLTLTCTFSGFSLSTSGMCVSWIRQ

70_v1
PPGKALEWLARIDWDDDKYYSTSLKTRLTISKDTSKNQVVL

TMTNMDPVDTATYYCARI

432
IGHV2-
RVTLRESGPALVKPTQTLTLTCTFSGFSLSTSGMGVSWIRQ
C to G mutation avoids

70_v2
PPGKALEWLARIDWDDDKYYSTSLKTRLTISKDTSKNQVVL
unpaired Cys in v1

TMTNMDPVDTATYYCARI
above. G was chosen by

analogy to other germ-

line sequences, but

other amino acid types,

R, S, T, as non-limiting

examples, are possible.

433
IGHV3-
EVQLVESGGGLVQPGRSLRLSCAASGFTFDDYAMHWVRQ

9
APGKGLEWVSGISWNSGSIGYADSVKGRFTISRDNAKNSL

YLQMNSLRAEDTALYYCAKD

434
IGHV3-
QVQLVESGGGLVKPGGSLRLSCAASGFTFSDYYMSWIRQ

11
APGKGLEWVSYISSSGSTIYYADSVKGRFTISRDNAKNSLY

LQMNSLRAEDTAVYYCAR

435
IGHV3-
EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYDMHWVRQ

13
ATGKGLEWVSAIGTAGDTYYPGSVKGRFTISRENAKNSLYL

QMNSLRAGDTAVYYCAR

436
IGHV3-
EVQLVESGGGVVRPGGSLRLSCAASGFTFDDYGMSWVR

20
QAPGKGLEWVSGINWNGGSTGYADSVKGRFTISRDNAKN

SLYLQMNSLRAEDTALYHCAR

437
IGHV3-
EVQLVESGGGLVKPGGSLRLSCAASGFTFSSYSMNWVRQ

21
APGKGLEWVSSISSSSSYIYYADSVKGRFTISRDNAKNSLY

LQMNSLRAEDTAVYYCAR

438
IGHV3-
EVQLVESGGVVVQPGGSLRLSCAASGFTFDDYTMHWVRQ

43
APGKGLEWVSLISWDGGSTYYADSVKGRFTISRDNSKNSL

YLQMNSLRTEDTALYYCAKD

439
IGHV3-
EVQLVESGGGLVQPGRSLRLSCTASGFTFGDYAMSWVRQ

49
APGKGLEWVGFIRSKAYGGTTEYAASVKGRFTISRDDSKSI

AYLQMNSLKTEDTAVYYCTR

440
IGHV3-
EVQLVESGGGLIQPGGSLRLSCAASGFTVSSNYMSWVRQ

53
APGKGLEWVSVIYSGGSTYYADSVKGRFTISRDNSKNTLYL

QMNSLRAEDTAVYYCAR

441
IGHV3-
EVQLVESGGGLVQPGGSLRLSCSASGFTFSSYAMHWVRQ

64
APGKGLEYVSAISSNGGSTYYADSVKGRFTISRDNSKNTLY

LQMSSLRAEDTAVYYCVK

442
IGHV3-
EVQLVESGGGLVQPGGSLRLSCAASGFTVSSNYMSWVRQ

66
APGKGLEWVSVIYSGGSTYYADSVKGRFTISRDNSKNTLYL

QMNSLRAEDTAVYYCAR

443
IGHV3-
EVQLVESGGGLVQPGGSLRLSCAASGFTFSDHYMDWVRQ

72
APGKGLEWVGRTRNKANSYTTEYAASVKGRFTISRDDSKN

SLYLQMNSLKTEDTAVYYCAR

444
IGHV3-
EVQLVESGGGLVQPGGSLKLSCAASGFTFSGSAMHWVRQ

73
ASGKGLEWVGRIRSKANSYATAYAASVKGRFTISRDDSKN

TAYLQMNSLKTEDTAVYYCTR

445
IGHV3-
EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYWMHWVR

74
QAPGKGLVWVSRINSDGSSTSYADSVKGRFTISRDNAKNT

LYLQMNSLRAEDTAVYYCAR

446
IGHV4-
QVQLQESGPGLVKPSGTLSLTCAVSGGSISSSNWWSWVR
Contains CDRH1 with size

4v1
QPPGKGLEWIGEIYHSGSTNYNPSLKSRVTISVDKSKNQFS
6 (Kabat definition);

LKLSSVTAADTAVYYCAR
canonical structure H1-

2. Sequence corresponds

to allele *02 of

IGHV4-4.

447
IGHV4-
QVQLQESGPGLVKPSETLSLTCTVSGGSISSYYWSWIRQP
Contains CDRH1 with size

4v2
AGKGLEWIGRIYTSGSTNYNPSLKSRVTMSVDTSKNQFSL
5 (Kabat definition);

KLSSVTAADTAVYYCAR
canonical structure H1-

1. Sequence corresponds

to allele *07 of

IGHV4-4

448
IGHV4-
QVQLQESGPGLVKPSDTLSLTCAVSGYSISSSNWWGWIR

28
QPPGKGLEWIGYIYYSGSTYYNPSLKSRVTMSVDTSKNQF

SLKLSSVTAVDTAVYYCAR

449
IGHV6-
QVQLQQSGPGLVKPSQTLSLTCAISGDSVSSNSAAWNWIR

1
QSPSRGLEWLGRTYYRSKWYNDYAVSVKSRITINPDTSKN

QFSLQLNSVTPEDTAVYYCAR

450
IGHV7-
QVQLVQSGSELKKPGASVKVSCKASGYTFTSYAMNWVRQ

4-1
APGQGLEWMGWINTNTGNPTYAQGFTGRFVFSLDTSVST

AYLQISSLKAEDTAVYYCAR

451
IGKV1-
AIQMTQSPSSLSASVGDRVTITCRASQGIRNDLGWYQQKP

06
GKAPKLLIYAASSLQSGVPSRFSGSGSGTDFTLTISSLQPE

DFATYYCLQDYNYP

452
IGKV1-
AIRMTQSPSSFSASTGDRVTITCRASQGISSYLAWYQQKP

08_v1
GKAPKLLIYAASTLQSGVPSRFSGSGSGTDFTLTISCLQSE

DFATYYCQQYYSYP

453
IGKV1-
AIRMTQSPSSFSASTGDRVTITCRASQGISSYLAWYQQKP
C to S mutation avoids

08_v2
GKAPKLLIYAASTLQSGVPSRFSGSGSGTDFTLTISSLQSE
unpaired Cys. in v1

DFATYYCQQYYSYP
above. S was chosen by

analogy to other germ-

line sequences, but

amino acid types, N,

R, S, as non-limiting

examples, are also

possible

454
IGKV1-
DIQLTQSPSFLSASVGDRVTITCRASQGISSYLAWYQQKPG

09
KAPKLLIYAASTLQSGVPSRFSGSGSGTEFTLTISSLQPEDF

ATYYCQQLNSYP

455
IGKV1-
AIQLTQSPSSLSASVGDRVTITCRASQGISSALAWYQQKPG

13
KAPKLLIYDASSLESGVPSRFSGSGSGTDFTLTISSLQPEDF

ATYYCQQFNSYP

456
IGKV1-
DIQMTQSPSSLSASVGDRVTITCRASQGISNYLAWFQQKP

16
GKAPKSLIYAASSLQSGVPSRFSGSGSGTDFTLTISSLQPE

DFATYYCQQYNSYP

457
IGKV1-
DIQMTQSPSSLSASVGDRVTITCRASQGIRNDLGWYQQKP

17
GKAPKRLIYAASSLQSGVPSRFSGSGSGTEFTLTISSLQPE

DFATYYCLQHNSYP

458
IGKV1-
DIQLTQSPSSLSASVGDRVTITCRVSQGISSYLNWYRQKPG

37_v1
KVPKLLIYSASNLQSGVPSRFSGSGSGTDFTLTISSLQPED

VATYYGQRTYNAP

459
IGKV1-
DIQLTQSPSSLSASVGDRVTITCRVSQGISSYLNWYRQKPG
Restores conserved Cys,

37_v2
KVPKLLIYSASNLQSGVPSRFSGSGSGTDFTLTISSLQPED
missing in v1 above,

VATYYCQRTYNAP
just prior to CDRL3.

460
IGKV1D-
DIQMTQSPSSLSASVGDRVTITCRASQGISSWLAWYQQKP

16
EKAPKSLIYAASSLQSGVPSRFSGSGSGTDFTLTISSLQPE

DFATYYCQQYNSYP

461
IGKV1D-
NIQMTQSPSAMSASVGDRVTITCRARQGISNYLAWFQQKP

17
GKVPKHLIYAASSLQSGVPSRFSGSGSGTEFTLTISSLQPE

DFATYYCLQHNSYP

462
IGKV1D-
AIRMTQSPFSLSASVGDRVTITCWASQGISSYLAWYQQKP

43
AKAPKLFIYYASSLQSGVPSRFSGSGSGTDYTLTISSLQPE

DFATYYCQQYYSTP

463
IGKV1D-
VIWMTQSPSLLSASTGDRVTISCRMSQGISSYLAWYQQKP

8_v1
GKAPELLIYAASTLQSGVPSRFSGSGSGTDFTLTISCLQSE

DFATYYCQQYYSFP

464
IGKV1D-
VIWMTQSPSLLSASTGDRVTISCRMSQGISSYLAWYQQKP
C to S mutation avoids

8_v2
GKAPELLIYAASTLQSGVPSRFSGSGSGTDFTLTISSLQSE
unpaired Cys. in v1

DFATYYCQQYYSFP
above. S was chosen by

analogy to other germ-

line sequences, but

amino acid types, N,

R, S, as non-limiting

examples, are also

possible

465
IGKV2-
DIVMTQTPLSSPVTLGQPASISCRSSQSLVHSDGNTYLSWL

24
QQRPGQPPRLLIYKISNRFSGVPDRFSGSGAGTDFTLKISR

VEAEDVGVYYCMQATQFP

466
IGKV2-
DIVMTQTFLSLSVTRQQPASISCKSSQSLLHSDGVTYLYWY

29
LQRPQQSPQLLTYEVSSRFSGVPDRFSGSGSGTDFTLKIS

RVEAEDVGVYYCMQGTHLP

467
IGKV2-
DVVMTQSPLSLPVTLGQPASISCRSSQSLVYSDGNTYLNW

30
FQQRPGQSPRRLIYKVSNRDSGVPDRFSGSGSGTDFTLKI

SRVEAEDVGVYYCMQGTHWP

468
IGKV2-
DIVMTQTPLSLPVTPGEPASISCRSSQSLLDSDDGNTYLDW

40
YLQKPGQSPQLLIYTLSYRASGVPDRFSGSGSGTDFTLKIS

RVEAEDVGVYYCMQRIEFP

469
IGKV2D-
EIVMTQTPLSLSITPGEQASMSCRSSQSLLHSDGYTYLYWF

26
LQKARPVSTLLIYEVSNRFSGVPDRFSGSGSGTDFTLKISR

VEAEDFGVYYCMQDAQD

470
IGKV2D-
DIVMTQTPLSLSVTPGQPASISCKSSQSLLHSDGKTYLYWY

29
LQKPGQPPQLLIYEVSNRFSGVPDRFSGSGSGTDFTLKISR

VEAEDVGVYYCMQSIQLP

471
IGKV2D-
DWVMTQSPLSLPVTLGQPASISCRSSQSLVYSDGNTYLNW

30
FQQRPGQSPRRLIYKVSNWDSGVPDRFSGSGSGTDFTLKI

SRVEAEDVGVYYCMQGTHWP

472
IGKV3D-
EIVMTQSPATLSLSPGERATLSCRASQSVSSSYLSWYQQK

07
PGQAPRLLIYGASTRATGIPARFSGSGSGTDFTLTISSLQPE

DFAVYYCQQDYNLP

473
IGKV3D-
EIVLTQSPATLSLSPGERATLSCRASQGVSSYLAWYQQKP

11
GQAPRLLIYDASNRATGIPARFSGSGPGTDFTLTISSLEPED

FAVYYCQQRSNWH

474
IGKV3D-
EIVLTQSPATLSLSPGERATLSCGASQSVSSSYLAWYQQK

20
PGLAPRLLIYDASSRATGIPDRFSGSGSGTDFTLTISRLEPE

DFAVYYCQQYGSSP

475
IGKV5-
ETTLTQSPAFMSATPGDKVNISCKASQDIDDDMNWYQQKP

2_v1
GEAAIFIIQEATTLVPGIPPRFSGSGYGTDFTLTINNIESEDA

AYYFCLQHDNFP

476
IGKV5-
ETTLTQSPAFMSATPGDKVTISCKASQDIDDDMNWYQQKP
N to D mutation avoids

2_v2
GEAAIFIIQEATTLVPGIPPRFSGSGYGTDFTLTINNIESEDA
NIS potential glyco-

AYYFCLQHDNFP
sylation site in v1

above. XIS, where X is

not N, and NIZ, where

Z is not S or T are

also options. NPS is

yet another option

that is much less

likely to be N-linked

glycosylated.

477
IGKV6-
EIVLTQSPDFQSVTPKEKVTITCRASQSIGSSLHWYQQKPD

21
QSPKLLIKYASQSFSGVPSRFSGSGSGTDFTLTINSLEAED

AATYYCHQSSSLP

478
IGKV6D-
EIVLTQSPDFQSVTPKEKVTITCRASQSIGSSLHWYQQKPD

21
QSPKLLIKYASQSFSGVPSRFSGSGSGTDFTLTINSLEAED

AATYYCHQSSSLP

479
IGKV7-
DIVLTQSPASLAVSPGQRATITCRASESVSFLGINLIHWYQQ

3
KPGQPPKLLIYQASNKDTGVPARFSGSGSGTDFTLTINPVE

ANDTANYYCLQSKNFP

480
IGλV1-
QSVLTQPPSVSEAPRQRVTISCSGSSSNIGNNAVNWYQQL

36
PGKAPKLLIYYDDLLPSGVSDRFSGSKSGTSASLAISGLQS

EDEADYYCAAWDDSLNG

481
IGλV1-
QSVLTQPPSASGTPGQRVTISCSGSSSNIGSNYVYWYQQL

47
PGTAPKLLIYRNNQRPSGVPDRFSGSKSGTSASLAISGLRS

EDEADYYCAAWDDSLSG

482
IGλV10-
QAGLTQPPSVSKGLRQTATLTCTGNSNNVGNQGAAWLQQ

54
HQGHPPKLLSYRNNNRPSGISERLSASRSGNTASLTITGLQ

PEDEADYYCSAWDSSLSA

483
IGλV2-
QSALTQPRSVSGSPGQSVTISCTGTSSDVGGYNYVSWYQ

11_v1
QHPGKAPKLMIYDVSKRPSGVPDRFSGSKSGNTASLTISGL

QAEDEADYYCCSYAGSYTF

484
IGλV2-
QSALTQPRSVSGSPGQSVTISCTGTSSDVGGYNYVSWYQ
C to S mutation avoids

11_v2
QHPGKAPKLMIYDVSKRPSGVPDRFSGSKSGNTASLTISGL
unpaired Cys in v1

QAEDEADYYCSSYAGSYTF
above. S was chosen

by analogy to other

germline sequences, but

other amino acid types,

such as Q, G, A, L, as

non-limiting examples,

are also possible

485
IGλV2-
QSALTQPPSVSGSPGQSVTISCTGTSSDVGSYNRVSWYQ

18
QPPGTAPKLMIYEVSNRPSGVPDRFSGSKSGNTASLTISGL

QAEDEADYYCSLYTSSSTF

486
IGλV2-
QSALTQPASVSGSPGQSITISCTGTSSDVGSYNLVSWYQQ

23_v1
HPGKAPKLMIYEGSKRPSGVSNRFSGSKSGNTASLTISGL

QAEDEADYYCCSYAGSSTL

487
IGλV2-
QSALTQPASVSGSPGQSITISCTGTSSDVGSYNLVSWYQQ
C to S mutation avoids

23_v2
HPGKAPKLMIYEGSKRPSGVSNRFSGSKSGNTASLTISGL
unpaired Cys in v1

QAEDEADYYCSSYAGSSTL
above. S was chosen

by analogy to other

germline sequences, but

other amino acid types,

such as Q, G, A, L, as

non-limiting examples,

are also possible

488
IGλV2-
QSALTQPPSASGSPGQSVTISCTGTSSDVGGYNYVSWYQ

8
QHPGKAPKLMIYEVSKRPSGVPDRFSGSKSGNTASLTVSG

LQAEDEADYYCSSYAGSNNF

489
IGλV3-
SYELTQPPSVSVSPGQTARITCSGDALPKKYAYWYQQKSG

10
QAPVLVIYEDSKRPSGIPERFSGSSSGTMATLTISGAQVED

EADYYCYSTDSSGNH

490
IGλV3-
SYELTQPHSVSVATAQMARITCGGNNIGSKAVHWYQQKP

12
GQDPVLVIYSDSNRPSGIPERFSGSNPGNTTTLTISRIEAGD

EADYYCQVWDSSSDH

491
IGλV3-
SYELTQPPSVSVSLGQMARITCSGEALPKKYAYWYQQKPG

16
QFPVLVIYKDSERPSGIPERFSGSSSGTIVTLTISGVQAEDE

ADYYCLSADSSGTY

492
IGλV3-
SYELMQPPSVSVSPGQTARITCSGDALPKQYAYWYQQKP

25
GQAPVLVIYKDSERPSGIPERFSGSSSGTTVTLTISGVQAE

DEADYYCQSADSSGTY

493
IGλV3-
SYELTQPSSVSVSPGQTARITCSGDVLAKKYARWFQQKPG

27
QAPVLVIYKDSERPSGIPERFSGSSSGTTVTLTISGAQVEDE

ADYYCYSAADNN

494
IGλV3-
SYELTQPLSVSVALGQTARITCGGNNIGSKNVHWYQQKPG

9
QAPVLVIYRDSNRPSGIPERFSGSNSGNTATLTISRAQAGD

EADYYCQVWDSSTA

495
IGλV4-
LPVLTQPPSASALLGASIKLTCTLSSEHSTYTIEWYQQRPG

3
RSPQYIMKVKSDGSHSKGDGIPDRFMGSSSGADRYLTFSN

LQSDDEAEYHCGESHTIDGQVG

496
IGλV4-
QPVLTQSSSASASLGSSVKLTCTLSSGHSSYIIAWHQQQP

60
GKAPRYLMKLEGSGSYNKGSGVPDRFSGSSSGADRYLTIS

NLQLEDEADYYCETWDSNT

497
IGλV5-
QPVLTQPTSLSASPGASARFTCTLRSGINVGTYRIYWYQQK

39
PGSLPRYLLRYKSDSDKQQGSGVPSRFSGSKDASTNAGLL

LISGLQSEDEADYYCAIWYSSTS

498
IGλV7-
QAVVTQEPSLTVSPGGTVTLTCGSSTGAVTSGHYPYWFQ

46
QKPGQAPRTLIYDTSNKHSWTPARFSGSLLGGKAALTLSG

AQPEDEAEYYCLLSYSGAR

499
IGλV8-
QTVVTQEPSFSVSPGGTVTLTCGLSSGSVSTSYYPSWYQ

61
QTPGQAPRTLIYSTNTRSSGVPDRFSGSILGNKAALTITGA

QADDESDYYCVLYMGSGI

500
IGλV9-
QPVLTQPPSASASLGASVTLTCTLSSGYSNYKVDWYQQRP

49
GKGPRFVMRVGTGGIVGSKGDGIPDRFSVLGSGLNRYLTI

KNIQEEDESDYHCGADHGSGSNFV

501
IGHD1-
GGTACAACTGGAACGAC
See (1) below.

1

502
IGHD1-
GGTATAACCGGAACCAC

14

503
IGHD1-
GGTATAACTGGAACGAC

20

504
IGHD1-
GGTATAACTGGAACTAC

7

505
IGHD2-
AGCATATTGTGGTGGTGATTGCTATTCC

21_v1

506
IGHD2-
AGCATATTGTGGTGGTGACTGCTATTCC
Common allelic variant

21_v2

encoding a different

amino acid sequence,

compared to v1, in 2

of 3 forward reading

frames.

507
IGHD2-
AGGATATTGTACTAATGGTGTATGCTATACC

8

508
IGHD3-
GTATTATGATTACGTTTGGGGGAGTTATGCTTATACC

16

509
IGHD3-
GTATTACGATATTTTGACTGGTTATTATAAC

9

510
IGHD4-
TGACTACGGTGGTAACTCC

23

511
IGHD4-
TGACTACAGTAACTAC

4/4-11

512
IGHD5-
GTGGATATAGTGGCTACGATTAC

12

513
IGHD5-
GTAGAGATGGCTACAATTAC

24

514
GHD6-
GGGTATAGCAGCGGCTAC

25

515
IGHD6-
GAGTATAGCAGCTCGTCC

6

516
IGHD7-
CTAACTGGGGA

27

(1) Each of the IGHD nucleotide sequences can be read in three (3) forward reading frames, and, possibly, in 3 reverse reading frames. For example, the nucleotide sequence given for IGHD1-1, depending on how it inserts in full V-DJ rearrangement, may encode the full peptide sequences: GTTGT (SEQ ID NO: 517), VQLER (SEQ ID NO: 518) and YNWND (SEQ ID NO: 519) in the forward direction, and VPVV (SEQ ID NO: 520), SFQLY (SEQ ID NO: 521) and RSSCT (SEQ ID NO: 522) in the reverse direction. Each of these sequences, in turn, could generate progressively deleted segments as explained in the Examples to produce suitable components for libraries of the invention.

Example 15: Selection of Antibodies from the Library

In this example, the selection of antibodies from a library of the invention (described in Examples 9-11 and other Examples) is demonstrated. These selections demonstrate that the libraries of the invention encode antibody proteins capable of binding to antigens. In one selection, antibodies specific for “Antigen X”, a protein antigen, were isolated from the library using the methods described herein. FIG. 24 shows binding curves for six clones specifically binding Antigen X, and their Kd values. This selection was performed using yeast with the heavy chain on a plasmid vector and the kappa light chain library integrated into the genome of the yeast.

In a separate selection, antibodies specific for a model antigen, hen egg white lysozyme (HEL) were isolated. FIG. 25 shows the binding curves for 10 clones specifically binding HEL; each gave a Kd >500 nM. This selection was performed using yeast with the heavy chain on a plasmid vector and the kappa light chain library on a plasmid vector. The sequences of the heavy and light chains were determined for clones isolated from the library and it was demonstrated that multiple clones were present. A portion of the FRM3s (underlined) and the entire CDRH3s from four clones are shown below (Table 53 and Table 54, the latter using the numbering system of the invention).

TABLE 53

Sequences of CDRH3, and a Portion of FRM3,

from Four HEL Binders

SEQ

Seq
ID
FRM3 and

Name
NO:
CDRH3
Tail
N1
DH
N2
H3-JH

CR080362
523

AKGPSVPAARAE
G
PS
VPA
AR
AEYFQH

YFQH

CR080363
524

AREGGLGYYYRE
E
GGL
GYYY
RE
WYFDL

WYFDL

CR080372
525

AKPDYGAEYFQH
—
P
DYG
—
AEYFQH

EK080902
526

AKEIVVPSAEYF
E
—
IVV
PS
AEYFQH

QH

TABLE 54

Sequences of CDRH3 from Four HEL Binders in Numbering System of

the Invention, According to the Numbering System of the Invention

[Tail]
[N1]
[DH]
[N2]

Clones
95
96
96A
96B
97
97A
97B
97C
97D
98
98A
98B

CR080362
G
P
S
—
V
P
A
—
—
A
R
—

CR080363
E
G
G
L
G
Y
Y
Y
—
R
E
—

CR080372
—
P
—
—
D
Y
G
—
—
—
—
—

EK080902
E
—
—
—
I
V
V
—
—
P
S
—

[H3-JH]
CDRH3

Clones
99E
99D
99C
99B
99A
99
100
101
102
Length

CR080362
—
—
—
A
E
Y
F
Q
H
14

CR080363
—
—
—
—
W
Y
F
D
L
15

CR080372
—
—
—
A
E
Y
F
Q
H
10

EK080902
—
—
—
A
E
Y
F
Q
H
12

Sequence Identifiers: CR080362 (SEQ ID NO: 523); CR080363 (SEQ ID NO: 524); CR080372 (SEQ ID NO: 525); EK080902 (SEQ ID NO: 526)

The heavy chain chassis isolated were VH3-23.0 (for EK080902 and CR080363), VH3-23.6 (for CR080362), and VH3-23.4 (for CR080372). These variants are defined in Table 8 of Example 2. Each of the four heavy chain CDRH3 sequences matched a designed sequence from the exemplified library. The CDRL3 sequence of one of the clones (ED080902) was also determined, and is shown below, with the surrounding FRM regions underlined:

(SEQ ID NO: 527)

CDRL3:

YYCQESFHIPYTFGGG.

In this case, the CDRL3 matched the design of a degenerate VK1-39 oligonucleotide sequence in row 49 of Table 33. The relevant portion of this table is reproduced below, with the amino acids occupying each position of the isolated CDRL3 bolded and underlined:

CDR
Junction
Degenerate
SEQ

Chassis
Length
type
Oligonucleotide
ID
89
90
91
92
93
94
95
96
97

VK1-39
9
1
CWGSAAWCATHC
307
LQ

EQ

ST

FSY

HNPRST

IST

P

FY

T

MVTABTCCTTWCA

CT

Example 16: Libraries Utilizing Non-Human DH Segments

This example illustrates a non-limiting selection of non-human vertebrate DH segments for use in the libraries of the invention. Non-human vertebrate DH segments were generally selected as follows. First, an exemplary survey of published IGHD sequences was performed as summarized below. Second, the degree of deletion on either end of the IGHD gene segments was estimated by analogy with human sequences (see Example 4.1). For the presently exemplified library, progressively deleted DH segments as short as three amino acids were included. As enumerated in the Detailed Description, other embodiments of the invention comprise libraries with DH segments with a minimum length of about 1, 2, 4, 5, 6, 7, 8, 9, or 10 amino acids.

Table 55 lists IGHD segments for a variety of species, namely Mus musculus (mouse; BALB/C and C57BL/6), Macaca mulatta (rhesus monkey), Oryctolagus cuniculus (rabbit), Rattus norvegicus (rat), Ictalurus punclatus (catfish), Gadus morhua L (Atlantic cod), Pan troglodytes (chimpanzee), Camelidae sp. (camel), and Bos sp. (cow). The sequences were obtained from the publications cited in Table 55. The DNA sequences encoding the IGHD genes are presented together with their translations in all three forward reading frames and, in some cases, three reverse reading frames. It will be appreciated that a skilled artisan could readily translate the reverse reading frames in those cases where they are not provided herein. Without being bound by theory, it is generally believed that the forward reading frames tend to be favored for inclusion in actual complete antibody sequences.

For the rat sequences, a procedure was implemented to extract the IGHD information from the most recent genomic assembly. First, the genomic location of a typical IGHV gene, e.g., 138565773 on chromosome 6, was identified from the literature (Das et al., Immunogenetics, 2008, 60: 47, incorporated by reference in its entirety). This location (i.e., 138565773 on chromosome 6) was then used to identify the contig and location within Genbank, and the approximately 150K bp upstream (because the genes of interest are in the minus strand) segment was extracted. Searches for canonical (e.g., mouse and human) recombination signal sequences (RSS) were conducted and candidate coding regions of lengths between about 10 and about 50 nucleotides were considered putative IGHD genes. The results of this IGHD gene identification process were consistent with the data that was available in the literature (e.g., the IGHD sequence designated “D15” in Table 55 is identical to the sequence highlighted in FIG. 3A of Bruggemann et al., Proc. Natl. Acad. Sci. USA, 1986, 83: 6075, incorporated by reference in its entirety). Finally, when the translation led to a stop codon, the longest open reading frame (ORF) was chosen to represent the peptide contribution. For example, translation in the first reverse reading frame (R1) of the rabbit sequence D2a results in the sequence *HKHNQHNHKYSN (SEQ ID NO: 845), where ‘*’ represents a stop codon; in such case the longest ORF would be HKHNQHNHKYSN (SEQ ID NO: 845), as reported in Table 55. Alternatively, in the case of long segments, such as those derived from the cow (see Table 55), appropriate sub-segments not comprising a stop codon would be considered. For example, translation of the cow DH1 gene in the first reading frame, provides MIR[stop]VWL[stop]LL[stop]CCY, which naturally would give rise to the ORFs or sub-segments: MIR, VWL, and CCY, when keeping a minimum length of three amino acids.

The procedure used above for the rat, was also used for the chimpanzee (Pan troglodytes) and the three sets of sequences that were determined using the foregoing method are listed in Table 55. Only the forward reading frame translations are presented, but it will be appreciated that one of ordinary skill in the art could readily generate the corresponding reverse translations.

For each of the sequences set forth in the tables described above, variants may be generated by systematic deletion from the N- and/or C-termini, until there are three amino acids remaining. For example, for gene D6s 4 from the rhesus macaque, the full sequence GYSGTWN (SEQ ID NO: 846) may be used to generate the progressive deletion variants: GYSGTW (SEQ ID NO: 847), GYSGT (SEQ ID NO: 848), GYSG (SEQ ID NO: 849), GYS, YSGTWN (SEQ ID NO: 850), SGTWN (SEQ ID NO: 851), GTWN (SEQ ID NO: 852), TWN, YSGTW (SEQ ID NO: 853), YSGT (SEQ ID NO: 854), YSG, SGTW (SEQ ID NO: 855), GTW, and so forth. This progressive deletion procedure is taught in detail herein in other parts of the specification. In general, and as shown in Example 4.1, for any full-length sequence of size N, there will be a total of (N−1)*(N−2)/2 variants, including the original full-length sequence, when the termini are progressively deleted to obtain a minimum of three amino acids per segment. The number of variants will increase or decrease accordingly, depending on the minimum length of the progressively deleted DH segment; e.g., (N−2)*(N−3)/2 for a minimum length of four and (N)*(N−1)/2 for a minimum length of two. This relationship can be generalized to (N+1−L)*(N+2−L)/2 where L is the number of amino acid residues in the shortest segment and L is always smaller than N. In the extreme case where L equals N, as expected, one obtains (1)*(2)/2, or just one segment, namely the original segment.

For the disulfide-loop-encoding segments, as exemplified by sequence D2S3 of rhesus translated in the second forward reading frame (AHCSDSGCSS) (SEQ ID NO: 856), the progressive deletions were limited, in the present exemplification of the library, so as to leave the loop intact; i.e., only amino acids N-terminal to the first Cys, or C-terminal to the second Cys were deleted in the respective D segment variants; i.e., AHCSDSGCS (SEQ ID NO: 857), AHCSDSGC (SEQ ID NO: 858), HCSDSGCSS (SEQ ID NO: 859), CSDSGCSS (SEQ ID NO: 860), HCSDSGCS (SEQ ID NO: 861), HCSDSGC (SEQ ID NO: 862), CSDSGCS (SEQ ID NO: 863), and CSDSGC (SEQ ID NO: 864). This choice was made to avoid the presence of unpaired cysteine residues in the currently exemplified version of the library. For the same reason, segments with an odd number of Cys residues may be avoided in library construction. For example, the peptide segment resulting from the first reverse translation of the mouse (C57BL/6 strain) DST4 gene is SLSC, with the last Cys being potentially unpaired. This segment may be ignored, or considered only in its C-terminal deleted derivative, SLS. However, as discussed in the Detailed Description, other embodiments of the library may include unpaired cysteine residues, or the substitution of these cysteine residues with other amino acids.

According to the criteria outlined above and throughout the specification, a number of sequences, or subsets thereof, may be chosen for inclusion in a library of the invention. Selection of these segments may be carried out using a variety of criteria, individually or in combination. Exemplary non-limiting criteria include:

- (a) choosing segments that are most diverse in length and sequence;
- (b) choosing segments with maximal “human string content” (see, e.g., US Pub. No. 2006/0008883, incorporated by reference in its entirety); or
- (c) choosing segments with a minimal number of predicted T-cell epitopes (see, e.g., U.S. Pat. No. 5,712,120, WO 9852976, and US Pub. No. 2008/0206239, each of which is incorporated by reference in its entirety).

TABLE 55

IGHD segments from other vertebrates

Refer-

Species
Name
DNA
ence
F1
F2
F3
R1
R2
R3

Mouse_
DFL16.1
TTTATTACT
[1]
FITTVVAT
LLLR
YYYGSSY
GSYYRSNK
LLP
VATTVVI

C57BL/6

ACGGTAGTA

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

GCTACC

NO: 866)
NO: 867)
NO: 925)
NO: 868)

NO: 869)

(SEQ ID

NO: 865)

Mouse_
DSP2.2
TCTACTATG
[1]
STMITT
LR
YYDYD
RNHSR
S
VVIIV

C57BL/6

ATTACGAC

(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO: 871)

NO: 872)
NO: 873)

NO: 874)

NO: 870)

Mouse_
DSP2.3
TCTACTATG
[1]
STMVTT
LLWLR
YYGYD
RNHSR
P
VVTIV

C57BL/6

GTTACGAC

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO: 917)
NO: 876)
NO: 922)
NO: 873)

NO: 877)

NO: 875)

Mouse_
DSP2.5
TCTACTATG
[1]
STMVMT
LLW
YYGND
HYHSR
SLP
VITIV

C57BL/6

GTAATGAC

(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO: 879)

NO: 880)
NO: 881)

NO: 882)

NO: 878)

Mouse_
DSP2.9
TCTACTATG
[1]
STMVMT
LLW
YYGND
HYHSR
SLP
VITIV

C57BL/6

GTAATGAC

(SEQ ID

SEQ ID
SEQ ID

(SEQ ID

(SEQ ID

NO: 879)

NO: 880)
NO: 881)

NO: 882)

NO: 883)

Mouse_
DSP2.X
CCTACTATA
[1]
PTIVTT
LL
YYSNY
SYYSR
LL
VVTIVG

C57BL/6

GTAACTAC

(SEQ ID

SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO: 885)

NO: 886
NO: 887)

NO: 923)

NO: 884)

Mouse_
DST4
ACAGCTCAG
[1]
TAQAT
QLRL
SSGY
SLSC
PEL
VA

C57BL/6

GCTAC

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO: 889)
NO: 890)
NO: 891)
NO: 892)

NO: 888)

Mouse_
DST4.2
CACAGCTCG
[1]
HSSGY
TARA
QLGL
VARAV
SPSC
PEL

C57BL/6

GGCTAC

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO: 894)
NO: 895)
NO: 896)
NO: 897)
NO: 898)

NO: 893)

Mouse_
DQ52
CTAACTGGG
[1]
LTGT
LG
NWD
PS
SQL
VPV

C57BL/6

AC

(SEQ ID

(SEQ ID

NO: 900)

NO: 899)

Mouse_
P3
GAATACCTA
[1]
EYLP
NTY
IPT
VF
VGI
GRYS

C57BL/6

CC

(SEQ ID

(SEQ ID

(SEQ ID

NO: 902)

NO: 903)

NO: 901)

Mouse_
P5
GACTACCTA
[1]
DYLP
TTY
LPT
VV
VGS
GR

C57BL/6

CC

(SEQ ID

(SEQ ID

NO: 905)

NO: 904)

Mouse_
P1
GAGTACCTA
[1]
EYLP
STY
VPT
VL
VGT
GRYS

C57BL/6

CC

(SEQ ID

(SEQ ID

(SEQ ID

NO: 907)

NO: 908)

NO: 906)

Mouse_
DSP2.9
TCTATGATG
[2]
SMMVTT
WLL
YDGYY
SNHHR
PS
VVTII

BALB/C

GTTACTAC

(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO: 910)

NO: 911)
NO: 912)

NO: 913)

NO: 909)

Mouse
DSP2.2
TCTACTATG
[3]
STMITT
LR
YYDYD
RNHSR
S
VVIIV

BALB/C

ATTACGAC

(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO: 871)

NO: 872)
NO: 873)

NO: 915)

NO: 914)

Mouse_
DSP2.5
TCTACTATG
[3]
STMVTT
LLW
YYGNY
SYHSR
LP
VVTIV

BALB/C

GTAACTAC

(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO: 917)

NO: 918)
NO: 919)

NO: 877)

NO: 916)

Mouse_
DSP2.6
CCTACTATG
[3]
PTMVTT
LLWLR
YYGYD
RNHSR
S
VVTIVG

BALB/C

GTTACGAC

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO: 921)
NO: 876)
NO: 922)
NO: 873)

NO: 923)

NO: 920)

Mouse_
DFL16.1
TTTATTACT
[3]
FITTVVAT
LLLR
YYYGSSY
SYYRSNK
LLP
VATTVVI

BALB/C

ACGGTAGTA

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

GCTAC

NO:866)
NO: 867)
NO: 925)
NO: 926)

NO: 869)

(SEQ ID

NO: 924)

Mouse_
DSP2.3
TCTACTATG
[3]
STMVTT
LLWLR
YYGYD
RNHSR
P
VVTIV

BALB/C

GTTACGAC

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO: 917)
NO: 876)
NO: 922)
NO: 873)

NO: 877)

NO: 927)

Mouse_
DFL16.2
TTCATTACT
[3]
FITTAT
SLLRL
HYYGY
SRSNE
P
VAVVM

BALB/C

ACGGCTAC

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO: 929)
NO: 930)
NO: 931)
NO: 932)

NO: 933)

NO: 928)

Mouse_
DSP2.4
TCTACTATG
[3]
STMVTT
LLWLR
YYGYD
RNHSR
S
VVTIV

BALB/C

GTTACGAC

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

SEQ ID

NO: 917)
NO: 876)
NO: 922)
NO: 873)

NO: 877)

NO: 934)

Mouse_
DSP2.7
CCTACTATG
[3]
PTMVTT
LLW
YYGNY
SYHSR
LP
VVTIVG

BALB/C

GTAACTAC

(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO: 921)

NO: 918)
NO: 919)

NO: 923)

NO: 935)

Mouse_
DSP2.8
CCTAGTATG
[3]
PSMVTT
LVW
YGNY
SYHTR
LPY
VVTILG

BALB/C

GTAACTAC

(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO: 937)

NO: 938)
NO: 939)

NO: 940)

NO: 936)

Mouse_
DQ52
CTAACTGGG
[4]
LTG
LG
NW
PS
SQL
PV

BALB/C

A (SEQ ID

NO: 941)

Mouse_
DST4
AGACAGCTC
[5]
RQLGL
DSSG
TARA
PELS
ARAV
SPSCL

BALB/C

GGGCTA

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO: 943)
NO: 944)
NO: 895)
NO: 945)
NO: 946)
NO: 947)

NO: 942)

Mouse_
DSP2.1
TCTACTATG
[6]
STMVTT
LLW
YYGNY
SYHSR
LP
VVTIV

BALB/C

GTAACTAC

(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO: 917)

NO: 918)
NO: 919)

NO: 877)

NO: 948)

Mouse_
DSP2.X
CCTACTATA
[6]
PTIVTT
LL
YYSNY
SYYSR
LL
VVTIVG

BALB/C

GTAACTAC

(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO: 885)

NO: 886)
NO: 887)

NO: 923)

NO: 949)

Rhesus

D6S4
GGGTATAGC
[7]
GYSGTWN
GIAARG
RHVE
VPRAAIP
STCRYT
FHVPLYP

GGCACGTG

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
SEQ ID

GAAC

NO: 846)
NO: 951)
NO: 952)
NO: 953)
NO: 954)
NO: 955)

(SEQ ID

NO: 950)

Rhesus

D6S3
GGGGTATAG
[7]
RWLV
GYSGGWS
GIAVAG
DQPPLYP
GPATAIP
TSHRYTP

CGGTGGCT

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

GGTCC

NO: 957)
NO: 958)
NO: 959)
NO: 960)
NO: 961)
NO: 962)

(SEQ ID

NO: 956)

Rhesus

D6S2
GGGTATAGC
[7]
GYSSWS
GIAAG
QLV
GPAAIP
TSCYT
DQLLYP

AGCTGGTCC

(SEQ ID
(SEQ ID

(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO: 964)
NO: 965)

NO: 966)
NO: 967)
NO: 968)

NO: 963)

Rhesus

D6S1
GGGTATAGC
[7]
GYSSGWY
GIAAAG
QRLV
VPAAAIP
TSRCYT
YQPLLYP

AGCGGCTG

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

GTAC

NO: 970)
NO: 971)
NO: 972)
NO: 973)
NO: 974)
NO: 975)

(SEQ ID

NO: 969)

Rhesus

D5S5
GGGGATACA
[7]
GDTVGTVT
GIQWVQL
GYSGYSY
NCTHCIP
LYPLYP
VTVPTVSP

GTGGGTACA

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

GTTAC

NO: 977)
NO: 978)
NO: 979)
NO: 980)
NO: 981)
NO: 982)

(SEQ ID

NO: 976)

Rhesus

D5S4
GTGGTATAG
[7]
TTVT
WYRLRL
GIDYGY
NRSLYH
SIP
VTVVYTT

ACTACGGTT

(SEQ ID
SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

AC

NO:
NO: 984)
NO: 985)
NO: 986)

NO: 987)

(SEQ ID

1027)

NO: 983)

Rhesus

D553
GGGGATATA
[7]
GDIVGTVT
WWVQL
GYSGYSY
NCTHYIP
LYPLYP
VTVPTISP

GTGGGTACA

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

GTTAC

NO: 989)
NO: 990)
NO: 979)
NO: 991)
NO: 981)
NO: 992)

(SEQ ID

NO: 988)

Rhesus

D5S2
GTGGATACA
[7]
VDTATVT
WIQLQL
GYSYSY
NCSCIH
LYP
VTVAVST

GCTACAGTT

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

AC

NO: 994)
NO: 995)
NO: 996)
NO: 997)

NO: 998)

(SEQ ID

NO: 993)

Rhesus

D5S1
GTGGATACA
[7]
VDTVGTVT
WIQWVQL
GYSGYSY
NCTHCIH
LYPLYP
VTVPTVST

GTGGGTACA

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

GTTAC

NO:
NO:
NO: 979)
NO:
NO: 981)
NO:

(SEQ ID

1000)
1001)

1002)

1003)

NO: 999)

Rhesus

D4S5
TGACTACGG
[7]
LR
DYGNY
TTVT
LP
VVTVV
SYRS

TAACTAC

(SEQ ID
(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:

NO:
NO:

NO: 1004)

1005)
1027)

1006)
1007)

Rhesus

D4S4
TGACTACGG
[7]
LRNL
DYGI
TTES
IP
LDSVV
RFRS

AATCTAG

(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:
NO:

NO:
NO:

NO: 1008)

1009)
1010)
1011)

1012)
1013)

Rhesus

D4S3
TGACTACGG
[7]
QL
DYGSSY
TTVAA
LLP
VAATVV
SCYRS

TAGCAGCTA

(SEQ ID
(SEQ ID

(SEQ ID
(SEQ ID

C (SEQ ID

NO:
NO:

NO:
NO:

NO: 1014)

1015)
1016)

1017)
1018)

Rhesus

D4S2
TGAATACAG
[7]
IQ
EYSNY
NTVT
LLYS
VVTVF
SYCI

TAACTAC

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:
NO:
NO:
NO:

NO: 1019)

1020)
1021)
1022)
1023)
1024)

Rhesus

D4S1
TGACTACGG
[7]
LR
DYGNY
TTVT
LP
VVTVV
SYRS

TAACTAC

(SEQ ID
(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:

NO:
NO:

NO: 1025)

1005)
1027)

1006)
1007)

Rhesus

D3S5
GTATTACTA
[7]
VLL
YYYSGSCY
ITIVVVVT
QLPL
VVTTTT
SNNYHYS

TAGTGGTAG

Y (SEQ
(SEQ ID
(SEQ ID
IVI
NT (SEQ

TTGTTACTA

ID NO:
NO:
NO:
(SEQ ID
ID NO:

C (SEQ ID

1029)
1030)
1031)
NO:
1033)

NO: 1028)

1032)

Rhesus

D3S4
GTATTACGA
[7]
VLRLRY
YYDYDISS
ITITILVV
YRLLIS
GLISTT
YRNRNT

TTACGATAT

(SEQ ID
RY (SEQ
DIK
(SEQ ID
NIVIVI
(SEQ

TAGTAGTCG

NO:
ID NO:
(SEQ ID
NO:
(SEQ ID
ID NO:

ATATTAAAC

1035)
1036)
NO:
1038)
NO:
1040)

C (SEQ ID

1037)

1039)

NO: 1034)

Rhesus

D3S3
GTATTACTA
[7]
LLL
YYYSGSYY
ITIVVVIT
LPL
VVITTTI
SNNYHY

TAGTGGTAG

Y (SEQ
(SEQ ID

VI
SNT

TTATTACTA

ID NO:
NO:

(SEQ ID
(SEQ ID

C (SEQ ID

1042)
1043)

NO:
NO:

NO: 1041)

1044)
1033)

Rhesus

D3S2
GTATTACGA
[7]
LRLLLH
YYEDDYGY
ITRMITVT
SSS
GVIVTVI
CNSNRNH

GGATGATTA

(SEQ ID
YYT
IT

ILVI
PRNT

CGGTTACTA

NO:
(SEQ ID
(SEQ ID

(SEQ ID
(SEQ ID

TTACACC

1046)
NO:
NO:

NO:
NO:

(SEQ ID

1047)
1048)

1049)
1050)

NO: 1045)

Rhesus

D3S1
GTATTACAA
[7]
VLQFLEW
YYNFWSGY
ITIFGVVI
PLQKL
GVITTPKI
CNNHSKNC

TTTTTGGAG

LLH
YT
T
(SEQ ID
VI
NT

TGGTTATTA

(SEQ ID
(SEQ ID
(SEQ ID
NO:
(SEQ ID
(SEQ ID

CACC

NO:
NO:
NO:
1055)
NO:
NO:

(SEQ ID

1052)
1053)
1054)

1056)
1057)

NO: 1051)

Rhesus

D2S5
AGGATATTG
[7]
RILYCYYL
GYCTATTC
DIVLLLLV
ARQVVAVQ
TSSSSTIS
QYNIL

TACTGCTAC

SS
LA
(SEQ ID
YP
(SEQ ID
(SEQ ID

TACTTGTCT

(SEQ ID
(SEQ ID
NO:
(SEQ ID
NO:
NO:

AGCC

NO:
NO:
1061)
NO:
1063)
1064)

(SEQ ID

1059)
1060)

1062)

NO: 1058)

Rhesus

D2S4
AGGATATTG
[7]
WWCLLH
GYCSGGVC
DIVVVVSA
VEQTPPLQ
GGADTTTT
WSRHHHY

TAGTGGTGG

SEQ ID
ST
P
YP
IS
NIL

TGTCTGCTC

NO:
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

CACC

1066)
NO:
NO:
NO:
NO:
NO:

(SEQ ID

1067)
1068)
1069)
1070)
1071)

NO: 1065)

Rhesus

D2S3
AGCACACTG
[7]
WLLL
AHCSDSGC
HTVVIVAA
EEQPLSLQ
GGAATITT
RSSHYHYS

TAGTGATAG

(SEQ ID
SS
P
CA
VC
VL

TGGCTGCTC

NO:
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

CTCC

1073)
NO: 856)
NO:
NO:
NO:
NO:

(SEQ ID

1074)
1075)
1076)
1077)

NO: 1072)

Rhesus

D2S2
AGCATATTG
[7]
SILLWWCL
AYCCGGVC
HIVVVVSA
QTPPQQYA
GVADTTTT
CSRHHHNN

TTGTGGTGG

LH
YT
T
(SEQ ID
IC
ML

TGTCTGCTA

(SEQ ID
(SEQ ID
(SEQ ID
NO:
(SEQ ID
(SEQ ID

CACC

NO:
NO:
NO:
1082)
NO:
NO:

(SEQ ID

1079)
1080)
1081)

1083)
1084)

NO: 1078)

Rhesus

D2S1
AGGATATTG
[7]
WWCLLR
GYCSGGVC
DIVVVVSA
QTPPLQYP
GVADTTTT
RSRHHHYN

TAGTGGTGG

(SEQ ID
YA
T
(SEQ ID
IS
IL

TGTCTGCTA

NO:
(SEQ ID
(SEQ ID
NO:
(SEQ ID
(SEQ ID

CGCC

1086)
NO:
NO:
1089)
NO:
NO:

(SEQ ID

1087)
1088)

1090)
1091)

NO:1085)

Rhesus

D1S6
GGTATAACT
[7]
GITGTT
LEL
YNWNY
SSSYT
FQLY
VVPVIP

GGAACTAC

(SEQ ID

(SEQ ID
(SEQ ID
(SEQID
(SEQ ID

(SEQ ID

NO:

NO:
NO:
NO:
NO:

NO: 1092)

1093)

1094)
1095)
1096)
1097)

Rhesus

D1S5
GGTATAGCT
[7]
GIAGTT
LER
YSWND
RSSYT
SFQLY
VVPAIP

GGAACGAC

(SEQ ID

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO:

NO:
NO:
NO:
NO:

NO: 1098)

1099)

1100)
1101)
1102)
1103)

Rhesus

D1S4
GGTACAGCT
[7]
GTAGT
VQLEL
YSWNY
SSSCT
FQLY
IVPAVP

GGAACTAT

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:
NO:
NO:
NO:
NO:

NO: 1104)

1105)
1106)
1107)
1108)
1109)
1110)

Rhesus

D1S3
GGTATAACT
[7]
GITGTT
LER
YNWND
RSSYT
SFQLY
VVPVIP

GGAACGAC

(SEQ ID

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO:

NO:
NO:
NO:
NO:

NO: 1111)

1093)

1112)
1101)
1102)
1097)

Rhesus

D1S2
GGAACACCT
[7]
GTPGTT
EHLER
NTWND
GRSRCS
SFQVF
VVPGVP

GGAACGACC

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:
NO:
NO:
NO:

NO: 1113)

1114)
1115)
1116)
1117)
1118)

Rhesus

D1S1
GATATAGCT
[7]
DIAGTT
LEQ
YSWNN
CSSYI
LFQLY
VVPAIS

GGAACAAC

(SEQ ID

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO:

NO:
NO:
NO:
NO:

NO: 1119)

1120)

1121)
1122)
1123)
1124)

Rabbit
D1
TAGCTACGA
[8]
LR
SYDDYGDY
ATMTMVI
SP
VITIVIVA
NHHSHRS

TGACTATGG

(SEQ ID
(SEQ ID

(SEQ ID
(SEQ ID

TGATTAC

NO:
NO:

NO:
NO:

(SEQ ID

1126)
1127)

1128)
1129)

NO: 1125)

Rabbit
D2a
GTTACTATA
[8]
VTILMVML
LLYLWLCW
YYTYGYAG
HKHNQHNH
PA
GSISITSI

CTTATGGTT

VMLML
LCLCY
YAYAT
KYSN

TISIVT

ATGCTGGTT

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

ATGCTTATG

NO:
NO:
NO:
NO:

NO:

CTACC

1131)
1132)
1133)
1134)

1135)

(SEQ ID

NO: 1130)

Rabbit
D2b
GTTATGCTG
[8]
VMLVMLVM
LCWLCWL
YAGYAGYG
HNHNQHN
VA
GSITITSI

GTTATGCTG

VMLP
WLCY
YAT
QHN

TSIT

GTTATGGTT

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

ATGCTACC

NO:
NO:
NO:
NO:

NO:

(SEQ ID

1137)
1138)
1139)
1140)

1141)

NO: 1136)

Rabbit
D3
GCATATGCT
[8]
AYASSSGY
HMLVVVVI
WLLY
YIITTTTS
VYNNHYY
PLLLAYA

AGTAGTAGT

YI (SEQ
IY
(SEQ ID
IC
(SEQ ID
(SEQ ID

GGTTATTAT

ID NO:
(SEQ ID
NO:
(SEQ ID
NO:
NO:

ATAC

1143)
NO:
1145)
NO:
1147)
1148)

(SEQ ID

1144)

1146)

NO: 1142)

Rabbit
D4
GTTACTATA
[8]
VTIVVAGV
WLG
YYSSGWG
HPSHYYSN
PQPLL
TPATTIVT

GTAGTGGCT

(SEQ ID

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

GGGGTG

NO:

NO:
NO:
NO:
NO:

(SEQ ID

1150)

1151)
1152)
1153)
1154)

NO: 1149)

Rabbit
D5
GTTATGCTG
[8]
VMLVVVII
LLY
YAGSSYYT
YNNYYQHN
LLPA
GIITTTSI

GTAGTAGTT

(SEQ ID

(SEQ ID
(SEQ ID
(SEQ ID
T

ATTATACC

NO:

NO:
NO:
NO:
(SEQ ID

(SEQ ID

1156)

1157)
1158)
1159)
NO:

NO: 1155)

1160)

Rabbit
D6
GTTATGCTG
[8]
VMLVVAGM
LCW
YAGSSWD
HPSYYQHN
SQLLPA
IPATTSIT

GTAGTAGCT

(SEQ ID

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

GGGATG

NO:

NO:
NO:
NO:
NO:

(SEQ ID

1162)

1163)
1164)
1165)
1166)

NO: 1161)

Rabbit
D7
ACTATGGTG
[8]
TMVI
LW
YGDY
NHHS
SP
VITIV

ATTAC

(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO:

NO:
NO:

NO:

NO: 1167)

1168)

1169)
1170)

1171)

Rat
D1
TAAACTACA
[9]
TTIC
KLQSA
NYNLP
ADCSL
GRL
WQIVV

ATCTGCCA

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO:
NO:
NO:
NO:

NO:

NO: 1172)

1173)
1174)
1175)
1176)

1177)

Rat
D2
GGTATAATT
[9]
GIIRGT
FGV
YNSGY
TPNYT
YPELY
VPRIIP

CGGGGTAC

(SEQ ID

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO:

NO:
NO:
NO:
NO:

NO: 1178)

1179)

1180)
1181)
1182)
1183)

Rat
D3
GGTATAATT
[9]
GIIRG
FGV
YNSG
TPNYT
YPELY
LPRIP

CGGGGTAA

(SEQ ID

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO:

NO:
NO:
NO:
NO:

NO: 1184)

1185)

1186)
1181)
1182)
1187)

Rat
D4
TTATAGATT
[9]
INPK
YRLILK
ID
FRINL
SI
LGLIY

AATCCTAAA

(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

G (SEQ ID

NO:
NO:

NO:

NO:

NO: 1188)

1189)
1190)

1191)

1192)

Rat
D5
TACATACTA
[9]
YILWV
TYYGYNY
HTMGIT
LYP
VVIPIVC
SYTHSM

TGGGTATAA

(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID
(SEQ ID

CTAC

NO:
NO:
NO:

NO:
NO:

(SEQ ID

1194)
1195)
1196)

1197)
1198)

NO: 1193)

Rat
D6
TTTATAACA
[9]
FITTT
QL
YNNY
SCYK
LL
VVVI

ACTAC

(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

(SEQ ID

NO:

NO:
NO:

NO:

NO: 1199)

1200)

1201)
1202)

1203)

Rat
D7
TCCTCAGGT
[9]
SSGESCVW
PQVSPVSG
VLCL
PDTGLT
PRHRTHLR
QTQDSPE

GAGTCCTGT

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

GTCTGGG

NO:
NO:
NO:
NO:
NO:
NO:

(SEQ ID

1205)
1206)
1207)
1208)
1209)
1210)

NO: 1204)

Rat
D8
GGATATCTA
[9]
GYL
DI
IS
IS
LDI
RYP

G (SEQ ID

NO: 1211)

Rat
D9
TTAACTACG
[9]
LTTEGIV
LRRV
NYGGYSE
HYTLRS
SLYPP
LTIPSVV

GAGGGTATA

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

GTGAG

NO:
NO:
NO:
NO:
NO:
NO:

(SEQ ID

1213)
1214)
1215)
1216)
1217)
1218)

NO: 1212)

Rat
D10
TTTTTAACT
[9]
FLTTVAT
LQ
FNYSSY
SYCS
LK
VATVVK

ACAGTAGCT

(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

AC

NO:

NO:
NO:

NO:

(SEQ ID

1220)

1221)
1222)

1223)

NO: 1219)

Rat
D11
TTTATTACT
[9]
FITMMVVI
LLL
YYYDGSYY
SNNYHHSN
LPS
VVITTIIV

ATGATGGTA

TT

Y
K

I

GTTATTACT

(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

AC

NO:

NO:
NO:

NO:

(SEQ ID

1225)

1226)
1227)

1228)

NO: 1224)

Rat
D12
GGATACCTA
[9]
GYL
DTY
P
VS
IGI
RYP

T (SEQ ID

NO: 1229)

Rat
D13
TTCATACTA
[9]
FILWV
SYYGYDY
HTMGMT
SYP
VVIPIV
SHTHSM

TGGGTATG

(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID
(SEQ ID

ACTAC

NO:
NO:
NO:

NO:
NO:

(SEQ ID

1231)
1232)
1233)

1234)
1235)

NO: 1230)

Rat
D14
TTTATTACT
[9]
FITMMVII
WLLS
YYYDGYYH
DNNHHSNK
PS
VIITIIVI

ATGATGGTT

T
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

ATTATCAC

(SEQ ID
NO:
NO:
NO:

NO:

(SEQ ID

NO:
1238)
1239)
1240)

1241)

NO: 1236)

1237)

Rat
D15
CTAACTGGG
[9]
LTG
LG
NWE
PS
SQL
LPV

AG

(SEQ ID

NO: 1242)

Rat
D16
TTTATGTAT
[9]
FMYTTDYY
LCILRIIT
YVYYGLLL
VVIIRSIH
SNNP
SVVYI

ACTACGGAT

Y
(SEQ ID
(SEQ ID
K
(SEQ ID
(SEQ ID

TATTACTAC

(SEQ ID
NO:
NO:
(SEQ ID
NO:
NO:

(SEQ ID

NO:
1245)
1246)
NO:
1248)
1249)

NO: 1243)

1244)

1247)

Catfish
DH1
GTTATAGCA
[10]
VIAAGV
QLG
YSSWG
YPSCYN
LPQLL
TPAAIT

GCTGGGGTA

(SEQ ID

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

G (SEQ ID

NO:

NO:
NO:
NO:
NO:

NO: 1250)

1251)

1252)
1253)
1254)
1255)

Catfish
DH2
CAATATAGC
[10]
QYSG
NIAG
R
PAIL
TRYI
PLY

GGGT

(SEQ ID
(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:

NO:
NO:

NO: 1256)

1257)
1258)

1259)
1260)

Catfish
DH3
ATAACTACG
[10]
ITTA
LR
NYG
RSY
P
AVV

GC

(SEQ ID

(SEQ ID

NO:

NO: 1261)

1262)

Catfish
AF06813
TCGCGTGGC
[11]
SRGQ
RVA
AWP
LATR
GHA
WPR

7
CAA

(SEQ ID

(SEQ ID

(SEQ ID

NO:

NO:

NO: 1263)

1264)

1265)

Atlan-
core1
ATACAACT
[12]
IQLGWG
YNWAG
TTGLG
PAQLY
PSPVV
PQPSC

tic cod

GGGCTGGG

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

G (SEQ ID

NO:
NO:
NO:
NO:
NO:
NO:

NO: 1266)

1267)
1268)
1269)
1270)
1271)
1272)

Atlan-
core2a
ATACAGTGG
[12]
IQWGD
YSGGI
TVGG
IPPLY
DPPTV
SPHC

tic cod

GGGGATC

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:
NO:
NO:
NO:
NO:

NO: 1273)

1274)
1275)
1276)
1277)
1278)
1279)

Atlan-
core2b
ATACAGTGG
[12]
IQWG
YSG
TVG
TPLY
PTV
PHC

tic cod

GGT

(SEQ ID

(SEQ ID

(SEQ ID

NO:

NO:

NO: 1280)

1281)

1282)

Atlan-
core4
ATACAGGGG
[12]
IQGG
YRG
TGG
PLY
PPV
PPC

tic cod

GG

(SEQ ID

(SEQ ID

NO:

NO: 1283)

1284)

Atlan-
core5a
ATACGGGG
[12]
IRGD
YGGI
TGG
IPPY
DPPV
SPR

tic cod

GGATC

(SEQ ID
(SEQ ID

(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:

NO:
NO:

NO: 1285)

1286)
1287)

1288)
1289)

Chim-
Chimp_6
CTAACTGGG
[13]
LTG
LG
NWG

panzee
224
GA (

SEQ ID

NO: 1290)

Chim-
Chimp_1
TGACTACAG
[13]
LQ
DYSNY
TTVT

panzee
0468
TAACTAC

(SEQ ID

(SEQ ID

NO:

NO: 1291)

1292)

Chim-
Chimp_1
TGACTACGG
[13]
LR
DYGDY
TTVT

panzee
0580
TGACTAC

(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:

NO: 1293)

1294)
1295)

Chim-
Chimp_3
TGACTACGG
[13]
LR
DYGDY
TTVT

panzee
0856
TGACTAC

(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:

NO: 1296)

1294)
1295)

Chim-
Chimp_1
GGTATAACT
[13]
GITGS
LDR
YNWID

panzee
73
GGATCGAT

(SEQ ID

(SEQ ID

(SEQ ID

NO:

NO:

NO: 1297)

1298)

1299)

Chim-
Chimp_4
GGTATAACT
[13]
GITGTT
LEL
YNWNY

panzee
74
GGAACTAC

(SEQ ID

(SEQ ID

(SEQ ID

NO:

NO:

NO: 1300)

1301)

1302)

Chim-
Chimp_1
GAATATCTA
[13]
EYL
NI
S

panzee
395

Chim-
Chimp_1
GAATACCCC
[13]
EYP
NT
IP

panzee
484

Chim-
Chimp_5
GGTATAACT
[13]
GITGTT
LER
YNWND

panzee
696
GGAACGAC

(SEQ ID

(SEQ ID

(SEQ ID

NO:

NO:

NO: 1303)

1304)

1386)

Chim-
Chimp_4
GGGTATAGC
[13]
GYSSGWY
GIAVAG
QWLV

panzee
29
AGTGGCTGG

(SEQ ID
(SEQ ID
(SEQ ID

TAC

NO:
NO:
NO:

(SEQ ID

1306)
1307)
1308)

NO: 1305)

Chim-
Chimp_1
GGGTATAGC
[13]
GYSGSWY
GIAAAG
RQLV

panzee
045
GGCAGCTG

(SEQ ID
(SEQ ID
(SEQ ID

GTAC

NO:
NO:
NO:

(SEQ ID

1310)
1385)
1311)

NO: 1309)

Chim-
Chimp_4
CCATGGGTG
[13]
PWV
HGCSGY
MGVVA

panzee
178
TAGTGGCTA

(SEQ ID
(SEQ ID

C (SEQ ID

NO:
NO:

NO: 1312)

1313)
1314)

Chim-
Chimp_8
TGACTACGG
[13]
LR
DYGNY
TTVT

panzee
658
TAACTAC

(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:

NO: 1315)

1026)
1295)

Chim-
Chimp_1
TGACTACGG
[13]
LR
DYGNY
TTVT

panzee
1102
TAACTAC

(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:

NO: 1316)

1026)
1295)

Chim-
Chimp_2
AGCATATTG
[13]
SILWW
AYCGGDCY
HIVVVTAM

panzee
093
TGGTGGTGA

(SEQ ID
A
(SEQ ID

CTGCTATGC

NO:
(SEQ ID
NO:

C (SEQ ID

1318)
NO:
1320)

NO: 1317)

1319)

Chim-
Chimp_4
GGTGTAGTG
[13]
GVVAT
WŁ
CSGY

panzee
876
GCTAC

(SEQ ID

(SEQ ID

(SEQ ID

NO:

NO:

NO: 1321)

1322)

1323)

Chim-
Chimp_1
GATATGGTG
[13]
DMVAT
IWWL
YGGY

panzee
0664
GCTAC

(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:
NO:

NO: 1324)

1325)
1326)
1327)

Chim-
Chimp_1
GCCTGAGAT
[13]
DPQDAA
PEIPRTQH
LRSPGRS

panzee
1497
CCCCAGGAC

(SEQ ID
(SEQ ID
(SEQ ID

GCAGCAC

NO:
NO:
NO:

(SEQ ID

1329)
1330)
1331)

NO: 1328)

Chim-
Chimp_2
GGCGTGTGA
[13]
GV
ACE
RVR

panzee
5802
GAG

(SEQ ID

NO: 1332)

Chim-
Chimp_5
GTGGATATA
[13]
VDIVATIT
WLRL
GYSGYDY

panzee
740
GTGGCTACG

(SEQ ID
(SEQ ID
(SEQ ID

ATTAC

NO:
NO:
NO:

(SEQ ID

1334)
1335)
1336)

NO: 1333)

Chim-
Chimp_7
GTGGATATA
[13]
VDIVATIT
WLRL
GYSGYDY

panzee
586
GTGGCTACG

(SEQ ID
(SEQ ID
(SEQ ID

ATTAC

NO:
NO:
NO:

(SEQ ID

1334)
1335)
1336)

NO: 1337)

Chim-
Chimp_9
GTGGATATA
[13]
VDIVATIT
WLRL
GYSGYDY

panzee
253
GTGGCTACG

(SEQ ID
(SEQ ID
(SEQ ID

ATTAC

NO:
NO:
NO:

(SEQ ID

1334)
1335)
1336)

NO: 1338)

Chim-
Chimp_9
GTGGATACA
[13]
VDTATIT
WIQLRL
GYSYDY

panzee
731
GCTACGATT

(SEQ ID
(SEQ ID
(SEQ ID

AC

NO:
NO:
NO:

(SEQ ID

1340)
1341)
1342)

NO: 1339)

Chim-
Chimp_1
GTGGATATA
[13]
VDIVATIT
WLRL
GYSGYDY

panzee
4017
GTGGCTACG

(SEQ ID
(SEQ ID
(SEQ ID

ATTAC

NO:
NO:
NO:

(SEQ ID

1334)
1335)
1336)

NO: 1343)

Chim-
Chimp_8
GTGGAGATG
[13]
VEMATIT
WRWLQL
GDGYNY

panzee
4128
GCTACAATT

(SEQ ID
SEQ ID
(SEQ ID

AC

NO:
NO:
NO:

(SEQ ID

1345)
1346)
1347)

NO: 1344)

Chim-
Chimp_2
GACCGCCAC
[13]
DRH
TAT
PP

panzee
3293
A (SEQ ID

NO: 1348)

Chim-
Chimp_1
ATAGTGGTG
[13]
IVVVS
WWC
SGGV

panzee
702
GTGTC

(SEQ ID
(SEQ ID
(SEQ ID

(SEQ ID

NO:
NO:
NO:

NO: 1349)

1350)
1351)
1352)

Chim-
Chimp_6
AGAATAGCT
[13]
RIAGSKTL
LGPKLSW
NSWVQNSP

panzee
38
GGGTCCAAA

LA
(SEQ ID
G

ACTCTCCTG

(SEQ ID
NO:
(SEQ ID

GC

NO:
1355)
NO:

(SEQ ID

1354)

1356)

NO: 1353)

Chim-
Chimp_1
AGAATAGCT
[13]
RIAGSKTL
LGPKLSW
NSWVQNSP

panzee
760
GGGTCCAAA

LA
(SEQ ID
G

ACTCTCCTG

(SEQ ID
NO:
(SEQ ID

GC

NO:
1355)
NO:

(SEQ ID

1354)

1356)

NO: 1357)

Chim-
Chimp_6
ATCTTTTGA
[13]
KFALC
SFESLPCA
LLKVCPV

panzee
453
AAGTTTGCC

(SEQ ID
(SEQ ID
(SEQ ID

CTGTGCC

NO:
NO:
NO:

(SEQ ID

1359)
1360)
1361)

NO: 1358)

Chim-
Chimp_8
TTAGGATTT
[13]
LRPQ
DFD
RILIEAT

panzee
535
TGATTGAGG

(SEQ ID

(SEQ ID

CCACAG

NO:

NO:

(SEQ ID

1363)

1364)

NO: 1362)

Chim-
Chimp_3
GCAGGCTG
[13]
AGCGEGPG
QAAGKDQG
RLRGRTR

panzee
0042
CGGGGAAG

(SEQ ID
(SEQ ID
(SEQ ID

GACCAGGG

NO:
NO:
NO:

A (SEQ ID

1366)
1367)
1368)

NO: 1365)

Chim-
Chimp_4
GTGGTGTC
[13]
VVS
WC
GV

panzee
4108

Camel
camD4
ACTATAGCG
[14]
TIATM
RL
YSDY

ACTATG

(SEQ ID

(SEQ ID

(SEQ ID

NO:

NO:

NO: 1369)

1370)

1371)

Llama
n/a
CTAACTGGA
[15]
LTGA
LEP
NWS

GCCA (SEQ

(SEQ ID

ID

NO:

NO: 1372)

1373)

Cow
DH1
ATGATACGA
[16]
MIR
YDRCGCSY
DTIGVVVV

TAGGTGTGG

[stop]V
CSVA
IVVLL

TTGTAGTTA

top]CCY
(SEQ ID
(SEQ ID

TTGTAGTGT

NO:
NO:

TGCTAC

1375)
1376)

(SEQ ID

NO: 1374)

Cow
DH2
GTAGTTGTC
[16]
VVVLMVIV
LS[stop]
SCPDGYSY

CTGATGGTT

MVMVVVMV
WL[stop]
GYGCGYGY

ATAGTTATG

MVVVVMIV
LWLWLWLW
GCSGYDCY

GTTATGGTT

MVMVVMVV
LWL
GYGGYGG

GTGGTTATG

MVVMVIVV
[stop]WL
YGGYGYSS

GTTATGGTT

IVIVILTN
[stop]
YSYSYTYE

GTAGTGGTT

I
LLWLWWL
Y

ATGATTGTT

(SEQ ID
WWLWWLW
(SEQ ID

ATGGTTATG

NO:
L[stop]
NO:

GTGGTTATG

1378)
[stop]
1380)

GTGGTTATG

L[stop]

GTGGTTATG

L[stop]

GTTATAGTA

LYLRI

GTTATAGTT

(SEQ ID

ATAGTTATA

NO:

CTTACGAAT

1379)

ATA

(SEQ ID

NO: 1377)

Cow
DH3
GTAGTTGTT
[16]
VVVIVVMV
LL[stop]
SCYSGYGY

ATAGTGGTT

MVVVMVMV
WL
GCGYGYGY

ATGGTTATG

MII
WLWLWLW
DY

GTTGTGGTT

(SEQ ID
LWL
(SEQ ID

ATGGTTATG

NO:
[stop]LY
NO:

GTTATGATT

1382)
(SEQ ID
1384)

ATAC (SEQ

NO:

ID

1383)

NO: 1381)

Each of the following references are incorporated by reference in their entirety:

[1] Ye, Immunogenetics, 2004, 56: 399;

[2] Shimizu and Yamagishi, EMBO J, 1992, 11: 4869;

[3] Kurosawa et al., Nature, 1981, 290: 565;

[4] Dirkes et al., Immunogenetics, 1994, 40: 379;

[5] Gerondakis et al. Immunogenetics, 1988, 28: 255;

[6] Gu et al., Cell, 1991, 65: 47;

[7] Link et al., Immunogenetics, 2002, 54: 240;

[8] Friedman et al., J. Immunol., 1994, 152: 632;

[9] GI code: 62651567; reverse strand 33906161-33793435;

[10] Hayman et al., J. Immunol., 2000, 164: 1916;

[11] Ghaffari and Lobb, J. Immunol. 1999, 162: 1519;

[12] Solem and Stenvik, Dev. Comp. Immunol., 2006, 30: 57;

[13] GI code: 114655167; reverse strand 203704-97555;

[14] Nguyen et al., EMBO J, 2000, 19: 921;

[15] GI code: 13345163;

[16] Shojaci et al., Mol. Immunol., 2003, 40: 61.

EQUIVALENTS

Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments and methods described herein. Such equivalents are intended to be encompassed by the scope of the following claims.

APPENDIX A

GI Numbers of Kappa Light Chains

Used to Derive the VK Libraries

23868
2385488
16923194
58222611
70798854
98956311

32779
2385490
16923202
58222613
70798856
98956323

32810
2385492
16923208
58222615
70798858
98956325

33059
2385494
17226623
58222617
70798860
98956327

33144
2385495
17226631
58222619
70798862
98956337

33156
2385497
17226635
58222621
70798866
98956341

33170
2597932
17226639
58222623
70798868
98956343

33173
2597935
17226643
58222625
70798872
98956349

33183
2597937
17226645
58222627
70798874
98956355

33185
2597943
17226655
58222629
70798878
98956357

33189
2597946
17381491
58222631
70798880
98956365

33191
2597948
17385013
58222633
70798882
98956375

33195
2597950
17385015
58222635
70798884
98956379

33200
2597952
17385017
58222637
70798886
98956381

33202
2599531
17385019
58222639
70798888
98956383

33221
2599533
17385021
58222641
70798890
98956400

33227
2599535
17483729
58222643
70798892
98956404

33230
2599545
18025561
58222645
70798894
98956406

33233
2625059
18025563
58222647
70798896
98956414

33237
2632152
18025573
58222649
70798898
98956418

33268
2654047
18025575
58222651
70798900
98956422

33288
2654051
18025577
58222653
70798902
98956426

33290
2654055
18025579
58222655
70798904
98956428

33294
2773084
18025581
58222657
70798906
98956430

33296
2920359
18025583
58222659
70798914
98956432

33298
2995674
18025585
58222661
70798916
98956436

33300
2995676
18025587
58222663
70798918
98956440

33302
2995678
18025589
58222665
70798920
99022977

33304
2995680
18025591
58222667
70798922
99022979

33324
2995682
18025593
58222669
70798926
99022981

33330
2995688
18025595
58222671
70798928
99022983

33415
2995690
18025597
58222673
70798930
99022985

33416
3023134
18025599
58222675
70798934
99022987

33417
3023136
18025603
58222677
70798936
99022989

33418
3023138
18025605
58222679
70798940
99022991

33421
3023140
18025607
58222681
70798942
99022993

33422
3023142
18025611
58222683
70798946
99022995

33423
3023144
18025613
58222685
70798948
99022997

33424
3023146
18025617
58222687
70798950
99022999

33426
3023148
18025621
58222689
70798952
99023002

33647
3251385
18025623
58222691
70798954
99023004

33649
3251387
18025627
58222693
70798956
99023006

33655
3251389
18025629
58222695
71058688
99023008

33657
3251391
18025635
58222697
71058704
99023010

33659
3251744
18025639
58222699
71058712
99023012

33665
3251749
18025641
58222701
71058717
99023474

33669
3251983
18025645
58222703
71058719
99023476

33679
3251985
18025651
58222705
71058721
99023478

33683
3288824
18025653
58222707
71058723
99023480

33685
3378165
18025655
58222709
71058725
99023482

33756
3378177
18025657
58222711
71058727
99023484

34022
3378183
18025659
58222713
71058729
99025082

36657
3451194
18025661
58222715
71058731
99025083

37860
3603382
18025665
58222717
71482591
99025084

37909
3603384
18025667
58222719
71482622
99025903

38361
3603386
18025669
58222721
71482624
99025916

38362
3603388
18025677
58222723
71482634
99026398

38363
3603390
18025679
58222725
71482636
99026399

38367
3603392
18025681
58222727
71482638
99026416

38436
3603394
18025683
58222729
71482640
99026418

38438
3603396
18025685
58222731
71482642
109240611

38439
3641303
18025687
58222733
71482644
109240615

38440
3641307
18025689
58222735
71482646
109240619

38441
3644015
18025693
58222737
71482648
109240627

38442
3644021
18025697
58222739
71482650
109240631

38448
3746530
18025701
58222741
71482652
109240635

38485
3747011
18025705
58222743
71482654
109240637

38487
3747015
18025709
58222745
71792302
109240641

38489
3821085
18025715
58222747
71792306
109240643

38491
3821088
18025717
58222749
71792308
109240647

38493
3901025
18025719
58222751
73532341
109240655

38495
3928173
18092607
58222753
75707120
109240657

38497
3928181
18092609
58222755
75707124
109240661

38499
3928185
18092611
58222757
75707126
109240665

38501
3928189
18092613
58222759
75707128
109240669

38503
3928210
18092615
58222761
75707130
109240671

38505
3928211
18092617
58222763
75707132
109240675

178678
3928212
18092619
58222765
75707134
109240679

182338
3928214
18092621
58222767
75707138
109240687

182340
3928215
18092623
58222769
75707140
109240691

182342
3928219
18307263
58222771
75707148
109240695

182344
3928220
18307265
58222773
75707154
109240701

182346
3928222
18307267
58222775
75707156
109240705

182348
3928223
18307269
58222777
75707158
109240709

183962
3928224
18307271
58222779
75707160
109240713

183968
3928225
18307273
58222781
75707162
109240717

183972
3928227
18307275
58222783
75707168
109240721

185375
3928231
18307277
58222785
75707170
109240723

185377
3928232
18307279
58222787
75707172
109240729

185379
3928233
18307281
58222789
75707174
109240733

185381
3928234
18307283
58222791
75707176
109240737

185383
3928235
18307285
58222793
75707180
109240741

185385
3928236
18307289
58222795
75707188
109240745

185387
3928237
18307291
58222797
75707194
109240760

185389
3928238
18307293
58222799
75707196
109240764

185391
3928239
18626727
58222801
75707198
109240766

185393
3928240
18626728
58222803
75707204
109240770

185395
3928243
18626729
58222805
75707206
109241210

185397
3928244
18626730
58222807
75707208
109241212

185399
3928245
18632678
58222809
75707210
109241214

185401
3928248
18698406
58222811
75707220
109241216

185403
3928250
19170347
58222813
75707222
109241218

185415
3928251
19701578
58222815
75707226
109241220

185417
3928252
19744467
58222817
75707228
109241450

185419
3928253
19744471
58222819
75707230
109241549

185423
3928254
19744475
58222821
75707232
109241551

185427
3928257
19744479
58222823
75707234
109242373

185811
3928258
19744487
58222825
75707236
109242377

185813
3928259
19744491
58222828
75707238
109242379

185815
3928260
19744495
58222830
75707240
109242381

185816
3928261
19744499
58222832
75707242
109242383

185827
3928263
19744503
58222834
75707244
109242385

185829
3928264
19744507
58222836
75707246
109242387

185831
3928265
19744511
58222838
75707248
109242389

185833
3928266
19744515
58222840
75707250
109242395

185835
3928267
19744519
58222843
75707262
109242399

185837
3928276
19744523
58222845
75707264
109242401

185839
3928277
19744527
58222847
75707268
109242403

185841
3928278
19744531
58222849
75707270
109242409

185845
3928279
19744535
58222851
75707272
109242411

185847
3928280
19744539
58222853
75707274
109242417

185849
3928283
19744543
58222855
75707276
109242419

185855
3928287
19744547
58222857
75707278
109242421

185859
3928288
19744551
58222859
75707282
109242423

185862
3928289
19744555
58222861
75707284
109242425

185866
3928290
19744559
58222863
75707292
109242427

185868
3928291
19744563
58222865
75707298
109245190

185870
3928293
19744567
58222867
75707300
109245192

185872
3928294
19744571
58222869
75707302
109245194

185874
3928295
19744575
58222871
75707304
109693080

185880
3928296
19744579
58222873
75707306
109693082

185882
3928297
19744583
58222875
75707316
109693084

185884
3928298
19744587
58222877
75707318
109693094

185886
3928299
20372497
58222879
75707322
109693096

185888
3928301
20372499
58222881
75707324
109693100

185890
3928302
20372501
58222883
75707334
109693102

185892
3928303
20372503
58222885
75707338
109693110

185894
3928304
20372505
58222887
75707340
109693112

185896
3928308
20372507
58222889
75707362
109693114

185898
3928309
20372509
58222891
75707368
109693116

185904
3928310
20372511
58222893
75707370
109693118

185906
3928312
20372513
58222895
75707372
109693120

185908
3928315
20372515
58222897
75707374
109693135

185910
3928316
20372517
58222899
75707378
109693137

185912
3928317
20372519
58222901
75707382
109693139

185920
3928318
20372521
58222903
75707384
109693144

185922
3928319
20372523
58222905
75707386
109693146

185928
3928320
20372525
58222907
75707398
109693148

185934
3928321
20372527
58222909
75707406
109693150

185950
3928323
20372529
58222911
75707408
109693152

185980
3928324
20387057
58222913
75707410
109693154

185984
3928325
20387059
58222915
75707412
109693157

185987
3928326
20387061
58222917
75707416
109693159

185988
3928327
21311286
58222919
75707418
109693165

186008
3928329
21311288
58222923
75707420
109693167

186015
3928330
21311294
58222925
75707422
109693169

186017
3928331
21311296
58222927
75707424
109693171

186019
3928332
21311318
58222929
75707426
109693177

186040
3928333
21311322
58222931
75707428
109693179

186041
3928334
21669062
58222933
75707430
109693181

186042
3928335
21669064
58222935
75707432
109693183

186047
3928336
21669066
58222937
75707434
109693187

186199
3928337
21669068
58222939
75707444
109693189

186266
3928338
21669070
58222941
75707446
109693201

254719
3928339
21669072
58222943
75707448
109693203

257550
3928340
21669074
58222945
75707454
109693206

261239
3928341
21669076
58222947
75707460
109693210

265236
3928342
21669078
58222949
75707462
109693216

265240
3928343
21669080
58222951
75707464
109693218

298552
3928344
21669082
58222953
75707472
109693220

298560
3928345
21669084
58222955
75707476
109693222

298827
3928346
21669086
58222957
75707500
109693228

298829
3928347
21669088
58222959
75707502
109693230

299955
3928348
21669090
58222961
75707504
109693232

306919
3928349
21669092
58222963
75707506
109693235

306957
3928350
21669094
58222965
75707508
109693237

306959
3928351
21669096
58222967
75707510
109693239

306961
3928352
21669098
58222969
75707514
109693241

306963
3928353
21669100
58222971
75707516
109693249

306965
3928354
21669102
58222973
75707518
109693253

306967
3928355
21669104
58222975
75707520
109693255

306971
3928356
21669106
58222977
75707522
109693261

306980
3928357
21669108
58222979
75707524
109693264

306982
3928358
21669110
58222981
75707526
109942421

306984
3928359
21669112
58222983
75707528
109942431

306986
3928360
21669114
58222985
75707530
110290934

306988
3928361
21669116
58222987
75707534
110610132

306990
3928362
21669118
58222989
75707536
110624509

306992
3928363
21669120
58222991
75707540
110657101

306994
3928364
21669122
58222993
75707542
110657103

306996
3928365
21669124
58222995
75707544
110657105

306998
3928366
21669126
58222997
75707546
110657107

307000
3928367
21669128
58222999
75707548
110657109

348203
3928368
21669130
58223001
75707550
110657111

348205
3928369
21669132
58223003
75707552
110657113

348207
3928370
21669134
58223005
75707586
110657115

348211
3928371
21669136
58223007
75707598
110657123

386052
3928372
21669138
58223009
75707600
110657124

396631
3928373
21669140
58223011
75707602
110657125

397787
3928374
21669142
58223013
75707604
110657158

397789
3928375
21669144
58223015
75707618
110657159

397791
3928376
21669146
58223017
76058957
110657160

397793
3928377
21669148
58223019
76252624
110657161

397795
3928378
21727250
58223021
76252626
110657162

398490
3928379
21998806
58223023
76252630
110657163

398491
3928380
21998808
58223025
76252632
110657164

398492
3928381
21998810
58223027
76252634
110657165

404110
3928382
21998812
58223029
76252636
110657166

404112
3928383
21998814
58223031
76252638
110657167

404114
3928384
21998816
58223033
76252640
110657168

408365
3928385
21998818
58223035
76252642
110657169

409042
3928386
21998820
58223037
76252644
110657170

414035
3928387
21998822
58223039
76252646
110657171

415651
3928388
21998824
58223041
76781673
110657172

415710
3928389
21998826
58223043
77378090
110657173

415955
3928390
21998830
58223045
77378092
110657174

415957
3928391
21998832
58223047
77378094
110657175

415959
3928392
22086572
58223049
77378096
110657176

415961
3928393
22086575
58223051
77378098
110657177

415963
3928394
22086581
58223053
77378100
110657178

415965
3928395
22086587
58223055
77378102
110657179

415967
3928396
22086593
58223057
77378105
110657180

415969
3928397
22091617
58223059
77378107
110657181

415971
3928398
22214019
58223061
77378109
110657182

416329
3928399
22214023
58223063
77378111
110657183

416331
3928400
22297542
58223065
77378135
110657184

416333
3928401
22556681
58223067
77378137
110657185

416335
3928402
22556683
58223069
77378139
110657186

416337
3928403
22556684
58223071
77378141
110657187

430845
3928404
22607990
58223073
77378143
110657188

431039
3928405
22620896
58223075
77378145
110657189

431040
3928406
22620899
58223077
77378147
110657230

431041
3928407
22640510
58223079
77378149
110657232

431042
3928408
22640512
58223081
77378151
110657234

431043
3928409
22640513
58223083
77378153
110657236

431044
3928410
22642789
58223085
77378155
110657238

431045
3928411
22642790
58223087
77378157
110657240

431046
3928412
22642791
58223089
77378159
110657242

431047
3928413
22642808
58223091
77378161
110657244

431048
3928414
22642809
58223093
77378163
110657246

431049
3928415
22642810
58223095
77378165
110657248

431051
3928416
22642811
58223097
77378167
110657250

431052
3928417
22643188
58223099
77378169
110657252

431053
3928418
22643190
58223101
77378172
110657254

431067
3928419
22643192
58223103
77378174
110657256

431069
3928420
22643196
58223105
77378176
110657258

431071
3928421
22647625
58223107
77378224
110657615

431073
3928422
22647633
58223109
77378225
110657617

431075
3928423
23194480
58223111
77378228
110657619

431077
3928424
23194500
58223113
77378230
110657621

431079
3928425
23225992
58223115
77378234
110657624

431081
3928426
23225994
58223117
77378236
110657676

431083
3928427
23225996
58223119
77378237
110657678

431085
3928428
23234613
58223121
77378239
110657728

431087
3928430
23320663
58223123
77378241
110657730

431089
3928431
23342423
58223125
77378245
110658341

433889
3928432
23343554
58223127
77378247
110660158

436562
3928433
24412754
58223129
77378249
110660166

440153
3928434
24412756
58223131
77378251
110660174

441312
3928435
24412758
58223133
77378253
112184495

441314
3928436
24474081
58223135
77378255
112184497

441316
3928437
24850297
58223137
77379405
112184499

441318
3928438
26985941
58223139
77379407
112184501

441320
3928439
27368974
58223141
77379409
112184503

441322
3928440
27368976
58223149
77379412
112184505

441324
3928441
27368978
58223151
77379414
112184507

441330
3928442
27368981
58223153
77379416
112184509

441332
3928443
27368983
58223155
77379418
112184511

441334
3928444
27368986
58223157
77379420
112184513

441336
4100379
27368991
58223159
77379422
112189154

441338
4100381
27368993
58223161
77379425
112191695

441342
4100383
27368997
58223163
77379427
112191699

441344
4103644
27368999
58223165
77379429
112703827

441346
4103662
27369001
58223167
77379431
112708249

441348
4103664
27369003
58223169
77379433
112708250

441350
4103666
27369007
58223171
77379435
112711584

441352
4103674
27369009
58223173
77379437
112712351

441354
4128063
27369011
58223175
77379439
112712352

441356
4139195
27818830
58223177
77379441
112712353

441358
4139197
27867541
58223179
77379443
112712354

441360
4139199
27873542
58223181
77379445
112712355

441364
4139201
27875080
58223183
77379447
112712356

441366
4323178
27875088
58223185
77379449
112712357

441368
4323182
27875191
58223187
77379457
112712358

441370
4323186
27875199
58223189
77379459
112712359

441372
4323194
28611056
58223191
77379461
112712360

441374
4323809
28848873
58223193
77379463
112712361

441376
4323811
28883544
58223195
77379477
112712362

441378
4323813
28883548
58223197
77379479
112712363

441380
4323821
28883550
58223199
77379481
112712364

441382
4323823
29650328
58223201
77379483
112712365

441384
4323825
29650334
58223203
77379485
112712366

441386
4323829
29650337
58223205
77379487
112712367

441388
4323831
29650339
58223207
77379489
112712368

441390
4323833
29725711
58223209
77379491
112712369

441392
4323839
29725713
58223211
77379493
112712370

441394
4323841
29725715
58223213
77379495
112712371

441396
4323845
29725717
58223215
77379497
112712372

441398
4323847
29725719
58223217
77379499
112712373

441400
4323849
29725721
58223219
77379501
112712374

441402
4323851
29725723
58223221
77379503
112712375

441408
4323853
29725725
58223223
77379505
112712376

441412
4323855
29725727
58223225
77379507
112712377

441414
4323857
29725729
58223227
77379509
112712378

441416
4323859
29725731
58223229
77379511
112712379

441418
4323861
29725733
58223231
77379513
112712380

441422
4323863
30026987
58223233
77379515
112712381

441424
4323865
30258344
58223235
77379517
112712382

441426
4323869
30258346
58223237
77379519
112712383

441428
4323871
30793253
58223239
77379521
112727205

441430
4323873
30793255
58223241
77379523
112727206

441432
4323875
30793257
58223243
77379525
112727207

441434
4323877
30793259
58223245
77379527
112727208

441436
4323881
30793261
58223247
77379529
112727209

441440
4323883
30793263
58223249
77379545
112727210

441444
4323885
30793265
58223251
77994607
112727211

441446
4323887
30793565
58223253
77994611
112727212

441448
4323889
30793567
58223255
77994615
112727213

452060
4323891
30793569
58223257
77994619
112727214

452061
4323893
30793571
58223259
78629976
112727215

452062
4323895
30793573
58223261
78629977
112727216

452063
4323897
30841928
58223263
78629978
112727217

459655
4323899
30841931
58223265
80750467
112727218

460858
4323901
30841933
58223267
80975580
114155738

472970
4323903
30841935
58223269
80975600
114155883

472971
4323905
30841939
58223271
80975604
114155884

472972
4323907
30841943
58223273
80975616
114156208

472973
4323909
30841945
58223275
80975618
114207907

472974
4323911
30841947
58223277
80975638
114385493

472975
4323913
31879463
58223279
80975642
114385505

472976
4323915
31879464
58223281
80975644
114385507

487826
4323923
31879467
58223283
81020146
114385509

487827
4323927
31879468
58223285
81020229
114385511

493148
4323929
31879471
58223287
81020258
114385513

493149
4323931
31879472
58223289
81239122
114385515

493150
4323933
33021483
58223291
81251581
114385517

496044
4323935
33044572
58223293
81251585
114385521

496046
4323937
33044573
58223295
82794837
114385537

496048
4323939
33044574
58223297
83410334
114385539

496050
4323941
33044582
58223299
83697271
114385541

496053
4323945
33044586
58223301
83959521
114385543

496055
4323947
33051527
58223303
83959523
114385545

496059
4323949
33051528
58223305
83959525
114385547

496061
4323951
33070272
58223307
83959937
114385549

496063
4323953
33070283
58223309
83959939
114385551

496065
4323955
33070284
58223311
83964685
114385553

496071
4323957
33083474
58223313
83964762
114385567

496073
4323959
33083476
58223315
83964764
114385569

506420
4323961
33083477
58223317
83964766
114385571

506424
4323963
33083478
58223319
83964768
114385573

510839
4323965
33083479
58223321
83966574
114385575

510841
4323983
33083480
58223323
83966576
114385579

510843
4323989
33083481
58223325
83966578
114385581

510845
4323993
33083482
58223327
83966655
114385583

514428
4323997
33083483
58760238
83966657
114385585

514429
4323999
33085842
59890568
83966659
114385587

514430
4324005
33235609
59890571
83966661
114385589

514431
4324007
33235611
59894819
83966663
114385591

514432
4324009
33235613
60392126
83966665
114385593

514433
4324011
33235615
60616327
83966667
114385595

514434
4324013
33235617
60616352
83970756
114385597

515780
4324019
33235619
60650119
83970763
114385599

516137
4378181
33235621
60650123
83970769
114385601

516187
4378183
33235623
60734312
83970772
114385603

516198
4378185
33235625
61697118
84659318
114385605

516213
4378187
33235627
61853816
84659320
114385607

516249
4378189
33235629
61970154
84660715
114385609

516265
4378191
33235631
61970158
84660717
114385611

516316
4378193
33235633
61970160
84660719
114385613

545722
4378195
33304656
61970164
84660720
114385615

557650
4378197
33304658
61970168
84660721
114385617

557651
4378199
33304661
61970172
84660722
114385619

560677
4378201
33304663
61970176
84660723
114385621

560678
4378203
33355480
61970180
84660725
114385623

560841
4378207
33868634
61970184
84797793
114385625

560843
4378209
33868636
61970192
84797795
114385627

575228
4378211
33868638
61970194
84797797
114385629

575236
4378213
33868640
61970198
84797799
114385631

575240
4378215
33868642
61970202
84797801
114385633

575257
4378217
33868644
61970206
84797803
114385635

575261
4378221
33868646
61970228
84797805
114385645

587143
4378223
37287525
62001845
84797807
114385647

587245
4378225
37605051
62120916
84797823
114385649

587323
4378227
37694620
62120917
84797825
114385651

587325
4378229
37694622
62120918
84797827
114385653

587327
4378233
37694624
62120919
84797857
114385655

587329
4378237
37694626
62120920
84797861
114385659

587331
4378239
37694628
62120921
84797883
114385661

587333
4378243
37694630
62120922
84797915
114385663

587335
4378245
37694632
62120923
84797929
114385665

587337
4378247
37694634
62120924
84797959
114385669

587341
4378249
37694636
62120925
84797961
114385671

587343
4378251
37694638
62120926
84797963
114385673

587345
4378253
37694640
62120927
84797979
114385675

587347
4378255
37694642
62120929
84797981
114385677

587349
4378259
37694644
62120931
84797985
114385679

587351
4378261
37694646
62120932
84798001
114385681

587353
4378265
37694648
62120933
84798003
114385683

598165
4378267
37694650
62120934
84798005
114385685

598167
4378269
37694654
62120935
84798007
114385687

598170
4378271
37694660
62120938
84798009
114385689

598172
4378273
37694662
62120939
84798011
114385691

601979
4378275
37694664
62120940
84798033
114385693

601982
4378279
37694666
62120941
84798035
114385699

601984
4378281
37694668
62120943
84798055
114385701

609002
4378283
37694670
62120944
84798057
114385703

609004
4378287
37694672
62120945
84798059
114385705

619259
4378291
37694674
62120946
84798061
114385707

623043
4378293
37694676
62120947
84798063
114385709

624874
4378295
37694678
62120948
84798103
114385711

632983
4378297
37694680
62120949
84798107
114385713

632985
4378299
37694682
62120950
84798115
114385715

632987
4378301
37694684
62120951
84798117
114385717

633227
4378303
37694686
62120952
84798147
114385719

642581
4378305
37694688
62120953
84798149
114385721

681896
4378307
37694690
62120954
84798167
114385723

681899
4378309
37694692
62120955
84798169
114385725

685029
4378313
37694694
62120956
84798171
114385727

693862
4378315
37694696
62120957
84798173
114385729

722413
4378317
37694698
62120958
84798175
114385731

722417
4378319
37694700
62120959
84798177
114385744

722419
4378323
37694702
62120960
84798179
114385746

722421
4378325
37694704
62120961
84798181
114385748

722423
4378327
37694706
62120962
84798183
114385750

722425
4378331
37694708
62199500
84798197
114385752

722427
4378333
37694710
62421462
84798199
114385756

722429
4378335
37702652
62421466
84798201
114385774

722431
4378337
37732215
62720427
84798203
114385776

722433
4378339
37780362
62720431
84798213
114385778

722435
4378341
39103877
62720436
84798215
114385780

722437
4378343
39103879
62720442
84798217
114385782

722439
4378345
39103881
62720444
84798219
114385804

722441
4378347
39103883
62720446
84798241
114385806

722443
4378349
39103885
62720452
84798249
114385808

722455
4378351
39103887
62720454
84798255
114385921

722461
4378353
40231616
62720473
84798257
115268711

722463
4378359
40288410
62720475
84798267
115268713

722465
4378361
40288412
62720477
84798269
115268880

722467
4378363
40288414
62720483
84798271
115268892

722469
4378365
40288416
62860940
84798273
115268894

722471
4378367
40288418
62860955
84798275
115268896

722473
4378369
40388582
62860957
84798277
115268898

722475
4378371
40388585
62860959
84798279
115268900

722477
4378373
40388592
62860961
84798295
115268902

722479
4378375
40388599
62860963
84798309
115268904

722483
4378377
40647131
62860965
84798321
115268906

722485
4378379
40784425
62860981
84798323
115270875

722487
4378383
40784429
62860983
84798325
115270877

722489
4378385
40795876
62860987
84798327
116543556

722493
4378387
42541061
62860989
84798343
116543560

722495
4378389
42541069
62860991
84798345
116543564

722497
4378391
42794782
62860994
84798347
116546686

722503
4378393
42794786
62860996
84798349
116546688

722505
4378395
44829186
62861000
84798351
116551153

722511
4378397
45111420
62861002
84798364
116551156

722513
4378399
45386482
62861004
84798366
116551162

722515
4378401
46016047
62861012
84798370
116551171

722521
4558868
46093898
62861015
84798372
116551175

722523
4680172
46093902
62861017
84798374
116551179

722525
4759539
46093906
62861019
84798377
116551183

722529
4759543
46093910
62861022
84798381
116551188

722531
4759547
46575858
62861024
84798383
116551192

722535
4759551
47078185
62861029
84798386
116551201

722537
4759555
47154907
62861031
84798388
116551207

722539
4759563
47154909
62861037
84798390
116551216

722541
4759567
47154911
62861041
84798397
116551226

722543
4759575
47154913
62861045
84798407
116551231

722545
4759579
47154915
62861054
85632219
116551235

722549
4759583
47154917
62868475
85642735
116551239

722553
4759587
47154919
62868477
85644222
116551244

722555
4759591
47154921
62868479
85644224
116551249

722557
4759595
47271269
62999493
85644226
116551258

722559
4759599
47271271
63102866
85644228
116551313

722561
4761194
47271273
63102872
85644230
116551317

722569
4761281
47271275
63102874
85644232
116551321

722571
4761283
47271277
63102876
85644600
116551325

722573
4837686
47271279
63102880
85644602
116551329

722581
4837688
47271281
63102882
85644604
116551333

722585
4837690
47271283
63102888
85650161
116551337

722587
4837692
47271285
63102892
85650163
116551341

722591
4837694
47271287
63102898
85650165
116551347

722593
4837696
47271289
63102900
85650167
116551351

722599
4837698
47271291
63102902
85650169
116551369

722601
5006350
47271295
63102904
85650171
116551373

722603
5006354
47271297
63102906
85650173
116551377

722605
5006356
47271299
63102908
85650175
116551381

722607
5006358
47271307
63102910
85650177
116551404

722609
5006360
47271309
63102912
85650179
116551413

722615
5019510
47271311
63102916
85650276
116551418

732737
5019512
47271313
63102920
85650278
116551422

732739
5019514
47271315
63102922
85650280
116551427

732741
5019522
47271317
63102924
85657010
116551431

732743
5019524
49073024
63102928
85658337
116551436

732745
5019526
49073036
63102938
85658632
116551446

732747
5019538
50199324
63102940
85660488
116551452

758588
5081714
50199334
63102942
85660492
116551772

758598
5081716
50831237
63102954
85660494
116551776

758600
5081718
50844518
63102962
85660497
116551780

762823
5081720
50844522
63102964
85660498
116551785

773589
5081722
50844526
63102966
85660502
116551790

790442
5102680
50844536
63102968
86439043
116553242

790450
5419682
50844540
63102970
86439047
116555276

790794
5419684
50844548
63102972
86439051
116555819

790802
5419700
50844552
63102974
86439053
116555821

790810
5419702
50871685
63102976
86439057
116555823

791015
5419704
50871687
63102980
86439061
116559889

791019
5419706
50898144
63102986
86439063
116560960

791023
5419708
50898148
63102988
86439071
116634471

791027
5419710
50898150
63102992
86439075
116634475

791031
5419712
50898152
63102994
86439081
116795086

791035
5419731
50898154
63102996
86439147
117576090

809552
5419738
50898158
63102998
86439151
118143176

809553
5419740
50898160
63103012
86439153
118143178

809554
5524134
50898162
63103014
87298995
118147088

845515
5524140
50898164
63103030
87298999
118147090

845517
5524142
50898170
63103032
87299001
118147092

845519
5524144
51103388
63103034
87299003
118147094

845521
5524146
51103390
63103040
87299007
118147096

845523
5524148
51103392
63103044
87299009
118147098

845525
5524150
51103394
63103046
87299011
118147100

845527
5566507
51103396
63103048
87299015
118147102

845529
5578779
51103398
63103054
88496317
118147104

845531
5578781
51103400
63103056
88496922
118147106

845533
5578783
51103402
63103070
90092372
118147108

845535
5578785
51103404
63103072
90092373
118147110

854111
5578787
51103406
63103076
90092374
118147112

871275
5578789
51103408
63103078
90092387
118147114

871819
5578791
51103410
63103086
90092910
118147116

871823
5578793
51103412
63103096
90092911
118147118

882261
5578795
51103414
63103098
90092912
118147120

882263
5578797
51103416
63103106
90092913
118147122

882265
5578799
51103418
63103108
90823178
118147125

882267
5578801
51103420
63103110
90823182
118147127

882269
5578803
51103422
63103112
90823186
118425771

882271
5578805
51103424
63103114
90823190
118425773

882273
5578807
51103522
63103116
90823196
118425775

882275
5578809
51103526
63103118
90823198
118490144

882277
5578811
51103528
63103120
90994745
118490148

882279
5578815
51103532
63103140
90994747
118490152

882281
5690395
51103534
63103142
90994751
118490156

882283
5690399
51103536
63103144
92115496
119359417

882285
5690403
51103538
63103146
92115497
119836694

882287
5709454
51103540
63103148
92130102
119836767

882289
5731228
51103542
63103150
92130103
119838997

882291
5731232
51103544
63103154
92131782
119839065

882293
5731236
51103546
63103156
92131783
119839355

882295
5731242
51103548
66096574
92131784
119839523

882297
5731252
51103550
66096603
92131785
119841342

882299
5921608
51103552
66096637
92133663
119841388

882301
5921610
51103554
66711101
92133665
119841425

882303
5921614
51103556
66711102
92137567
119841512

882305
5921618
51103558
66711103
92140334
121309186

882307
5921620
51103560
66711104
92140336
124042790

882309
5921622
51103562
66711105
92141530
124042792

882311
5921624
51103564
66711106
92155949
124042815

882313
5921626
51103566
66711107
92157443
126146964

882315
5921640
51103568
66711108
92157445
126146965

882317
6110569
51103570
66711109
92157453
126146966

882319
6179861
51851021
66711110
92157459
126147776

882321
6179863
51949938
66711111
92157461
126147812

882323
6179865
53988135
66711112
92158828
126147817

882325
6179867
53988137
66711114
92158980
126147952

882327
6179869
54034484
66711116
92161545
126147954

882329
6492198
54145422
66711117
92249233
126147956

882331
6492200
54145426
66711118
92298212
126152193

894090
6492202
54145440
66711119
92298539
126152196

904629
6492204
54781098
66711120
92315622
126633956

913352
6648587
54781100
66711123
92315624
126633957

929640
6649889
54781102
66711124
92315626
126633958

929642
6649895
54781104
66711125
92315628
134125852

944925
6708204
54781106
66711126
92332837
134125853

950049
7012704
54781108
66711128
92332841
134125854

973411
7012706
54781110
66711129
92348102
134128019

973415
7024356
54781112
66711130
92348670
134269772

999107
7160978
54781126
66711131
92349881
134273023

1020008
7673384
54781129
66711132
92360819
145850477

1020012
7673388
54781202
66711133
92370888
145850518

1020016
7673392
54781204
66711134
92381676
145850519

1070309
7745134
54781206
66711135
92496960
145850520

1070313
8250280
54781208
66711136
92520581
145850521

1070315
8777870
54781213
66711137
92520583
145850522

1070317
8777874
54781216
66711138
92520584
145850523

1070321
8777878
54781218
66711139
92520586
145850524

1070325
8777880
54781220
66711140
92575636
145850525

1070327
8777884
54781223
66711141
92589636
145850526

1070347
8777888
54781225
66711142
92589637
145850527

1136554
8777890
54781227
66711143
92589638
145850528

1136556
8777892
54781229
66711144
92589639
145850529

1208913
9295278
54781231
66711145
92589640
145850530

1235764
9295280
55274149
66711146
92589641
145850531

1235766
9295282
55274153
66711147
92589642
145850532

1235768
9295284
55274159
67509857
92589643
145850533

1235770
9295286
55274163
67509861
92589644
145850534

1235772
9295290
55824376
68148126
92589645
145850535

1235774
9295292
56118076
68148140
92589646
145850536

1245380
9295296
56118080
68148142
92589647
145850537

1245382
9295298
56292538
68148144
92589648
145850558

1255605
9295300
56294837
68148150
92589649
145850561

1255607
9437312
56294841
68148152
92589650
145850563

1255608
9927567
56399565
68148154
92589651
145854440

1255609
9928208
56609227
68148158
92589652
145856824

1255612
9968441
56609228
68148160
92589653
145859735

1292860
9968443
56609229
68148164
92589656
148355517

1292862
9968486
56609230
68148166
92600475
148355518

1353813
9968488
56609232
68148174
92600479
148355519

1353815
9968490
56609235
70797818
92600487
148355520

1353817
9968492
56742105
70797820
92607622
148355521

1353819
9968494
56742106
70797822
92667306
148355522

1353821
9968496
58003567
70797824
92667307
148355523

1353825
9968498
58003568
70797826
92667308
148355524

1353827
9968500
58003569
70797828
92667309
148355525

1353831
9997457
58003570
70797830
92667310
148355526

1370131
10636524
58003571
70797832
92667329
148355527

1370135
11229436
58003572
70797834
92667331
148355528

1370137
11343336
58003573
70797836
92798195
148355529

1495627
11343337
58003587
70797838
92798196
148355530

1495628
11876718
58003588
70797842
92798197
148355531

1495629
11876734
58003589
70797844
92798198
148355532

1495630
11876735
58003608
70797846
92798199
148355533

1495631
11876736
58003609
70797850
92798218
148540957

1495632
11876737
58003610
70797852
92798220
148578450

1495633
11876738
58003611
70797854
92824835
148578452

1495634
11876739
58003612
70797856
92834676
148578454

1495635
11876740
58003613
70797858
92835832
148578455

1495637
11876741
58003614
70797860
92835834
148578456

1495638
11878173
58003615
70797866
92835836
148578457

1495639
11878175
58003616
70797870
92839400
148578458

1495640
11878177
58003618
70797872
92839402
148578460

1495641
11992075
58003619
70797874
92839403
149849068

1495642
11992193
58003620
70797876
92839404
149849080

1495643
12003249
58003622
70797878
92839405
149849084

1495644
12003251
58003623
70797884
92839406
149849088

1495645
12003253
58003624
70797886
92839407
150447881

1495646
12003255
58003625
70797888
92839408
150447883

1495647
12003257
58003626
70797890
92839409
150447885

1495648
12655491
58003627
70797894
92845038
150447887

1495649
12655493
58003628
70797898
92845490
150450134

1495650
12655500
58003629
70798601
92845651
150450135

1495651
12655502
58003630
70798603
92855396
150450136

1495652
12655504
58003631
70798605
92855400
150450137

1532001
12655519
58003632
70798607
92855404
150450138

1532002
12655521
58003633
70798609
92855408
150450139

1532027
12655525
58003634
70798611
92855412
150450140

1552277
12655527
58003656
70798613
92855416
150450636

1552283
12655529
58003657
70798615
92855420
150453145

1552285
12655531
58003658
70798617
92855424
150453147

1552287
12655541
58003659
70798619
92855428
150453149

1552291
12655558
58003660
70798621
92855432
150453151

1552295
12655565
58003661
70798623
92855436
150453153

1552299
12655567
58032596
70798627
92855441
150453154

1552319
12655569
58032603
70798629
92855444
150453155

1561601
12655643
58032606
70798631
92856854
150453156

1561605
12655655
58194104
70798633
92856855
150453157

1561607
12655662
58194120
70798635
92856859
150453159

1561609
12655665
58194136
70798637
92857001
150453161

1561611
12655672
58202701
70798639
92857003
150453163

1572702
12655713
58202709
70798641
92857012
150453165

1572704
12655723
58202711
70798643
92857016
150453167

1572706
12655730
58202713
70798645
92857018
150453169

1572708
12655732
58202715
70798649
92858156
150453171

1572710
12655736
58202717
70798653
92861312
150453174

1657324
12655738
58202719
70798655
92861313
150453213

1657326
12655740
58202721
70798657
92861314
150453216

1657328
12655748
58202723
70798659
92862784
153590356

1673592
12655751
58202725
70798661
92875826
153590359

1673602
12710669
58202727
70798667
92878541
153590361

1710418
12710671
58202729
70798669
92878543
153590363

1770403
12734084
58202733
70798671
92878545
153590365

1770415
12734089
58202735
70798673
92903931
153590367

1773056
12750933
58202737
70798675
92905358
153590371

1778125
12836990
58222454
70798677
92905360
156149223

1785869
12957385
58222456
70798679
92905362
156149224

1785873
12957387
58222458
70798681
94034254
156149225

1785877
13170940
58222460
70798683
94034257
156229617

1800286
13170944
58222462
70798685
94034261
156557387

1813653
13170948
58222464
70798687
94034264
156557389

1813655
13171333
58222466
70798690
94034267
156557391

1813657
13171339
58222468
70798692
94034271
156557393

1834498
13171341
58222470
70798694
94034285
156557399

1834563
13171343
58222473
70798696
94034316
156557403

1834564
13447996
58222476
70798698
94034339
156557405

1835872
13448000
58222478
70798700
94034342
156557407

1835873
13448002
58222480
70798702
94034384
156557411

1839291
13448004
58222482
70798706
94034387
156562058

1864110
13448006
58222484
70798708
94034390
157087534

1864112
13448010
58222487
70798710
94034393
157896695

1864114
13448012
58222489
70798712
94035272
157896697

1864116
13448016
58222491
70798716
94035284
157903220

1864118
13448018
58222493
70798718
94035289
158055245

1864136
13448022
58222497
70798720
94035298
158055254

1864138
13549147
58222499
70798722
94035300
158055268

1890131
13785652
58222501
70798724
94035312
158055282

1890133
13939245
58222503
70798732
94469910
158055285

1905798
13939277
58222505
70798734
94469912
158055288

1905937
13939331
58222507
70798736
94469914
158058441

1905941
13991697
58222509
70798738
94469922
158731523

1911732
14150696
58222511
70798742
94469924
158731524

1922370
14150698
58222513
70798744
94469926
158731525

1922438
14290262
58222515
70798750
95007504
158731526

1922466
14573212
58222517
70798752
95007510
158731527

1922501
14573214
58222519
70798758
95007512
158731528

1922528
14573216
58222521
70798760
95007514
158731529

1922535
14573218
58222523
70798764
95007516
158731530

1922602
14573220
58222525
70798766
95007518
158731531

1922618
14573222
58222527
70798768
95007520
158731532

1922645
14573226
58222529
70798770
95007522
158731533

1922679
14573254
58222531
70798772
95007524
158731534

1922796
14573256
58222533
70798774
95007526
158731536

1922805
14573258
58222535
70798776
95007528
158731538

1932772
14573260
58222537
70798778
95007530
158731539

1943727
14573262
58222539
70798780
95007532
158731540

2058533
14573264
58222541
70798782
95007534
158731541

2058535
14573266
58222543
70798784
95007536
158731542

2058678
14573268
58222545
70798786
95007538
158731545

2072271
14573270
58222547
70798788
95007540
158731546

2072273
14573272
58222549
70798792
95007542
158731547

2072279
14573274
58222551
70798794
95007544
158731548

2072981
14573276
58222553
70798796
95101759
158731550

2078359
14573278
58222556
70798798
95101761
158731551

2078371
14588864
58222558
70798800
95101767
158731552

2078373
14588866
58222560
70798802
95101769
158731553

2169989
14588868
58222562
70798804
95101777
158731554

2169990
14588870
58222564
70798806
98956195
158731555

2172285
14588872
58222566
70798808
98956209
158731556

2173403
14597098
58222568
70798810
98956219
158731557

2175768
14597112
58222570
70798812
98956223
158731558

2175852
14597124
58222572
70798814
98956232
158731559

2175867
14597127
58222575
70798816
98956244
158731560

2218123
14625743
58222577
70798818
98956249
158731561

2239113
14625918
58222579
70798820
98956255
158731562

2239115
14626493
58222581
70798824
98956261
158731563

2253439
14716957
58222583
70798826
98956263
158731564

2266632
14716961
58222585
70798828
98956271
158731565

2266634
14716969
58222587
70798830
98956277
158731566

2291087
14716971
58222589
70798832
98956279
158731567

2293965
14716973
58222591
70798834
98956281
158731568

2293967
15011457
58222593
70798836
98956285
158731569

2306827
15099974
58222595
70798838
98956289
158744132

2306829
15277619
58222597
70798840
98956291
158744140

2345025
15419020
58222599
70798842
98956293
158744148

2345029
15859220
58222601
70798844
98956299
158744156

2345031
15986229
58222603
70798846
98956301
158744164

2345033
16508167
58222605
70798848
98956303
158746355

2385484
16554974
58222607
70798850
98956305
158746363

2385486
16923186
58222609
70798852
98956307
158746371

APPENDIX B

GI Numbers of Lambda Light Chains

Used to Derive the Vλ Libraries

31454
3142529
4566076
9968397
51103608
77379760

32808
3142531
4566078
9968401
51103612
77379824

32812
3142533
4566082
9968403
51103614
77379826

33335
3142535
4566084
9968405
51103616
77379828

33368
3142537
4566086
9968409
51490956
77379830

33383
3142539
4566088
9968411
54781261
77379832

33387
3142541
4566090
9968413
61815560
77379834

33412
3142543
4566092
9968415
62720404
77379836

33429
3142545
4566094
9968417
62720406
77379838

33431
3142547
4566096
9968419
62720408
77379840

33433
3142549
4566098
9968421
62720412
77379842

33703
3142553
4566101
9968423
62860947
77379846

33711
3142556
4566105
9968425
62860950
77379848

37918
3142558
4732059
9968427
62860967
77379850

37920
3142562
4761253
9968429
62860969
77379855

37922
3142564
4761255
9968433
62860971
77379857

37923
3142566
4761257
9968435
62860973
77379859

38359
3142569
4761259
9968437
62860975
77379861

38360
3142573
4761261
9968439
62860977
77379863

38364
3142577
4761263
10636511
62860979
77379865

38365
3142579
4761265
10636514
62860985
77379867

38366
3142581
4761267
10636518
62861006
77379869

38368
3142583
4761269
10636521
62861008
77379871

186078
3142585
4761271
10636527
62861010
77379875

186080
3142587
4761273
11992185
62861047
77379877

186082
3142589
4761277
11992187
62999489
77379879

186084
3142591
4761279
11992189
62999497
77379882

186086
3142593
4927957
11992191
62999501
77379884

186088
3142595
5019504
11992195
62999509
77379886

186090
3142597
5019506
11992197
70888031
77379888

186092
3142599
5019516
11992199
70888035
77379890

186094
3142601
5019518
11992201
70888037
77379894

186096
3142603
5019520
12666922
70888041
77379896

186097
3142612
5019528
12666924
70888043
77379900

186111
3142614
5019530
12666926
70888045
77379908

186162
3142616
5019532
12666928
70888047
77379910

186164
3142618
5019534
12666930
70888049
77379912

186168
3142620
5019536
12666932
70888051
77379916

186170
3142649
5174362
12666934
70888053
77379918

186172
3142651
5174364
12666936
70888055
80975584

186175
3142653
5174366
12666938
70888057
80975588

298556
3142656
5174378
12666940
70888059
80975598

405223
3142658
5524086
12666942
70888061
80975622

405227
3142660
5524106
12666944
70888063
80975628

409040
3142662
5524108
12666946
70888065
80975632

409041
3142668
5524118
12666948
70888067
80975636

409043
3142670
5524122
12666952
70888069
81020028

433485
3142672
5524132
12666954
70888071
81020064

434041
3142674
5578817
12666956
70888073
86438995

434045
3142676
5578819
12666958
70888075
86439001

439514
3142678
5578823
12666960
70888077
86439005

439516
3142680
5578825
12666962
70888079
86439015

441251
3142684
5578827
12830380
70888081
86439017

460854
3153359
5578829
12830382
70888083
86439087

460856
3153361
5578831
12830384
70888085
86439089

460860
3153365
5578833
13276707
70888087
86439091

465157
3153366
5911837
13877276
70888089
86439093

465167
3153368
6492194
14279402
70888091
86439095

465171
3153374
6492196
14279404
70888093
86439097

465175
3153376
6492206
14279406
70888095
86439099

469249
3335577
6492208
17226627
70888097
86439101

483911
3335579
6492210
17226649
70888099
86439105

487824
3335585
6492212
18307305
70888103
86439127

487825
3335587
6643078
18307307
70888105
86439133

487828
3335591
6643082
18307309
70888109
86439137

493153
3388046
6643086
18307311
70888111
86439139

506426
3388048
6643088
18307313
70888113
86439141

506428
3388050
6643090
18307315
70888115
90994749

515765
3388054
6643098
18307317
70888117
95007506

532599
3388056
6643104
18307319
70888121
95007546

532600
3388058
6643106
18307321
70888123
95007548

532603
3388060
6643114
18307329
70888125
95007550

560845
3388062
6643118
21311290
70888127
95007552

575230
3388064
6643120
21311292
70888129
95007554

575238
3388066
6643124
21669150
70888133
95007556

575242
3388070
6643126
21669152
70888137
95007558

685021
3388072
6643128
21669154
70888139
95007560

773591
3388074
6643136
21669156
70888141
95007562

871362
3388080
6643138
21669158
70888143
95007564

987068
3747019
6643154
21669160
70888147
95007566

987076
3821077
6643156
21669162
70888149
95007570

998390
3821078
6643158
21669164
70888151
95007572

998394
3821079
6643162
21669166
70888155
95007576

1055278
3821080
6643168
21669172
70888157
95007578

1070329
3821081
6643170
21669174
70888159
109240683

1070341
3821082
6643172
21669176
70888161
109240697

1070349
3821083
6643176
21669178
70888163
109240743

1143195
3821084
6643178
21669180
70888165
109240749

1200068
3821086
6643180
21669182
70888167
109240754

1235776
3821087
6643182
21669184
70888169
109240756

1235778
3821089
6643184
21669186
70888171
109240758

1235780
3821090
6643186
21669188
70888173
116795127

1235782
3821091
6643188
21669190
70888179
116795192

1255606
3821092
6643192
21669192
70888181
146336934

1255610
3821093
6643196
21669194
70888183
156632919

1255611
3821094
6643198
21669196
70888185
156632943

1255613
3821095
6643200
21669198
70888187
156632945

1552313
3821096
6643202
21669200
70888193
156632975

1561599
3821097
6643204
21669204
70888195
156633095

1770407
4103646
6643210
21669206
70888197
156633103

1864134
4103648
6643214
21669210
70888199
156633141

1864140
4103650
6643218
21669212
70888201
156633153

1864142
4103652
6643220
21669214
70888204
156633155

1864144
4103654
6643224
21669218
70888206
156633159

2078365
4103656
6643226
21669220
70888208
156633171

2654039
4103658
6643230
21669222
70888210
156633179

2654043
4103660
6643232
21669224
70888212
156633199

2865485
4103672
6643238
21669226
70888216
156633203

3023094
4324023
6643240
21669228
70888218
156633209

3023096
4324025
6643242
21669230
70888220
156633211

3023098
4324029
6643244
21669232
70888222
156633225

3023100
4324031
6643248
21669234
70888224
156633229

3023102
4324037
6643250
21669236
70888228
156633237

3023104
4324039
6643254
21669238
70888230
156633241

3023106
4324043
6643256
21669240
70888232
156633245

3023108
4324047
6643258
21669242
70888234
156633253

3023110
4324055
6643268
21669244
70888236
156633255

3023112
4324057
6643272
21669248
70888238
156633267

3023114
4324061
6643274
21669252
70888240
156633283

3023116
4324063
6643276
21669254
70888242
157093725

3023118
4324067
6643278
21669256
70888244
170684323

3023120
4324069
6643280
21669260
70888246
170684325

3023122
4324073
6643282
21669262
70888248
170684329

3023126
4324075
6643286
21669264
70888250
170684331

3023130
4324077
6643290
21669266
70888252
170684333

3023132
4324085
6643292
21669268
70888254
170684335

3091153
4324087
6643294
21669270
70888258
170684339

3091155
4324089
6643296
21669272
70888260
170684341

3091157
4324091
6643302
21669274
70888262
170684345

3091159
4324093
6643304
21669276
70888264
170684349

3091161
4324097
6643308
21669278
70888266
170684351

3091163
4324103
6643314
21669280
70888268
170684355

3091165
4324107
6643318
21669288
70888270
170684363

3091167
4324111
6643328
21998780
70888272
170684365

3091169
4324113
6643344
21998782
70888274
170684369

3091171
4324115
6643352
21998784
70888276
170684371

3091173
4324117
6643354
21998786
70888278
170684373

3091175
4324123
6643358
21998792
70888280
170684375

3091177
4324125
6643360
21998794
70888282
170684379

3091179
4324127
6643362
21998800
70888284
170684381

3091181
4324139
6643366
21998802
70888286
170684385

3091183
4324145
6643368
21998804
70888288
170684387

3091185
4324151
6643374
23194484
70888290
170684389

3091187
4324155
6643376
23194488
70888292
170684397

3091191
4324157
6643378
23194492
70888294
170684405

3091193
4324159
6643382
23194496
70888296
170684407

3091195
4324163
6643386
23343556
70888304
170684409

3091197
4324169
6643390
24474079
70888306
170684411

3091201
4324175
6643392
27369031
71482628
170684417

3091203
4324177
6643402
27369033
71482632
170684419

3091205
4324181
6643416
27369035
77378177
170684423

3091207
4324187
6643418
27369037
77378188
170684425

3091209
4324189
6643424
27369045
77378257
170684427

3091213
4324193
6643428
27369047
77378266
170684429

3093861
4324197
6643436
27369051
77378268
170684431

3093863
4324199
6643446
27369053
77378270
170684433

3093865
4324205
6643448
27369058
77378273
170684439

3093867
4324207
6643450
27369060
77378277
170684443

3093869
4324209
6643452
27369064
77378280
170684449

3093871
4324211
6643456
27369068
77378282
170684451

3093873
4324213
6643470
27369075
77378284
170684453

3093875
4324215
6643474
27369082
77378286
170684461

3093877
4324221
6643478
27369084
77378288
170684469

3093879
4324223
6643484
27369088
77378291
170684473

3093881
4324229
6643488
27818828
77378293
170684489

3093883
4324231
6643492
28394695
77378298
170684495

3093885
4324245
6643500
28394699
77378300
170684497

3093887
4324247
6643512
28394703
77378303
170684499

3093889
4324249
6643514
28394707
77378305
170684501

3093891
4324251
6643528
28394711
77378307
170684507

3093895
4324255
6643534
28394715
77378309
170684513

3093903
4324257
6643558
28848877
77378312
170684515

3142451
4324261
6643560
28848881
77378316
170684517

3142453
4324263
6643562
28848885
77378318
170684527

3142455
4324265
6643564
29342115
77378320
170684531

3142457
4324271
6643572
33304654
77378322
170684535

3142459
4324273
6643574
40647151
77378377
170684537

3142461
4324275
6643580
47271301
77378379
170684539

3142465
4324283
6643582
47271303
77378381
170684541

3142467
4324285
6643584
47271319
77378383
170684545

3142471
4468355
6643586
47271321
77378385
170684549

3142475
4468367
6643588
47271323
77378387
170684553

3142477
4468369
6643592
47271325
77378389
170684555

3142479
4468371
6643596
50199320
77378392
170684557

3142481
4565964
6643598
50199322
77378394
170684561

3142483
4565966
6643600
50199328
77378396
170684565

3142485
4565996
6643602
50199330
77378398
170684567

3142487
4566007
6643604
50199338
77378400
170684569

3142489
4566009
6643606
50199340
77378402
170684571

3142491
4566016
6643614
50871689
77379590
170684583

3142493
4566021
6643628
51103426
77379620
170684589

3142495
4566023
6643630
51103428
77379622
170684591

3142497
4566025
6649891
51103430
77379624
170684593

3142499
4566029
6649893
51103434
77379632
170684597

3142503
4566045
8920222
51103436
77379642
170684599

3142505
4566049
8920226
51103572
77379644
170684601

3142507
4566051
9864840
51103574
77379646
170684603

3142509
4566053
9968383
51103576
77379675
170684607

3142511
4566055
9968385
51103588
77379677
170684609

3142515
4566057
9968387
51103590
77379726
170684613

3142517
4566059
9968389
51103592
77379728
170684617

3142519
4566061
9968391
51103600
77379730
170684619

3142521
4566065
9968393
51103602
77379738

3142527
4566074
9968395
51103606
77379740

	Number	Date	Country
Parent	16236259	Dec 2018	US
Child	17231414		US
Parent	14256126	Apr 2014	US
Child	16236259		US

	Number	Date	Country
Parent	17534006	Nov 2021	US
Child	17856679		US
Parent	17231414	Apr 2021	US
Child	17534006		US
Parent	12404059	Mar 2009	US
Child	14256126		US

	Number	Date	Country
Parent	12210072	Sep 2008	US
Child	12404059		US

RATIONALLY DESIGNED, SYNTHETIC ANTIBODY LIBRARIES AND USES THEREFOR

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATIONS

Provisional Applications (1)

Divisions (2)

Continuations (3)

Continuation in Parts (1)