GPCR BINDING PROTEINS AND SYNTHESIS THEREOF

Abstract
Provided herein are methods and compositions relating to G protein-coupled receptor (GPCR) libraries having nucleic acids encoding for a scaffold comprising a GPCR binding domain. Libraries described herein include variegated libraries comprising nucleic acids each encoding for a predetermined variant of at least one predetermined reference nucleic acid sequence. Further described herein are protein libraries generated when the nucleic acid libraries are translated. Further described herein are cell libraries expressing variegated nucleic acid libraries described herein.
Description
BACKGROUND

G protein-coupled receptors (GPCRs) are implicated in a wide variety of diseases. Raising antibodies to GPCRs has been difficult due to problems in obtaining suitable antigen because GPCRs are often expressed at low levels in cells and are very unstable when purified. Thus, there is a need for improved agents for therapeutic intervention which target GPCRs.


INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.


BRIEF SUMMARY

Provided herein are antibodies comprising a CDR-H3 comprising a sequence of any one of SEQ ID NOS: 2420 to 2436. Provided herein are antibodies comprising a CDR-H3 comprising a sequence of any one of SEQ ID NOS: 2420 to 2436; and wherein the antibody is a monoclonal antibody, a polyclonal antibody, a bi-specific antibody, a multispecific antibody, a grafted antibody, a human antibody, a humanized antibody, a synthetic antibody, a chimeric antibody, a camelized antibody, a single-chain Fvs (scFv), a single chain antibody, a Fab fragment, a F(ab′)2 fragment, a Fd fragment, a Fv fragment, a single-domain antibody, an isolated complementarity determining region (CDR), a diabody, a fragment comprised of only a single monomeric variable domain, disulfide-linked Fvs (sdFv), an intrabody, an anti-idiotypic (anti-Id) antibody, or ab antigen-binding fragments thereof. Provided herein are antibodies wherein the VH domain is IGHV1-18, IGHV1-69, IGHV1-8 IGHV3-21, IGHV3-23, IGHV3-30/33rn, IGHV3-28, IGHV3-74, IGHV4-39, or IGHV4-59/61. Provided herein are antibodies, wherein the VL domain is IGKV1-39, IGKV1-9, IGKV2-28, IGKV3-11, IGKV3-15, IGKV3-20, IGKV4-1, IGLV1-51, or IGLV2-14. Provided herein are methods of inhibiting GLP1R activity, comprising administering the antibodies as described herein. Provided herein are methods for treatment of a metabolic disorder, comprising administering to a subject in need thereof the antibodies as described herein. In some instances, the antibody comprises a CDR-H3 comprising a sequence of any one of SEQ ID NOS: 2420 to 2436. Provided herein are methods for treatment of a metabolic disorder, wherein the metabolic disorder is Type II diabetes, or obesity. Provided herein are nucleic acids encoding for a protein comprising a sequence of any one of SEQ ID NOS: 2420 to 2436.


Provided herein are nucleic acid libraries comprising a plurality of nucleic acids, wherein each nucleic acid encodes for a sequence that when translated encodes for an immunoglobulin scaffold, wherein the immunoglobulin scaffold comprises a CDR-H3 loop that comprises a GPCR binding domain, and wherein each nucleic acid comprises a sequence encoding for a sequence variant of the GPCR binding domain. Provided herein are nucleic acid libraries, wherein a length of the CDR-H3 loop is about 20 to about 80 amino acids. Provided herein are nucleic acid libraries, wherein a length of the CDR-H3 loop is about 80 to about 230 base pairs. Provided herein are nucleic acid libraries, wherein the immunoglobulin scaffold further comprises one or more domains selected from variable domain, light chain (VL), variable domain, heavy chain (VH), constant domain, light chain (CL), and constant domain, heavy chain (CH). Provided herein are nucleic acid libraries, wherein the VH domain is IGHV1-18, IGHV1-69, IGHV1-8 IGHV3-21, IGHV3-23, IGHV3-30/33rn, IGHV3-28, IGHV3-74, IGHV4-39, or IGHV4-59/61. Provided herein are nucleic acid libraries, wherein the VL domain is IGKV1-39, IGKV1-9, IGKV2-28, IGKV3-11, IGKV3-15, IGKV3-20, IGKV4-1, IGLV1-51, or IGLV2-14. Provided herein are nucleic acid libraries, wherein a length of the VH domain is about 90 to about 100 amino acids. Provided herein are nucleic acid libraries, wherein a length of the VL domain is about 90 to about 120 amino acids. Provided herein are nucleic acid libraries, wherein a length of the VH domain is about 280 to about 300 base pairs. Provided herein are nucleic acid libraries, wherein a length of the VL domain is about 300 to about 350 base pairs. Provided herein are nucleic acid libraries, wherein the library comprises at least 105 non-identical nucleic acids. Provided herein are nucleic acid libraries, wherein the immunoglobulin scaffold comprises a single immunoglobulin domain. Provided herein are nucleic acid libraries, wherein the immunoglobulin scaffold comprises a peptide of at most 100 amino acids. Provided herein are vector libraries comprising nucleic acid libraries as described herein. Provided herein are cell libraries comprising nucleic acid libraries as described herein.


Provided herein are nucleic acid libraries comprising a plurality of nucleic acids, wherein each nucleic acid encodes for a sequence that when translated encodes a GPCR binding domain, and wherein each nucleic acid comprises sequence encoding for a different GPCR binding domain about 20 to about 80 amino acids. Provided herein are nucleic acid libraries, wherein a length of the GPCR binding domain is about 80 to about 230 base pairs. Provided herein are nucleic acid libraries, wherein the GPCR binding domain is designed based on conformational ligand interactions, peptide ligand interactions, small molecule ligand interactions, extracellular domains of GPCRs, or antibodies that target GPCRs. Provided herein are vector libraries comprising nucleic acid libraries as described herein. Provided herein are cell libraries comprising nucleic acid libraries as described herein.


Provided herein are protein libraries comprising a plurality of proteins, wherein each of the proteins of the plurality of proteins comprise an immunoglobulin scaffold, wherein the immunoglobulin scaffold comprises a CDR-H3 loop that comprises a sequence variant of a GPCR binding domain. Provided herein are protein libraries, wherein a length of the CDR-H3 loop is about 20 to about 80 amino acids. Provided herein are protein libraries, wherein the immunoglobulin scaffold further comprises one or more domains selected from variable domain, light chain (VL), variable domain, heavy chain (VH), constant domain, light chain (CL), and constant domain, heavy chain (CH). Provided herein are protein libraries, wherein the VH domain is IGHV1-18, IGHV1-69, IGHV1-8 IGHV3-21, IGHV3-23, IGHV3-30/33rn, IGHV3-28, IGHV3-74, IGHV4-39, or IGHV4-59/61. Provided herein are protein libraries, wherein the VL domain is IGKV1-39, IGKV1-9, IGKV2-28, IGKV3-11, IGKV3-15, IGKV3-20, IGKV4-1, IGLV1-51, or IGLV2-14. Provided herein are protein libraries, wherein a length of the VH domain is about 90 to about 100 amino acids. Provided herein are protein libraries, wherein a length of the VL domain is about 90 to about 120 amino acids. Provided herein are protein libraries, wherein the plurality of proteins is used to generate a peptidomimetic library. Provided herein are protein libraries, wherein the protein library comprises peptides. Provided herein are protein libraries, wherein the protein library comprises immunoglobulins. Provided herein are protein libraries, wherein the protein library comprises antibodies. Provided herein are cell libraries comprising protein libraries as described herein.


Provided herein are protein libraries comprising a plurality of proteins, wherein the plurality of proteins comprises sequence encoding for different GPCR binding domains, and wherein the length of each GPCR binding domain is about 20 to about 80 amino acids. Provided herein are protein libraries, wherein the protein library comprises peptides. Provided herein are protein libraries, wherein the protein library comprises immunoglobulins. Provided herein are protein libraries, wherein the protein library comprises antibodies. Provided herein are protein libraries, wherein the plurality of proteins are used to generate a peptidomimetic library. Provided herein are cell libraries comprising protein libraries as described herein.


Provided herein are vector libraries comprising a nucleic acid library described herein. Provided herein are cell libraries comprising a nucleic acid library described herein. Provided herein are cell libraries comprising a protein library described herein.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 depicts a schematic of G protein-coupled receptor (GPCR) ligand interaction surfaces.



FIG. 2A depicts a first schematic of an immunoglobulin scaffold.



FIG. 2B depicts a second schematic of an immunoglobulin scaffold.



FIG. 3 depicts a schematic of a motif for placement in a scaffold.



FIG. 4 depicts a schematic of a GPCR.



FIG. 5 depicts schematics of segments for assembly of clonal fragments and non-clonal fragments.



FIG. 6 depicts schematics of segments for assembly of clonal fragments and non-clonal fragments.



FIG. 7 presents a diagram of steps demonstrating an exemplary process workflow for gene synthesis as disclosed herein.



FIG. 8 illustrates an example of a computer system.



FIG. 9 is a block diagram illustrating an architecture of a computer system.



FIG. 10 is a diagram demonstrating a network configured to incorporate a plurality of computer systems, a plurality of cell phones and personal data assistants, and Network Attached Storage (NAS).



FIG. 11 is a block diagram of a multiprocessor computer system using a shared virtual address memory space.



FIGS. 12A-12C depict sequences of immunoglobulin scaffolds. FIG. 12A discloses SEQ ID NOS 2437-2453, 2448, 2454-2464, 2448, 2465-2471, 2466, 2472-2483, 2462, 2484-2485, 2448, 2486-2496, 2491, 2497-2507, 2491, 2508-2518, 2513, 2519-2520, 2462, 2521-2522, 2491, 2523-2539, 2534, 2540-2544, 2534, 2545-2549, 2534, 2550-2554, 2491, 2555-2559, 2513, 2560, 2556, 2561-2562, 2496, 2563-2564, 2556, 2565-2567, 2491, 2568-2575, 2552, 2576-2577, 2563, 2578-2581, 2522, 2491, 2582-2585, 2567 and 2563, respectively, in order of appearance. FIG. 12B discloses SEQ ID NOS 2586-2620, 2615, 2621-2631, 2459, 2632-2654, 2448, 2655-2658, 2464, 2448, 2659-2661, 2658, 2662-2672, 2658, 2673-2718, 2657, 2719, 2715, 2720-2728, 2720, 2729-2745, 2734, 2746-2753, 2748-2749 and 2754-2755, respectively, in order of appearance. FIG. 12C discloses SEQ ID NOS 2756-2766, 2761, 2767-2785, 2461, 2786-2788, 2448, 2789-2793, 2459, 2794-2798, 2448, 2799-2803, 2448, 2804-2808, 2734, 2809-2831, 2814, 2832-2836, 2820, 2837, 2551, 2838-2843, 2817, 2844, 2522, 2491, 2845-2848, 2522, 2849-2854, 2491, 2855-2859, 2513, 2860-2864, 2448, 2865-2874, 2464, 2448, 2875-2879, 2459, 2880-2881, 2873-2874, 2464, 2448, 2882-2903, 2608 and 2904, respectively, in order of appearance.



FIG. 13 depicts sequences of G protein-coupled receptors scaffolds. FIG. 13 discloses SEQ ID NOS 2905-2911, 2911, 2911-2912, 2912, 2911, 2911, 2911, 2911, 2911-2912, 2912, 2912, 2911, 2911, 2911-2912, 2911-2912, 2912, 2911, 2911, 2913-2915, 2915-2918, 2918, 2918, 2918, 2918, 2918 and 2918, respectively, in order of appearance.



FIG. 14 is a graph of normalized reads for a library for variable domain, heavy chains.



FIG. 15 is a graph of normalized reads for a library for variable domain, light chains.



FIG. 16 is a graph of normalized reads for a library for heavy chain complementarity determining region 3.



FIG. 17A is a plot of light chain frameworks assayed for folding. FIG. 17A discloses SEQ ID NOS 2919-2924, 2712, 2925-2926, 2725, 2927-2932, 2461, 2933, 2455, 2934-2939, 2730, 2742, 2940-2941, 2556, 2942, 2851, 2498, 2943, 2843, 2944-2945, 2757, 2946-2947, 2774 and 2948, respectively, in order of appearance.



FIG. 17B is a plot of light chain frameworks assayed for thermostability. FIG. 17B discloses SEQ ID NOS 2919-2924, 2712, 2925-2926, 2725, 2927-2932, 2461, 2933, 2455, 2934-2939, 2730, 2742, 2940-2941, 2556, 2942, 2851, 2498, 2943, 2843, 2944-2945, 2757, 2946-2947, 2774 and 2948, respectively, in order of appearance.



FIG. 17C is a plot of light chain frameworks assayed for motif display using FLAG tag. FIG. 17C discloses SEQ ID NOS 2919-2924, 2712, 2925-2926, 2725, 2927-2932, 2461, 2933, 2455, 2934-2939, 2730, 2742, 2940-2941, 2556, 2942, 2851, 2498, 2943, 2843, 2944-2945, 2757, 2946-2947, 2774 and 2948, respectively, in order of appearance.



FIG. 17D is a plot of light chain frameworks assayed for motif display using His tag. FIG. 17D discloses SEQ ID NOS 2919-2924, 2712, 2925-2926, 2725, 2927-2932, 2461, 2933, 2455, 2934-2939, 2730, 2742, 2940-2941, 2556, 2942, 2851, 2498, 2943, 2843, 2944-2945, 2757, 2946-2947, 2774 and 2948, respectively, in order of appearance.



FIG. 18A is a plot of heavy chain frameworks assayed for folding.



FIG. 18B is a plot of heavy chain frameworks assayed for stability.



FIG. 18C is a plot of heavy chain frameworks assayed for motif display using FLAG tag.



FIG. 18D is a plot of heavy chain frameworks assayed for motif display using His tag.



FIG. 18E is a plot of heavy chain frameworks assayed for expression.



FIG. 18F is a plot of heavy chain frameworks assayed for selection specificity.



FIGS. 19A-19C depict images of G protein-coupled receptors visualized by fluorescent antibodies.



FIGS. 20A-20C depict images of G protein-coupled receptors visualized by auto-fluorescent proteins.



FIG. 21A depicts a schematic of an immunoglobulin scaffold comprising a VH domain attached to a VL domain using a linker.



FIG. 21B depicts a schematic of a full-domain architecture of an immunoglobulin scaffold comprising a VH domain attached to a VL domain using a linker, a leader sequence, and pIII sequence.



FIG. 21C depicts a schematic of four framework elements (FW1, FW2, FW3, FW4) and the variable 3 CDR (L1, L2, L3) elements for a VL or VH domain.





DETAILED DESCRIPTION

The present disclosure employs, unless otherwise indicated, conventional molecular biology techniques, which are within the skill of the art. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art.


Definitions

Throughout this disclosure, various embodiments are presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of any embodiments. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range to the tenth of the unit of the lower limit unless the context clearly dictates otherwise. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual values within that range, for example, 1.1, 2, 2.3, 5, and 5.9. This applies regardless of the breadth of the range. The upper and lower limits of these intervening ranges may independently be included in the smaller ranges, and are also encompassed within the disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosure, unless the context clearly dictates otherwise.


The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of any embodiment. As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.


Unless specifically stated or obvious from context, as used herein, the term “about” in reference to a number or range of numbers is understood to mean the stated number and numbers +/−10% thereof, or 10% below the lower listed limit and 10% above the higher listed limit for the values listed for a range.


Unless specifically stated, as used herein, the term “nucleic acid” encompasses double- or triple-stranded nucleic acids, as well as single-stranded molecules. In double- or triple-stranded nucleic acids, the nucleic acid strands need not be coextensive (i.e., a double-stranded nucleic acid need not be double-stranded along the entire length of both strands). Nucleic acid sequences, when provided, are listed in the 5′ to 3′ direction, unless stated otherwise. Methods described herein provide for the generation of isolated nucleic acids. Methods described herein additionally provide for the generation of isolated and purified nucleic acids. A “nucleic acid” as referred to herein can comprise at least 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, or more bases in length. Moreover, provided herein are methods for the synthesis of any number of polypeptide-segments encoding nucleotide sequences, including sequences encoding non-ribosomal peptides (NRPs), sequences encoding non-ribosomal peptide-synthetase (NRPS) modules and synthetic variants, polypeptide segments of other modular proteins, such as antibodies, polypeptide segments from other protein families, including non-coding DNA or RNA, such as regulatory sequences e.g. promoters, transcription factors, enhancers, siRNA, shRNA, RNAi, miRNA, small nucleolar RNA derived from microRNA, or any functional or structural DNA or RNA unit of interest. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, intergenic DNA, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), small nucleolar RNA, ribozymes, complementary DNA (cDNA), which is a DNA representation of mRNA, usually obtained by reverse transcription of messenger RNA (mRNA) or by amplification; DNA molecules produced synthetically or by amplification, genomic DNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. cDNA encoding for a gene or gene fragment referred herein may comprise at least one region encoding for exon sequences without an intervening intron sequence in the genomic equivalent sequence.


GPCR Libraries


Provided herein are methods and compositions relating to G protein-coupled receptor (GPCR) binding libraries comprising nucleic acids encoding for a scaffold comprising a GPCR binding domain. Scaffolds as described herein can stably support a GPCR binding domain. The GPCR binding domain may be designed based on surface interactions of a GPCR ligand and the GPCR. Libraries as described herein may be further variegated to provide for variant libraries comprising nucleic acids each encoding for a predetermined variant of at least one predetermined reference nucleic acid sequence. Further described herein are protein libraries that may be generated when the nucleic acid libraries are translated. In some instances, nucleic acid libraries as described herein are transferred into cells to generate a cell library. Also provided herein are downstream applications for the libraries synthesized using methods described herein. Downstream applications include identification of variant nucleic acids or protein sequences with enhanced biologically relevant functions, e.g., improved stability, affinity, binding, functional activity, and for the treatment or prevention of a disease state associated with GPCR signaling.


Scaffold Libraries


Provided herein are libraries comprising nucleic acids encoding for a scaffold, wherein sequences for GPCR binding domains are placed in the scaffold. Scaffold described herein allow for improved stability for a range of GPCR binding domain encoding sequences when inserted into the scaffold, as compared to an unmodified scaffold. Exemplary scaffolds include, but are not limited to, a protein, a peptide, an immunoglobulin, derivatives thereof, or combinations thereof. In some instances, the scaffold is an immunoglobulin. Scaffolds as described herein comprise improved functional activity, structural stability, expression, specificity, or a combination thereof. In some instances, scaffolds comprise long regions for supporting a GPCR binding domain.


Provided herein are libraries comprising nucleic acids encoding for a scaffold, wherein the scaffold is an immunoglobulin. In some instances, the immunoglobulin is an antibody. As used herein, the term antibody will be understood to include proteins having the characteristic two-armed, Y-shape of a typical antibody molecule as well as one or more fragments of an antibody that retain the ability to specifically bind to an antigen. Exemplary antibodies include, but are not limited to, a monoclonal antibody, a polyclonal antibody, a bi-specific antibody, a multispecific antibody, a grafted antibody, a human antibody, a humanized antibody, a synthetic antibody, a chimeric antibody, a camelized antibody, a single-chain Fvs (scFv) (including fragments in which the VL and VH are joined using recombinant methods by a synthetic or natural linker that enables them to be made as a single protein chain in which the VL and VH regions pair to form monovalent molecules, including single chain Fab and scFab), a single chain antibody, a Fab fragment (including monovalent fragments comprising the VL, VH, CL, and CH1 domains), a F(ab′)2 fragment (including bivalent fragments comprising two Fab fragments linked by a disulfide bridge at the hinge region), a Fd fragment (including fragments comprising the VH and CH1 fragment), a Fv fragment (including fragments comprising the VL and VH domains of a single arm of an antibody), a single-domain antibody (dAb or sdAb) (including fragments comprising a VH domain), an isolated complementarity determining region (CDR), a diabody (including fragments comprising bivalent dimers such as two VL and VH domains bound to each other and recognizing two different antigens), a fragment comprised of only a single monomeric variable domain, disulfide-linked Fvs (sdFv), an intrabody, an anti-idiotypic (anti-Id) antibody, or ab antigen-binding fragments thereof. In some instances, the libraries disclosed herein comprise nucleic acids encoding for a scaffold, wherein the scaffold is a Fv antibody, including Fv antibodies comprised of the minimum antibody fragment which contains a complete antigen-recognition and antigen-binding site. In some embodiments, the Fv antibody consists of a dimer of one heavy chain and one light chain variable domain in tight, non-covalent association, and the three hypervariable regions of each variable domain interact to define an antigen-binding site on the surface of the VH-VL dimer. In some embodiments, the six hypervariable regions confer antigen-binding specificity to the antibody. In some embodiments, a single variable domain (or half of an Fv comprising only three hypervariable regions specific for an antigen, including single domain antibodies isolated from camelid animals comprising one heavy chain variable domain such as VHH antibodies or nanobodies) has the ability to recognize and bind antigen. In some instances, the libraries disclosed herein comprise nucleic acids encoding for a scaffold, wherein the scaffold is a single-chain Fv or scFv, including antibody fragments comprising a VH, a VL, or both a VH and VL domain, wherein both domains are present in a single polypeptide chain. In some embodiments, the Fv polypeptide further comprises a polypeptide linker between the VH and VL domains allowing the scFv to form the desired structure for antigen binding. In some instances, a scFv is linked to the Fc fragment or a VHH is linked to the Fc fragment (including minibodies). In some instances, the antibody comprises immunoglobulin molecules and immunologically active fragments of immunoglobulin molecules, e.g., molecules that contain an antigen binding site. Immunoglobulin molecules are of any type (e.g., IgG, IgE, IgM, IgD, IgA and IgY), class (e.g., IgG 1, IgG 2, IgG 3, IgG 4, IgA 1 and IgA 2) or subclass.


Libraries described herein comprising nucleic acids encoding for a scaffold, wherein the scaffold is an immunoglobulin, comprise variations in at least one region of the immunoglobulin. Exemplary regions of the antibody for variation include, but are not limited to, a complementarity-determining region (CDR), a variable domain, or a constant domain. In some instances, the CDR is CDR1, CDR2, or CDR3. In some instances, the CDR is a heavy domain including, but not limited to, CDR-H1, CDR-H2, and CDR-H3. In some instances, the CDR is a light domain including, but not limited to, CDR-L1, CDR-L2, and CDR-L3. In some instances, the variable domain is variable domain, light chain (VL) or variable domain, heavy chain (VH). In some instances, the VL domain comprises kappa or lambda chains. In some instances, the constant domain is constant domain, light chain (CL) or constant domain, heavy chain (CH).


Methods described herein provide for synthesis of libraries comprising nucleic acids encoding for a scaffold, wherein each nucleic acid encodes for a predetermined variant of at least one predetermined reference nucleic acid sequence. In some cases, the predetermined reference sequence is a nucleic acid sequence encoding for a protein, and the variant library comprises sequences encoding for variation of at least a single codon such that a plurality of different variants of a single residue in the subsequent protein encoded by the synthesized nucleic acid are generated by standard translation processes. In some instances, the scaffold library comprises varied nucleic acids collectively encoding variations at multiple positions. In some instances, the variant library comprises sequences encoding for variation of at least a single codon of a CDR-H1, CDR-H2, CDR-H3, CDR-L1, CDR-L2, CDR-L3, VL, or VH domain. In some instances, the variant library comprises sequences encoding for variation of multiple codons of a CDR-H1, CDR-H2, CDR-H3, CDR-L1, CDR-L2, CDR-L3, VL, or VH domain. In some instances, the variant library comprises sequences encoding for variation of multiple codons of framework element 1 (FW1), framework element 2 (FW2), framework element 3 (FW3), or framework element 4 (FW4). An exemplary number of codons for variation include, but are not limited to, at least or about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 225, 250, 275, 300, or more than 300 codons.


In some instances, the at least one region of the immunoglobulin for variation is from heavy chain V-gene family, heavy chain D-gene family, heavy chain J-gene family, light chain V-gene family, or light chain J-gene family. In some instances, the light chain V-gene family comprises immunoglobulin kappa (IGK) gene or immunoglobulin lambda (IGL). Exemplary genes include, but are not limited to, IGHV1-18, IGHV1-69, IGHV1-8, IGHV3-21, IGHV3-23, IGHV3-30/33rn, IGHV3-28, IGHV1-69, IGHV3-74, IGHV4-39, IGHV4-59/61, IGKV1-39, IGKV1-9, IGKV2-28, IGKV3-11, IGKV3-15, IGKV3-20, IGKV4-1, IGLV1-51, and IGLV2-14.


Provided herein are libraries comprising nucleic acids encoding for immunoglobulin scaffolds, wherein the libraries are synthesized with various numbers of fragments. In some instances, the fragments comprise the CDR-H1, CDR-H2, CDR-H3, CDR-L1, CDR-L2, CDR-L3, VL, or VH domain. In some instances, the fragments comprise framework element 1 (FW1), framework element 2 (FW2), framework element 3 (FW3), or framework element 4 (FW4). In some instances, the scaffold libraries are synthesized with at least or about 2 fragments, 3 fragments, 4 fragments, 5 fragments, or more than 5 fragments. The length of each of the nucleic acid fragments or average length of the nucleic acids synthesized may be at least or about 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 525, 550, 575, 600, or more than 600 base pairs. In some instances, the length is about 50 to 600, 75 to 575, 100 to 550, 125 to 525, 150 to 500, 175 to 475, 200 to 450, 225 to 425, 250 to 400, 275 to 375, or 300 to 350 base pairs.


Libraries comprising nucleic acids encoding for immunoglobulin scaffolds as described herein comprise various lengths of amino acids when translated. In some instances, the length of each of the amino acid fragments or average length of the amino acid synthesized may be at least or about 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, or more than 150 amino acids. In some instances, the length of the amino acid is about 15 to 150, 20 to 145, 25 to 140, 30 to 135, 35 to 130, 40 to 125, 45 to 120, 50 to 115, 55 to 110, 60 to 110, 65 to 105, 70 to 100, or 75 to 95 amino acids. In some instances, the length of the amino acid is about 22 amino acids to about 75 amino acids. In some instances, the immunoglobulin scaffolds comprise at least or about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, or more than 5000 amino acids.


A number of variant sequences for the at least one region of the immunoglobulin for variation are de novo synthesized using methods as described herein. In some instances, a number of variant sequences is de novo synthesized for CDR-H1, CDR-H2, CDR-H3, CDR-L1, CDR-L2, CDR-L3, VL, VH, or combinations thereof. In some instances, a number of variant sequences is de novo synthesized for framework element 1 (FW1), framework element 2 (FW2), framework element 3 (FW3), or framework element 4 (FW4). The number of variant sequences may be at least or about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, or more than 500 sequences. In some instances, the number of variant sequences is at least or about 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, or more than 8000 sequences. In some instances, the number of variant sequences is about 10 to 500, 25 to 475, 50 to 450, 75 to 425, 100 to 400, 125 to 375, 150 to 350, 175 to 325, 200 to 300, 225 to 375, 250 to 350, or 275 to 325 sequences.


Variant sequences for the at least one region of the immunoglobulin, in some instances, vary in length or sequence. In some instances, the at least one region that is de novo synthesized is for CDR-H1, CDR-H2, CDR-H3, CDR-L1, CDR-L2, CDR-L3, VL, VH, or combinations thereof. In some instances, the at least one region that is de novo synthesized is for framework element 1 (FW1), framework element 2 (FW2), framework element 3 (FW3), or framework element 4 (FW4). In some instances, the variant sequence comprises at least or about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, or more than 50 variant nucleotides or amino acids as compared to wild-type. In some instances, the variant sequence comprises at least or about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 additional nucleotides or amino acids as compared to wild-type. In some instances, the variant sequence comprises at least or about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, or 50 less nucleotides or amino acids as compared to wild-type. In some instances, the libraries comprise at least or about 101, 102, 103, 104, 105, 106, 107, 108, 109, 1010, or more than 1010 variants.


Following synthesis of scaffold libraries, scaffold libraries may be used for screening and analysis. For example, scaffold libraries are assayed for library displayability and panning. In some instances, displayability is assayed using a selectable tag. Exemplary tags include, but are not limited to, a radioactive label, a fluorescent label, an enzyme, a chemiluminescent tag, a colorimetric tag, an affinity tag or other labels or tags that are known in the art. In some instances, the tag is histidine, polyhistidine, myc, hemagglutinin (HA), or FLAG. In some instances, scaffold libraries are assayed by sequencing using various methods including, but not limited to, single-molecule real-time (SMRT) sequencing, Polony sequencing, sequencing by ligation, reversible terminator sequencing, proton detection sequencing, ion semiconductor sequencing, nanopore sequencing, electronic sequencing, pyrosequencing, Maxam-Gilbert sequencing, chain termination (e.g., Sanger) sequencing, +S sequencing, or sequencing by synthesis.


In some instances, the scaffold libraries are assayed for functional activity, structural stability (e.g., thermal stable or pH stable), expression, specificity, or a combination thereof. In some instances, the scaffold libraries are assayed for scaffolds capable of folding. In some instances, a region of the antibody is assayed for functional activity, structural stability, expression, specificity, folding, or a combination thereof. For example, a VH region or VL region is assayed for functional activity, structural stability, expression, specificity, folding, or a combination thereof.


GPCR Libraries


Provided herein are G protein-coupled receptor (GPCR) binding libraries comprising nucleic acids encoding for scaffolds comprising sequences for GPCR binding domains. In some instances, the scaffolds are immunoglobulins. In some instances, the scaffolds comprising sequences for GPCR binding domains are determined by interactions between the GPCR binding domains and the GPCRs.


Provided herein are libraries comprising nucleic acids encoding scaffolds comprising GPCR binding domains, wherein the GPCR binding domains are designed based on surface interactions on the GPCRs. Exemplary GPCRs are seen in Table 1. In some instances, the GPCR binding domains interact with the amino- or carboxy-terminus of the GPCR. In some instances, the GPCR binding domains interact with at least one transmembrane domain including, but not limited to, transmembrane domain 1 (TM1), transmembrane domain 2 (TM2), transmembrane domain 3 (TM3), transmembrane domain 4 (TM4), transmembrane domain 5 (TM5), transmembrane domain 6 (TM6), and transmembrane domain 7 (TM7). In some instances, the GPCR binding domains interact with an intracellular surface of the GPCR. For example, the GPCR binding domains interact with at least one intracellular loop including, but not limited to, intracellular loop 1 (ICL1), intracellular loop 2 (ICL2), and intracellular loop 3 (ICL3). In some instances, the GPCR binding domains interact with an extracellular surface of the GPCR. See FIG. 1. For example, the GPCR binding domains interact with at least one extracellular domain (ECD) or extracellular loop (ECL) of the GPCR. The extracellular loops include, but are not limited to, extracellular loop 1 (ECL1), extracellular loop 2 (ECL2), and extracellular loop 3 (ECL3).









TABLE 1







List of GPCRs










Gene
Accession


GPCR
Name
Number





5-hydroxytryptamine receptor 1A
HTR1A
P08908


5-hydroxytryptamine receptor 1B
HTR1B
P28222


5-hydroxytryptamine receptor 1D
HTR1D
P28221


5-hydroxytryptamine receptor 1E
HTR1E
P28566


5-hydroxytryptamine receptor 1F
HTR1F
P30939


5-hydroxytryptamine receptor 2A
HTR2A
P28223


5-hydroxytryptamine receptor 2B
HTR2B
P41595


5-hydroxytryptamine receptor 2C
HTR2C
P28335


5-hydroxytryptamine receptor 4
HTR4
Q13639


5-hydroxytryptamine receptor 5A
HTR5A
P47898


5-hydroxytryptamine receptor 6
HTR6
P50406


5-hydroxytryptamine receptor
HTR7
P34969


Adenosine receptor A1
ADORA1
P30542


Adenosine receptor A2a
ADORA2A
P29274


Adenosine receptor A2b
ADORA2B
P29275


Adenosine A3 receptor
ADORA3
P33765


Muscarinic acetylcholine receptor M1
CHRM1
P11229


Muscarinic acetylcholine receptor M2
CHRM2
P08172


Muscarinic acetylcholine receptor M3
CHRM3
P20309


Muscarinic acetylcholine receptor M4
CHRM4
P08173


Muscarinic acetylcholine receptor M5
CHRM5
P08912


Adrenocorticotropic hormone receptor
MC2R
Q01718


α-1A adrenergic receptor
ADRA1A
P35348


α-1B adrenergic receptor
ADRA1B
P35368


α-1D adrenergic receptor
ADRA1D
P25100


α-2A adrenergic receptor
ADRA2A
P08913


α-2B adrenergic receptor
ADRA2B
P18089


α-2C adrenergic receptor
ADRA2C
P18825


β-1 adrenergic receptor
ADRB1
P08588


β-2 adrenergic receptor
ADRB2
P07550


β-3 adrenergic receptor
ADRB3
P13945


Type-1 angiotensin II receptor
AGTR1
P30556


Duffy antigen/chemokine receptor
DARC
Q16570


Endothelin-1 receptor
EDNRA
P25101


Endothelin B receptor
EDNRB
P24530


N-formyl peptide receptor 2
FPR2
P25090


Follicle-stimulating hormone receptor
FSHR
P23945


Galanin receptor type 1
GALR1
P47211


Galanin receptor type 2
GALR2
O43603


Galanin receptor type 3
GALR3
O60755


Gastrin/cholecystokinin type B receptor
CCKBR
P32239


Gonadotropin-releasing hormone receptor
GNRHR
P30968


Putative gonadotropin-releasing hormone II
GNRHR2
Q96P88


receptor


G-protein coupled oestrogen receptor 1
GPER
Q99527


Uracil nucleotide/cysteinyl leukotriene
GPR17
Q13304


receptor


Putative G-protein coupled receptor 44
GPR44
Q9Y5Y4


G-protein coupled receptor 55
GPR55
Q9Y2T6


Gastrin-releasing peptide receptor
GRPR
P30550


Histamine H1 receptor
HRH1
P35367


Histamine H2 receptor
HRH2
P25021


Histamine H3 receptor
HRH3
Q9Y5N1


Histamine H4 receptor
HRH4
Q9H3N8


KiSS-1 receptor
KISS1R
Q969F8


Lysophosphatidic acid receptor 1
LPAR1
Q92633


Lysophosphatidic acid receptor 2
LPAR2
Q9HBW0


Lysophosphatidic acid receptor 3
LPAR3
Q9UBY5


Lysophosphatidic acid receptor 4
LPAR4
Q99677


Lysophosphatidic acid receptor 6
LPAR6
P43657


Lutropin-choriogonadotropic hormone
LHCGR
P22888


receptor


Leukotriene B4 receptor 1
LTB4R
Q15722


Leukotriene B4 receptor 2
LTB4R2
Q9NPC1


Melanocortin receptor 3
MC3R
P41968


Melanocortin receptor 4
MC4R
P32245


Melanocortin receptor 5
MC5R
P33032


Olfactory receptor 10G9
OR10G9
Q8NGN4


Olfactory receptor 10H1
OR10H1
Q9Y4A9


Olfactory receptor 10H2
OR10H2
O60403


Olfactory receptor 10H3
OR10H3
O60404


Olfactory receptor 10H4
OR10H4
Q8NGA5


Olfactory receptor 10H5
OR10H5
Q8NGA6


Olfactory receptor 10J1
OR10J1
P30954


Olfactory receptor 10J3
OR10J3
Q5JRS4


Olfactory receptor 10J5
OR10J5
Q8NHC4


Olfactory receptor 10K1
OR10K1
Q8NGX5


Olfactory receptor 10K2
OR10K2
Q6IF99


Olfactory receptor 10P1
OR10P1
Q8NGE3


Olfactory receptor 10Q1
OR10Q1
Q8NGQ4


Olfactory receptor 10R2
OR10R2
Q8NGX6


Olfactory receptor 10S1
OR10S1
Q8NGN2


Olfactory receptor 10T2
OR10T2
Q8NGX3


Olfactory receptor 10V1
OR10V1
Q8NGI7


Olfactory receptor 10W1
OR10W1
Q8NGF6


Olfactory receptor 14A2
OR14A2
Q96R54


Olfactory receptor 14C36
OR14C36
Q8NHC7


Olfactory receptor 14I1
OR14I1
A6ND48


Olfactory receptor 14J1
OR14J1
Q9UGF5


Olfactory receptor 14K1
OR14K1
Q8NGZ2


Olfactory receptor 2A12
OR2A12
Q8NGT7


Olfactory receptor 2A14
OR2A14
Q96R47


Olfactory receptor 2A25
OR2A25
A4D2G3


Olfactory receptor 2AG1
OR2AG1
Q9H205


Olfactory receptor 2AG2
OR2AG2
A6NM03


Olfactory receptor 2AJ1
OR2AJ1
Q8NGZ0


Olfactory receptor 2AK2
OR2AK2
Q8NG84


Olfactory receptor 2AP1
OR2AP1
Q8NGE2


Olfactory receptor 2AT4
OR2AT4
A6NND4


Olfactory receptor 51I2
OR51I2
Q9H344


Olfactory receptor 51J1
OR51J1
Q9H342


Olfactory receptor 51L1
OR51L1
Q8NGJ5


Olfactory receptor 51M1
OR51M1
Q9H341


Olfactory receptor 51Q1
OR51Q1
Q8NH59


Olfactory receptor 51S1
OR51S1
Q8NGJ8


Olfactory receptor 51T1
OR51T1
Q8NGJ9


Olfactory receptor 51V1
OR51V1
Q9H2C8


Olfactory receptor 52A1
OR52A1
Q9UKL2


Olfactory receptor 52A5
OR52A5
Q9H2C5


Olfactory receptor 52B2
OR52B2
Q96RD2


Olfactory receptor 52B4
OR52B4
Q8NGK2


Olfactory receptor 52B6
OR52B6
Q8NGF0


Olfactory receptor 52D1
OR52D1
Q9H346


Olfactory receptor 52E2
OR52E2
Q8NGJ4


Olfactory receptor 52E4
OR52E4
Q8NGH9


Olfactory receptor 52E5
OR52E5
Q8NH55


Olfactory receptor 52E6
OR52E6
Q96RD3


Olfactory receptor 52E8
OR52E8
Q6IFG1


Olfactory receptor 52H1
OR52H1
Q8NGJ2


Olfactory receptor 52I1
OR52I1
Q8NGK6


Olfactory receptor 52I2
OR52I2
Q8NH67


Olfactory receptor 52K1
OR52K1
Q8NGK4


Olfactory receptor 52K2
OR52K2
Q8NGK3


Olfactory receptor 52L1
OR52L1
Q8NGH7


Olfactory receptor 52M1
OR52M1
Q8NGK5


Olfactory receptor 52N1
OR52N1
Q8NH53


Olfactory receptor 52N2
OR52N2
Q8NGI0


Olfactory receptor 52N4
OR52N4
Q8NGI2


Olfactory receptor 52N5
OR52N5
Q8NH56


Olfactory receptor 52R1
OR52R1
Q8NGF1


Olfactory receptor 52W1
OR52W1
Q6IF63


Red-sensitive opsin
OPN1LW
P04000


Visual pigment-like receptor peropsin
RRH
O14718


Olfactory receptor 1A1
OR1A1
Q9P1Q5


Olfactory receptor 1A2
OR1A2
Q9Y585


Olfactory receptor 1B1
OR1B1
Q8NGR6


Olfactory receptor 1C1
OR1C1
Q15619


Olfactory receptor 1D2
OR1D2
P34982


Olfactory receptor 1F1
OR1F1
O43749


Olfactory receptor 1F12
OR1F12
Q8NHA8


Olfactory receptor 1G1
OR1G1
P47890


Olfactory receptor 1I1
OR1I1
O60431


Olfactory receptor 1J1
OR1J1
Q8NGS3


Olfactory receptor 1J2
OR1J2
Q8NGS2


Olfactory receptor 1J4
OR1J4
Q8NGS1


Olfactory receptor 1K1
OR1K1
Q8NGR3


Olfactory receptor 1L1
OR1L1
Q8NH94


Olfactory receptor 1L3
OR1L3
Q8NH93


Olfactory receptor 1L4
OR1L4
Q8NGR5


Olfactory receptor 1L6
OR1L6
Q8NGR2


Olfactory receptor 1L8
OR1L8
Q8NGR8


Olfactory receptor 1M1
OR1M1
Q8NGA1


Olfactory receptor 1N1
OR1N1
Q8NGS0


Olfactory receptor 1N2
OR1N2
Q8NGR9


Olfactory receptor 1Q1
OR1Q1
Q15612


Olfactory receptor 1S1
OR1S1
Q8NH92


Olfactory receptor 1S2
OR1S2
Q8NGQ3


Olfactory receptor 2A2
OR2A2
Q6IF42


Olfactory receptor 2A4
OR2A4
O95047


Olfactory receptor 2B2
OR2B2
Q9GZK3


Putative olfactory receptor 2B3
OR2B3
O76000


Olfactory receptor 2B6
OR2B6
P58173


Putative olfactory receptor 2B8
OR2B8P
P59922


Olfactory receptor 2T5
OR2T5
Q6IEZ7


Olfactory receptor 2T6
OR2T6
Q8NHC8


Olfactory receptor 2T8
OR2T8
A6NH00


Olfactory receptor 2V1
OR2V1
Q8NHB1


Olfactory receptor 2V2
OR2V2
Q96R30


Olfactory receptor 2W1
OR2W1
Q9Y3N9


Olfactory receptor 2W3
OR2W3
Q7Z3T1


Olfactory receptor 2Y1
OR2Y1
Q8NGV0


Olfactory receptor 2Z1
OR2Z1
Q8NG97


Olfactory receptor 3A1
OR3A1
P47881


Olfactory receptor 3A2
OR3A2
P47893


Olfactory receptor 3A3
OR3A3
P47888


Olfactory receptor 3A4
OR3A4
P47883


Putative olfactory receptor 4A4
OR4A4P
Q8NGN8


Olfactory receptor 4A5
OR4A5
Q8NH83


Olfactory receptor 4A8
OR4A8P
P0C604


Olfactory receptor 4B1
OR4B1
Q8NGF8


Olfactory receptor 4C3
OR4C3
Q8NH37


Olfactory receptor 4C5
OR4C5
Q8NGB2


Olfactory receptor 4C6
OR4C6
Q8NH72


Olfactory receptor 4C11
OR4C11
Q6IEV9


Olfactory receptor 4C12
OR4C12
Q96R67


Olfactory receptor 4C13
OR4C13
Q8NGP0


Olfactory receptor 4C15
OR4C15
Q8NGM1


Olfactory receptor 4C16
OR4C16
Q8NGL9


Olfactory receptor 4D1
OR4D1
Q15615


Olfactory receptor 4D2
OR4D2
P58180


Olfactory receptor 4D5
OR4D5
Q8NGN0


Olfactory receptor 4D6
OR4D6
Q8NGJ1


Olfactory receptor 4D9
OR4D9
Q8NGE8


Olfactory receptor 4D10
OR4D10
Q8NGI6


Olfactory receptor 4D11
OR4D11
Q8NGI4


Olfactory receptor 5B17
OR5B17
Q8NGF7


Olfactory receptor 5B21
OR5B21
A6NL26


Olfactory receptor 5C1
OR5C1
Q8NGR4


Olfactory receptor 5D13
OR5D13
Q8NGL4


Olfactory receptor 5D14
OR5D14
Q8NGL3


Olfactory receptor 5D16
OR5D16
Q8NGK9


Olfactory receptor 5D18
OR5D18
Q8NGL1


Olfactory receptor 5F1
OR5F1
O95221


Olfactory receptor 5H1
OR5H1
A6NKK0


Olfactory receptor 5H2
OR5H2
Q8NGV7


Olfactory receptor 5H6
OR5H6
Q8NGV6


Olfactory receptor 5I1
OR5I1
Q13606


Olfactory receptor 5J2
OR5J2
Q8NH18


Olfactory receptor 5K1
OR5K1
Q8NHB7


Olfactory receptor 5K2
OR5K2
Q8NHB8


Olfactory receptor 5K3
OR5K3
A6NET4


Olfactory receptor 5K4
OR5K4
A6NMS3


Olfactory receptor 5L1
OR5L1
Q8NGL2


Olfactory receptor 5L2
OR5L2
Q8NGL0


Olfactory receptor 5M1
OR5M1
Q8NGP8


Olfactory receptor 5M3
OR5M3
Q8NGP4


Olfactory receptor 5M8
OR5M8
Q8NGP6


Olfactory receptor 5M9
OR5M9
Q8NGP3


Olfactory receptor 5M10
OR5M10
Q6IEU7


Olfactory receptor 5M11
OR5M11
Q96RB7


Olfactory receptor 5P2
OR5P2
Q8WZ92


Olfactory receptor 5P3
OR5P3
Q8WZ94


Olfactory receptor 5R1
OR5R1
Q8NH85


Olfactory receptor 5T1
OR5T1
Q8NG75


Olfactory receptor 5T2
OR5T2
Q8NGG2


Olfactory receptor 5T3
OR5T3
Q8NGG3


Olfactory receptor 5W2
OR5W2
Q8NH69


Olfactory receptor 7G3
OR7G3
Q8NG95


Olfactory receptor 8A1
OR8A1
Q8NGG7


Olfactory receptor 8B3
OR8B3
Q8NGG8


Olfactory receptor 8B4
OR8B4
Q96RC9


Olfactory receptor 8B8
OR8B8
Q15620


Olfactory receptor 8B12
OR8B12
Q8NGG6


Olfactory receptor 8D1
OR8D1
Q8WZ84


Olfactory receptor 8D2
OR8D2
Q9GZM6


Olfactory receptor 8D4
OR8D4
Q8NGM9


Orexin receptor type 1
HCRTR1
O43613


Orexin receptor type 2
HCRTR2
O43614


Oxoeicosanoid receptor 1
OXER1
Q8TDS5


Oxytocin receptor
OXTR
P30559


P2Y purinoceptor 1
P2RY1
P47900


P2Y purinoceptor 2
P2RY2
P41231


P2Y purinoceptor 4
P2RY4
P51582


P2Y purinoceptor 6
P2RY6
Q15077


P2Y purinoceptor 8
P2RY8
Q86VZ1


Putative P2Y purinoceptor 10
P2RY10
O00398


P2Y purinoceptor 11
P2RY11
Q96G91


P2Y purinoceptor 12
P2RY12
Q9H244


P2Y purinoceptor 13
P2RY13
Q9BPV8


P2Y purinoceptor 14
P2RY14
Q15391


Proteinase-activated receptor 1
F2R
P25116


Proteinase-activated receptor 2
F2RL1
P55085


Proteinase-activated receptor 3
F2RL2
O00254


Proteinase-activated receptor 4
F2RL3
Q96RI0


Prostaglandin D2 receptor
PTGDR
Q13258


Prostaglandin E2 receptor EP1 subtype
PTGER1
P34995


Prostaglandin E2 receptor EP2 subtype
PTGER2
P43116


Prostaglandin E2 receptor EP3 subtype
PTGER3
P43115


Prostaglandin E2 receptor EP4 subtype
PTGER4
P35408


Type-2 angiotensin II receptor (AT2)
AGTR2
P50052


Apelin receptor
APLNR
P35414


B1 bradykinin receptor
BDKRB1
P46663


B2 bradykinin receptor
BDKRB2
P30411


C5a anaphylatoxin chemotactic receptor
C5AR1
P21730


Cholecystokinin receptor type A
CCKAR
P32238


C-C chemokine receptor type 10
CCR10
P46092


C-C chemokine receptor type 1
CCR1
P32246


C-C chemokine receptor type 2
CCR2
P41597


C-C chemokine receptor type 3
CCR3
P51677


C-C chemokine receptor type 4
CCR4
P51679


C-C chemokine receptor type 5
CCR5
P51681


C-C chemokine receptor type 6
CCR6
P51684


C-C chemokine receptor type 7
CCR7
P32248


C-C chemokine receptor type 8
CCR8
P51685


C-C chemokine receptor type 9
CCR9
P51686


Cysteinyl leukotriene receptor 1
CYSLTR1
Q9Y271


Cysteinyl leukotriene receptor 2
CYSLTR2
Q9NS75


Cannabinoid receptor 1
CNR1
P21554


Cannabinoid receptor 2
CNR2
P34972


CX3C chemokine receptor 1
CX3CR1
P49238


High affinity interleukin-8 receptor A
IL8RA
P25024


High affinity interleukin-8 receptor B
IL8RB
P25025


C-X-C chemokine receptor type 3
CXCR3
P49682


C-X-C chemokine receptor type 4
CXCR4
P61073


C-X-C chemokine receptor type 6
CXCR6
O00574


C-X-C chemokine receptor type 7
CXCR7
P25106


D(1A) dopamine receptor
DRD1
P21728


D(2) dopamine receptor
DRD2
P14416


D(3) dopamine receptor
DRD3
P35462


D(4) dopamine receptor
DRD4
P21917


D(1B) dopamine receptor
DRD5
P21918


Melanocyte-stimulating hormone receptor
MC1R
Q01726


Melatonin receptor type 1A
MTNR1A
P48039


Melatonin receptor type 1B
MTNR1B
P49286


Substance-P receptor
TACR1
P25103


Substance-K receptor
TACR2
P21452


Neuromedin-K receptor
TACR3
P29371


Neuromedin-B receptor
NMBR
P28336


Neuropeptides B/W receptor type 1
NPBWR1
P48145


Neuropeptides B/W receptor type 2
NPBWR2
P48146


Neuropeptide FF receptor 1
NPFFR1
Q9GZQ6


Neuropeptide FF receptor 2
NPFFR2
Q9Y5X5


Neuropeptide Y receptor type 1
NPY1R
P25929


Neuropeptide Y receptor type 2
NPY2R
P49146


Neuropeptide Y receptor type 4
PPYR1
P50391


Neuropeptide Y receptor type 5
NPY5R
Q15761


Neurotensin receptor type 1
NTSR1
P30989


Neurotensin receptor type 2
NTSR2
O95665


Olfactory receptor 10A2
OR10A2
Q9H208


Olfactory receptor 10A3
OR10A3
P58181


Olfactory receptor 10A4
OR10A4
Q9H209


Olfactory receptor 10A5
OR10A5
Q9H207


Olfactory receptor 10A6
OR10A6
Q8NH74


Olfactory receptor 10A7
OR10A7
Q8NGE5


Olfactory receptor 10AD1
OR10AD1
Q8NGE0


Olfactory receptor 10AG1
OR10AG1
Q8NH19


Olfactory receptor 10C1
OR10C1
Q96KK4


Olfactory receptor 10G2
OR10G2
Q8NGC3


Olfactory receptor 10G3
OR10G3
Q8NGC4


Olfactory receptor 10G4
OR10G4
Q8NGN3


Olfactory receptor 10G6
OR10G6
Q8NH81


Olfactory receptor 10G7
OR10G7
Q8NGN6


Olfactory receptor 10G8
OR10G8
Q8NGN5


Olfactory receptor 2T10
OR2T10
Q8NGZ9


Olfactory receptor 2T11
OR2T11
Q8NH01


Olfactory receptor 2T12
OR2T12
Q8NG77


Olfactory receptor 2T27
OR2T27
Q8NH04


Olfactory receptor 2T29
OR2T29
Q8NH02


Olfactory receptor 2T33
OR2T33
Q8NG76


Olfactory receptor 2T34
OR2T34
Q8NGX1


Olfactory receptor 2T35
OR2T35
Q8NGX2


Olfactory receptor 4A15
OR4A15
Q8NGL6


Olfactory receptor 4A16
OR4A16
Q8NH70


Olfactory receptor 4A47
OR4A47
Q6IF82


Olfactory receptor 4C45
OR4C45
A6NMZ5


Olfactory receptor 4C46
OR4C46
A6NHA9


Olfactory receptor 4F15
OR4F15
Q8NGB8


Olfactory receptor 4F17
OR4F17
Q8NGA8


Olfactory receptor 4F21
OR4F21
O95013


Olfactory receptor 51A2
OR51A2
Q8NGJ7


Olfactory receptor 51A4
OR51A4
Q8NGJ6


Olfactory receptor 51A7
OR51A7
Q8NH64


Olfactory receptor 51B2
OR51B2
Q9Y5P1


Olfactory receptor 51B4
OR51B4
Q9Y5P0


Olfactory receptor 51B5
OR51B5
Q9H339


Olfactory receptor 51B5
OR51B6
Q9H340


Olfactory receptor 51D1
OR51D1
Q8NGF3


Olfactory receptor 51E1
OR51E1
Q8TCB6


Olfactory receptor 51E2
OR51E2
Q9H255


Olfactory receptor 51F1
OR51F1
A6NGY5


Olfactory receptor 51F2
OR51F2
Q8NH61


Olfactory receptor 51G1
OR51G1
Q8NGK1


Olfactory receptor 51G2
OR51G2
Q8NGK0


Putative olfactory receptor 51H1
OR51H1P
Q8NH63


Olfactory receptor 51I1
OR51I1
Q9H343


Olfactory receptor 56A1
OR56A1
Q8NGH5


Olfactory receptor 56A3
OR56A3
Q8NH54


Olfactory receptor 56A4
OR56A4
Q8NGH8


Olfactory receptor 56A5
OR56A5
P0C7T3


Olfactory receptor 56B1
OR56B1
Q8NGI3


Olfactory receptor 56B4
OR56B4
Q8NH76


Olfactory receptor 5AC2
OR5AC2
Q9NZP5


Olfactory receptor 5AK2
OR5AK2
Q8NH90


Olfactory receptor 5AN1
OR5AN1
Q8NGI8


Olfactory receptor 5AP2
OR5AP2
Q8NGF4


Olfactory receptor 5AR1
OR5AR1
Q8NGP9


Olfactory receptor 5AS1
OR5AS1
Q8N127


Olfactory receptor 5AU1
OR5AU1
Q8NGC0


Olfactory receptor 5H14
OR5H14
A6NHG9


Olfactory receptor 5H15
OR5H15
A6NDH6


Olfactory receptor 6C65
OR6C65
A6NJZ3


Olfactory receptor 6C68
OR6C68
A6NDL8


Olfactory receptor 6C70
OR6C70
A6NIJ9


Olfactory receptor 6C74
OR6C74
A6NCV1


Olfactory receptor 6C75
OR6C75
A6NL08


Olfactory receptor 6C76
OR6C76
A6NM76


Olfactory receptor 7E24
OR7E24
Q6IFN5


Opsin-3
OPN3
Q9H1Y3


Melanopsin
OPN4
Q9UHM6


Opsin-5
OPN5
Q6U736


δ-type opioid receptor
OPRD1
P41143


κ-type opioid receptor
OPRK1
P41145


μ-type opioid receptor
OPRM1
P35372


Nociceptin receptor
OPRL1
P41146


Blue-sensitive opsin
OPN1SW
P03999


Rhodopsin
RHO
P08100


Green-sensitive opsin
OPN1MW
P04001


Olfactory receptor 2B11
OR2B11
Q5JQS5


Olfactory receptor 2C1
OR2C1
O95371


Olfactory receptor 2C3
OR2C3
Q8N628


Olfactory receptor 2D2
OR2D2
Q9H210


Olfactory receptor 2D3
OR2D3
Q8NGH3


Olfactory receptor 2F1
OR2F1
Q13607


Olfactory receptor 2F2
OR2F2
O95006


Olfactory receptor 2G2
OR2G2
Q8NGZ5


Olfactory receptor 2G3
OR2G3
Q8NGZ4


Olfactory receptor 2G6
OR2G6
Q5TZ20


Olfactory receptor 2H1
OR2H1
Q9GZK4


Olfactory receptor 2H2
OR2H2
O95918


Putative olfactory receptor 2I1
OR2I1P
Q8NGU4


Olfactory receptor 2J1
OR2J1
Q9GZK6


Olfactory receptor 2J2
OR2J2
O76002


Olfactory receptor 2J3
OR2J3
O76001


Olfactory receptor 2K2
OR2K2
Q8NGT1


Olfactory receptor 2L2
OR2L2
Q8NH16


Olfactory receptor 2L3
OR2L3
Q8NG85


Olfactory receptor 2L5
OR2L5
Q8NG80


Olfactory receptor 2L8
OR2L8
Q8NGY9


Olfactory receptor 2L13
OR2L13
Q8N349


Olfactory receptor 2M2
OR2M2
Q96R28


Olfactory receptor 2M3
OR2M3
Q8NG83


Olfactory receptor 2M4
OR2M4
Q96R27


Olfactory receptor 2M5
OR2M5
A3KFT3


Olfactory receptor 2M7
OR2M7
Q8NG81


Olfactory receptor 2S2
OR2S2
Q9NQN1


Olfactory receptor 2T1
OR2T1
O43869


Olfactory receptor 2T2
OR2T2
Q6IF00


Olfactory receptor 2T3
OR2T3
Q8NH03


Olfactory receptor 2T4
OR2T4
Q8NH00


Olfactory receptor 4E1
OR4E1
P0C645


Olfactory receptor 4E2
OR4E2
Q8NGC2


Olfactory receptor 4F3/4F16/4F29
OR4F3
Q6IEY1


Olfactory receptor 4F4
OR4F4
Q96R69


Olfactory receptor 4F5
OR4F5
Q8NH21


Olfactory receptor 4F6
OR4F6
Q8NGB9


Olfactory receptor 4K1
OR4K1
Q8NGD4


Olfactory receptor 4K2
OR4K2
Q8NGD2


Olfactory receptor 4K3
OR4K3
Q96R72


Olfactory receptor 4K5
OR4K5
Q8NGD3


Olfactory receptor 4K13
OR4K13
Q8NH42


Olfactory receptor 4K14
OR4K14
Q8NGD5


Olfactory receptor 4K15
OR4K15
Q8NH41


Olfactory receptor 4K17
OR4K17
Q8NGC6


Olfactory receptor 4L1
OR4L1
Q8NH43


Olfactory receptor 4M1
OR4M1
Q8NGD0


Olfactory receptor 4M2
OR4M2
Q8NGB6


Olfactory receptor 4N2
OR4N2
Q8NGD1


Olfactory receptor 4N4
OR4N4
Q8N0Y3


Olfactory receptor 4N5
OR4N5
Q8IXE1


Olfactory receptor 4P4
OR4P4
Q8NGL7


Olfactory receptor 4Q2
OR4Q2
P0C623


Olfactory receptor 4Q3
OR4Q3
Q8NH05


Olfactory receptor 4S1
OR4S1
Q8NGB4


Olfactory receptor 4S2
OR4S2
Q8NH73


Olfactory receptor 4X1
OR4X1
Q8NH49


Olfactory receptor 4X2
OR4X2
Q8NGF9


Olfactory receptor 5A1
OR5A1
Q8NGJ0


Olfactory receptor 5A2
OR5A2
Q8NGI9


Olfactory receptor 5B2
OR5B2
Q96R09


Olfactory receptor 5B3
OR5B3
Q8NH48


Olfactory receptor 5B12
OR5B12
Q96R08


Olfactory receptor 6A2
OR6A2
O95222


Olfactory receptor 6B1
OR6B1
O95007


Olfactory receptor 6B2
OR6B2
Q6IFH4


Olfactory receptor 6B3
OR6B3
Q8NGW1


Olfactory receptor 6C1
OR6C1
Q96RD1


Olfactory receptor 6C2
OR6C2
Q9NZP2


Olfactory receptor 6C3
OR6C3
Q9NZP0


Olfactory receptor 6C4
OR6C4
Q8NGE1


Olfactory receptor 6C6
OR6C6
A6NF89


Olfactory receptor 6F1
OR6F1
Q8NGZ6


Olfactory receptor 6J1
OR6J1
Q8NGC5


Olfactory receptor 6K2
OR6K2
Q8NGY2


Olfactory receptor 6K3
OR6K3
Q8NGY3


Olfactory receptor 6K6
OR6K6
Q8NGW6


Olfactory receptor 6M1
OR6M1
Q8NGM8


Olfactory receptor 6N1
OR6N1
Q8NGY5


Olfactory receptor 6N2
OR6N2
Q8NGY6


Olfactory receptor 6P1
OR6P1
Q8NGX9


Olfactory receptor 6Q1
OR6Q1
Q8NGQ2


Olfactory receptor 6S1
OR6S1
Q8NH40


Olfactory receptor 6T1
OR6T1
Q8NGN1


Olfactory receptor 6V1
OR6V1
Q8N148


Olfactory receptor 6X1
OR6X1
Q8NH79


Olfactory receptor 6Y1
OR6Y1
Q8NGX8


Olfactory receptor 7A5
OR7A5
Q15622


Olfactory receptor 7A10
OR7A10
O76100


Olfactory receptor 7A17
OR7A17
O14581


Olfactory receptor 7C1
OR7C1
O76099


Olfactory receptor 7C2
OR7C2
O60412


Olfactory receptor 7D4
OR7D4
Q8NG98


Olfactory receptor 7G1
OR7G1
Q8NGA0


Olfactory receptor 7G2
OR7G2
Q8NG99


Prostaglandin F2-α receptor
PTGFR
P43088


Prostacyclin receptor
PTGIR
P43119


Prolactin-releasing peptide receptor
PRLHR
P49683


Platelet-activating factor receptor
PTAFR
P25105


Pyroglutamylated RFamide peptide receptor
QRFPR
Q96P65


RPE-retinal G protein-coupled receptor
RGR
P47804


Sphingosine 1-phosphate receptor 1
S1PR1
P21453


Sphingosine 1-phosphate receptor 2
S1PR2
O95136


Sphingosine 1-phosphate receptor 3
S1PR3
Q99500


Sphingosine 1-phosphate receptor 4
S1PR4
O95977


Sphingosine 1-phosphate receptor 5
S1PR5
Q9H228


Somatostatin receptor type 1
SSTR1
P30872


Somatostatin receptor type 2
SSTR2
P30874


Somatostatin receptor type 3
SSTR3
P32745


Somatostatin receptor type 4
SSTR4
P31391


Somatostatin receptor type 5
SSTR5
P35346


Thromboxane A2 receptor
TBXA2R
P21731


Trace amine-associated receptor 1
TAAR1
Q96RJ0


Trace amine-associated receptor 2
TAAR2
Q9P1P5


Putative trace amine-associated receptor 3
TAAR3
Q9P1P4


Trace amine-associated receptor 5
TAAR5
O14804


Trace amine-associated receptor 6
TAAR6
Q96RI8


Trace amine-associated receptor 8
TAAR8
Q969N4


Trace amine-associated receptor 9
TAAR9
Q96RI9


Thyrotropin receptor
TSHR
P16473


Vasopressin V1a receptor
AVPR1A
P37288


Vasopressin V1b receptor
AVPR1B
P47901


Vasopressin V2 receptor
AVPR2
P30518


Chemokine XC receptor 1
XCR1
P46094


Brain-specific angiogenesis inhibitor 1
BAI1
O14514


Brain-specific angiogenesis inhibitor 2
BAI2
O60241


Brain-specific angiogenesis inhibitor 3
BAI3
O60242


Calcitonin receptor
CALCR
P30988


Calcitonin gene-related peptide type 1
CALCRL
Q16602


receptor


Corticotropin-releasing factor receptor 1
CRHR1
P34998


Corticotropin-releasing factor receptor 2
CRHR2
Q13324


Growth hormone-releasing hormone receptor
GHRHR
Q02643


Gastric inhibitory polypeptide receptor
GIPR
P48546


Glucagon-like peptide 1 receptor
GLP1R
P43220


Glucagon-like peptide 2 receptor
GLP2R
O95838


Glucagon receptor
GCGR
P47871


Pituitary adenylate cyclase-activating
ADCYAP1R1
P41586


polypeptide type I receptor


Taste receptor 1 member 2
TAS1R2
Q8TE23


Parathyroid hormone receptor 1
PTH1R
Q03431


Parathyroid hormone 2 receptor
PTH2R
P49190


Secretin receptor
SCTR
P47872


Vasoactive intestinal polypeptide receptor 1
VIPR1
P32241


Vasoactive intestinal polypeptide receptor 2
VIPR2
P41587


Frizzled-10
FZD10
Q9ULW2


Frizzled-1
FZD1
Q9UP38


Frizzled-2
FZD2
Q14332


Frizzled-3
FZD3
Q9NPG1


Frizzled-4
FZD4
Q9ULV1


Frizzled-5
FZD5
Q13467


Frizzled-6
FZD6
O60353


Frizzled-7
FZD7
O75084


Frizzled-8
FZD8
Q9H461


Frizzled-9
FZD9 (FZD3)
O00144


Smoothened homologue
SMO (SMOH)
Q99835


Extracellular calcium-sensing receptor
CASR
P41180


GABA type B receptor 1subunit 1
GABBR1
Q9UBS5


GABA type B receptor subunit 2
GABBR2
O75899


GPCR family C group 6 member A
GPRC6A
Q5T6X5


Metabotropic glutamate receptor 1
GRM1
Q13255


Metabotropic glutamate receptor 2
GRM2
Q14416


Metabotropic glutamate receptor 3
GRM3
Q14832


Metabotropic glutamate receptor 4
GRM4
Q14833


Metabotropic glutamate receptor 5
GRM5
P41594


Metabotropic glutamate receptor 6
GRM6
O15303


Metabotropic glutamate receptor 7
GRM7
Q14831


Metabotropic glutamate receptor 8
GRM8
O00222


Taste receptor 1 member 1
TAS1R1
Q7RTX1


Taste receptor 1 member 2
TAS1R2
Q8TE23


Taste receptor 1 member 3
TAS1R3
Q7RTX0









Described herein are GPCR binding domains, wherein the GPCR binding domains are designed based on surface interactions between a GPCR ligand and the GPCR. In some instances, the ligand is a subatomic particle (e.g., a photon), an ion, an organic molecule, a peptide, and a protein. Non-limiting examples of ligands which can be bound by a GPCR include (−)-adrenaline, (−)-noradrenaline, (lyso)phospholipid mediators, [des-Arg10]kallidin, [des-Arg9]bradykinin, [des-Gln14]ghrelin, [Hyp3]bradykinin, [Leu]enkephalin, [Met]enkephalin, 12-hydroxyheptadecatrienoic acid, 12R-HETE, 12S-HETE, 12S-HPETE, 15S-HETE, 17β-estradiol, 20-hydroxy-LTB4, 2-arachidonoylglycerol, 2-oleoyl-LPA, 3-hydroxyoctanoic acid, 5-hydroxytryptamine, 5-oxo-15-HETE, 5-oxo-ETE, 5-oxo-ETrE, 5-oxo-ODE, 5S-HETE, 5S-HPETE, 7α,25-dihydroxycholesterol, acetylcholine, ACTH, adenosine diphosphate, adenosine, adrenomedullin 2/intermedin, adrenomedullin, amylin, anandamide, angiotensin II, angiotensin III, annexin I, apelin receptor early endogenous ligand, apelin-13, apelin-17, apelin-36, aspirin triggered lipoxin A4, aspirin-triggered resolvin D1, ATP, beta-defensin 4A, big dynorphin, bovine adrenal medulla peptide 8-22, bradykinin, C3a, C5a, Ca2+, calcitonin gene related peptide, calcitonin, cathepsin G, CCK-33, CCK-4, CCK-8, CCL1, CCL11, CCL13, CCL14, CCL15, CCL16, CCL17, CCL19, CCL2, CCL20, CCL21, CCL22, CCL23, CCL24, CCL25, CCL26, CCL27, CCL28, CCL3, CCL4, CCL5, CCL7, CCL8, chemerin, chenodeoxycholic acid, cholic acid, corticotrophin-releasing hormone, CST-17, CX3CL1, CXCL1, CXCL10, CXCL11, CXCL12α, CXCL12β, CXCL13, CXCL16, CXCL2, CXCL3, CXCL4, CXCL6, CXCL7, CXCL8, CXCL9, cysteinyl-leukotrienes (CysLTs), uracil nucleotides, deoxycholic acid, dihydrosphingosine-1-phosphate, dioleoylphosphatidic acid, dopamine, dynorphin A, dynorphin A-(1-13), dynorphin A-(1-8), dynorphin B, endomorphin-1, endothelin-1, endothelin-2, endothelin-3, F2L, Free fatty acids, FSH, GABA, galanin, galanin-like peptide, gastric inhibitory polypeptide, gastrin-17, gastrin-releasing peptide, ghrelin, GHRH, glucagon, glucagon-like peptide 1-(7-36) amide, glucagon-like peptide 1-(7-37), glucagon-like peptide 2, glucagon-like peptide 2-(3-33), GnRH I, GnRH II, GRP-(18-27), hCG, histamine, humanin, INSL3, INSL5, kallidin, kisspeptin-10, kisspeptin-13, kisspeptin-14, kisspeptin-54, kynurenic acid, large neuromedin N, large neurotensin, L-glutamic acid, LH, lithocholic acid, L-lactic acid, long chain carboxylic acids, LPA, LTB4, LTC4, LTD4, LTE4, LXA4, Lys-[Hyp3]-bradykinin, lysophosphatidylinositol, lysophosphatidylserine, Medium-chain-length fatty acids, melanin-concentrating hormone, melatonin, methylcarbamyl PAF, Mg2+, motilin, N-arachidonoylglycine, neurokinin A, neurokinin B, neuromedin B, neuromedin N, neuromedin S-33, neuromedin U-25, neuronostatin, neuropeptide AF, neuropeptide B-23, neuropeptide B-29, neuropeptide FF, neuropeptide S, neuropeptide SF, neuropeptide W-23, neuropeptide W-30, neuropeptide Y, neuropeptide Y-(3-36), neurotensin, nociceptin/orphanin FQ, N-oleoylethanolamide, obestatin, octopamine, orexin-A, orexin-B, Oxysterols, oxytocin, PACAP-27, PACAP-38, PAF, pancreatic polypeptide, peptide YY, PGD2, PGE2, PGF2a, PGI2, PGJ2, PHM, phosphatidylserine, PHV, prokineticin-1, prokineticin-2, prokineticin-2β, prosaposin, PrRP-20, PrRP-31, PTH, PTHrP, PTHrP-(1-36), QRFP43, relaxin, relaxin-1, relaxin-3, resolvin D1, resolvin E1, RFRP-1, RFRP-3, R-spondins, secretin, serine proteases, sphingosine 1-phosphate, sphingosylphosphorylcholine, SRIF-14, SRIF-28, substance P, succinic acid, thrombin, thromboxane A2, TIP39, T-kinin, TRH, TSH, tyramine, UDP-glucose, uridine diphosphate, urocortin 1, urocortin 2, urocortin 3, urotensin II-related peptide, urotensin-II, vasopressin, VIP, Wnt, Wnt-1, Wnt-10a, Wnt-10b, Wnt-11, Wnt-16, Wnt-2, Wnt-2b, Wnt-3, Wnt-3a, Wnt-4, Wnt-5a, Wnt-5b, Wnt-6, Wnt-7a, Wnt-7b, Wnt-8a, Wnt-8b, Wnt-9a, Wnt-9b, XCL1, XCL2, Zn2+, α-CGRP, α-ketoglutaric acid, α-MSH, α-neoendorphin, β-alanine, β-CGRP, β-D-hydroxybutyric acid, β-endorphin, β-MSH, β-neoendorphin, β-phenylethylamine, and γ-MSH.


Sequences of GPCR binding domains based on surface interactions between a GPCR ligand and the GPCR are analyzed using various methods. For example, multispecies computational analysis is performed. In some instances, a structure analysis is performed. In some instances, a sequence analysis is performed. Sequence analysis can be performed using a database known in the art. Non-limiting examples of databases include, but are not limited to, NCBI BLAST (blast.ncbi.nlm.nih.gov/Blast.cgi), UCSC Genome Browser (genome.ucsc.edu/), UniProt (www.uniprot.org/), and IUPHAR/BPS Guide to PHARMACOLOGY (guidetopharmacology.org/).


Described herein are GPCR binding domains designed based on sequence analysis among various organisms. For example, sequence analysis is performed to identify homologous sequences in different organisms. Exemplary organisms include, but are not limited to, mouse, rat, equine, sheep, cow, primate (e.g., chimpanzee, baboon, gorilla, orangutan, monkey), dog, cat, pig, donkey, rabbit, fish, fly, and human.


Following identification of GPCR binding domains, libraries comprising nucleic acids encoding for the GPCR binding domains may be generated. In some instances, libraries of GPCR binding domains comprise sequences of GPCR binding domains designed based on conformational ligand interactions, peptide ligand interactions, small molecule ligand interactions, extracellular domains of GPCRs, or antibodies that target GPCRs. Libraries of GPCR binding domains may be translated to generate protein libraries. In some instances, libraries of GPCR binding domains are translated to generate peptide libraries, immunoglobulin libraries, derivatives thereof, or combinations thereof. In some instances, libraries of GPCR binding domains are translated to generate protein libraries that are further modified to generate peptidomimetic libraries. In some instances, libraries of GPCR binding domains are translated to generate protein libraries that are used to generate small molecules.


Methods described herein provide for synthesis of libraries of GPCR binding domains comprising nucleic acids each encoding for a predetermined variant of at least one predetermined reference nucleic acid sequence. In some cases, the predetermined reference sequence is a nucleic acid sequence encoding for a protein, and the variant library comprises sequences encoding for variation of at least a single codon such that a plurality of different variants of a single residue in the subsequent protein encoded by the synthesized nucleic acid are generated by standard translation processes. In some instances, the libraries of GPCR binding domains comprise varied nucleic acids collectively encoding variations at multiple positions. In some instances, the variant library comprises sequences encoding for variation of at least a single codon in a GPCR binding domain. In some instances, the variant library comprises sequences encoding for variation of multiple codons in a GPCR binding domain. An exemplary number of codons for variation include, but are not limited to, at least or about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 225, 250, 275, 300, or more than 300 codons.


Methods described herein provide for synthesis of libraries comprising nucleic acids encoding for the GPCR binding domains, wherein the libraries comprise sequences encoding for variation of length of the GPCR binding domains. In some instances, the library comprises sequences encoding for variation of length of at least or about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 225, 250, 275, 300, or more than 300 codons less as compared to a predetermined reference sequence. In some instances, the library comprises sequences encoding for variation of length of at least or about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 200, 225, 250, 275, 300, or more than 300 codons more as compared to a predetermined reference sequence.


Following identification of GPCR binding domains, the GPCR binding domains may be placed in scaffolds as described herein. In some instances, the scaffolds are immunoglobulins. In some instances, the GPCR binding domains are placed in the CDR-H3 region. GPCR binding domains that may be placed in scaffolds can also be referred to as a motif. Scaffolds comprising GPCR binding domains may be designed based on binding, specificity, stability, expression, folding, or downstream activity. In some instances, the scaffolds comprising GPCR binding domains enable contact with the GPCRs. In some instances, the scaffolds comprising GPCR binding domains enables high affinity binding with the GPCRs. Exemplary amino acid sequences of GPCR binding domains are described in Table 2.









TABLE 2







GPCR amino acid sequences









SEQ




ID




NO
GPCR
Amino Acid Sequence












1
CXCR4
MEGISIYTSDNYTEEMGSGDYDSMKEPCFREENAN




FNKIFLPTIYSIIFLTGIVGNGLVILVMGYQKKLR




SMTDKYRLHLSVADLLFVITLPFWAVDAVANWYFG




NFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVH




ATNSQRPRKLLAEKVVYVGVWIPALLLTIPDFIFA




NVSEADDRYICDRFYPNDLWWVFQFQHIMVGLILP




GIVILSCYCIIISKLSHSKGHQKRKALKTTVILIL




AFFACWLPYYIGISIDSFILLEIIKQGCEFENTVH




KWISITEALAFFHCCLNPILYAFLGAKFKTSAQHA




LTSVSRGSSLKILSKGKRGGHSSVSTESESSSFHS




S





2
CCR4
MNPTDIADTTLDESIYSNYYLYESIPKPCTKEGIK




AFGELFLPPLYSLVFVFGLLGNSVVVLVLFKYKRL




RSMTDVYLLNLAISDLLFVFSLPFWGYYAADQWVF




GLGLCKMISWMYLVGFYSGIFFVMLMSIDRYLAIV




HAVFSLRARTLTYGVITSLATWSVAVFASLPGFLF




STCYTERNHTYCRTKYSLNSTTWKVLSSLEINILG




LVIPLGIMLFCYSMIIRTLQHCKNEKKNKAVKMIF




AVVVLFLGFWTPYNIVLFLETLVELEVLQDCTFER




YLDYAIQATETLAFVHCCLNPIIYFFLGEKFRKYI




LQLFKTCRGLFVLCQYCGLLQIYSADTPSSSYTQS




TMDHDLHDAL





3
GCGR
MPPCQPQRPLLLLLLLLACQPQVPSAQVMDFLFEK




WKLYGDQCHHNLSLLPPPTELVCNRTFDKYSCWPD




TPANTTANISCPWYLPWHHKVQHRFVFKRCGPDGQ




WVRGPRGQPWRDASQCQMDGEEIEVQKEVAKMYSS




FQVMYTVGYSLSLGALLLALAILGGLSKLHCTRNA




IHANLFASFVLKASSVLVIDGLLRTRYSQKIGDDL




SVSTWLSDGAVAGCRVAAVFMQYGIVANYCWLLVE




GLYLHNLLGLATLPERSFFSLYLGIGWGAPMLFVV




PWAVVKCLFENVQCWTSNDNMGFWWILRFPVFLAI




LINFFIFVRIVQLLVAKLRARQMHHTDYKFRLAKS




TLTLIPLLGVHEVVFAFVTDEHAQGTLRSAKLFFD




LFLSSFQGLLVAVLYCFLNKEVQSELRRRWHRWRL




GKVLWEERNTSNHRASSSPGHGPPSKELQFGRGGG




SQDSSAETPLAGGLPRLAESPF





4
mGluR5
MVLLLILSVLLLKEDVRGSAQSSERRVVAHMPGDI




IIGALFSVHHQPTVDKVHERKCGAVREQYGIQRVE




AMLHTLERINSDPTLLPNITLGCEIRDSCWHSAVA




LEQSIEFIRDSLISSEEEEGLVRCVDGSSSSFRSK




KPIVGVIGPGSSSVAIQVQNLLQLFNIPQIAYSAT




SMDLSDKTLFKYFMRVVPSDAQQARAMVDIVKRYN




WTYVSAVHTEGNYGESGMEAFKDMSAKEGI




CIAHSYKIYSNAGEQSFDKLLKKLTSHLPKARVVA




CFCEGMTVRGLLMAMRRLGLAGEFLLLGSDGWADR




YDVTDGYQREAVGGITIKLQSPDVKWFDDYYLKLR




PETNHRNPWFQEFWQHRFQCRLEGFPQENSKYNKT




CNSSLTLKTHHVQDSKMGFVINAIYSMAYGLHNMQ




MSLCPGYAGLCDAMKPIDGRKLLESLMKTNFTGVS




GDTILFDENGDSPGRYEIMNFKEMGKDYFDYINVG




SWDNGELKMDDDEVWSKKSNIIRSVCSEPCEKGQI




KVIRKGEVSCCWTCTPCKENEYVFDEYTCKACQLG




SWPTDDLTGCDLIPVQYLRWGDPEPIAAVVFACLG




LLATLFVTVVFIIYRDTPVVKSSSRELCYIILAGI




CLGYLCTFCLIAKPKQIYCYLQRIGIGLSPAMSYS




ALVTKTNRIARILAGSKKKICTKKPRFMSACAQLV




IAFILICIQLGIIVALFIMEPPDIMHDYPSIREVY




LICNTTNLGVVTPLGYNGLLILSCTFYAFKTRNVP




ANFNEAKYIAFTMYTTCIIWLAFVPIYFGSNYKII




TMCFSVSLSATVALGCMFVPKVYIILAKPERNVRS




AFTTSTVVRMHVGDGKSSSAASRSSSLVNLWKRRG




SSGETLRYKDRRLAQHKSEIECFTPKGSMGNGGRA




TMSSSNGKSVTWAQNEKSSRGQHLWQRLSIHINKK




ENPNQTAVIKPFPKSTESRGLGAGAGAGGSAGGVG




ATGGAGCAGAGPGGPESPDAGPKALYDVAEAEEHF




PAPARPRSPSPISTLSHRAGSASRTDDDVPSLHSE




PVARSSSSQGSLMEQISSVVTRFTANISELNSMML




STAAPSPGVGAPLCSSYLIPKEIQLPTTMTTFAEI




QPLPAIEVTGGAQPAAGAQAAGDAARESPAAGPEA




AAAKPDLEELVALTPPSPFRDSVDSGSTTPNSPVS




ESALCIPSSPKYDTLIIRDYTQSSSSL





5
GLP-1R
RPQGATVSLWETVQKWREYRRQCQRSLTEDPPPAT




DLFCNRTFDEYACWPDGEPGSFVNVSCPWYLPWAS




SVPQGHVYRFCTAEGLWLQKDNSSLPWRDLSECEE




SKRGERSSPEEQLLFLYIIYTVGYALSFSALVIAS




AILLGFRHLHCTRNYIHLNLFASFILRALSVFIKD




AALKWMYSTAAQQHQWDGLLSYQDSLSCRLVFLLM




QYCVAANYYWLLVEGVYLYTLLAFSVLSEQWIFRL




YVSIGWGVPLLFVVPWGIVKYLYEDEGCWTRNSNM




NYWLIIRLPILFAIGVNFLIFVRVICIVVSKLKAN




LMCKTDIKCRLAKSTLTLIPLLGTHEVIFAFVMDE




HARGTLRFIKLFTELSFTSFQGLMVAILYCFVNNE




VQLEFRKSWERWRLEHLHIQRDSSMKPLKCPTSSL




SSGATAGSSMYTATCQASCS





6
GABAB
MLLLLLLAPLFLRPPGAGGAQTPNATSEGCQIIHP




PWEGGIRYRGLTRDQVKAINFLPVDYEIEYVCRGE




REVVGPKVRKCLANGSWTDMDTPSRCVRICSKSYL




TLENGKVFLTGGDLPALDGARVDFRCDPDFHLVGS




SRSICSQGQWSTPKPHCQVNRTPHSERRAVYIGAL




FPMSGGWPGGQACQPAVEMALEDVNSRRDILPDYE




LKLIHHDSKCDPGQATKYLYELLYNDPIKIILMPG




CSSVSTLVAEAARMWNLIVLSYGSSSPALSNRQRF




PTFFRTHPSATLHNPTRVKLFEKWGWKKIATIQQT




TEVFTSTLDDLEERVKEAGIEITFRQSFFSDPAVP




VKNLKRQDARIIVGLFYETEARKVFCEVYKERLFG




KKYVWFLIGWYADNWFKIYDPSINCTVDEMTEAVE




GHITTEIVMLNPANTRSISNMTSQEFVEKLTKRLK




RHPEETGGFQEAPLAYDAIWALALALNKTSGGGGR




SGVRLEDFNYNNQTITDQIYRAMNSSSFEGVSGHV




VFDASGSRMAWTLIEQLQGGSYKKIGYYDSTKDDL




SWSKTDKWIGGSPPADQTLVIKTFRFLSQKLFISV




SVLSSLGIVLAVVCLSFNIYNSHVRYIQNSQPNLN




NLTAVGCSLALAAVFPLGLDGYHIGRNQFPFVCQA




RLWLLGLGFSLGYGSMFTKIWWVHTVFTKKEEKKE




WRKTLEPWKLYATVGLLVGMDVLTLAIWQIVDPLH




RTIETFAKEEPKEDIDVSILPQLEHCSSRKMNTWL




GIFYGYKGLLLLLGIFLAYETKSVSTEKINDHRAV




GMAIYNVAVLCLITAPVTMILSSQQDAAFAFASLA




IVFSSYITLVVLFVPKMRRLITRGEWQSEAQDTMK




TGSSTNNNEEEKSRLLEKENRELEKIIAEKEERVS




ELRHQLQSRQQLRSRRHPPTPPEPSGGLPRGPPEP




PDRLSCDGSRVHLLYK





7
OPRM1
MDSSAAPTNASNCTDALAYSSCSPAPSPGSWVNLS




HLDGNLSDPCGPNRTDLGGRDSLCPPTGSPSMITA




ITIMALYSIVCVVGLFGNFLVMYVIVRYTKMKTAT




NIYIFNLALADALATSTLPFQSVNYLMGTWPFGTI




LCKIVISIDYYNMFTSIFTLCTMSVDRYIAVCHPV




KALDFRTPRNAKIINVCNWILSSAIGLPVMFMATT




KYRQGSIDCTLTFSHPTWYWENLLKICVFIFAFIM




PVLIITVCYGLMILRLKSVRMLSGSKEKDRNLRRI




TRMVLVVVAVFIVCWTPIHIYVIIKALVTIPETTF




QTVSWHFCIALGYTNSCLNPVLYAFLDENFKRCFR




EFCIPTSSNIEQQNSTRIRQNTRDHPSTANTVDRT




NHQLENLEAETAPLP





8
OPRK1
MDSPIQIFRGEPGPTCAPSACLPPNSSAWFPGWAE




PDSNGSAGSEDAQLEPAHISPAIPVIITAVYSVVF




VVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALAD




ALVTTTMPFQSTVYLMNSWPFGDVLCKIVISIDYY




NMFTSIFTLTMMSVDRYIAVCHPVKALDFRTPLKA




KIINICIWLLSSSVGISAIVLGGTKVREDVDVIEC




SLQFPDDDYSWWDLFMKICVFIFAFVIPVLIIIVC




YTLMILRLKSVRLLSGSREKDRNLRRITRLVLVVV




AVFVVCWTPIHIFILVEALGSTSHSTAALSSYYFC




IALGYTNSSLNPILYAFLDENFKRCFRDFCFPLKM




RMERQSTSRVRNTVQDPAYLRDIDGMNKPV





9
C5aR
MDSFNYTTPDYGHYDDKDTLDLNTPVDKTSNTLRV




PDILALVIFAVVFLVGVLGNALVVWVTAFEAKRTI




NAIWFLNLAVADFLSCLALPILFTSIVQHHHWPFG




GAACSILPSLILLNMYASILLLATISADRFLLVFK




PIWCQNFRGAGLAWIACAVAWGLALLLTIPSFLYR




VVREEYFPPKVLCGVDYSHDKRRERAVAIVRLVLG




FLWPLLTLTICYTFILLRTWSRRATRSTKTLKVVV




AVVASFFIFWLPYQVTGIMMSFLEPSSPTFLLLKK




LDSLCVSFAYINCCINPIIYVVAGQGFQGRLRKSL




PSLLRNVLTEESVVRESKSFTRSTVDTMAQKTQAV


10
CGRP
ELEESPEDSIQLGVTRNKIMTAQYECYQKIMQDPI




QQAEGVYCNRTWDGWLCWNDVAAGTESMQLCPDYF




QDFDPSEKVTKICDQDGNWFRHPASNRTWTNYTQC




NVNTHEKVKTALNLFYLTIIGHGLSIASLLISLGI




FFYFKSLSCQRITLHKNLFFSFVCNSVVTIIHLTA




VANNQALVATNPVSCKVSQFIHLYLMGCNYFWMLC




EGIYLHTLIVVAVFAEKQHLMWYYFLGWGFPLIPA




CIHAIARSLYYNDNCWISSDTHLLYIIHGPICAAL




LVNLFFLLNIVRVLITKLKVTHQAESNLYMKAVRA




TLILVPLLGIEFVLIPWRPEGKIAEEVYDYIMHIL




MHFQGLLVSTIFCFFNGEVQAILRRNWNQYKIQFG




NSFSNSEALRSASYTVSTISDGPGYSHDCPSEHLN




GKSIHDIENVLLKPENLYN





11
M1
MNTSAPPAVSPNITVLAPGKGPWQVAFIGITTGLL



muscarinic
SLATVTGNLLVLISFKVNTELKTVNNYFLLSLACA




DLIIGTFSMNLYTTYLLMGHWALGTLACDLWLALD




YVASNASVMNLLLISFDRYFSVTRPLSYRAKRTPR




RAALMIGLAWLVSFVLWAPAILFWQYLVGERTVLA




GQCYIQFLSQPIITFGTAMAAFYLPVTVMCTLYWR




IYRETENRARELAALQGSETPGKGGGSSSSSERSQ




PGAEGSPETPPGRCCRCCRAPRLLQAYSWKEEEEE




DEGSMESLTSSEGEEPGSEVVIKMPMVDPEAQAPT




KQPPRSSPNTVKRPTKKGRDRAGKGQKPRGKEQLA




KRKTFSLVKEKKAARTLSAILLAFILTWTPYNIMV




LVSTFCKDCVPETLWELGYWLCYVNSTINPMCYAL




CNKAFRDTFRLLLLCRWDKRRWRKIPKRPGSVHRT




PSRQC





12
M4
MANFTPVNGSSGNQSVRLVTSSSHNRYETVEMVFI



muscarinic
ATVTGSLSLVTVVGNILVMLSIKVNRQLQTVNNYF




LFSLACADLIIGAFSMNLYTVYIIKGYWPLGAVVC




DLWLALDYVVSNASVMNLLIISFDRYFCVTKPLTY




PARRTTKMAGLMIAAAWVLSFVLWAPAILFWQFVV




GKRTVPDNQCFIQFLSNPAVTFGTAIAAFYLPVVI




MTVLYIHISLASRSRVHKHRPEGPKEKKAKTLAFL




KSPLMKQSVKKPPPGEAAREELRNGKLEEAPPPAL




PPPPRPVADKDTSNESSSGSATQNTKERPATELST




TEATTPAMPAPPLQPRALNPASRWSKIQIVTKQTG




NECVTAIEIVPATPAGMRPAANVARKFASIARNQV




RKKRQMAARERKVTRTIFAILLAFILTWTPYNVMV




LVNTFCQSCIPDTVWSIGYWLCYVNSTINPACYAL




CNATFKKTFRHLLLCQYRNIGTAR





13
CCR2
MLSTSRSRFIRNTNESGEEVTTFFDYDYGAPCHKF




DVKQIGAQLLPPLYSLVFIFGFVGNMLVVLILINC




KKLKCLTDIYLLNLAISDLLFLITLPLWAHSAANE




WVFGNAMCKLFTGLYHIGYFGGIFFIILLTIDRYL




AIVHAVFALKARTVTFGVVTSVITWLVAVFASVPG




IIFTKCQKEDSVYVCGPYFPRGWNNFHTIMRNILG




LVLPLLIMVICYSGILKTLLRCRNEKKRHRAVRVI




FTIMIVYFLFWTPYNIVILLNTFQEFFGLSNCEST




SQLDQATQVTETLGMTHCCINPIIYAFVGEKFRSL




FHIALGCRIAPLQKPVCGGPGVRPGKNVKVTTQGL




LDGRGKGKSIGRAPEASLQDKEGA





14
CCR9
MTPTDFTSPIPNMADDYGSESTSSMEDYVNFNFTD




FYCEKNNVRQFASHFLPPLYWLVFIVGALGNSLVI




LVYWYCTRVKTMTDMFLLNLAIADLLFLVTLPFWA




IAAADQWKFQTFMCKVVNSMYKMNFYSCVLLIMCI




SVDRYIAIAQAMRAHTWREKRLLYSKMVCFTIWVL




AAALCIPEILYSQIKEESGIAICTMVYPSDESTKL




KSAVLTLKVILGFFLPFVVMACCYTIIIHTLIQAK




KSSKHKALKVTITVLTVFVLSQFPYNCILLVQTID




AYAMFISNCAVSTNIDICFQVTQTIAFFHSCLNPV




LYVFVGERFRRDLVKTLKNLGCISQAQWVSFTRRE




GSLKLSSMLLETTSGALSL





15
GPR174
MPANYTCTRPDGDNTDFRYFIYAVTYTVILVPGLI




GNILALWVFYGYMKETKRAVIFMINLAIADLLQVL




SLPLRIFYYLNHDWPFGPGLCMFCFYLKYVNMYAS




IYFLVCISVRRFWFLMYPFRFHDCKQKYDLYISIA




GWLIICLACVLFPLLRTSDDTSGNRTKCFVDLPTR




NVNLAQSVVMMTIGELIGFVTPLLIVLYCTWKTVL




SLQDKYPMAQDLGEKQKALKMILTCAGVFLICFAP




YHFSFPLDFLVKSNEIKSCLARRVILIFHSVALCL




ASLNSCLDPVIYYFSTNEFRRRLSRQDLHDSIQLH




AKSFVSNHTASTMTPELC





16
MASP-2
TPLGPKWPEPVFGRLASPGFPGEYANDQERRWTLT




APPGYRLRLYFTHFDLELSHLCEYDFVKLSSGAKV




LATLCGQESTDTERAPGKDTFYSLGSSLDITFRSD




YSNEKPFTGFEAFYAAEDIDECQVAPGEAPTCDHH




CHNHLGGFYCSCRAGYVLHRNKRTCSALCSGQVFT




QRSGELSSPEYPRPYPKLSSCTYSISLEEGFSVIL




DFVESFDVETHPETLCPYDFLKIQTDREEHGPFCG




KTLPHRIETKSNTVTITFVTDESGDHTGWKIHYTS




TAQPCPYPMAPPNGHVSPVQAKYILKDSFSIFCET




GYELLQGHLPLKSFTAVCQKDGSWDRPMPACSIVD




CGPPDDLPSGRVEYITGPGVTTYKAVIQYSCEETF




YTMKVNDGKYVCEADGFWTSSKGEKSLPVCEPVCG




LSARTTGGRIYGGQKAKPGDFPWQVLILGGTTAAG




ALLYDNWVLTAAHAVYEQKHDASALDIRMGTLKRL




SPHYTQAWSEAVFIHEGYTHDAGFDNDIALIKLNN




KVVINSNITPICLPRKEAESFMRTDDIGTASGWGL




TQRGFLARNLMYVDIPIVDHQKCTAAYEKPPYPRG




SVTANMLCAGLESGGKDSCRGDSGGALVFLDSETE




RWFVGGIVSWGSMNCGEAGQYGVYTKVINYIPWIE




NIISDF





17
CCR5
MDYQVSSPIYDINYYTSEPCQKINVKQIAARLLPP




LYSLVFIFGFVGNMLVILILINCKRLKSMTDIYLL




NLAISDLFFLLTVPFWAHYAAAQWDFGNTMCQLLT




GLYFIGFFSGIFFIILLTIDRYLAVVHAVFALKAR




TVTFGVVTSVITWVVAVFASLPGIIFTRSQKEGLH




YTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVM




VICYSGILKTLLRCRNEKKRHRAVRLIFTIMIVYF




LFWAPYNIVLLLNTFQEFFGLNNCSSSNRLDQAMQ




VTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKH




IAKRFCKCCSIFQQEAPERASSVYTRSTGEQEISV




GL





18
FSHR
CHHRICHCSNRVFLCQESKVTEIPSDLPRNAIELR




FVLTKLRVIQKGAFSGFGDLEKIEISQNDVLEVIE




ADVFSNLPKLHEIRIEKANNLLYINPEAFQNLPNL




QYLLISNTGIKHLPDVHKIHSLQKVLLDIQDNINI




HTIERNSFVGLSFESVILWLNKNGIQEIHNCAFNG




TQLDELNLSDNNNLEELPNDVFHGASGPVILDISR




TRIHSLPSYGLENLKKLRARSTYNLKKLPTLEKLV




ALMEASLTYPSHCCAFANWRRQISELHPICNKSIL




RQEVDYMTQARGQRSSLAEDNESSYSRGFDMTYTE




FDYDLCNEVVDVTCSPKPDAFNPCEDIMGYNILRV




LIWFISILAITGNIIVLVILTTSQYKLTVPRFLMC




NLAFADLCIGIYLLLIASVDIHTKSQYHNYAIDWQ




TGAGCDAAGFFTVFASELSVYTLTAITLERWHTIT




HAMQLDCKVQLRHAASVMVMGWIFAFAAALFPIFG




ISSYMKVSICLPMDIDSPLSQLYVMSLLVLNVLAF




VVICGCYIHIYLTVRNPNIVSSSSDTRIAKRMAML




IFTDFLCMAPISFFAISASLKVPLITVSKAKILLV




LFHPINSCANPFLYAIFTKNFRRDFFILLSKCGCY




EMQAQIYRTETSSTVHNTHPRNGHCSSAPRVTNGS




TYILVPLSHLAQN





19
mGluR2
EGPAKKVLTLEGDLVLGGLFPVHQKGGPAEDCGPV



PAM
NEHRGIQRLEAMLFALDRINRDPHLLPGVRLGAHI




LDSCSKDTHALEQALDFVRASLSRGADGSRHICPD




GSYATHGDAPTAITGVIGGSYSDVSIQVANLLRLF




QIPQISYASTSAKLSDKSRYDYFARTVPPDFFQAK




AMAEILRFFNWTYVSTVASEGDYGETGIEAFELEA




RARNICVATSEKVGRAMSRAAFEGVVRALLQKPSA




RVAVLFTRSEDARELLAASQRLNASFTWVASDGWG




ALESVVAGSEGAAEGAITIELASYPISDFASYFQS




LDPWNNSRNPWFREFWEQRFRCSFRQRDCAAHSLR




AVPFEQESKIMFVVNAVYAMAHALHNMHRALCPNT




TRLCDAMRPVNGRRLYKDFVLNVKFDAPFRPADTH




NEVRFDRFGDGIGRYNIFTYLRAGSGRYRYQKVGY




WAEGLTLDTSLIPWASPSAGPLPASRCSEPCLQNE




VKSVQPGEVCCWLCIPCQPYEYRLDEFTCADCGLG




YWPNASLTGCFELPQEYIRWGDAWAVGPVTIACLG




ALATLFVLGVFVRHNATPVVKASGRELCYILLGGV




FLCYCMTFIFIAKPSTAVCTLRRLGLGTAFSVCYS




ALLTKTNRIARIFGGAREGAQRPRFISPASQVAIC




LALISGQLLIVVAWLVVEAPGTGKETAPERREVVT




LRCNHRDASMLGSLAYNVLLIALCTLYAFKTRKCP




ENFNEAKFIGFTMYTTCIIWLAFLPIFYVTSSDYR




VQTTTMCVSVSLSGSVVLGCLFAPKLHIILFQPQK




NVVSHRAPTSRFGSAAARASSSLGQGSGSQFVPTV




CNGREVVDSTTSSL





20
mGluR3
LGDHNFLRREIKIEGDLVLGGLFPINEKGTGTEEC




GRINEDRGIQRLEAMLFAIDEINKDDYLLPGVKLG




VHILDTCSRDTYALEQSLEFVRASLTKVDEAEYMC




PDGSYAIQENIPLLIAGVIGGSYSSVSIQVANLLR




LFQIPQISYASTSAKLSDKSRYDYFARTVPPDFYQ




AKAMAEILRFFNWTYVSTVASEGDYGETGIEAFEQ




EARLRNICIATAEKVGRSNIRKSYDSVIRELLQKP




NARVVVLFMRSDDSRELIAAASRANASFTWVASDG




WGAQESIIKGSEHVAYGAITLELASQPVRQFDRYF




QSLNPYNNHRNPWFRDFWEQKFQCSLQNKRNHRRV




CDKHLAIDSSNYEQESKIMFVVNAVYAMAHALHKM




QRTLCPNTTKLCDAMKILDGKKLYKDYLLKINFTA




PFNPNKDADSIVKFDTFGDGMGRYNVFNFQNVGGK




YSYLKVGHWAETLSLDVNSIHWSRNSVPTSQCSDP




CAPNEMKNMQPGDVCCWICIPCEPYEYLADEFTCM




DCGSGQWPTADLTGCYDLPEDYIRWEDAWAIGPVT




IACLGFMCTCMVVTVFIKHNNTPLVKASGRELCYI




LLFGVGLSYCMTFFFIAKPSPVICALRRLGLGSSF




AICYSALLTKTNCIARIFDGVKNGAQRPKFISPSS




QVFICLGLILVQIVMVSVWLILEAPGTRRYTLAEK




RETVILKCNVKDSSMLISLTYDVILVILCTVYAFK




TRKCPENFNEAKFIGFTMYTTCIIWLAFLPIFYVT




SSDYRVQTTTMCISVSLSGFVVLGCLFAPKVHIIL




FQPQKNVVTHRLHLNRFSVSGTGTTYSQSSASTYV




PTVCNGREVLDSTTSSL





21
mGluR4
KPKGHPHMNSIRIDGDITLGGLFPVHGRGSEGKPC




GELKKEKGIHRLEAMLFALDRINNDPDLLPNITLG




ARILDTCSRDTHALEQSLTFVQALIEKDGTEVRCG




SGGPPIITKPERVVGVIGASGSSVSIMVANILRLF




KIPQISYASTAPDLSDNSRYDFFSRVVPSDTYQAQ




AMVDIVRALKWNYVSTVASEGSYGESGVEAFIQKS




REDGGVCIAQSVKIPREPKAGEFDKIIRRLLETSN




ARAVIIFANEDDIRRVLEAARRANQTGHFFWMGSD




SWGSKIAPVLHLEEVAEGAVTILPKRMSVRGFDRY




FSSRTLDNNRRNIWFAEFWEDNFHCKLSRHALKKG




SHVKKCTNRERIGQDSAYEQEGKVQFVIDAVYAMG




HALHAMHRDLCPGRVGLCPRMDPVDGTQLLKYIRN




VNFSGIAGNPVTFNENGDAPGRYDIYQYQLRNDSA




EYKVIGSWTDHLHLRIERMHWPGSGQQLPRSICSL




PCQPGERKKTVKGMPCCWHCEPCTGYQYQVDRYTC




KTCPYDMRPTENRTGCRPIPIIKLEWGSPWAVLPL




FLAVVGIAATLFVVITFVRYNDTPIVKASGRELSY




VLLAGIFLCYATTFLMIAEPDLGTCSLRRIFLGLG




MSISYAALLTKTNRIYRIFEQGKRSVSAPRFISPA




SQLAITFSLISLQLLGICVWFVVDPSHSVVDFQDQ




RTLDPRFARGVLKCDISDLSLICLLGYSMLLMVTC




TVYAIKTRGVPETFNEAKPIGFTMYTTCIVWLAFI




PIFFGTSQSADKLYIQTTTLTVSVSLSASVSLGML




YMPKVYIILFHPEQNVPKRKRSLKAVVTAATMSNK




FTQKGNFRPNGEAKSELCENLEAPALATKQTYVTY




TNHAI





22
mGluR7
QEMYAPHSIRIEGDVTLGGLFPVHAKGPSGVPCGD




IKRENGIHRLEAMLYALDQINSDPNLLPNVTLGAR




ILDTCSRDTYALEQSLTFVQALIQKDTSDVRCTNG




EPPVFVKPEKVVGVIGASGSSVSIMVANILRLFQI




PQISYASTAPELSDDRRYDFFSRVVPPDSFQAQAM




VDIVKALGWNYVSTLASEGSYGEKGVESFTQISKE




AGGLCIAQSVRIPQERKDRTIDFDRIIKQLLDTPN




SRAVVIFANDEDIKQILAAAKRADQVGHFLWVGSD




SWGSKINPLHQHEDIAEGAITIQPKRATVEGFDAY




FTSRTLENNRRNVWFAEYWEENFNCKLTISGSKKE




DTDRKCTGQERIGKDSNYEQEGKVQFVIDAVYAMA




HALHHMNKDLCADYRGVCPEMEQAGGKKLLKYIRN




VNFNGSAGTPVMFNKNGDAPGRYDIFQYQTTNTSN




PGYRLIGQWTDELQLNIEDMQWGKGVREIPASVCT




LPCKPGQRKKTQKGTPCCWTCEPCDGYQYQFDEMT




CQHCPYDQRPNENRTGCQDIPIIKLEWHSPWAVIP




VFLAMLGIIATIFVMATFIRYNDTPIVRASGRELS




YVLLTGIFLCYIITFLMIAKPDVAVCSFRRVFLGL




GMCISYAALLTKTNRIYRIFEQGKKSVTAPRLISP




TSQLAITSSLISVQLLGVFIWFGVDPPNIIIDYDE




HKTMNPEQARGVLKCDITDLQIICSLGYSILLMVT




CTVYAIKTRGVPENFNEAKPIGFTMYTTCIVWLAF




IPIFFGTAQSAEKLYIQTTTLTISMNLSASVALGM




LYMPKVYIIIFHPELNVQKRKRSFKAVVTAATMSS




RLSHKPSDRPNGEAKTELCENVDPNSPAAKKKYVS




YNNLVI





23
CXCR3
MVLEVSDHQVLNDAEVAALLENFSSSYDYGENESD




SCCTSPPCPQDFSLNFDRAFLPALYSLLFLLGLLG




NGAVAAVLLSRRTALSSTDTFLLHLAVADTLLVLT




LPLWAVDAAVQWVFGSGLCKVAGALFNINFYAGAL




LLACISFDRYLNIVHATQLYRRGPPARVTLTCLAV




WGLCLLFALPDFIFLSAHHDERLNATHCQYNFPQV




GRTALRVLQLVAGFLLPLLVMAYCYAHILAVLLVS




RGQRRLRAMRLVVWVVAFALCWTPYHLVVLVDILM




DLGALARNCGRESRVDVAKSVTSGLGYMHCCLNPL




LYAFVGVKFRERMWMLLLRLGCPNQRGLQRQPSSS




RRDSSWSETSEASYSGL





24
CCR8
MDYTLDLSVTTVTDYYYPDIFSSPCDAELIQTNGK




LLLAVFYCLLFVFSLLGNSLVILVLVVCKKLRSIT




DVYLLNLALSDLLFVFSFPFQTYYLLDQWVFGTVM




CKVVSGFYYIGFYSSMFFITLMSVDRYLAVVHAVY




ALKVRTIRMGTTLCLAVWLTAIMATIPLLVFYQVA




SEDGVLQCYSFYNQQTLKWKIFTNFKMNILGLLIP




FTIFMFCYIKILHQLKRCQNHNKTKAIRLVLIVVI




ASLLFWVPFNVVLFLTSLHSMHILDGCSISQQLTY




ATHVTEIISFTHCCVNPVIYAFVGEKFKKHLSEIF




QKSCSQIFNYLGRQMPRESCEKSSSCQQHSSRSSS




VDYIL





25
Adenosine
MPIMGSSVYITVELAIAVLAILGNVLVCWAVWLNS



A2a
NLQNVTNYFVVSLAAADIAVGVLAIPFAITISTGF




CAACHGCLFIACFVLVLTQSSIFSLLAIAIDRYIA




IRIPLRYNGLVTGTRAKGIIAICWVLSFAIGLTPM




LGWNNCGQPKEGKNHSQGCGEGQVACLFEDVVPMN




YMVYFNFFACVLVPLLLMLGVYLRIFLAARRQLKQ




MESQPLPGERARSTLQKEVHAAKSLAIIVGLFALC




WLPLHIINCFTFFCPDCSHAPLWLMYLAIVLSHTN




SVVNPFIYAYRIREFRQTFRKIIRSHVLRQQEPFK




AAGTSARVLAAHGSDGEQVSLRLNGHPPGVWANGS




APHPERRPNGYALGLVSGGSAQESQGNTGLPDVEL




LSHELKGVCPEPPGLDDPLAQDGAGVS





26
Orexin
MEPSATPGAQMGVPPGSREPSPVPPDYEDEFLRYL



OX1
WRDYLYPKQYEWVLIAAYVAVFVVALVGNTLVCLA




VWRNHHMRTVTNYFIVNLSLADVLVTAICLPASLL




VDITESWLFGHALCKVIPYLQAVSVSVAVLTLSFI




ALDRWYAICHPLLFKSTARRARGSILGIWAVSLAI




MVPQAAVMECSSVLPELANRTRLFSVCDERWADDL




YPKIYHSCFFIVTYLAPLGLMAMAYFQIFRKLWGR




QIPGTTSALVRNWKRPSDQLGDLEQGLSGEPQPRA




RAFLAEVKQMRARRKTAKMLMVVLLVFALCYLPIS




VLNVLKRVFGMFRQASDREAVYACFTFSHWLVYAN




SAANPIIYNFLSGKFREQFKAAFSCCLPGLGPCGS




LKAPSPRSSASHKSLSLQSRCSISKISEHVVLTSV




TTVLP





27
Orexin
MSGTKILEDSPPCRNWSSASELNETQEPFLNPTDY



OX2
DDEEFLRYLWREYLHPKEYEWVLIAGYIIVFVVAL




IGNYLVCVAVWKNHHMRTVTNYFIVNLSLADVLVT




ITCLPATLVVDITETWFFGQSLCKVIPYLQTVSVS




VSVLTLSCIALDRWYAICHPLMFKSTAKRARNSIV




IIWIVSCIIMIPQAIVMECSTVFPGLANKTTLFTV




CDERWGGEIYPKMYHICFFLVTYMAPLCLMVLAYL




QIFRKLWCRQIPGTSSVVQRKWKPLQPVSQPRGPG




QPTKSRMSAVAAEIKQIRARRKTARMLMIVLLVFA




ICYLPISILNVLKRVFGMFAHTEDRETVYAWFTFS




HWLVYANSAANPIIYNFLSGKFREEFKAAFSCCCL




GVHHRQEDRLTRGRTSTESRKSLTTQISNFDNISK




LSEQVVLTSISTLPAANGAGPLQNW





28
PAR-2
IQGTNRSSKGRSLIGKVDGTSHVTGKGVTVETVFS




VDEFSASVLTGKLTTVFLPIVYTIVFVVGLPSNGM




ALWVFLFRTKKKHPAVIYMANLALADLLSVIWFPL




KIAYHIHGNNWIYGEALCNVLIGFFYGNMYCSILF




MTCLSVQRYWVIVNPMGHSRKKANIAIGISLAIWL




LILLVTIPLYVVKQTIFIPALNITTCHDVLPEQLL




VGDMFNYFLSLAIGVFLFPAFLTASAYVLMIRMLR




SSAMDENSEKKRKRAIKLIVTVLAMYLICFTPSNL




LLVVHYFLIKSQGQSHVYALYIVALCLSTLNSCID




PFVYYFVSHDFRDHAKNALLCRSVRTVKQMQVSLT




SKKHSRKSSSYSSSSTTVKTSY





29
C3aR
MASFSAETNSTDLLSQPWNEPPVILSMVILSLTFL




LGLPGNGLVLWVAGLKMQRTVNTIWFLHLTLADLL




CCLSLPFSLAHLALQGQWPYGRFLCKLIPSIIVLN




MFASVFLLTAISLDRCLVVFKPIWCQNHRNVGMAC




SICGCIWVVAFVMCIPVFVYREIFTTDNHNRCGYK




FGLSSSLDYPDFYGDPLENRSLENIVQPPGEMNDR




LDPSSFQTNDHPWTVPTVFQPQTFQRPSADSLPRG




SARLTSQNLYSNVFKPADVVSPKIPSGFPIEDHET




SPLDNSDAFLSTHLKLFPSASSNSFYESELPQGFQ




DYYNLGQFTDDDQVPTPLVAITITRLVVGFLLPSV




IMIACYSFIVFRMQRGRFAKSQSKTFRVAWVVAVF




LVCWTPYHIFGVLSLLTDPETPLGKTLMSWDHVCI




ALASANSCFNPFLYALLGKDFRKKARQSIQGILEA




AFSEELTRSTHCPSNNVISERNSTTV





30
LGR5
GSSPRSGVLLRGCPTHCHCEPDGRMLLRVDCSDLG




LSELPSNLSVFTSYLDLSMNNISQLLPNPLPSLRF




LEELRLAGNALTYIPKGAFTGLYSLKV




LMLQNNQLRHVPTEALQNLRSLQSLRLDANHISYV




PPSCFSGLHSLRHLWLDDNALTEIPVQAFRSLSAL




QAMTLALNKIHHIPDYAFGNLSSLVVLHLHNNRIH




SLGKKCFDGLHSLETLDLNYNNLDEFPTAIRTLSN




LKELGFHSNNIRSIPEKAFVGNPSLITIHFYDNPI




QFVGRSAFQHLPELRTLTLNGASQITEFPDLTGTA




NLESLTLTGAQISSLPQTVCNQLPNLQVLDLSYNL




LEDLPSFSVCQKLQKIDLRHNEIYEIKVDTFQQLL




SLRSLNLAWNKIAIIHPNAFSTLPSLIKLDLSSNL




LSSFPITGLHGLTHLKLTGNHALQSLISSENFPEL




KVIEMPYAYQCCAFGVCENAYKISNQWNKGDNSSM




DDLHKKDAGMFQAQDERDLEDFLLDFEEDLKAL




HSVQCSPSPGPFKPCEHLLDGWLIRIGVWTIAVLA




LTCNALVTSTVFRSPLYISPIKLLIGVIAAVNMLT




GVSSAVLAGVDAFTFGSFARHGAWWENGVGCHVIG




FLSIFASESSVFLLTLAALERGFSVKYSAKFETKA




PFSSLKVIILLCALLALTMAAVPLLGGSKYGASPL




CLPLPFGEPSTMGYMVALILLNSLCFLMMTIAYTK




LYCNLDKGDLENIWDCSMVKHIALLLFTNCILNCP




VAFLSFSSLINLTFISPEVIKFILLVVVPLPACLN




PLLYILFNPHFKEDLVSLRKQTYVWTRSKHPSLMS




INSDDVEKQSCDSTQALVTFTSSSITYDLPPSSVP




SPAYPVTESCHLSSVAFVPCL





31
GPR101
MTSTCTNSTRESNSSHTCMPLSKMPISLAHGIIRS




TVLVIFLAASFVGNIVLALVLQRKPQLLQVTNRFI




FNLLVTDLLQISLVAPWVVATSVPLFWPLNSHFCT




ALVSLTHLFAFASVNTIVVVSVDRYLSIIHPLSYP




SKMTQRRGYLLLYGTWIVAILQSTPPLYGWGQAAF




DERNALCSMIWGASPSYTILSVVSFIVIPLIVMIA




CYSVVFCAARRQHALLYNVKRHSLEVRVKDCVENE




DEEGAEKKEEFQDESEFRRQHEGEVKAKEGRMEAK




DGSLKAKEGSTGTSESSVEARGSEEVRESSTVASD




GSMEGKEGSTKVEENSMKADKGRTEVNQCSIDLGE




DDMEFGEDDINFSEDDVEAVNIPESLPPSRRNSNS




NPPLPRCYQCKAAKVIFIIIFSYVLSLGPYCFLAV




LAVWVDVETQVPQWVITIIIWLFFLQCCIHPYVYG




YMHKTIKKEIQDMLKKFFCKEKPPKEDSHPDLPGT




EGGTEGKIVPSYDSATFP





32
GPR151
MLAAAFADSNSSSMNVSFAHLHFAGGYLPSDSQDW




RTIIPALLVAVCLVGFVGNLCVIGILLHNAWKGKP




SMIHSLILNLSLADLSLLLFSAPIRATAYSKSVW




DLGWFVCKSSDWFIHTCMAAKSLTIVVVAKVCFMY




ASDPAKQVSIHNYTIWSVLVAIWTVASLLPLPEW




FFSTIRHHEGVEMCLVDVPAVAEEFM




SMFGKLYPLLAFGLPLFFASFYFWRAYDQCKKRGT




KTQNLRNQIRSKQVTVMLLSIAIISALLWLPEWVA




WLWVWHLKAAGPAPPQGFIALSQVLMFSISSANPL




IFLVMSEEFREGLKGVWKWMITKKPPTVSESQETP




AGNSEGLPDKVPSPESPASIPEKEKPSSPSSGKGK




TEKAEIPILPDVEQFWHERDTVPSVQDNDPIPWEF




IEDQETGEGVK





33
GPR161
MSLNSSLSCRKELSNLTEEEGGEGGVIITQFIAII




VITIFVCLGNLVIVVTLYKKSYLLTLSNKFVFSLT




LSNFLLSVLVLPFVVTSSIRREWIFGVVWCNFSAL




LYLLISSASMLTLGVIAIDRYYAVLYPMVYPMKIT




GNRAVMALVYIWLHSLIGCLPPLFGWSSVEFDEFK




WMCVAAWHREPGYTAFWQIWCALFPFLVMLVCYGF




IFRVARVKARKVHCGTVVIVEEDAQRTGRKNSSTS




TSSSGSRRNAFQGVVYSANQCKALITILVVLGAFM




VTWGPYMVVIASEALWGKSSVSPSLETWATWLSFA




SAVCHPLIYGLWNKTVRKELLGMCFGDRYYREPFV




QRQRTSRLFSISNRITDLGLSPHLTALMAGGQPLG




HSSSTGDTGFSCSQDSGTDMMLLEDYTSDDNPPSH




CTCPPKRRSSVTFEDEVEQIKEAAKNSILHVKAEV




HKSLDSYAASLAKAIEAEAKINLFGEEALPGVLVT




ARTVPGGGFGGRRGSRTLVSQRLQLQSIEEGDVLA




AEQR





34
GPR17
MSKRSWWAGSRKPPREMLKLSGSDSSQSMNGLEVA




PPGLITNFSLATAEQCGQETPLENMLFASFYLLDF




ILALVGNTLALWLFIRDHKSGTPANLMLQNNQLRH




VPTEALQNLRSLQSLRLDANHISYVPPSCFSGLHS




LRHLWLDDNALTEIPVQAFRSLSALQAMTLALNKI




HHIPDYAFGNLSSLVVLHLHNNRIHSLGKKCFDGL




HSLETLDLNYNNLDEFPTAIRTLSNLKELGFHSNN




IRSIPEKAFVGNPSLITIHFYDNPIQFVGRSAFQH




LPELRTLTLNGASQITEFPDLTGTANLESLTLTGA




QISSLPQTVCNQLPNLQVLDLSYNLLEDLPSFSVC




QKLQKIDLRHNEIYEIKVDTFQQLLSLRSLNLAWN




KIAIIHPNAFSTLPSLIKLDLSSNLLSSFPITGLH




GLTHLKLTGNHALQSLISSENFPELKVIEMPYAYQ




CCAFGVCENAYKISNQWNKGDNSSMDDLHKKDAGM




FQAQDERDLEDFLLDFEEDLKAL




HSVQCSPSPGPFKPCEHLLDGWLIRIGVWTIAVLA




LTCNALVTSTVFRSPLYISPIKLLIGVIAAVNMLT




GVSSAVLAGVDAFTFGSFARHGAWWENGVGCHVIG




FLSIFASESSVFLLTLAALERGFSVKYSAKFETKA




PFSSLKVIILLCALLALTMAAVPLLGGSKYGASPL




CLPLPFGEPSTMGYMVALILLNSLCFLMMTIAYTK




LYCNLDKGDLENIWDCSMVKHIALLLFTNCILNCP




VAFLSFSSLINLTFISPEVIKFILLVVVPLPACLN




PLLYILFNPHFKEDLVSLRKQTYVWTRSKHPSLMS




INSDDVEKQSCDSTQALVTFTSSSITYDLPPSSVP




SPAYPVTESCHLSSVAFVPCL





35
GPR183
MDIQMANNFTPPSATPQGNDCDLYAHHSTARIVMP




LHYSLVFIIGLVGNLLALVVIVQNRKKINSTTLYS




TNLVISDILFTTALPTRIAYYAMGFDWRIGDALCR




ITALVFYINTYAGVNFMTCLSIDRFIAVVHPLRYN




KIKRIEHAKGVCIFVWILVFAQTLPLLINPMSKQE




AERITCMEYPNFEETKSLPWILLGACFIGYVLPLI




IILICYSQICCKLFRTAKQNPLTEKSGVNKKALNT




IILIIVVFVLCFTPYHVAIIQHMIKKLRFSNFLEC




SQRHSFQISLHFTVCLMNFNCCMDPFIYFFACKGY




KRKVMRMLKRQVSVSISSAVKSAPEENSREMTETQ




MMIHSKSSNGK





36
CRTH2
MSANATLKPLCPILEQMSRLQSHSNTSIRYIDHAA




VLLHGLASLLGLVENGVILFVVGCRMRQTVVTTWV




LHLALSDLLASASLPFFTYFLAVGHSWELGTTFCK




LHSSIFFLNMFASGFLLSAISLDRCLQVVRPVWAQ




NHRTVAAAHKVCLVLWALAVLNTVPYFVFRDTISR




LDGRIMCYYNVLLLNPGPDRDATCNSRQVALAVSK




FLLAFLVPLAIIASSHAAVSLRLQHRGRRRPGRFV




RLVAAVVAAFALCWGPYHVFSLLEARAHANPGLRP




LVWRGLPFVTSLAFFNSVANPVLYVLTCPDMLRKL




RRSLRTVLESVLVDDSELGGAGSSRRRRTSSTARS




ASPLALCSRPEEPRGPARLLGWLLGSCAASPQTGP




LNRALSSTSS





37
5-HT4
MDKLDANVSSEEGFGSVEKVVLLTFLSTVILMAIL




GNLLVMVAVCWDRQLRKIKTNYFIVSLAFADLLVS




VLVMPFGAIELVQDIWIYGEVFCLVRTSLDVLLTT




ASIFHLCCISLDRYYAICCQPLVYRNKMTPLRIAL




MLGGCWVIPTFISFLPIMQGWNNIGIIDLIEKRKF




NQNSNSTYCVFMVNKPYAITCSVVAFYIPFLLMVL




AYYRIYVTAKEHAHQIQMLQRAGASSESRPQSADQ




HSTHRMRTETKAAKTLCIIMGCFCLCWAPFFVTNI




VDPFIDYTVPGQVWTAFLWLGYINSGLNPFLYAFL




NKSFRRAFLIILCCDDERYRRPSILGQTVPCSTTT




INGSTHVLRDAVECGGQWESQCHPPATSPLVAAQP




SDT





38
5-HT6
MVPEPGPTANSTPAWGAGPPSAPGGSGWVAAALCV




VIALTAAANSLLIALICTQPALRNTSNFFLVSLFT




SDLMVGLVVMPPAMLNALYGRWVLARGLCLLWTAF




DVMCCSASILNLCLISLDRYLLILSPLRYKLRMTP




LRALALVLGAWSLAALASFLPLLLGWHELGHARPP




VPGQCRLLASLPFVLVASGLTFFLPSGAICFTYCR




ILLAARKQAVQVASLTTGMASQASETLQVPRTPRP




GVESADSRRLATKHSRKALKASLTLGILLGMFFVT




WLPFFVANIVQAVCDCISPGLFDVLTWLGYCNSTM




NPIIYPLFMRDFKRALGRFLPCPRCPRERQASLAS




PSLRTSHSGPRPGLSLQQVLPLPLPPDSDSDSDAG




SGGSSGLRLTAQLLLPGEATQDPPLPTRAAAAVNF




FNIDPAEPELRPHPLGIPTN





39
CB2
MEECWVTEIANGSKDGLDSNPMKDYMILSGPQKTA




VAVLCTLLGLLSALENVAVLYLILSSHQLRRKPSY




LFIGSLAGADFLASVVFACSFVNFHVFHGVDSKAV




FLLKIGSVTMTFTASVGSLLLTAIDRYLCLRYPPS




YKALLTRGRALVTLGIMWVLSALVSYLPLMGWTCC




PRPCSELFPLIPNDYLLSWLLFIAFLFSGIIYTYG




HVLWKAHQHVASLSGHQDRQVPGMARMRLDVRLAK




TLGLVLAVLLICWFPVLALMAHSLATTLSDQVKKA




FAFCSMLCLINSMVNPVIYALRSGEIRSSAHHCLA




HWKKCVRGLGSEAKEEAPRSSVTETEADGKITPWP




DSRDLDLSDC





40
Histamine-
MERAPPDGPLNASGALAGEAAAAGGARGFSAAWTA



3
VLAALMALLIVATVLGNALVMLAFVADSSLRTQNN




FFLLNLAISDFLVGAFCIPLYVPYVLTGRWTFGRG




LCKLWLVVDYLLCTSSAFNIVLISYDRFLSVTRAV




SYRAQQGDTRRAVRKMLLVWVLAFLLYGPAILSWE




YLSGGSSIPEGHCYAEFFYNWYFLITASTLEFFTP




FLSVTFFNLSIYLNIQRRTRLRLDGAREAAGPEPP




PEAQPSPPPPPGCWGCWQKGHGEAMPLHRYGVGE




AAVGAEAGEATLGGGGGGGSVASPTSSSGSSSRG




TERP




RSLKRGSKPSASSASLEKRMKMVSQSFTQRFRLSR




DRKVAKSLAVIVSIFGLCWAPYTLLMIIRAACHGH




CVPDYWYETSFWLLWANSAVNPVLYPLCHHSFRRA




FTKLLCPQKLKIQPHSSLEHCWK





41
VPAC-1 or
ARLQEECDYVQMIEVQHKQCLEEAQLENETIGCSK



VIPR1
MWDNLTCWPATPRGQVVVLACPLIFKLFSSIQGRN




VSRSCTDEGWTHLEPGPYPIACGLDDKAASLDEQQ




TMFYGSVKTGYTIGYGLSLATLLVATAILSLFRKL




HCTRNYIHMHLFISFILRAAAVFIKDLALFDSGES




DQCSEGSVGCKAAMVFFQYCVMANFFWLLVEGLYL




YTLLAVSFFSERKYFWGYILIGWGVPSTFTMVWTI




ARIHFEDYGCWDT




INSSLWWIIKGPILTSILVNFILFICIIRILLQKL




RPPDIRKSDSSPYSRLARSTLLLIPLFGVHYIMFA




FFPDNFKPEVKMVFELVVGSFQGFVVAILYCFLNG




EVQAELRRKWRRWHLQGVLGWNPKYRHPSGGSNGA




TCSTQVSMLTRVSPGARRSSSFQAEVSLV





42
GIPR
RAETGSKGQTAGELYQRWERYRRECQETLAAAEPP




SGLACNGSFDMYVCWDYAAPNATARASCPWYLPWH




HHVAAGFVLRQCGSDGQWGLWRDHTQCENPEKNEA




FLDQRLILERLQVMYTVGYSLSLATLLLALLILSL




FRRLHCTRNYIHINLFTSFMLRAAAILSRDRLLPR




PGPYLGDQALALWNQALAACRTAQIVTQYCVGANY




TWLLVEGVYLHSLLVLVGGSEEGHFRYYLLLGWGA




PALFVIPWVIVRYLYEN




TQCWERNEVKAIWWIIRTPILMTILINFLIFIRIL




GILLSKLRTRQMRCRDYRLRLARSTLTLVPLLGVH




EVVFAPVTEEQARGALRFAKLGFEIFLSSFQGFLV




SVLYCFINKEVQSEIRRGWHHCRLRRSLGEEQRQL




PERAFRALPSGSGPGEVPTSRGLSSGTLPGPGNEA




SRELESYC





43
5-HT1B
MEEPGAQCAPPPPAGSETWVPQANLSSAPSQNCSA



GPCR
KDYIYQDSISLPWKVLLVMLLALITLATTLSNAFV




IATVYRTRKLHTPANYLIASLAVTDLLVSILVMPI




STMYTVTGRWTLGQVVCDFWLSSDITCCTASILHL




CVIALDRYWAITDAVEYSAKRTPKRAAVMIALVWV




FSISISLPPFFWRQAKAEEEVSECVVNTDHILYTV




YSTVGAFYFPTLLLIALYGRIYVEARSRILKQTPN




RTGKRLTRAQLITDSPGSTSSVTSINSRVPDVPSE




SGSPVYVNQVKVRVSDALLEKKKLMAARERKATKT




LGIILGAFIVCWLPFFIISLVMPICKDACWFHLAI




FDFFTWLGYLNSLINPIIYTMSNEDFKQAFHKLIR




FKCTS





44
CCR7
QDEVTDDYIGDNTTVDYTLFESLCSKKDVRNFKAW




FLPIMYSIICFVGLLGNGLVVLTYIYFKRLKTMTD




TYLLNLAVADILFLLTLPFWAYSAAKSWVFGVHFC




KLIFAIYKMSFFSGMLLLLCISIDRYVAIVQAVSA




HRHRARVLLISKLSCVGIWILATVLSIPELLYSDL




QRSSSEQAMRCSLITEHVEAFITIQVAQMVIGFLV




PLLAMSFCYLVIIRTLLQARNFERNKAIKVIIAVV




VVFIVFQLPYNGVVLAQTVANFNITSSTCELSKQL




NIAYDVTYSLACVRCCVNPFLYAFIGVKFRNDLFK




LFKDLGCLSQEQLRQWSSCRHIRRSSMSVEAETTT




TFSP





45
CXCR5
MNYPLTLEMDLENLEDLFWELDRLDNYNDTSLVEN




HLCPATEGPLMASFKAVFVPVAYSLIFLLGVIGNV




LVLVILERHRQTRSSTETFLFHLAVADLLLVFILP




FAVAEGSVGWVLGTFLCKTVIALHKVNFYCSSLLL




ACIAVDRYLAIVHAVHAYRHRRLLSIHITCGTIWL




VGFLLALPEILFAKVSQGHHNNSLPRCTFSQENQA




ETHAWFTSRFLYHVAGFLLPMLVMGWCYVGVVHRL




RQAQRRPQRQKAVRVAILVTSIFFLCWSPYHIVIF




LDTLARLKAVDNTCKLNGSLPVAITMCEFLGLAHC




CLNPMLYTFAGVKFRSDLSRLLTKLGCTGPASLCQ




LFPSWRRSSLSESENATSLTTF





46
GPR119
MESSFSFGVILAVLASLIIATNTLVAVAVLLLIHK




NDGVSLCFTLNLAVADTLIGVAISGLLTDQLSSPS




RPTQKTLCSLRMAFVTSSAAASVLTVMLITFDRYL




AIKQPFRYLKIMSGFVAGACIAGLWLVSYLIGFLP




LGIPMFQQTAYKGQCSFFAVFHPHFVLTLSCVGFF




PAMLLFVFFYCDMLKIASMHSQQIRKMEHAGAMAG




GYRSPRTPSDFKALRTVSVLIGSFALSWTPFLITG




IVQVACQECHLYLVLERYLWLLGVGNSLLNPLIYA




YWQKEVRLQLYHMALGVKKVLTSFLLFLSARNCGP




ERPRESSCHIVTISSSEFDG





47
GPR55
MSQQNTSGDCLFDGVNELMKTLQFAVHIPTFVLGL




LLNLLAIHGFSTFLKNRWPDYAATSIYMINLAVFD




LLLVLSLPFKMVLSQVQSPFPSLCTLVECLYFVSM




YGSVFTICFISMDRFLAIRYPLLVSHLRSPRKIFG




ICCTIWVLVWTGSIPIYSFHGKVEKYMCFHNMSDD




TWSAKVFFPLEVFGFLLPMGIMGFCCSRSIHILLG




RRDHTQDWVQQKACIYSIAASLAVFVVSFLPVHLG




FFLQFLVRNSFIVECRAKQSISFFLQLSMCFSNVN




CCLDVFCYYFVIKEFRMNIRAHRPSRVQLVLQDTT




ISRG









Provided herein are scaffolds comprising GPCR binding domains, wherein the sequences of the GPCR binding domains support interaction with at least one GPCR. The sequence may be homologous or identical to a sequence of a GPCR ligand. In some instances, the GPCR binding domain sequence comprises at least or about 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, or 47. In some instances, the GPCR binding domain sequence comprises at least or about 95% homology to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, or 47. In some instances, the GPCR binding domain sequence comprises at least or about 97% homology to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, or 47. In some instances, the GPCR binding domain sequence comprises at least or about 99% homology to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, or 47. In some instances, the GPCR binding domain sequence comprises at least or about 100% homology to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, or 47. In some instances, the GPCR binding domain sequence comprises at least a portion having at least or about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, or more than 400 amino acids of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, or 47.


Libraries comprising nucleic acids encoding for scaffolds comprising GPCR binding domains may bind to one or more GPCRs. In some instances, the scaffolds comprising GPCR binding domains binds to a single GPCR. In some instances, the scaffolds comprising GPCR binding domains binds to GPCRs in a same family or class. In some instances, the scaffolds comprising GPCR binding domains bind to multiple GPCRs. For example, the scaffolds are multimeric and comprise at least 2 scaffolds. In some instances, the multimeric scaffolds comprise at least or about 3, 4, 5, 6, 7, 8, or more than 8 scaffolds. In some instances, the multimeric scaffolds comprise at least 2 scaffolds linked by, for example, a dimerization domain, an amino acid linker, a disulfide bond, a chemical crosslink, or any other linker known in the art. In some instances, the multimeric scaffolds bind to the same GPCRs or different GPCRs.


Provided herein are GPCR binding libraries comprising nucleic acids encoding for scaffolds comprising GPCR binding domains comprise variation in domain type, domain length, or residue variation. In some instances, the domain is a region in the scaffold comprising the GPCR binding domains. For example, the region is the VH, CDR-H3, or VL domain. In some instances, the domain is the GPCR binding domain.


Methods described herein provide for synthesis of a GPCR binding library of nucleic acids each encoding for a predetermined variant of at least one predetermined reference nucleic acid sequence. In some cases, the predetermined reference sequence is a nucleic acid sequence encoding for a protein, and the variant library comprises sequences encoding for variation of at least a single codon such that a plurality of different variants of a single residue in the subsequent protein encoded by the synthesized nucleic acid are generated by standard translation processes. In some instances, the GPCR binding library comprises varied nucleic acids collectively encoding variations at multiple positions. In some instances, the variant library comprises sequences encoding for variation of at least a single codon of a VH, CDR-H3, or VL domain. In some instances, the variant library comprises sequences encoding for variation of at least a single codon in a GPCR binding domain. For example, at least one single codon of a GPCR binding domain as listed in Table 2 is varied. In some instances, the variant library comprises sequences encoding for variation of multiple codons of a VH, CDR-H3, or VL domain. In some instances, the variant library comprises sequences encoding for variation of multiple codons in a GPCR binding domain. An exemplary number of codons for variation include, but are not limited to, at least or about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 225, 250, 275, 300, or more than 300 codons.


Methods described herein provide for synthesis of a GPCR binding library of nucleic acids each encoding for a predetermined variant of at least one predetermined reference nucleic acid sequence, wherein the GPCR binding library comprises sequences encoding for variation of length of a domain. In some instances, the domain is VH, CDR-H3, or VL domain. In some instances, the domain is the GPCR binding domain. In some instances, the library comprises sequences encoding for variation of length of at least or about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 225, 250, 275, 300, or more than 300 codons less as compared to a predetermined reference sequence. In some instances, the library comprises sequences encoding for variation of length of at least or about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 200, 225, 250, 275, 300, or more than 300 codons more as compared to a predetermined reference sequence.


Provided herein are GPCR binding libraries comprising nucleic acids encoding for scaffolds comprising GPCR binding domains, wherein the GPCR binding libraries are synthesized with various numbers of fragments. In some instances, the fragments comprise the VH, CDR-H3, or VL domain. In some instances, the GPCR binding libraries are synthesized with at least or about 2 fragments, 3 fragments, 4 fragments, 5 fragments, or more than 5 fragments. The length of each of the nucleic acid fragments or average length of the nucleic acids synthesized may be at least or about 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 525, 550, 575, 600, or more than 600 base pairs. In some instances, the length is about 50 to 600, 75 to 575, 100 to 550, 125 to 525, 150 to 500, 175 to 475, 200 to 450, 225 to 425, 250 to 400, 275 to 375, or 300 to 350 base pairs.


GPCR binding libraries comprising nucleic acids encoding for scaffolds comprising GPCR binding domains as described herein comprise various lengths of amino acids when translated. In some instances, the length of each of the amino acid fragments or average length of the amino acid synthesized may be at least or about 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, or more than 150 amino acids. In some instances, the length of the amino acid is about 15 to 150, 20 to 145, 25 to 140, 30 to 135, 35 to 130, 40 to 125, 45 to 120, 50 to 115, 55 to 110, 60 to 110, 65 to 105, 70 to 100, or 75 to 95 amino acids. In some instances, the length of the amino acid is about 22 to about 75 amino acids.


GPCR binding libraries comprising de novo synthesized variant sequences encoding for scaffolds comprising GPCR binding domains comprise a number of variant sequences. In some instances, a number of variant sequences is de novo synthesized for a CDR-H1, CDR-H2, CDR-H3, CDR-L1, CDR-L2, CDR-L3, VL, VH, or a combination thereof. In some instances, a number of variant sequences is de novo synthesized for framework element 1 (FW1), framework element 2 (FW2), framework element 3 (FW3), or framework element 4 (FW4). In some instances, a number of variant sequences is de novo synthesized for a GPCR binding domain. For example, the number of variant sequences is about 1 to about 10 sequences for the VH domain, about 108 sequences for the GPCR binding domain, and about 1 to about 44 sequences for the VK domain. See FIG. 2. The number of variant sequences may be at least or about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, or more than 500 sequences. In some instances, the number of variant sequences is about 10 to 300, 25 to 275, 50 to 250, 75 to 225, 100 to 200, or 125 to 150 sequences.


GPCR binding libraries comprising de novo synthesized variant sequences encoding for scaffolds comprising GPCR binding domains comprise improved diversity. For example, variants are generated by placing GPCR binding domain variants in immunoglobulin scaffold variants comprising N-terminal CDR-H3 variations and C-terminal CDR-H3 variations. In some instances, variants include affinity maturation variants. Alternatively or in combination, variants include variants in other regions of the immunoglobulin including, but not limited to, CDR-H1, CDR-H2, CDR-L1, CDR-L2, and CDR-L3. In some instances, the number of variants of the GPCR binding libraries is least or about 104, 105, 106, 107, 108, 109, 1010, or more than 1010 non-identical sequences. For example, a library comprising about 10 variant sequences for a VH region, about 237 variant sequences for a CDR-H3 region, and about 43 variant sequences for a VL and CDR-L3 region comprises 105 non-identical sequences (10×237×43). See FIGS. 4A-4B.


Following synthesis of GPCR binding libraries comprising nucleic acids encoding scaffolds comprising GPCR binding domains, libraries may be used for screening and analysis. For example, libraries are assayed for library displayability and panning. In some instances, displayability is assayed using a selectable tag. Exemplary tags include, but are not limited to, a radioactive label, a fluorescent label, an enzyme, a chemiluminescent tag, a colorimetric tag, an affinity tag or other labels or tags that are known in the art. In some instances, the tag is histidine, polyhistidine, myc, hemagglutinin (HA), or FLAG. For example as seen in FIG. 3, the GPCR binding libraries comprises nucleic acids encoding scaffolds comprising GPCR binding domains with multiple tags such as GFP, FLAG, and Lucy as well as a DNA barcode. In some instances, libraries are assayed by sequencing using various methods including, but not limited to, single-molecule real-time (SMRT) sequencing, Polony sequencing, sequencing by ligation, reversible terminator sequencing, proton detection sequencing, ion semiconductor sequencing, nanopore sequencing, electronic sequencing, pyrosequencing, Maxam-Gilbert sequencing, chain termination (e.g., Sanger) sequencing, +S sequencing, or sequencing by synthesis.


Expression Systems


Provided herein are libraries comprising nucleic acids encoding for scaffolds comprising GPCR binding domains, wherein the libraries have improved specificity, stability, expression, folding, or downstream activity. In some instances, libraries described herein are used for screening and analysis.


Provided herein are libraries comprising nucleic acids encoding for scaffolds comprising GPCR binding domains, wherein the nucleic acid libraries are used for screening and analysis. In some instances, screening and analysis comprises in vitro, in vivo, or ex vivo assays. Cells for screening include primary cells taken from living subjects or cell lines. Cells may be from prokaryotes (e.g., bacteria and fungi) or eukaryotes (e.g., animals and plants). Exemplary animal cells include, without limitation, those from a mouse, rabbit, primate, and insect. In some instances, cells for screening include a cell line including, but not limited to, Chinese Hamster Ovary (CHO) cell line, human embryonic kidney (HEK) cell line, or baby hamster kidney (BHK) cell line. In some instances, nucleic acid libraries described herein may also be delivered to a multicellular organism. Exemplary multicellular organisms include, without limitation, a plant, a mouse, rabbit, primate, and insect.


Nucleic acid libraries described herein may be screened for various pharmacological or pharmacokinetic properties. In some instances, the libraries are screened using in vitro assays, in vivo assays, or ex vivo assays. For example, in vitro pharmacological or pharmacokinetic properties that are screened include, but are not limited to, binding affinity, binding specificity, and binding avidity. Exemplary in vivo pharmacological or pharmacokinetic properties of libraries described herein that are screened include, but are not limited to, therapeutic efficacy, activity, preclinical toxicity properties, clinical efficacy properties, clinical toxicity properties, immunogenicity, potency, and clinical safety properties.


Provided herein are nucleic acid libraries, wherein the nucleic acid libraries may be expressed in a vector. Expression vectors for inserting nucleic acid libraries disclosed herein may comprise eukaryotic or prokaryotic expression vectors. Exemplary expression vectors include, without limitation, mammalian expression vectors: pSF-CMV-NEO-NH2-PPT-3XFLAG, pSF-CMV-NEO—COOH-3XFLAG, pSF-CMV—PURO-NH2-GST-TEV, pSF-OXB20-COOH-TEV-FLAG(R)-6His, pCEP4 pDEST27, pSF-CMV-Ub-KrYFP, pSF-CMV-FMDV-daGFP, pEF1a-mCherry-N1 Vector, pEF1a-tdTomato Vector, pSF-CMV-FMDV-Hygro, pSF-CMV-PGK-Puro, pMCP-tag(m), and pSF-CMV—PURO-NH2-CMYC; bacterial expression vectors: pSF-OXB20-BetaGal,pSF-OXB20-Fluc, pSF-OXB20, and pSF-Tac; plant expression vectors: pRI 101-AN DNA and pCambia2301; and yeast expression vectors: pTYB21 and pKLAC2, and insect vectors: pAc5.1N5-His A and pDEST8. In some instances, the vector is pcDNA3 or pcDNA3.1.


Described herein are nucleic acid libraries that are expressed in a vector to generate a construct comprising a scaffold comprising sequences of GPCR binding domains. In some instances, a size of the construct varies. In some instances, the construct comprises at least or about 500, 600, 700, 800, 900, 1000, 1100, 1300, 1400, 1500, 1600, 1700, 1800, 2000, 2400, 2600, 2800, 3000, 3200, 3400, 3600, 3800, 4000, 4200, 4400, 4600, 4800, 5000, 6000, 7000, 8000, 9000, 10000, or more than 10000 bases. In some instances, a the construct comprises a range of about 300 to 1,000, 300 to 2,000, 300 to 3,000, 300 to 4,000, 300 to 5,000, 300 to 6,000, 300 to 7,000, 300 to 8,000, 300 to 9,000, 300 to 10,000, 1,000 to 2,000, 1,000 to 3,000, 1,000 to 4,000, 1,000 to 5,000, 1,000 to 6,000, 1,000 to 7,000, 1,000 to 8,000, 1,000 to 9,000, 1,000 to 10,000, 2,000 to 3,000, 2,000 to 4,000, 2,000 to 5,000, 2,000 to 6,000, 2,000 to 7,000, 2,000 to 8,000, 2,000 to 9,000, 2,000 to 10,000, 3,000 to 4,000, 3,000 to 5,000, 3,000 to 6,000, 3,000 to 7,000, 3,000 to 8,000, 3,000 to 9,000, 3,000 to 10,000, 4,000 to 5,000, 4,000 to 6,000, 4,000 to 7,000, 4,000 to 8,000, 4,000 to 9,000, 4,000 to 10,000, 5,000 to 6,000, 5,000 to 7,000, 5,000 to 8,000, 5,000 to 9,000, 5,000 to 10,000, 6,000 to 7,000, 6,000 to 8,000, 6,000 to 9,000, 6,000 to 10,000, 7,000 to 8,000, 7,000 to 9,000, 7,000 to 10,000, 8,000 to 9,000, 8,000 to 10,000, or 9,000 to 10,000 bases.


Provided herein are libraries comprising nucleic acids encoding for scaffolds comprising GPCR binding domains, wherein the nucleic acid libraries are expressed in a cell. In some instances, the libraries are synthesized to express a reporter gene. Exemplary reporter genes include, but are not limited to, acetohydroxyacid synthase (AHAS), alkaline phosphatase (AP), beta galactosidase (LacZ), beta glucoronidase (GUS), chloramphenicol acetyltransferase (CAT), green fluorescent protein (GFP), red fluorescent protein (RFP), yellow fluorescent protein (YFP), cyan fluorescent protein (CFP), cerulean fluorescent protein, citrine fluorescent protein, orange fluorescent protein, cherry fluorescent protein, turquoise fluorescent protein, blue fluorescent protein, horseradish peroxidase (HRP), luciferase (Luc), nopaline synthase (NOS), octopine synthase (OCS), luciferase, and derivatives thereof. Methods to determine modulation of a reporter gene are well known in the art, and include, but are not limited to, fluorometric methods (e.g. fluorescence spectroscopy, Fluorescence Activated Cell Sorting (FACS), fluorescence microscopy), and antibiotic resistance determination.


Diseases and Disorders


Provided herein are GPCR binding libraries comprising nucleic acids encoding for scaffolds comprising GPCR binding domains may have therapeutic effects. In some instances, the GPCR binding libraries result in protein when translated that is used to treat a disease or disorder. In some instances, the protein is an immunoglobulin. In some instances, the protein is a peptidomimetic. Exemplary diseases include, but are not limited to, cancer, inflammatory diseases or disorders, a metabolic disease or disorder, a cardiovascular disease or disorder, a respiratory disease or disorder, pain, a digestive disease or disorder, a reproductive disease or disorder, an endocrine disease or disorder, or a neurological disease or disorder. In some instances, the cancer is a solid cancer or a hematologic cancer. In some instances, an inhibitor of GPCR glucagon like peptide 1 receptor (GLP1R) as described herein is used for treatment of a metabolic disorder. In some instances, an inhibitor of GPCR GLP1R as described herein is used for treatment of weight gain (or for inducing weight loss), treatment of obesity, or treatment of Type II diabetes. In some instances, the subject is a mammal. In some instances, the subject is a mouse, rabbit, dog, or human. Subjects treated by methods described herein may be infants, adults, or children. Pharmaceutical compositions comprising antibodies or antibody fragments as described herein may be administered intravenously or subcutaneously. In some instances, a pharmaceutical composition comprises an antibody or antibody fragment described herein comprising a CDR-H3 comprising a sequence of any one of SEQ ID NOS: 2420 to 2436. In further instances, the pharmaceutical composition is used for treatment of a metabolic disorder.


Variant Libraries


Codon variation.


Variant nucleic acid libraries described herein may comprise a plurality of nucleic acids, wherein each nucleic acid encodes for a variant codon sequence compared to a reference nucleic acid sequence. In some instances, each nucleic acid of a first nucleic acid population contains a variant at a single variant site. In some instances, the first nucleic acid population contains a plurality of variants at a single variant site such that the first nucleic acid population contains more than one variant at the same variant site. The first nucleic acid population may comprise nucleic acids collectively encoding multiple codon variants at the same variant site. The first nucleic acid population may comprise nucleic acids collectively encoding up to 19 or more codons at the same position. The first nucleic acid population may comprise nucleic acids collectively encoding up to 60 variant triplets at the same position, or the first nucleic acid population may comprise nucleic acids collectively encoding up to 61 different triplets of codons at the same position. Each variant may encode for a codon that results in a different amino acid during translation. Table 3 provides a listing of each codon possible (and the representative amino acid) for a variant site.









TABLE 3







List of codons and amino acids











One
Three




letter
letter


Amino Acids
code
code
Codons
















Alanine
A
Ala
GCA
GCC
GCG
GCT











Cysteine
C
Cys
TGC
TGT


Aspartic acid
D
Asp
GAC
GAT


Glutamic acid
E
Glu
GAA
GAG


Phenylalanine
F
Phe
TTC
TTT













Glycine
G
Gly
GGA
GGC
GGG
GGT











Histidine
H
His
CAC
CAT












Isoleucine
I
Iso
ATA
ATC
ATT











Lysine
K
Lys
AAA
AAG















Leucine
L
Leu
TTA
TTG
CTA
CTC
CTG
CTT










Methionine
M
Met
ATG











Asparagine
N
Asn
AAC
AAT













Proline
P
Pro
CCA
CCC
CCG
CCT











Glutamine
Q
Gln
CAA
CAG















Arginine
R
Arg
AGA
AGG
CGA
CGC
CGG
CGT


Serine
S
Ser
AGC
AGT
TCA
TCC
TCG
TCT













Threonine
T
Thr
ACA
ACC
ACG
ACT


Valine
V
Val
GTA
GTC
GTG
GTT










Tryptophan
W
Trp
TGG











Tyrosine
Y
Tyr
TAC
TAT









A nucleic acid population may comprise varied nucleic acids collectively encoding up to 20 codon variations at multiple positions. In such cases, each nucleic acid in the population comprises variation for codons at more than one position in the same nucleic acid. In some instances, each nucleic acid in the population comprises variation for codons at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more codons in a single nucleic acid. In some instances, each variant long nucleic acid comprises variation for codons at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more codons in a single long nucleic acid. In some instances, the variant nucleic acid population comprises variation for codons at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more codons in a single nucleic acid. In some instances, the variant nucleic acid population comprises variation for codons in at least about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 or more codons in a single long nucleic acid.


Highly Parallel Nucleic Acid Synthesis


Provided herein is a platform approach utilizing miniaturization, parallelization, and vertical integration of the end-to-end process from polynucleotide synthesis to gene assembly within nanowells on silicon to create a revolutionary synthesis platform. Devices described herein provide, with the same footprint as a 96-well plate, a silicon synthesis platform is capable of increasing throughput by a factor of up to 1,000 or more compared to traditional synthesis methods, with production of up to approximately 1,000,000 or more polynucleotides, or 10,000 or more genes in a single highly-parallelized run.


With the advent of next-generation sequencing, high resolution genomic data has become an important factor for studies that delve into the biological roles of various genes in both normal biology and disease pathogenesis. At the core of this research is the central dogma of molecular biology and the concept of “residue-by-residue transfer of sequential information.” Genomic information encoded in the DNA is transcribed into a message that is then translated into the protein that is the active product within a given biological pathway.


Another exciting area of study is on the discovery, development and manufacturing of therapeutic molecules focused on a highly-specific cellular target. High diversity DNA sequence libraries are at the core of development pipelines for targeted therapeutics. Gene mutants are used to express proteins in a design, build, and test protein engineering cycle that ideally culminates in an optimized gene for high expression of a protein with high affinity for its therapeutic target. As an example, consider the binding pocket of a receptor. The ability to test all sequence permutations of all residues within the binding pocket simultaneously will allow for a thorough exploration, increasing chances of success. Saturation mutagenesis, in which a researcher attempts to generate all possible mutations at a specific site within the receptor, represents one approach to this development challenge. Though costly and time and labor-intensive, it enables each variant to be introduced into each position. In contrast, combinatorial mutagenesis, where a few selected positions or short stretch of DNA may be modified extensively, generates an incomplete repertoire of variants with biased representation.


To accelerate the drug development pipeline, a library with the desired variants available at the intended frequency in the right position available for testing—in other words, a precision library, enables reduced costs as well as turnaround time for screening. Provided herein are methods for synthesizing nucleic acid synthetic variant libraries which provide for precise introduction of each intended variant at the desired frequency. To the end user, this translates to the ability to not only thoroughly sample sequence space but also be able to query these hypotheses in an efficient manner, reducing cost and screening time. Genome-wide editing can elucidate important pathways, libraries where each variant and sequence permutation can be tested for optimal functionality, and thousands of genes can be used to reconstruct entire pathways and genomes to re-engineer biological systems for drug discovery.


In a first example, a drug itself can be optimized using methods described herein. For example, to improve a specified function of an antibody, a variant polynucleotide library encoding for a portion of the antibody is designed and synthesized. A variant nucleic acid library for the antibody can then be generated by processes described herein (e.g., PCR mutagenesis followed by insertion into a vector). The antibody is then expressed in a production cell line and screened for enhanced activity. Example screens include examining modulation in binding affinity to an antigen, stability, or effector function (e.g., ADCC, complement, or apoptosis). Exemplary regions to optimize the antibody include, without limitation, the Fc region, Fab region, variable region of the Fab region, constant region of the Fab region, variable domain of the heavy chain or light chain (VH or VL), and specific complementarity-determining regions (CDRs) of VH or VL.


Nucleic acid libraries synthesized by methods described herein may be expressed in various cells associated with a disease state. Cells associated with a disease state include cell lines, tissue samples, primary cells from a subject, cultured cells expanded from a subject, or cells in a model system. Exemplary model systems include, without limitation, plant and animal models of a disease state.


To identify a variant molecule associated with prevention, reduction or treatment of a disease state, a variant nucleic acid library described herein is expressed in a cell associated with a disease state, or one in which a cell a disease state can be induced. In some instances, an agent is used to induce a disease state in cells. Exemplary tools for disease state induction include, without limitation, a Cre/Lox recombination system, LPS inflammation induction, and streptozotocin to induce hypoglycemia. The cells associated with a disease state may be cells from a model system or cultured cells, as well as cells from a subject having a particular disease condition. Exemplary disease conditions include a bacterial, fungal, viral, autoimmune, or proliferative disorder (e.g., cancer). In some instances, the variant nucleic acid library is expressed in the model system, cell line, or primary cells derived from a subject, and screened for changes in at least one cellular activity. Exemplary cellular activities include, without limitation, proliferation, cycle progression, cell death, adhesion, migration, reproduction, cell signaling, energy production, oxygen utilization, metabolic activity, and aging, response to free radical damage, or any combination thereof


Substrates


Devices used as a surface for polynucleotide synthesis may be in the form of substrates which include, without limitation, homogenous array surfaces, patterned array surfaces, channels, beads, gels, and the like. Provided herein are substrates comprising a plurality of clusters, wherein each cluster comprises a plurality of loci that support the attachment and synthesis of polynucleotides. In some instances, substrates comprise a homogenous array surface. For example, the homogenous array surface is a homogenous plate. The term “locus” as used herein refers to a discrete region on a structure which provides support for polynucleotides encoding for a single predetermined sequence to extend from the surface. In some instances, a locus is on a two dimensional surface, e.g., a substantially planar surface. In some instances, a locus is on a three-dimensional surface, e.g., a well, microwell, channel, or post. In some instances, a surface of a locus comprises a material that is actively functionalized to attach to at least one nucleotide for polynucleotide synthesis, or preferably, a population of identical nucleotides for synthesis of a population of polynucleotides. In some instances, polynucleotide refers to a population of polynucleotides encoding for the same nucleic acid sequence. In some cases, a surface of a substrate is inclusive of one or a plurality of surfaces of a substrate. The average error rates for polynucleotides synthesized within a library described here using the systems and methods provided are often less than 1 in 1000, less than about 1 in 2000, less than about 1 in 3000 or less often without error correction.


Provided herein are surfaces that support the parallel synthesis of a plurality of polynucleotides having different predetermined sequences at addressable locations on a common support. In some instances, a substrate provides support for the synthesis of more than 50, 100, 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2,000; 5,000; 10,000; 20,000; 50,000; 100,000; 200,000; 300,000; 400,000; 500,000; 600,000; 700,000; 800,000; 900,000; 1,000,000; 1,200,000; 1,400,000; 1,600,000; 1,800,000; 2,000,000; 2,500,000; 3,000,000; 3,500,000; 4,000,000; 4,500,000; 5,000,000; 10,000,000 or more non-identical polynucleotides. In some cases, the surfaces provide support for the synthesis of more than 50, 100, 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2,000; 5,000; 10,000; 20,000; 50,000; 100,000; 200,000; 300,000; 400,000; 500,000; 600,000; 700,000; 800,000; 900,000; 1,000,000; 1,200,000; 1,400,000; 1,600,000; 1,800,000; 2,000,000; 2,500,000; 3,000,000; 3,500,000; 4,000,000; 4,500,000; 5,000,000; 10,000,000 or more polynucleotides encoding for distinct sequences. In some instances, at least a portion of the polynucleotides have an identical sequence or are configured to be synthesized with an identical sequence. In some instances, the substrate provides a surface environment for the growth of polynucleotides having at least 80, 90, 100, 120, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500 bases or more.


Provided herein are methods for polynucleotide synthesis on distinct loci of a substrate, wherein each locus supports the synthesis of a population of polynucleotides. In some cases, each locus supports the synthesis of a population of polynucleotides having a different sequence than a population of polynucleotides grown on another locus. In some instances, each polynucleotide sequence is synthesized with 1, 2, 3, 4, 5, 6, 7, 8, 9 or more redundancy across different loci within the same cluster of loci on a surface for polynucleotide synthesis. In some instances, the loci of a substrate are located within a plurality of clusters. In some instances, a substrate comprises at least 10, 500, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 11000, 12000, 13000, 14000, 15000, 20000, 30000, 40000, 50000 or more clusters. In some instances, a substrate comprises more than 2,000; 5,000; 10,000; 100,000; 200,000; 300,000; 400,000; 500,000; 600,000; 700,000; 800,000; 900,000; 1,000,000; 1,100,000; 1,200,000; 1,300,000; 1,400,000; 1,500,000; 1,600,000; 1,700,000; 1,800,000; 1,900,000; 2,000,000; 300,000; 400,000; 500,000; 600,000; 700,000; 800,000; 900,000; 1,000,000; 1,200,000; 1,400,000; 1,600,000; 1,800,000; 2,000,000; 2,500,000; 3,000,000; 3,500,000; 4,000,000; 4,500,000; 5,000,000; or 10,000,000 or more distinct loci. In some instances, a substrate comprises about 10,000 distinct loci. The amount of loci within a single cluster is varied in different instances. In some cases, each cluster includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 130, 150, 200, 300, 400, 500 or more loci. In some instances, each cluster includes about 50-500 loci. In some instances, each cluster includes about 100-200 loci. In some instances, each cluster includes about 100-150 loci. In some instances, each cluster includes about 109, 121, 130 or 137 loci. In some instances, each cluster includes about 19, 20, 61, 64 or more loci. Alternatively or in combination, polynucleotide synthesis occurs on a homogenous array surface.


In some instances, the number of distinct polynucleotides synthesized on a substrate is dependent on the number of distinct loci available in the substrate. In some instances, the density of loci within a cluster or surface of a substrate is at least or about 1, 10, 25, 50, 65, 75, 100, 130, 150, 175, 200, 300, 400, 500, 1,000 or more loci per mm2. In some cases, a substrate comprises 10-500, 25-400, 50-500, 100-500, 150-500, 10-250, 50-250, 10-200, or 50-200 mm2. In some instances, the distance between the centers of two adjacent loci within a cluster or surface is from about 10-500, from about 10-200, or from about 10-100 um. In some instances, the distance between two centers of adjacent loci is greater than about 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100 um. In some instances, the distance between the centers of two adjacent loci is less than about 200, 150, 100, 80, 70, 60, 50, 40, 30, 20 or 10 um. In some instances, each locus has a width of about 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100 um. In some cases, each locus is has a width of about 0.5-100, 0.5-50, 10-75, or 0.5-50 um.


In some instances, the density of clusters within a substrate is at least or about 1 cluster per 100 mm2, 1 cluster per 10 mm2, 1 cluster per 5 mm2, 1 cluster per 4 mm2, 1 cluster per 3 mm2, 1 cluster per 2 mm2, 1 cluster per 1 mm2, 2 clusters per 1 mm2, 3 clusters per 1 mm2, 4 clusters per 1 mm2, 5 clusters per 1 mm2, 10 clusters per 1 mm2, 50 clusters per 1 mm2 or more. In some instances, a substrate comprises from about 1 cluster per 10 mm2 to about 10 clusters per 1 mm2. In some instances, the distance between the centers of two adjacent clusters is at least or about 50, 100, 200, 500, 1000, 2000, or 5000 um. In some cases, the distance between the centers of two adjacent clusters is between about 50-100, 50-200, 50-300, 50-500, and 100-2000 um. In some cases, the distance between the centers of two adjacent clusters is between about 0.05-50, 0.05-10, 0.05-5, 0.05-4, 0.05-3, 0.05-2, 0.1-10, 0.2-10, 0.3-10, 0.4-10, 0.5-10, 0.5-5, or 0.5-2 mm. In some cases, each cluster has a cross section of about 0.5 to about 2, about 0.5 to about 1, or about 1 to about 2 mm. In some cases, each cluster has a cross section of about 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9 or 2 mm. In some cases, each cluster has an interior cross section of about 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.15, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9 or 2 mm.


In some instances, a substrate is about the size of a standard 96 well plate, for example between about 100 and about 200 mm by between about 50 and about 150 mm. In some instances, a substrate has a diameter less than or equal to about 1000, 500, 450, 400, 300, 250, 200, 150, 100 or 50 mm. In some instances, the diameter of a substrate is between about 25-1000, 25-800, 25-600, 25-500, 25-400, 25-300, or 25-200 mm. In some instances, a substrate has a planar surface area of at least about 100; 200; 500; 1,000; 2,000; 5,000; 10,000; 12,000; 15,000; 20,000; 30,000; 40,000; 50,000 mm2 or more. In some instances, the thickness of a substrate is between about 50-2000, 50-1000, 100-1000, 200-1000, or 250-1000 mm.


Surface Materials


Substrates, devices, and reactors provided herein are fabricated from any variety of materials suitable for the methods, compositions, and systems described herein. In certain instances, substrate materials are fabricated to exhibit a low level of nucleotide binding. In some instances, substrate materials are modified to generate distinct surfaces that exhibit a high level of nucleotide binding. In some instances, substrate materials are transparent to visible and/or UV light. In some instances, substrate materials are sufficiently conductive, e.g., are able to form uniform electric fields across all or a portion of a substrate. In some instances, conductive materials are connected to an electric ground. In some instances, the substrate is heat conductive or insulated. In some instances, the materials are chemical resistant and heat resistant to support chemical or biochemical reactions, for example polynucleotide synthesis reaction processes. In some instances, a substrate comprises flexible materials. For flexible materials, materials can include, without limitation: nylon, both modified and unmodified, nitrocellulose, polypropylene, and the like. In some instances, a substrate comprises rigid materials. For rigid materials, materials can include, without limitation: glass; fuse silica; silicon, plastics (for example polytetraflouroethylene, polypropylene, polystyrene, polycarbonate, and blends thereof, and the like); metals (for example, gold, platinum, and the like). The substrate, solid support or reactors can be fabricated from a material selected from the group consisting of silicon, polystyrene, agarose, dextran, cellulosic polymers, polyacrylamides, polydimethylsiloxane (PDMS), and glass. The substrates/solid supports or the microstructures, reactors therein may be manufactured with a combination of materials listed herein or any other suitable material known in the art.


Surface Architecture


Provided herein are substrates for the methods, compositions, and systems described herein, wherein the substrates have a surface architecture suitable for the methods, compositions, and systems described herein. In some instances, a substrate comprises raised and/or lowered features. One benefit of having such features is an increase in surface area to support polynucleotide synthesis. In some instances, a substrate having raised and/or lowered features is referred to as a three-dimensional substrate. In some cases, a three-dimensional substrate comprises one or more channels. In some cases, one or more loci comprise a channel. In some cases, the channels are accessible to reagent deposition via a deposition device such as a material deposition device. In some cases, reagents and/or fluids collect in a larger well in fluid communication one or more channels. For example, a substrate comprises a plurality of channels corresponding to a plurality of loci with a cluster, and the plurality of channels are in fluid communication with one well of the cluster. In some methods, a library of polynucleotides is synthesized in a plurality of loci of a cluster.


Provided herein are substrates for the methods, compositions, and systems described herein, wherein the substrates are configured for polynucleotide synthesis. In some instances, the structure is configured to allow for controlled flow and mass transfer paths for polynucleotide synthesis on a surface. In some instances, the configuration of a substrate allows for the controlled and even distribution of mass transfer paths, chemical exposure times, and/or wash efficacy during polynucleotide synthesis. In some instances, the configuration of a substrate allows for increased sweep efficiency, for example by providing sufficient volume for a growing polynucleotide such that the excluded volume by the growing polynucleotide does not take up more than 50, 45, 40, 35, 30, 25, 20, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1%, or less of the initially available volume that is available or suitable for growing the polynucleotide. In some instances, a three-dimensional structure allows for managed flow of fluid to allow for the rapid exchange of chemical exposure.


Provided herein are substrates for the methods, compositions, and systems described herein, wherein the substrates comprise structures suitable for the methods, compositions, and systems described herein. In some instances, segregation is achieved by physical structure. In some instances, segregation is achieved by differential functionalization of the surface generating active and passive regions for polynucleotide synthesis. In some instances, differential functionalization is achieved by alternating the hydrophobicity across the substrate surface, thereby creating water contact angle effects that cause beading or wetting of the deposited reagents. Employing larger structures can decrease splashing and cross-contamination of distinct polynucleotide synthesis locations with reagents of the neighboring spots. In some cases, a device, such as a material deposition device, is used to deposit reagents to distinct polynucleotide synthesis locations. Substrates having three-dimensional features are configured in a manner that allows for the synthesis of a large number of polynucleotides (e.g., more than about 10,000) with a low error rate (e.g., less than about 1:500, 1:1000, 1:1500, 1:2,000, 1:3,000, 1:5,000, or 1:10,000). In some cases, a substrate comprises features with a density of about or greater than about 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 300, 400 or 500 features per mm2.


A well of a substrate may have the same or different width, height, and/or volume as another well of the substrate. A channel of a substrate may have the same or different width, height, and/or volume as another channel of the substrate. In some instances, the diameter of a cluster or the diameter of a well comprising a cluster, or both, is between about 0.05-50, 0.05-10, 0.05-5, 0.05-4, 0.05-3, 0.05-2, 0.05-1, 0.05-0.5, 0.05-0.1, 0.1-10, 0.2-10, 0.3-10, 0.4-10, 0.5-10, 0.5-5, or 0.5-2 mm. In some instances, the diameter of a cluster or well or both is less than or about 5, 4, 3, 2, 1, 0.5, 0.1, 0.09, 0.08, 0.07, 0.06, or 0.05 mm. In some instances, the diameter of a cluster or well or both is between about 1.0 and 1.3 mm. In some instances, the diameter of a cluster or well, or both is about 1.150 mm. In some instances, the diameter of a cluster or well, or both is about 0.08 mm. The diameter of a cluster refers to clusters within a two-dimensional or three-dimensional substrate.


In some instances, the height of a well is from about 20-1000, 50-1000, 100-1000, 200-1000, 300-1000, 400-1000, or 500-1000 um. In some cases, the height of a well is less than about 1000, 900, 800, 700, or 600 um.


In some instances, a substrate comprises a plurality of channels corresponding to a plurality of loci within a cluster, wherein the height or depth of a channel is 5-500, 5-400, 5-300, 5-200, 5-100, 5-50, or 10-50 um. In some cases, the height of a channel is less than 100, 80, 60, 40, or 20 um.


In some instances, the diameter of a channel, locus (e.g., in a substantially planar substrate) or both channel and locus (e.g., in a three-dimensional substrate wherein a locus corresponds to a channel) is from about 1-1000, 1-500, 1-200, 1-100, 5-100, or 10-100 um, for example, about 90, 80, 70, 60, 50, 40, 30, 20 or 10 um. In some instances, the diameter of a channel, locus, or both channel and locus is less than about 100, 90, 80, 70, 60, 50, 40, 30, 20 or 10 um. In some instances, the distance between the center of two adjacent channels, loci, or channels and loci is from about 1-500, 1-200, 1-100, 5-200, 5-100, 5-50, or 5-30, for example, about 20 um.


Surface Modifications


Provided herein are methods for polynucleotide synthesis on a surface, wherein the surface comprises various surface modifications. In some instances, the surface modifications are employed for the chemical and/or physical alteration of a surface by an additive or subtractive process to change one or more chemical and/or physical properties of a substrate surface or a selected site or region of a substrate surface. For example, surface modifications include, without limitation, (1) changing the wetting properties of a surface, (2) functionalizing a surface, i.e., providing, modifying or substituting surface functional groups, (3) defunctionalizing a surface, i.e., removing surface functional groups, (4) otherwise altering the chemical composition of a surface, e.g., through etching, (5) increasing or decreasing surface roughness, (6) providing a coating on a surface, e.g., a coating that exhibits wetting properties that are different from the wetting properties of the surface, and/or (7) depositing particulates on a surface.


In some cases, the addition of a chemical layer on top of a surface (referred to as adhesion promoter) facilitates structured patterning of loci on a surface of a substrate. Exemplary surfaces for application of adhesion promotion include, without limitation, glass, silicon, silicon dioxide and silicon nitride. In some cases, the adhesion promoter is a chemical with a high surface energy. In some instances, a second chemical layer is deposited on a surface of a substrate. In some cases, the second chemical layer has a low surface energy. In some cases, surface energy of a chemical layer coated on a surface supports localization of droplets on the surface. Depending on the patterning arrangement selected, the proximity of loci and/or area of fluid contact at the loci are alterable.


In some instances, a substrate surface, or resolved loci, onto which nucleic acids or other moieties are deposited, e.g., for polynucleotide synthesis, are smooth or substantially planar (e.g., two-dimensional) or have irregularities, such as raised or lowered features (e.g., three-dimensional features). In some instances, a substrate surface is modified with one or more different layers of compounds. Such modification layers of interest include, without limitation, inorganic and organic layers such as metals, metal oxides, polymers, small organic molecules and the like.


In some instances, resolved loci of a substrate are functionalized with one or more moieties that increase and/or decrease surface energy. In some cases, a moiety is chemically inert. In some cases, a moiety is configured to support a desired chemical reaction, for example, one or more processes in a polynucleotide synthesis reaction. The surface energy, or hydrophobicity, of a surface is a factor for determining the affinity of a nucleotide to attach onto the surface. In some instances, a method for substrate functionalization comprises: (a) providing a substrate having a surface that comprises silicon dioxide; and (b) silanizing the surface using, a suitable silanizing agent described herein or otherwise known in the art, for example, an organofunctional alkoxysilane molecule. Methods and functionalizing agents are described in U.S. Pat. No. 5,474,796, which is herein incorporated by reference in its entirety.


In some instances, a substrate surface is functionalized by contact with a derivatizing composition that contains a mixture of silanes, under reaction conditions effective to couple the silanes to the substrate surface, typically via reactive hydrophilic moieties present on the substrate surface. Silanization generally covers a surface through self-assembly with organofunctional alkoxysilane molecules. A variety of siloxane functionalizing reagents can further be used as currently known in the art, e.g., for lowering or increasing surface energy. The organofunctional alkoxysilanes are classified according to their organic functions.


Polynucleotide Synthesis


Methods of the current disclosure for polynucleotide synthesis may include processes involving phosphoramidite chemistry. In some instances, polynucleotide synthesis comprises coupling a base with phosphoramidite. Polynucleotide synthesis may comprise coupling a base by deposition of phosphoramidite under coupling conditions, wherein the same base is optionally deposited with phosphoramidite more than once, i.e., double coupling. Polynucleotide synthesis may comprise capping of unreacted sites. In some instances, capping is optional. Polynucleotide synthesis may also comprise oxidation or an oxidation step or oxidation steps. Polynucleotide synthesis may comprise deblocking, detritylation, and sulfurization. In some instances, polynucleotide synthesis comprises either oxidation or sulfurization. In some instances, between one or each step during a polynucleotide synthesis reaction, the device is washed, for example, using tetrazole or acetonitrile. Time frames for any one step in a phosphoramidite synthesis method may be less than about 2 min, 1 min, 50 sec, 40 sec, 30 sec, 20 sec and 10 sec.


Polynucleotide synthesis using a phosphoramidite method may comprise a subsequent addition of a phosphoramidite building block (e.g., nucleoside phosphoramidite) to a growing polynucleotide chain for the formation of a phosphite triester linkage. Phosphoramidite polynucleotide synthesis proceeds in the 3′ to 5′ direction. Phosphoramidite polynucleotide synthesis allows for the controlled addition of one nucleotide to a growing nucleic acid chain per synthesis cycle. In some instances, each synthesis cycle comprises a coupling step. Phosphoramidite coupling involves the formation of a phosphite triester linkage between an activated nucleoside phosphoramidite and a nucleoside bound to the substrate, for example, via a linker. In some instances, the nucleoside phosphoramidite is provided to the device activated. In some instances, the nucleoside phosphoramidite is provided to the device with an activator. In some instances, nucleoside phosphoramidites are provided to the device in a 1.5, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100-fold excess or more over the substrate-bound nucleosides. In some instances, the addition of nucleoside phosphoramidite is performed in an anhydrous environment, for example, in anhydrous acetonitrile. Following addition of a nucleoside phosphoramidite, the device is optionally washed. In some instances, the coupling step is repeated one or more additional times, optionally with a wash step between nucleoside phosphoramidite additions to the substrate. In some instances, a polynucleotide synthesis method used herein comprises 1, 2, 3 or more sequential coupling steps. Prior to coupling, in many cases, the nucleoside bound to the device is de-protected by removal of a protecting group, where the protecting group functions to prevent polymerization. A common protecting group is 4,4′-dimethoxytrityl (DMT).


Following coupling, phosphoramidite polynucleotide synthesis methods optionally comprise a capping step. In a capping step, the growing polynucleotide is treated with a capping agent. A capping step is useful to block unreacted substrate-bound 5′—OH groups after coupling from further chain elongation, preventing the formation of polynucleotides with internal base deletions. Further, phosphoramidites activated with 1H-tetrazole may react, to a small extent, with the O6 position of guanosine. Without being bound by theory, upon oxidation with I2/water, this side product, possibly via O6-N7 migration, may undergo depurination. The apurinic sites may end up being cleaved in the course of the final deprotection of the polynucleotide thus reducing the yield of the full-length product. The O6 modifications may be removed by treatment with the capping reagent prior to oxidation with I2/water. In some instances, inclusion of a capping step during polynucleotide synthesis decreases the error rate as compared to synthesis without capping. As an example, the capping step comprises treating the substrate-bound polynucleotide with a mixture of acetic anhydride and 1-methylimidazole. Following a capping step, the device is optionally washed.


In some instances, following addition of a nucleoside phosphoramidite, and optionally after capping and one or more wash steps, the device bound growing nucleic acid is oxidized. The oxidation step comprises the phosphite triester is oxidized into a tetracoordinated phosphate triester, a protected precursor of the naturally occurring phosphate diester internucleoside linkage. In some instances, oxidation of the growing polynucleotide is achieved by treatment with iodine and water, optionally in the presence of a weak base (e.g., pyridine, lutidine, collidine). Oxidation may be carried out under anhydrous conditions using, e.g. tert-Butyl hydroperoxide or (1S)-(+)-(10-camphorsulfonyl)-oxaziridine (CSO). In some methods, a capping step is performed following oxidation. A second capping step allows for device drying, as residual water from oxidation that may persist can inhibit subsequent coupling. Following oxidation, the device and growing polynucleotide is optionally washed. In some instances, the step of oxidation is substituted with a sulfurization step to obtain polynucleotide phosphorothioates, wherein any capping steps can be performed after the sulfurization. Many reagents are capable of the efficient sulfur transfer, including but not limited to 3-(Dimethylaminomethylidene)amino)-3H-1,2,4-dithiazole-3-thione, DDTT, 3H-1,2-benzodithiol-3-one 1,1-dioxide, also known as Beaucage reagent, and N,N,N′N′-Tetraethylthiuram disulfide (TETD).


In order for a subsequent cycle of nucleoside incorporation to occur through coupling, the protected 5′ end of the device bound growing polynucleotide is removed so that the primary hydroxyl group is reactive with a next nucleoside phosphoramidite. In some instances, the protecting group is DMT and deblocking occurs with trichloroacetic acid in dichloromethane. Conducting detritylation for an extended time or with stronger than recommended solutions of acids may lead to increased depurination of solid support-bound polynucleotide and thus reduces the yield of the desired full-length product. Methods and compositions of the disclosure described herein provide for controlled deblocking conditions limiting undesired depurination reactions. In some instances, the device bound polynucleotide is washed after deblocking. In some instances, efficient washing after deblocking contributes to synthesized polynucleotides having a low error rate.


Methods for the synthesis of polynucleotides typically involve an iterating sequence of the following steps: application of a protected monomer to an actively functionalized surface (e.g., locus) to link with either the activated surface, a linker or with a previously deprotected monomer; deprotection of the applied monomer so that it is reactive with a subsequently applied protected monomer; and application of another protected monomer for linking. One or more intermediate steps include oxidation or sulfurization. In some instances, one or more wash steps precede or follow one or all of the steps.


Methods for phosphoramidite-based polynucleotide synthesis comprise a series of chemical steps. In some instances, one or more steps of a synthesis method involve reagent cycling, where one or more steps of the method comprise application to the device of a reagent useful for the step. For example, reagents are cycled by a series of liquid deposition and vacuum drying steps. For substrates comprising three-dimensional features such as wells, microwells, channels and the like, reagents are optionally passed through one or more regions of the device via the wells and/or channels.


Methods and systems described herein relate to polynucleotide synthesis devices for the synthesis of polynucleotides. The synthesis may be in parallel. For example, at least or about at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 1000, 10000, 50000, 75000, 100000 or more polynucleotides can be synthesized in parallel. The total number polynucleotides that may be synthesized in parallel may be from 2-100000, 3-50000, 4-10000, 5-1000, 6-900, 7-850, 8-800, 9-750, 10-700, 11-650, 12-600, 13-550, 14-500, 15-450, 16-400, 17-350, 18-300, 19-250, 20-200, 21-150,22-100, 23-50, 24-45, 25-40, 30-35. Those of skill in the art appreciate that the total number of polynucleotides synthesized in parallel may fall within any range bound by any of these values, for example 25-100. The total number of polynucleotides synthesized in parallel may fall within any range defined by any of the values serving as endpoints of the range. Total molar mass of polynucleotides synthesized within the device or the molar mass of each of the polynucleotides may be at least or at least about 10, 20, 30, 40, 50, 100, 250, 500, 750, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 25000, 50000, 75000, 100000 picomoles, or more. The length of each of the polynucleotides or average length of the polynucleotides within the device may be at least or about at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 100, 150, 200, 300, 400, 500 nucleotides, or more. The length of each of the polynucleotides or average length of the polynucleotides within the device may be at most or about at most 500, 400, 300, 200, 150, 100, 50, 45, 35, 30, 25, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10 nucleotides, or less. The length of each of the polynucleotides or average length of the polynucleotides within the device may fall from 10-500, 9-400, 11-300, 12-200, 13-150, 14-100, 15-50, 16-45, 17-40, 18-35, 19-25. Those of skill in the art appreciate that the length of each of the polynucleotides or average length of the polynucleotides within the device may fall within any range bound by any of these values, for example 100-300. The length of each of the polynucleotides or average length of the polynucleotides within the device may fall within any range defined by any of the values serving as endpoints of the range.


Methods for polynucleotide synthesis on a surface provided herein allow for synthesis at a fast rate. As an example, at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 70, 80, 90, 100, 125, 150, 175, 200 nucleotides per hour, or more are synthesized. Nucleotides include adenine, guanine, thymine, cytosine, uridine building blocks, or analogs/modified versions thereof. In some instances, libraries of polynucleotides are synthesized in parallel on substrate. For example, a device comprising about or at least about 100; 1,000; 10,000; 30,000; 75,000; 100,000; 1,000,000; 2,000,000; 3,000,000; 4,000,000; or 5,000,000 resolved loci is able to support the synthesis of at least the same number of distinct polynucleotides, wherein polynucleotide encoding a distinct sequence is synthesized on a resolved locus. In some instances, a library of polynucleotides is synthesized on a device with low error rates described herein in less than about three months, two months, one month, three weeks, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 days, 24 hours or less. In some instances, larger nucleic acids assembled from a polynucleotide library synthesized with low error rate using the substrates and methods described herein are prepared in less than about three months, two months, one month, three weeks, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 days, 24 hours or less.


In some instances, methods described herein provide for generation of a library of nucleic acids comprising variant nucleic acids differing at a plurality of codon sites. In some instances, a nucleic acid may have 1 site, 2 sites, 3 sites, 4 sites, 5 sites, 6 sites, 7 sites, 8 sites, 9 sites, 10 sites, 11 sites, 12 sites, 13 sites, 14 sites, 15 sites, 16 sites, 17 sites 18 sites, 19 sites, 20 sites, 30 sites, 40 sites, 50 sites, or more of variant codon sites.


In some instances, the one or more sites of variant codon sites may be adjacent. In some instances, the one or more sites of variant codon sites may not be adjacent and separated by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codons.


In some instances, a nucleic acid may comprise multiple sites of variant codon sites, wherein all the variant codon sites are adjacent to one another, forming a stretch of variant codon sites. In some instances, a nucleic acid may comprise multiple sites of variant codon sites, wherein none the variant codon sites are adjacent to one another. In some instances, a nucleic acid may comprise multiple sites of variant codon sites, wherein some the variant codon sites are adjacent to one another, forming a stretch of variant codon sites, and some of the variant codon sites are not adjacent to one another.


Referring to the Figures, FIG. 7 illustrates an exemplary process workflow for synthesis of nucleic acids (e.g., genes) from shorter nucleic acids. The workflow is divided generally into phases: (1) de novo synthesis of a single stranded nucleic acid library, (2) joining nucleic acids to form larger fragments, (3) error correction, (4) quality control, and (5) shipment. Prior to de novo synthesis, an intended nucleic acid sequence or group of nucleic acid sequences is preselected. For example, a group of genes is preselected for generation.


Once large nucleic acids for generation are selected, a predetermined library of nucleic acids is designed for de novo synthesis. Various suitable methods are known for generating high density polynucleotide arrays. In the workflow example, a device surface layer is provided. In the example, chemistry of the surface is altered in order to improve the polynucleotide synthesis process. Areas of low surface energy are generated to repel liquid while areas of high surface energy are generated to attract liquids. The surface itself may be in the form of a planar surface or contain variations in shape, such as protrusions or microwells which increase surface area. In the workflow example, high surface energy molecules selected serve a dual function of supporting DNA chemistry, as disclosed in International Patent Application Publication WO/2015/021080, which is herein incorporated by reference in its entirety.


In situ preparation of polynucleotide arrays is generated on a solid support and utilizes single nucleotide extension process to extend multiple oligomers in parallel. A deposition device, such as a material deposition device, is designed to release reagents in a step wise fashion such that multiple polynucleotides extend, in parallel, one residue at a time to generate oligomers with a predetermined nucleic acid sequence 702. In some instances, polynucleotides are cleaved from the surface at this stage. Cleavage includes gas cleavage, e.g., with ammonia or methylamine.


The generated polynucleotide libraries are placed in a reaction chamber. In this exemplary workflow, the reaction chamber (also referred to as “nanoreactor”) is a silicon coated well, containing PCR reagents and lowered onto the polynucleotide library 703. Prior to or after the sealing 704 of the polynucleotides, a reagent is added to release the polynucleotides from the substrate. In the exemplary workflow, the polynucleotides are released subsequent to sealing of the nanoreactor 705. Once released, fragments of single stranded polynucleotides hybridize in order to span an entire long range sequence of DNA. Partial hybridization 705 is possible because each synthesized polynucleotide is designed to have a small portion overlapping with at least one other polynucleotide in the pool.


After hybridization, a PCA reaction is commenced. During the polymerase cycles, the polynucleotides anneal to complementary fragments and gaps are filled in by a polymerase. Each cycle increases the length of various fragments randomly depending on which polynucleotides find each other. Complementarity amongst the fragments allows for forming a complete large span of double stranded DNA 706.


After PCA is complete, the nanoreactor is separated from the device 707 and positioned for interaction with a device having primers for PCR 708. After sealing, the nanoreactor is subject to PCR 709 and the larger nucleic acids are amplified. After PCR 710, the nanochamber is opened 711, error correction reagents are added 712, the chamber is sealed 713 and an error correction reaction occurs to remove mismatched base pairs and/or strands with poor complementarity from the double stranded PCR amplification products 714. The nanoreactor is opened and separated 715. Error corrected product is next subject to additional processing steps, such as PCR and molecular bar coding, and then packaged 722 for shipment 723.


In some instances, quality control measures are taken. After error correction, quality control steps include for example interaction with a wafer having sequencing primers for amplification of the error corrected product 716, sealing the wafer to a chamber containing error corrected amplification product 717, and performing an additional round of amplification 718. The nanoreactor is opened 719 and the products are pooled 720 and sequenced 721. After an acceptable quality control determination is made, the packaged product 722 is approved for shipment 723.


In some instances, a nucleic acid generate by a workflow such as that in FIG. 7 is subject to mutagenesis using overlapping primers disclosed herein. In some instances, a library of primers are generated by in situ preparation on a solid support and utilize single nucleotide extension process to extend multiple oligomers in parallel. A deposition device, such as a material deposition device, is designed to release reagents in a step wise fashion such that multiple polynucleotides extend, in parallel, one residue at a time to generate oligomers with a predetermined nucleic acid sequence 702.


Computer Systems


Any of the systems described herein, may be operably linked to a computer and may be automated through a computer either locally or remotely. In various instances, the methods and systems of the disclosure may further comprise software programs on computer systems and use thereof. Accordingly, computerized control for the synchronization of the dispense/vacuum/refill functions such as orchestrating and synchronizing the material deposition device movement, dispense action and vacuum actuation are within the bounds of the disclosure. The computer systems may be programmed to interface between the user specified base sequence and the position of a material deposition device to deliver the correct reagents to specified regions of the substrate.


The computer system 800 illustrated in FIG. 8 may be understood as a logical apparatus that can read instructions from media 811 and/or a network port 805, which can optionally be connected to server 809 having fixed media 812. The system, such as shown in FIG. 8 can include a CPU 801, disk drives 803, optional input devices such as keyboard 815 and/or mouse 816 and optional monitor 807. Data communication can be achieved through the indicated communication medium to a server at a local or a remote location. The communication medium can include any means of transmitting and/or receiving data. For example, the communication medium can be a network connection, a wireless connection or an internet connection. Such a connection can provide for communication over the World Wide Web. It is envisioned that data relating to the present disclosure can be transmitted over such networks or connections for reception and/or review by a party 822 as illustrated in FIG. 8.



FIG. 14 is a block diagram illustrating a first example architecture of a computer system 1400 that can be used in connection with example instances of the present disclosure. As depicted in FIG. 14, the example computer system can include a processor 1402 for processing instructions. Non-limiting examples of processors include: Intel Xeon™ processor, AMD Opteron™ processor, Samsung 32-bit RISC ARM 1176JZ(F)-S v1.0TM processor, ARM Cortex-A8 Samsung S5PC100TM processor, ARM Cortex-A8 Apple A4TM processor, Marvell PXA 930TM processor, or a functionally-equivalent processor. Multiple threads of execution can be used for parallel processing. In some instances, multiple processors or processors with multiple cores can also be used, whether in a single computer system, in a cluster, or distributed across systems over a network comprising a plurality of computers, cell phones, and/or personal data assistant devices.


As illustrated in FIG. 9, a high speed cache 904 can be connected to, or incorporated in, the processor 902 to provide a high speed memory for instructions or data that have been recently, or are frequently, used by processor 902. The processor 902 is connected to a north bridge 906 by a processor bus 908. The north bridge 906 is connected to random access memory (RAM) 910 by a memory bus 912 and manages access to the RAM 910 by the processor 902. The north bridge 906 is also connected to a south bridge 914 by a chipset bus 916. The south bridge 914 is, in turn, connected to a peripheral bus 918. The peripheral bus can be, for example, PCI, PCI-X, PCI Express, or other peripheral bus. The north bridge and south bridge are often referred to as a processor chipset and manage data transfer between the processor, RAM, and peripheral components on the peripheral bus 918. In some alternative architectures, the functionality of the north bridge can be incorporated into the processor instead of using a separate north bridge chip. In some instances, system 900 can include an accelerator card 922 attached to the peripheral bus 918. The accelerator can include field programmable gate arrays (FPGAs) or other hardware for accelerating certain processing. For example, an accelerator can be used for adaptive data restructuring or to evaluate algebraic expressions used in extended set processing.


Software and data are stored in external storage 924 and can be loaded into RAM 910 and/or cache 904 for use by the processor. The system 900 includes an operating system for managing system resources; non-limiting examples of operating systems include: Linux, Windows™, MACOS™, BlackBerry OS™, iOS™, and other functionally-equivalent operating systems, as well as application software running on top of the operating system for managing data storage and optimization in accordance with example instances of the present disclosure. In this example, system 900 also includes network interface cards (NICs) 920 and 921 connected to the peripheral bus for providing network interfaces to external storage, such as Network Attached Storage (NAS) and other computer systems that can be used for distributed parallel processing.



FIG. 10 is a diagram showing a network 1000 with a plurality of computer systems 1002a, and 1002b, a plurality of cell phones and personal data assistants 1002c, and Network Attached Storage (NAS) 1004a, and 1004b. In example instances, systems 1002a, 1002b, and 1002c can manage data storage and optimize data access for data stored in Network Attached Storage (NAS) 1004a and 1004b. A mathematical model can be used for the data and be evaluated using distributed parallel processing across computer systems 1002a, and 1002b, and cell phone and personal data assistant systems 1002c. Computer systems 1002a, and 1002b, and cell phone and personal data assistant systems 1002c can also provide parallel processing for adaptive data restructuring of the data stored in Network Attached Storage (NAS) 1004a and 1004b. FIG. 10 illustrates an example only, and a wide variety of other computer architectures and systems can be used in conjunction with the various instances of the present disclosure. For example, a blade server can be used to provide parallel processing. Processor blades can be connected through a back plane to provide parallel processing. Storage can also be connected to the back plane or as Network Attached Storage (NAS) through a separate network interface. In some example instances, processors can maintain separate memory spaces and transmit data through network interfaces, back plane or other connectors for parallel processing by other processors. In other instances, some or all of the processors can use a shared virtual address memory space.



FIG. 11 is a block diagram of a multiprocessor computer system 1100 using a shared virtual address memory space in accordance with an example instance. The system includes a plurality of processors 1102a-f that can access a shared memory subsystem 1104. The system incorporates a plurality of programmable hardware memory algorithm processors (MAPs) 1106a-f in the memory subsystem 1104. Each MAP 1106a-f can comprise a memory 1108a-f and one or more field programmable gate arrays (FPGAs) 1110a-f. The MAP provides a configurable functional unit and particular algorithms or portions of algorithms can be provided to the FPGAs 1110a-f for processing in close coordination with a respective processor. For example, the MAPs can be used to evaluate algebraic expressions regarding the data model and to perform adaptive data restructuring in example instances. In this example, each MAP is globally accessible by all of the processors for these purposes. In one configuration, each MAP can use Direct Memory Access (DMA) to access an associated memory 1108a-f, allowing it to execute tasks independently of, and asynchronously from the respective microprocessor 1102a-f In this configuration, a MAP can feed results directly to another MAP for pipelining and parallel execution of algorithms.


The above computer architectures and systems are examples only, and a wide variety of other computer, cell phone, and personal data assistant architectures and systems can be used in connection with example instances, including systems using any combination of general processors, co-processors, FPGAs and other programmable logic devices, system on chips (SOCs), application specific integrated circuits (ASICs), and other processing and logic elements. In some instances, all or part of the computer system can be implemented in software or hardware. Any variety of data storage media can be used in connection with example instances, including random access memory, hard drives, flash memory, tape drives, disk arrays, Network Attached Storage (NAS) and other local or distributed data storage devices and systems.


In example instances, the computer system can be implemented using software modules executing on any of the above or other computer architectures and systems. In other instances, the functions of the system can be implemented partially or completely in firmware, programmable logic devices such as field programmable gate arrays (FPGAs) as referenced in FIG. 9, system on chips (SOCs), application specific integrated circuits (ASICs), or other processing and logic elements. For example, the Set Processor and Optimizer can be implemented with hardware acceleration through the use of a hardware accelerator card, such as accelerator card 922 illustrated in FIG. 9.


The following examples are set forth to illustrate more clearly the principle and practice of embodiments disclosed herein to those skilled in the art and are not to be construed as limiting the scope of any claimed embodiments. Unless otherwise stated, all parts and percentages are on a weight basis.


EXAMPLES

The following examples are given for the purpose of illustrating various embodiments of the disclosure and are not meant to limit the present disclosure in any fashion. The present examples, along with the methods described herein are presently representative of preferred embodiments, are exemplary, and are not intended as limitations on the scope of the disclosure. Changes therein and other uses which are encompassed within the spirit of the disclosure as defined by the scope of the claims will occur to those skilled in the art.


Example 1: Functionalization of a Device Surface

A device was functionalized to support the attachment and synthesis of a library of polynucleotides. The device surface was first wet cleaned using a piranha solution comprising 90% H2SO4 and 10% H2O2 for 20 minutes. The device was rinsed in several beakers with DI water, held under a DI water gooseneck faucet for 5 min, and dried with N2. The device was subsequently soaked in NH4OH (1:100; 3 mL:300 mL) for 5 min, rinsed with DI water using a handgun, soaked in three successive beakers with DI water for 1 min each, and then rinsed again with DI water using the handgun. The device was then plasma cleaned by exposing the device surface to O2. A SAMCO PC-300 instrument was used to plasma etch O2 at 250 watts for 1 min in downstream mode.


The cleaned device surface was actively functionalized with a solution comprising N-(3-triethoxysilylpropyl)-4-hydroxybutyramide using a YES-1224P vapor deposition oven system with the following parameters: 0.5 to 1 torr, 60 min, 70° C., 135° C. vaporizer. The device surface was resist coated using a Brewer Science 200X spin coater. SPR™ 3612 photoresist was spin coated on the device at 2500 rpm for 40 sec. The device was pre-baked for 30 min at 90° C. on a Brewer hot plate. The device was subjected to photolithography using a Karl Suss MA6 mask aligner instrument. The device was exposed for 2.2 sec and developed for 1 min in MSF 26A. Remaining developer was rinsed with the handgun and the device soaked in water for 5 min. The device was baked for 30 min at 100° C. in the oven, followed by visual inspection for lithography defects using a Nikon L200. A descum process was used to remove residual resist using the SAMCO PC-300 instrument to O2 plasma etch at 250 watts for 1 min.


The device surface was passively functionalized with a 100 μL solution of perfluorooctyltrichlorosilane mixed with 10 μL light mineral oil. The device was placed in a chamber, pumped for 10 min, and then the valve was closed to the pump and left to stand for 10 min. The chamber was vented to air. The device was resist stripped by performing two soaks for 5 min in 500 mL NMP at 70° C. with ultrasonication at maximum power (9 on Crest system). The device was then soaked for 5 min in 500 mL isopropanol at room temperature with ultrasonication at maximum power. The device was dipped in 300 mL of 200 proof ethanol and blown dry with N2. The functionalized surface was activated to serve as a support for polynucleotide synthesis.


Example 2: Synthesis of a 50-Mer Sequence on an Oligonucleotide Synthesis Device

A two dimensional oligonucleotide synthesis device was assembled into a flowcell, which was connected to a flowcell (Applied Biosystems (ABI394 DNA Synthesizer”). The two-dimensional oligonucleotide synthesis device was uniformly functionalized with N-(3-TRIETHOXYSILYLPROPYL)-4-HYDROXYBUTYRAMIDE (Gelest) was used to synthesize an exemplary polynucleotide of 50 bp (“50-mer polynucleotide”) using polynucleotide synthesis methods described herein.


The sequence of the 50-mer was as described in SEQ ID NO.: 48. 5′AGACAATCAACCATTTGGGGTGGACAGCCTTGACCTCTAGACTTCGGCAT⊥⊥TTTTTTT TTT3′ (SEQ ID NO.: 48), where # denotes Thymidine-succinyl hexamide CED phosphoramidite (CLP-2244 from ChemGenes), which is a cleavable linker enabling the release of oligos from the surface during deprotection.


The synthesis was done using standard DNA synthesis chemistry (coupling, capping, oxidation, and deblocking) according to the protocol in Table 4 and an ABI synthesizer.









TABLE 4







Synthesis protocols









General DNA Synthesis

Time


Process Name
Process Step
(sec)












WASH (Acetonitrile Wash
Acetonitrile System Flush
4


Flow)
Acetonitrile to Flowcell
23



N2 System Flush
4



Acetonitrile System Flush
4


DNA BASE ADDITION
Activator Manifold Flush
2


(Phosphoramidite +
Activator to Flowcell
6


Activator Flow)
Activator +
6



Phosphoramidite to



Flowcell



Activator to Flowcell
0.5



Activator +
5



Phosphoramidite to



Flowcell



Activator to Flowcell
0.5



Activator +
5



Phosphoramidite to



Flowcell



Activator to Flowcell
0.5



Activator +
5



Phosphoramidite to



Flowcell



Incubate for 25 sec
25


WASH (Acetonitrile Wash
Acetonitrile System Flush
4


Flow)
Acetonitrile to Flowcell
15



N2 System Flush
4



Acetonitrile System Flush
4


DNA BASE ADDITION
Activator Manifold Flush
2


(Phosphoramidite +
Activator to Flowcell
5


Activator Flow)
Activator +
18



Phosphoramidite to



Flowcell



Incubate for 25 sec
25


WASH (Acetonitrile Wash
Acetonitrile System Flush
4


Flow)
Acetonitrile to Flowcell
15



N2 System Flush
4



Acetonitrile System Flush
4


CAPPING (CapA + B, 1:1,
CapA + B to Flowcell
15


Flow)


WASH (Acetonitrile Wash
Acetonitrile System Flush
4


Flow)
Acetonitrile to Flowcell
15



Acetonitrile System Flush
4


OXIDATION (Oxidizer
Oxidizer to Flowcell
18


Flow)


WASH (Acetonitrile Wash
Acetonitrile System Flush
4


Flow)
N2 System Flush
4



Acetonitrile System Flush
4



Acetonitrile to Flowcell
15



Acetonitrile System Flush
4



Acetonitrile to Flowcell
15



N2 System Flush
4



Acetonitrile System Flush
4



Acetonitrile to Flowcell
23



N2 System Flush
4



Acetonitrile System Flush
4


DEBLOCKING (Deblock
Deblock to Flowcell
36


Flow)


WASH (Acetonitrile Wash
Acetonitrile System Flush
4


Flow)
N2 System Flush
4



Acetonitrile System Flush
4



Acetonitrile to Flowcell
18



N2 System Flush
4.13



Acetonitrile System Flush
4.13



Acetonitrile to Flowcell
15









The phosphoramidite/activator combination was delivered similar to the delivery of bulk reagents through the flowcell. No drying steps were performed as the environment stays “wet” with reagent the entire time.


The flow restrictor was removed from the ABI 394 synthesizer to enable faster flow. Without flow restrictor, flow rates for amidites (0.1M in ACN), Activator, (0.25M Benzoylthiotetrazole (“BTT”; 30-3070-xx from GlenResearch) in ACN), and Ox (0.02M 12 in 20% pyridine, 10% water, and 70% THF) were roughly ˜100 uL/sec, for acetonitrile (“ACN”) and capping reagents (1:1 mix of CapA and CapB, wherein CapA is acetic anhydride in THF/Pyridine and CapB is 16% 1-methylimidizole in THF), roughly ˜200 uL/sec, and for Deblock (3% dichloroacetic acid in toluene), roughly ˜300 uL/sec (compared to ˜50 uL/sec for all reagents with flow restrictor). The time to completely push out Oxidizer was observed, the timing for chemical flow times was adjusted accordingly and an extra ACN wash was introduced between different chemicals. After polynucleotide synthesis, the chip was deprotected in gaseous ammonia overnight at 75 psi. Five drops of water were applied to the surface to recover polynucleotides. The recovered polynucleotides were then analyzed on a BioAnalyzer small RNA chip.


Example 3: Synthesis of a 100-Mer Sequence on an Oligonucleotide Synthesis Device

The same process as described in Example 2 for the synthesis of the 50-mer sequence was used for the synthesis of a 100-mer polynucleotide (“100-mer polynucleotide”; 5′ CGGGATCCTTATCGTCATCGTCGTACAGATCCCGACCCATTTGCTGTCCACCAGTCATG CTAGCCATACCATGATGATGATGATGATGAGAACCCCGCAT ##TTTTTTTTTT3′, where # denotes Thymidine-succinyl hexamide CED phosphoramidite (CLP-2244 from ChemGenes); SEQ ID NO.: 49) on two different silicon chips, the first one uniformly functionalized with N-(3-TRIETHOXYSILYLPROPYL)-4-HYDROXYBUTYRAMIDE and the second one functionalized with 5/95 mix of 11-acetoxyundecyltriethoxysilane and n-decyltriethoxysilane, and the polynucleotides extracted from the surface were analyzed on a BioAnalyzer instrument.


All ten samples from the two chips were further PCR amplified using a forward (5′ATGCGGGGTTCTCATCATC3′; SEQ ID NO.: 50) and a reverse (5′CGGGATCCTTATCGTCATCG3′; SEQ ID NO.: 51) primer in a 50 uL PCR mix (25 uL NEB Q5 mastermix, 2.5 uL 10 uM Forward primer, 2.5 uL 10 uM Reverse primer, 1 uL polynucleotide extracted from the surface, and water up to 50 uL) using the following thermalcycling program:


98° C., 30 sec


98° C., 10 sec; 63° C., 10 sec; 72° C., 10 sec; repeat 12 cycles


72° C., 2 min


The PCR products were also run on a BioAnalyzer, demonstrating sharp peaks at the 100-mer position. Next, the PCR amplified samples were cloned, and Sanger sequenced. Table 5 summarizes the results from the Sanger sequencing for samples taken from spots 1-5 from chip 1 and for samples taken from spots 6-10 from chip 2.









TABLE 5







Sequencing results












Spot

Error rate
Cycle efficiency
















1
1/763
bp
99.87%



2
1/824
bp
99.88%



3
1/780
bp
99.87%



4
1/429
bp
99.77%



5
1/1525
bp
99.93%



6
1/1615
bp
99.94%



7
1/531
bp
99.81%



8
1/1769
bp
99.94%



9
1/854
bp
99.88%



10
1/1451
bp
99.93%










Thus, the high quality and uniformity of the synthesized polynucleotides were repeated on two chips with different surface chemistries. Overall, 89% of the 100-mers that were sequenced were perfect sequences with no errors, corresponding to 233 out of 262.


Table 6 summarizes error characteristics for the sequences obtained from the polynucleotides samples from spots 1-10.









TABLE 6





Error characteristics

















Sample ID/Spot no.














OSA_0046/1
OSA_0047/2
OSA_0048/3
OSA_0049/4
OSA_0050/5
OSA_0051/6





Total
32
32
32
32
32
32


Sequences


Sequencing
25 of 28
27 of 27
26 of 30
21 of 23
25 of 26
29 of 30


Quality


Oligo
23 of 25
25 of 27
22 of 26
18 of 21
24 of 25
25 of 29


Quality


ROI
2500
2698
2561
2122
2499
2666


Match


Count


ROI
2
2
1
3
1
0


Mutation


ROI Multi
0
0
0
0
0
0


Base


Deletion


ROI Small
1
0
0
0
0
0


Insertion


ROI
0
0
0
0
0
0


Single


Base


Deletion


Large
0
0
1
0
0
1


Deletion


Count


Mutation:
2
2
1
2
1
0


G > A


Mutation:
0
0
0
1
0
0


T > C


ROI Error
3
2
2
3
1
1


Count


ROI Error
Err: ~1
Err: ~1
Err: ~1
Err: ~1
Err: ~1
Err: ~1


Rate
in 834
in 1350
in 1282
in 708
in 2500
in 2667


ROI
MP
MP
MP
MP
MP
MP


Minus
Err: ~1
Err: ~1
Err: ~1
Err: ~1
Err: ~1
Err: ~1


Primer
in 763
in 824
in 780
in 429
in 1525
in 1615


Error Rate












Sample ID/Spot no.














OSA_0052/7
OSA_0053/8
OSA_0054/9
OSA_0055/10







Total
32
32
32
32



Sequences



Sequencing
27 of 31
29 of 31
28 of 29
25 of 28



Quality



Oligo
22 of 27
28 of 29
26 of 28
20 of 25



Quality



ROI
2625
2899
2798
2348



Match



Count



ROI
2
1
2
1



Mutation



ROI Multi
0
0
0
0



Base



Deletion



ROI Small
0
0
0
0



Insertion



ROI
0
0
0
0



Single



Base



Deletion



Large
1
0
0
0



Deletion



Count



Mutation:
2
1
2
1



G > A



Mutation:
0
0
0
0



T > C



ROI Error
3
1
2
1



Count



ROI Error
Err: ~1
Err: ~1
Err: ~1
Err: ~1



Rate
in 876
in 2900
in 1400
in 2349



ROI
MP
MP
MP
MP



Minus
Err: ~1
Err: ~1
Err: ~1
Err: ~1



Primer
in 531
in 1769
in 854
in 1451



Error Rate










Example 4: Design of G Protein-Coupled Receptor Binding Domains Based on Conformational Ligand Interactions

G protein-coupled receptor (GPCR) binding domains were designed using interaction surfaces between conformational ligands that interact with GPCRs. Analysis of the interaction surfaces between chemokines and cytokines and the GPCRs indicated that the N-terminal peptide prior to the first conformational cysteine represents the activation peptide, and the core helical and beta-turn-beta topologies mediate interactions with the extracellular domain (ECD) of the GPCR.


An additional 254 GPCR ligands were designed based on cross-searching Uniprot and IUPHAR databases. The ligands represented 112 human, 71 rat, 4 pig, 1 sheep, and 1 cow derived interaction classes. The ligands were then collapsed to the following 101 cross-species ligand sequence annotations: ADM, ADM2, Agouti-related protein, Angiotensinogen, Annexin A1, Apelin, Apelin receptor early, Appetite regulating hormone, Beta-defensin 4A, C-C motif chemokine, C—X—C motif chemokine, Calcitonin, Calcitonin gene-related peptide, Cathepsin G, Cathepsin G (Fragment), Cholecystokinin, Complement C3, Complement C5, Complement C5 (Fragment), Corticoliberin, Cortistatin, Cytokine SCM-1 beta, Endothelin-2, Endothelin-3, Eotaxin, Fractalkine, Galanin peptides, Galanin-like peptide, Gastric inhibitory polypeptide, Gastrin, Gastrin-releasing peptide, Glucagon, Growth-regulated alpha protein, Heme-binding protein 1, Humanin, Insulin-like 3, Insulin-like peptide INSL5, Interleukin-8, Islet amyloid polypeptide, Kininogen-1, Lymphotactin, Metastasis-suppressor KiSS-1, Neurokinin-B, Neuromedin-B, Neuromedin-S, Neuromedin-U, Neuropeptide B, Neuropeptide S, Neuropeptide W, Neurotensin/neuromedin N, Orexigenic neuropeptide QRFP, Orexin, Oxytocin-neurophysin 1, Pancreatic prohormone, Parathyroid hormone, Parathyroid hormone-related protein, Peptide YY, Pituitary adenylate cyclase-activating, Platelet basic protein, Platelet factor 4, Prepronociceptin, Pro-FMRFamide-related neuropeptide FF, Pro-FMRFamide-related neuropeptide VF, Pro-MCH, Pro-neuropeptide Y, Pro-opiomelanocortin, Pro-thyrotropin releasing hormone, Proenkephalin-A, Proenkephalin-B, Progonadoliberin-1, Progonadoliberin-2, Prokineticin-1, Prokineticin-2, Prolactin-releasing peptide, Promotilin, Protachykinin-1, Protein Wnt-2, Protein Wnt-3 a, Protein Wnt-4, Protein Wnt-5a, Protein Wnt-7b, Prothrombin, Proto-oncogene Wnt-1, Protooncogene Wnt-3, Putative uncharacterized protein, RCG55748, Retinoic acid receptor, Secretin, Somatoliberin, Somatostatin, Stromal cell-derived factor, T-kininogen 2, Tuberoinfundibular peptide of, Urocortin, Urocortin-2, Urocortin-3, Urotensin-2, Urotensin-2B, VEGF coregulated chemokine, VIP peptides, and Vasopressin-neurophysin 2-copeptin.


Structural analysis of the ligands was performed and indicated that a majority of them comprised the N-terminal activation peptide of about 11 amino acids. Motif variants were then created by trimming back the N-terminal activation peptide. As seen in Table 7, an exemplary set of variants were created based on the N-terminal activation peptide for stromal derived factor-1. The motif variants were also placed combinatorially at multiple positions in the CDR-H3. A total of 1016 motifs were extracted for placement in the CDR-H3. In addition, the motif variants were provided with variably boundary placement and with 5-20 substring variants that were also placed in the CDR-H3.









TABLE 7







Variant amino acid sequences for


stromal derived factor-1











SEQ





ID





NO.
Variant
Amino Acid Sequence















53
1
ggggSDYKPVSLSYR







54
2
ggggDYKPVSLSYR







55
3
ggggYKPVSLSYR







56
4
ggggKPVSLSYR







57
5
ggggPVSLSYR










As seen in Table 8, an exemplary set of variants were created for interleukin-8 based on the following sequence:











(SEQ ID NO: 52)



MTSKLAVALLAAFLISAALCEGAVLPRSAKELRCQCIKTY







SKPFHPKFIKELRVIESGPHCANTEIIVKLSDGRELCLDP







KENWVQRVVEKFLKRAENS.













TABLE 8







Variant amino acid sequences


for inteleukin-8











SEQ





ID





NO.
Variant
Amino Acid Sequence















58
1
ggggSAALCEGAVLPRSA







59
2
ggggAALCEGAVLPRSA







60
3
ggggALCEGAVLPRSA







61
4
ggggLCEGAVLPRSA







62
5
ggggCEGAVLPRSA







63
6
ggggSAALCEGAVLPRSAKE







64
7
ggggAALCEGAVLPRSAKE







65
8
ggggALCEGAVLPRSAKE







66
9
ggggLCEGAVLPRSAKE







67
10
ggggCEGAVLPRSAKE







68
11
ggggSAALCEGAVLPRSAKELR







69
12
ggggAALCEGAVLPRSAKELR







70
13
ggggALCEGAVLPRSAKELR







71
14
ggggLCEGAVLPRSAKELR







72
15
ggggCEGAVLPRSAKELR










Example 5: Design of G Protein-Coupled Receptor Binding Domains Based on Peptide Ligand Interactions

GPCR binding domains were designed based on interaction surfaces between peptide ligands that interact with class B GPCRs. About 66 different ligands were used and include the following ligand sequence annotations: Adrenomedullin, Amylin, Angiotensin, Angiotensin I, Angiotensin II, Angiotensin III, Apelin, Apstatin, Big Endothelin, Big Gastrin, Bradykinin, Caerulein, Calcitonin, Calcitonin Gene Related Peptide, CGRP, Cholecystokinin, Endothelin, Endothelin 1, Endothelin 2, Endothelin 3, GIP, GIPs, GLP, Galanin, Gastrin, Ghrelin, Glucagon, IAPP, Kisspeptin, Mca, Metasti, Neuromedin, Neuromedin N, Neuropeptide, Neuropeptide F, Neuropeptide Y, Neurotensin, Nociceptin, Orexin, Orexin A, Orphanin, Oxytocin, Oxytocin Galanin, PACAP, PACAPs, PAR (Protease Activated Receptor) Peptides, PAR-1 Agonist, Pramlintide, Scyliorhinin I, Secretin, Senktide, Somatostatin, Somatostatin 14, Somatostatin 28, Substance P, Urotensin II, VIP, VIPs, Vasopressin, Xenin, cinnamoyl, furoyl, gastrin, holecystokinin, α-Mating Factor Pheromone. It was observed that the peptides formed a stabilized interaction with the GPCR extracellular domain (ECD).


Motif variants were generated based on the interaction surface of the peptides with the ECD as well as with the N-terminal GPCR ligand interaction surface. This was done using structural modeling. Exemplary motif variants were created based on glucagon like peptide's interaction with its GPCR as seen in Table 9. The motif variant sequences were generated using the following sequence from glucagon like peptide:











(SEQ ID NO: 73)



HAEGTFTSDVSSYLEGQAAKEFIAWLVKGRG.













TABLE 9







Variant amino acid sequences for


glucagon like peptide









SEQ




ID
Var-



NO.
iant
Amino Acid Sequence





74
1
sggggsggggsggggHAEGTFTSDVSSYLEGQAAKEFIAWLV





75
2
sggggsggggsggggAEGTFTSDVSSYLEGQAAKEFIAWLV





76
3
sggggsggggsggggEGTFTSDVSSYLEGQAAKEFIAWLV





77
4
sggggsggggsggggGTFTSDVSSYLEGQAAKEFIAWLV





78
5
sggggsggggsggggTFTSDVSSYLEGQAAKEFIAWLV





79
6
sggggsggggsggggFTSDVSSYLEGQAAKEFIAWLV





80
7
sggggsggggsggggTSDVSSYLEGQAAKEFIAWLV





81
8
sggggsggggsggggSDVSSYLEGQAAKEFIAWLV





82
9
sggggsggggsggggDVSSYLEGQAAKEFIAWLV









Example 6: Design of G Protein-Coupled Receptor Binding Domains Based on Small Molecule Interactions

GPCR binding domains were designed based on interaction surfaces between small molecule ligands that interact with GPCRs. By analyzing multiple GPCR ligands, an amino acid library of Tyr, Pro, Phe, His, and Gly was designed as being able to recapitulate many of the structural contacts of these ligands. An exemplary motif variant that was generated based on these observations comprises the following sequence:











(SEQ ID NO: 83)



sgggg(F, G, H, P, Y)(F, G, H, P, Y)







(F, G, H, P, Y)(F, G, H, P, Y)







(F, G, H, P, Y)(F, G, H, P, Y)







(F, G, H, P, Y)(F, G, H, P, Y).






Example 7: Design of G Protein-Coupled Binding Domains Based on Extracellular Domain Interactions

GPCR binding domains were designed based on interaction surfaces on extracellular domains (ECDs) and extracellular loops (ECLs) of GPCRs. About 2,257 GPCRs from human (356), mouse (369), rat (259), cow (102), pig (60), primate, fish, fly, and over 200 other organisms were analyzed, and it was observed that ECDs provide multiple complementary contacts to other loops and helices of the GPCR at a length of 15 amino acids. Further analysis of the ECLs from the about 2,257 GPCRs and all solved structures of GPCRs demonstrated that the N-terminal ECD1 and ECL2 comprise longer extracellular sequences and provide GPCR extracellular contacts.


Motif variants were then generated based on these sequences. Exemplary variants based on the following sequence from retinoic acid induced protein 3 (GPRC5A) were generated:











(SEQ ID NO: 84)



EYIVLTMNRTNVNVFSELSAPRRNED.













TABLE 10







Amino acid sequences











SEQ





ID





NO.
Variant
Amino Acid Sequence







85
1
YIVLTMNRTNVNVFSELSAPRRNE







86
2
IVLTMNRTNVNVFSELSAPRRN







87
3
VLTMNRTNVNVFSELSAPRR










Example 8: Design of Antibody Scaffolds

To generate scaffolds, structural analysis, repertoire sequencing analysis of the heavy chain, and specific analysis of heterodimer high-throughput sequencing datasets were performed. Each heavy chain was associated with each light chain scaffold. Each heavy chain scaffold was assigned 5 different long CDR-H3 loop options. Each light chain scaffold was assigned 5 different L3 scaffolds. The heavy chain CDR-H3 stems were chosen from the frequently observed long H3 loop stems (10 amino acids on the N-terminus and the C-terminus) found both across individuals and across V-gene segments. The light chain scaffold L3s were chosen from heterodimers comprising long H3s. Direct heterodimers based on information from the Protein Data Bank (PDB) and deep sequencing datasets were used in which CDR H1, H2, L1, L2, L3, and CDR-H3 stems were fixed. The various scaffolds were then formatted for display on phage to assess for expression.


Structural Analysis


About 2,017 antibody structures were analyzed from which 22 structures with long CDR-H3s of at least 25 amino acids in length were observed. The heavy chains included the following: IGHV1-69, IGHV3-30, IGHV4-49, and IGHV3-21. The light chains identified included the following: IGLV3-21, IGKV3-11, IGKV2-28, IGKV1-5, IGLV1-51, IGLV1-44, and IGKV1-13. In the analysis, four heterodimer combinations were observed multiple times including: IGHV4-59/61-IGLV3-21, IGHV3-21-IGKV2-28, IGHV1-69-IGKV3-11, and IGHV1-69-IGKV1-5. An analysis of sequences and structures identified intra-CDR-H3 disulfide bonds in a few structures with packing of bulky side chains such as tyrosine in the stem providing support for long H3 stability. Secondary structures including beta-turn-beta sheets and a “hammerhead” subdomain were also observed.


Repertoire Analysis


A repertoire analysis was performed on 1,083,875 IgM+/CD27−naïve B cell receptor (BCR) sequences and 1,433,011 CD27+ sequences obtained by unbiased 5′RACE from 12 healthy controls. The 12 healthy controls comprised equal numbers of male and female and were made up of 4 Caucasian, 4 Asian, and 4 Hispanic individuals. The repertoire analysis demonstrated that less than 1% of the human repertoire comprises BCRs with CDR-H3s longer than 21 amino acids. A V-gene bias was observed in the long CDR3 subrepertoire, with IGHV1-69, IGHV4-34, IGHV1-18, and IGHV1-8 showing preferential enrichment in BCRs with long H3 loops. A bias against long loops was observed for IGHV3-23, IGHV4-59/61, IGHV5-51, IGHV3-48, IGHV3-53/66, IGHV3-15, IGHV3-74, IGHV3-73, IGHV3-72, and IGHV2-70. The IGHV4-34 scaffold was demonstrated to be autoreactive and had a short half-life.


Viable N-terminal and C-terminal CDR-H3 scaffold variation for long loops were also designed based on the 5′RACE reference repertoire. About 81,065 CDR-H3s of amino acid length 22 amino acids or greater were observed. By comparing across V-gene scaffolds, scaffold-specific H3 stem variation was avoided as to allow the scaffold diversity to be cloned into multiple scaffold references.


Heterodimer Analysis


Heterodimer analysis was performed on scaffolds having sequences as seen in FIGS. 12A-12C. Variant sequences and lengths of the scaffolds were assayed.


Structural Analysis


Structural analysis was performed using GPCR scaffolds of variant sequences and lengths were assayed. See FIG. 13.


Example 9: Generation of GPCR Antibody Libraries

Based on GPCR-ligand interaction surfaces and scaffold arrangements, libraries were designed and de novo synthesized. See Examples 4-8. Referring to FIG. 5, 10 variant sequences were designed for the variable domain, heavy chain 503, 237 variant sequences were designed for the heavy chain complementarity determining region 3507, and 44 variant sequences were designed for the variable domain, light chain 513. The fragments were synthesized as three fragments as seen in FIG. 6 following similar methods as described in Examples 1-3.


Following de novo synthesis, 10 variant sequences were generated for the variable domain, heavy chain 602, 236 variant sequences were generated for the heavy chain complementarity determining region 3604, and 43 variant sequences were designed for a region comprising the variable domain 606, light chain and CDR-L3 and of which 9 variants for variable domain, light chain were designed. This resulted in a library with about 105 diversity (10×236×43). This was confirmed using next generation sequencing (NGS) with 16 million reads. As seen in FIG. 14, the normalized sequencing reads for each of the 10 variants for the variable domain, heavy chain was about 1. As seen in FIG. 15, the normalized sequencing reads for each of the 43 variants for the variable domain, light chain was about 1. As seen in FIG. 16, the normalized sequencing reads for 236 variant sequences for the heavy chain complementarity determining region 3 were about 1.


The various light and heavy chains were then tested for expression and protein folding. Referring to FIGS. 17A-17D, the 10 variant sequences for variable domain, heavy chain included the following: IGHV1-18, IGHV1-69, IGHV1-8 IGHV3-21, IGHV3-23, IGHV3-30/33rn, IGHV3-28, IGHV3-74, IGHV4-39, and IGHV4-59/61. Of the 10 variant sequences, IGHV1-18, IGHV1-69, and IGHV3-30/33rn exhibited improved characteristics such as improved thermostability. Referring to FIGS. 18A-18F, 9 variant sequences for variable domain, light chain included the following: IGKV1-39, IGKV1-9, IGKV2-28, IGKV3-11, IGKV3-15, IGKV3-20, IGKV4-1, IGLV1-51, and IGLV2-14. Of the 9 variant sequences, IGKV1-39, IGKV3-15, IGLV1-51, and IGLV2-14 exhibited improved characteristics such as improved thermostability.


Example 10: Expression of GPCR Antibody Libraries in HEK293 Cells

Following generation of GPCR antibody libraries as in Example 13, about 47 GPCRs were selected for screening. GPCR constructs about 1.8 kb to about 4.5 kb in size were designed in a pCDNA3.1 vector. The GPCR constructs were then synthesized following similar methods as described in Examples 2-4 including hierarchal assembly. Of the 47 GPCR constructs, 46 GPCR constructs were synthesized.


The synthesized GPCR constructs were transfected in HEK293 and assayed for expression using immunofluorescence. Referring to FIGS. 19A-19C, HEK293 cells were transfected with the GPCR constructs comprising an N-terminally hemagglutinin (HA)-tagged human Y1 receptor. Following 24-48 hours of transfection, cells were washed with phosphate buffered saline (PBS) and fixed with 4% paraformaldehyde. Cells were stained using fluorescent primary antibody directed towards the HA tag or secondary antibodies comprising a fluorophore and DAPI to visualize the nuclei in blue. Referring to FIGS. 19A-19C, human Y1 receptor was visualized on the cell surface in non-permeabilized cells and on the cell surface and intracellularly in permeabilized cells.


GPCR constructs were also visualized by designing GPCR constructs comprising auto-fluorescent proteins. Referring to FIGS. 20A-20C, human Y1 receptor comprised EYFP fused to its C-terminus, and human Y5 receptor comprised ECFP fused to its C-terminus. HEK293 cells were transfected with human Y1 receptor or co-transfected with human Y1 receptor and human Y5 receptor. Following transfection cells were washed and fixed with 4% paraformaldehyde. Cells were stained with DAPI. Localization of human Y1 receptor and human Y5 receptor were visualized by fluorescence microscopy.


Example 11: Design of Immunoglobulin Library

An immunoglobulin scaffold library was designed for placement of GPCR binding domains and for improving stability for a range of GPCR binding domain encoding sequences. The immunoglobulin scaffold included a VH domain attached with a VL domain with a linker. Variant nucleic acid sequences were generated for the framework elements and CDR elements of the VH domain and VL domain. The structure of the design is shown in FIG. 21A. A full domain architecture is shown in FIG. 21B. Sequences for the leader, linker, and pIII are listed in Table 11.









TABLE 11







Nucleotide sequences











SEQ





ID NO
Domain
Sequence







88
Leader
GCAGCCGCTGGCTTGCTGCT





GCTGGCAGCTCAGCCGGCCA





TGGCC







89
Linker
GCTAGCGGTGGAGGCGGTTC





AGGCGGAGGTGGCTCTGGCG





GTGGCGGATCGCATGCATCC







90
pIII
CGCGCGGCCGCTGGAAGCGG





CTCCCACCATCACCATCACC





AT










The VL domains that were designed include IGKV1-39, IGKV3-15, IGLV1-51, and IGLV2-14. Each of four VL domains were assembled with their respective invariant four framework elements (FW1, FW2, FW3, FW4) and variable 3 CDR (L1, L2, L3) elements. For IGKV1-39, there was 490 variants designed for L1, 420 variants designed for L2, and 824 variants designed for L3 resulting in a diversity of 1.7×108 (490*420*824). For IGKV3-15, there was 490 variants designed for L1, 265 variants designed for L2, and 907 variants designed for L3 resulting in a diversity of 1.2×108 (490*265*907). For IGLV1-51, there was 184 variants designed for L1, 151 variants designed for L2, and 824 variants designed for L3 resulting in a diversity of 2.3×107 (184*151*824). IGLV2-14, 967 variants designed for L1, 535 variants designed for L2, and 922 variants designed for L3 resulting in a diversity of 4.8 108 (967*535*922). Table 12 lists the amino acid sequences and nucleotide sequences for the four framework elements (FW1, FW2, FW3, FW4) for IGLV1-51. Table 13 lists the variable 3 CDR (L1, L2, L3) elements for IGLV1-51. Variant amino acid sequences and nucleotide sequences for the four framework elements (FW1, FW2, FW3, FW4) and the variable 3 CDR (L1, L2, L3) elements were also designed for IGKV1-39, IGKV3-15, and IGLV2-14.









TABLE 12







Sequences for IGLV1-51 framework elements












SEQ

SEQ




ID
Amino Acid
ID
Nucleotide


Element
NO
Sequence
NO
Sequence










IGLV1-51











FW1
91
QSVLTQPPSV
92
CAGTCTGTGT




SAAPGQKVTI

TGACGCAGCC




SC

GCCCTCAGTG






TCTGCGGCCC






CAGGACAGAA






GGTCACCATC






TCCTGC





FW2
93
WYQQLPGTAP
94
TGGTATCAGC




KLLIY

AGCTCCCAGG






AACAGCCCCC






AAACTCCTCA






TTTAT





FW3
95
GIPDRFSGSK
96
GGGATTCCTG




SGTSATLGIT

ACCGATTCTC




GLQTGDEADY

TGGCTCCAAG




Y

TCTGGCACGT






CAGCCACCCT






GGGCATCACC






GGACTCCAGA






CTGGGGACGA






GGCCGATTAT






TAC





FW4
97
GGGTKLTVL
98
GGCGGAGGGA






CCAAGCTGAC






CGTCCTA
















TABLE 13







Sequences for IGLV1-51 CDR elements










SEQ
Amino Acid
SEQ



ID NO
Sequence
ID NO
Nucleotide Sequence










IGLV1-51-L1










99
SGSSSNIGS
282
TCTGGAAGCAGCTCCAACATTGGGAGTAATCATGTA



NHVS

TCC





100
SGSSSNIGN
283
TCTGGAAGCAGCTCCAACATTGGGAATAATTATCTA



NYLS

TCC





101
SGSSSNIAN
284
TCTGGAAGCAGCTCCAACATTGCGAATAATTATGTA



NYVS

TCC





102
SGSSPNIGN
285
TCTGGAAGCAGCCCCAACATTGGGAATAATTATGTA



NYVS

TCG





103
SGSRSNIGS
286
TCTGGAAGCAGATCCAATATTGGGAGTAATTATGTT



NYVS

TCG





104
SGSSSNVG
287
TCTGGAAGCAGCTCCAACGTTGGCGATAATTATGTT



DNYVS

TCC





105
SGSSSNIGIQ
288
TCTGGAAGCAGCTCCAACATTGGGATTCAATATGTA



YVS

TCC





106
SGSSSNVG
289
TCTGGAAGCAGCTCCAATGTTGGTAACAATTTTGTCT



NNFVS

CC





107
SGSASNIGN
290
TCTGGAAGCGCCTCCAACATTGGGAATAATTATGTA



NYVS

TCC





108
SGSGSNIGN
291
TCTGGAAGCGGCTCCAATATTGGGAATAATGATGTG



NDVS

TCC





109
SGSISNIGN
292
TCTGGAAGCATCTCCAACATTGGTAATAATTATGTA



NYVS

TCC





110
SGSISNIGK
293
TCTGGAAGCATCTCCAACATTGGGAAAAATTATGTG



NYVS

TCG





111
SGSSSNIGH
294
TCTGGAAGCAGCTCCAACATTGGGCATAATTATGTA



NYVS

TCG





112
PGSSSNIGN
295
CCTGGAAGCAGCTCCAACATTGGGAATAATTATGTA



NYVS

TCC





113
SGSTSNIGI
296
TCTGGAAGCACCTCCAACATTGGAATTCATTATGTA



HYVS

TCC





114
SGSSSNIGS
297
TCTGGAAGCAGCTCCAACATTGGCAGTCATTATGTT



HYVS

TCC





115
SGSSSNIGN
298
TCCGGAAGCAGCTCCAACATTGGAAATGAATATGTA



EYVS

TCC





116
SGSTSNIGN
299
TCTGGAAGCACCTCCAACATTGGAAATAATTATATA



NYIS

TCG





117
SGSSSNIGN
300
TCTGGAAGCAGCTCCAATATTGGGAATCATTTTGTA



HFVS

TCG





118
SGSSSNIGN
301
TCTGGAAGCAGCTCCAACATTGGGAATAATTATGTG



NYVA

GCC





119
SGSSSNIGS
302
TCTGGAAGCAGCTCCAACATTGGAAGTTATTATGTA



YYVS

TCC





120
SGSGFNIGN
303
TCTGGAAGTGGTTTCAACATTGGGAATAATTATGTC



NYVS

TCT





121
SGSTSNIGN
304
TCTGGAAGCACCTCCAACATTGGGAATAATTATGTG



NYVS

TCC





122
SGSSSDIGN
305
TCTGGAAGCAGCTCCGACATTGGCAATAATTATGTA



NYVS

TCC





123
SGSSSNIGN
306
TCTGGAAGCAGCTCCAACATTGGGAATAATGTTGTA



NVVS

TCC





124
SGSKSNIGK
307
TCTGGAAGCAAGTCTAACATTGGGAAAAATTATGTA



NYVS

TCC





125
SGSSTNIGN
308
TCTGGAAGCAGCACCAACATTGGGAATAATTATGTA



NYVS

TCC





126
SGSISNIGD
309
TCTGGAAGCATCTCCAACATTGGGGATAATTATGTA



NYVS

TCC





127
SGSSSNIGS
310
TCTGGAAGCAGCTCCAACATTGGGAGTAAGGATGTA



KDVS

TCA





128
SGSSSNIEN
311
TCTGGAAGCAGCTCCAACATTGAGAATAATGATGTA



NDVS

TCG





129
SGSSSNIGN
312
TCTGGAAGCAGCTCCAACATTGGGAATCATTATGTA



HYVS

TCC





130
SGSSSNIGK
313
TCTGGAAGCAGCTCCAACATTGGGAAGGATTTTGTC



DFVS

TCC





131
SGSTSNIGS
314
TCTGGCAGTACTTCCAACATCGGAAGTAATTTTGTTT



NFVS

CC





132
SGSTSNIGH
315
TCTGGAAGCACCTCCAACATTGGGCATAATTATGTA



NYVS

TCC





133
SASSSNIGN
316
TCTGCAAGCAGCTCCAACATTGGGAATAATTATGTA



NYVS

TCC





134
SGSSSSIGN
317
TCTGGAAGCAGCTCCAGCATTGGCAATAATTATGTA



NYVS

TCC





135
SGSSSTIGN
318
TCTGGAAGCAGCTCCACCATTGGGAATAATTATGTA



NYVS

TCC





136
SGSSSNIEN
319
TCTGGAAGCAGCTCCAACATTGAAAATAATTATGTA



NYVS

TCC





137
SGSSSNIGN
320
TCTGGAAGCAGCTCCAACATTGGGAATCAGTATGTA



QYVS

TCC





138
SGSSSNIGN
321
TCTGGAAGCAGCTCCAACATTGGGAATAATTATGTA



NYVF

TTC





139
SGSSSNIGR
322
TCTGGAAGCAGCTCCAACATTGGGAGGAATTATGTC



NYVS

TCC





140
SGGSSNIGN
323
TCTGGAGGCAGCTCCAACATTGGAAATTATTATGTA



YYVS

TCG





141
SGSSSNIGD
324
TCTGGAAGCAGCTCCAACATTGGAGATAATTATGTC



NYVS

TCC





142
SGGSSNIGI
325
TCTGGAGGCAGCTCCAACATTGGAATTAATTATGTA



NYVS

TCC





143
SGGSSNIGK
326
TCTGGAGGCAGCTCCAACATTGGGAAGAATTATGTA



NYVS

TCC





144
SGSSSNIGK
327
TCTGGAAGCAGCTCCAACATTGGGAAGAGATCTGTA



RSVS

TCG





145
SGSRSNIGN
328
TCTGGAAGCAGATCCAACATTGGGAATAACTATGTA



NYVS

TCC





146
SGSSSNIGN
329
TCGGGAAGCAGCTCCAACATTGGGAATAATCTTGTT



NLVS

TCC





147
SGSSSNIGIN
330
TCTGGAAGCAGCTCCAACATTGGGATCAATTATGTA



YVS

TCC





148
SGSSSNIGN
331
TCTGGAAGCAGCTCCAACATCGGGAATAATTTTGTA



NFVS

TCC





149
SGTSSNIGR
332
TCTGGAACCAGCTCCAACATTGGCAGAAATTTTGTA



NFVS

TCC





150
SGRRSNIGN
333
TCTGGAAGGAGGTCCAACATTGGAAATAATTATGTG



NYVS

TCC





151
SGGSFNIGN
334
TCTGGAGGCAGCTTCAATATTGGGAATAATTATGTA



NYVS

TCC





152
SGSTSNIGE
335
TCTGGAAGCACTTCCAACATTGGGGAGAATTATGTG



NYVS

TCC





153
SGSSSNIGS
336
TCTGGAAGCAGCTCCAATATTGGGAGTGATTATGTA



DYVS

TCC





154
SGTSSNIGS
337
TCTGGAACCAGCTCCAACATTGGGAGTAATTATGTA



NYVS

TCC





155
SGSSSNIGT
338
TCTGGAAGCAGCTCCAACATTGGGACTAATTTTGTA



NFVS

TCC





156
SGSSSNFGN
339
TCTGGAAGCAGCTCCAACTTTGGGAATAATTATGTA



NYVS

TCC





157
SGSTSNIGN
340
TCTGGAAGCACCTCCAACATTGGGAATAATCATGTA



NHVS

TCC





158
SGSSSNIGN
341
TCTGGAAGCAGCTCCAACATTGGGAATGATTTTGTA



DFVS

TCC





159
SGSSSDIGD
342
TCTGGAAGCAGCTCCGACATTGGCGATAATTATGTG



NYVS

TCC





160
SGSSSNIGK
343
TCTGGAAGCAGCTCCAACATTGGGAAATATTATGTA



YYVS

TCC





161
SGSSSNIGG
344
TCTGGAAGCAGCTCCAACATTGGCGGTAATTATGTA



NYVS

TCC





162
SGSSSNTGN
345
TCTGGAAGCAGCTCCAACACTGGGAATAATTATGTA



NYVS

TCC





163
SGSSSNVG
346
TCTGGAAGCAGCTCCAACGTTGGGAATAATTATGTG



NNYVS

TCT





164
SGSSSNIAN
347
TCTGGAAGCAGCTCCAACATTGCGAATAATTTTGTA



NFVS

TCC





165
SGSSSNIGN
348
TCTGGAAGCAGCTCCAACATTGGGAATGATTATGTA



DYVS

TCC





166
SGSTSNIEN
349
TCTGGAAGCACCTCCAATATTGAGAATAATTATGTT



NYVS

TCC





167
SGGSSNIGN
350
TCTGGAGGCAGCTCCAATATTGGCAATAATGATGTG



NDVS

TCC





168
SGSTSNIGN
351
TCTGGAAGCACCTCCAACATTGGGAATCATTATGTA



HYVS

TCC





169
SGSSSNIGD
352
TCAGGAAGCAGCTCCAATATTGGGGATAATGATGTA



NDVS

TCC





170
SGYSSNIGN
353
TCTGGATACAGCTCCAACATTGGGAATAATTATGTA



NYVS

TCC





171
SGSGSNIGN
354
TCTGGAAGCGGCTCCAACATTGGAAATAATTTTGTA



NFVS

TCC





172
SGSSSNIWN
355
TCTGGAAGCAGCTCCAACATTTGGAATAATTATGTA



NYVS

TCC





173
FGSSSNIGN
356
TTTGGAAGCAGCTCCAACATTGGGAATAATTATGTA



NYVS

TCC





174
SGSSSNIEK
357
TCTGGAAGCAGCTCCAACATTGAGAAGAATTATGTA



NYVS

TCC





175
SGSRSNIGN
358
TCTGGAAGTAGATCCAATATTGGAAATTATTATGTA



YYVS

TCC





176
SGTKSNIGN
359
TCTGGAACCAAGTCAAACATTGGGAATAATTATGTA



NYVS

TCT





177
SGSTSNIGN
360
TCTGGAAGCACCTCCAACATTGGGAATTATTATGTA



YYVS

TCC





178
SGTSSNIGN
361
TCTGGAACCAGCTCCAACATTGGGAATAATTATGTG



NYVA

GCC





179
PGTSSNIGN
362
CCTGGAACCAGCTCCAACATTGGGAATAATTATGTA



NYVS

TCC





180
SGSTSNIGI
363
TCCGGAAGCACCTCCAACATTGGGATTAATTATGTA



NYVS

TCC





181
SGSSSNIGS
364
TCTGGAAGCAGCTCCAACATTGGGAGTAATCTGGTA



NLVS

TCC





182
SGSSSNIEN
365
TCTGGAAGCAGCTCCAACATTGAGAATAATCATGTA



NHVS

TCC





183
SGTRSNIGN
366
TCTGGAACCAGGTCCAACATCGGCAATAATTATGTT



NYVS

TCG





184
SGSTSNIGD
367
TCTGGAAGCACCTCCAACATTGGGGACAATTATGTT



NYVS

TCC





185
SGGSSNIGK
368
TCTGGAGGCAGTTCCAACATTGGGAAGAATTTTGTA



NFVS

TCC





186
SGSRSDIGN
369
TCTGGAAGCAGGTCCGACATTGGGAATAATTATGTA



NYVS

TCC





187
SGTSSNIGN
370
TCTGGAACTAGCTCCAACATTGGGAATAATGATGTA



NDVS

TCC





188
SGSSSNIGS
371
TCTGGAAGCAGCTCCAACATTGGGAGTAAATATGTA



KYVS

TCA





189
SGSSFNIGN
372
TCTGGAAGCAGCTTCAACATTGGGAATAATTATGTA



NYVS

TCC





190
SGSSSNIGN
373
TCTGGAAGCAGCTCCAACATTGGGAATACTTATGTA



TYVS

TCC





191
SGSSSNIGD
374
TCTGGAAGCAGCTCCAATATTGGGGATAATCATGTA



NHVS

TCC





192
SGSSSNIGN
375
TCTGGAAGCAGCTCCAACATTGGCAATAATCATGTT



NHVS

TCC





193
SGSTSNIGN
376
TCTGGAAGCACCTCCAACATTGGGAATAATGATGTA



NDVS

TCC





194
SGSRSNVG
377
TCTGGAAGCAGATCCAACGTTGGCAATAATTATGTT



NNYVS

TCA





195
SGGTSNIGK
378
TCCGGAGGCACCTCCAACATTGGGAAGAATTATGTG



NYVS

TCT





196
SGSSSNIAD
379
TCTGGAAGCAGCTCCAACATTGCCGATAATTATGTT



NYVS

TCC





197
SGSSSNIGA
380
TCTGGAAGCAGCTCCAACATTGGCGCCAATTATGTA



NYVS

TCC





198
SGSSSNIGS
381
TCTGGAAGCAGCTCCAACATTGGGAGTAATTATGTG



NYVA

GCC





199
SGSSSNIGN
382
TCTGGAAGCAGCTCCAACATTGGGAACAATTTTCTC



NFLS

TCC





200
SGRSSNIGK
383
TCTGGAAGAAGCTCCAACATTGGGAAGAATTATGTA



NYVS

TCC





201
SGSSPNIGA
384
TCTGGAAGCAGCCCCAACATTGGGGCTAATTATGTA



NYVS

TCC





202
SGSSSNIGP
385
TCCGGAAGCAGCTCCAACATTGGGCCTAATTATGTG



NYVS

TCC





203
SGSSSTIGN
386
TCTGGAAGCAGCTCCACCATTGGGAATAATTATATA



NYIS

TCC





204
SGSSSNIGN
387
TCTGGAAGCAGCTCCAACATTGGGAATTATTTTGTA



YFVS

TCC





205
SGSRSNIGN
388
TCTGGAAGCCGCTCCAACATTGGTAATAATTTTGTAT



NFVS

CC





206
SGGSSNIGS
389
TCTGGAGGCAGCTCCAACATTGGGAGTAATTTTGTA



NFVS

TCC





207
SGSSSNIGY
390
TCTGGAAGCAGCTCCAACATTGGGTATAATTATGTA



NYVS

TCC





208
SGTSSNIEN
391
TCTGGAACCAGCTCGAACATTGAGAACAATTATGTA



NYVS

TCC





209
SGSSSNIGN
392
TCTGGAAGTAGCTCCAACATTGGGAATTATTATGTA



YYVS

TCC





210
SGSTSNIGK
393
TCTGGAAGCACCTCCAACATTGGGAAGAATTATGTA



NYVS

TCC





211
SGSSSNIGT
394
TCTGGAAGCAGTTCCAACATTGGGACTTATTATGTCT



YYVS

CT





212
SGSSSNVG
395
TCTGGAAGCAGCTCCAACGTTGGGAAAAATTATGTA



KNYVS

TCT





213
SGSTSNIGD
396
TCTGGAAGCACCTCCAACATTGGGGATAATTTTGTA



NFVS

TCC





214
SGSTSNIGT
397
TCTGGAAGCACCTCCAACATTGGAACTAATTATGTT



NYVS

TCC





215
SGGTSNIGN
398
TCTGGAGGTACTTCCAACATTGGGAATAATTATGTC



NYVS

TCC





216
SGSYSNIGN
399
TCTGGAAGCTACTCCAATATTGGGAATAATTATGTA



NYVS

TCC





217
SGSSSNIED
400
TCTGGAAGCAGCTCCAACATTGAAGATAATTATGTA



NYVS

TCC





218
SGSSSNIGK
401
TCTGGAAGCAGCTCCAACATTGGGAAACATTATGTA



HYVS

TCC





219
SGSGSNIGS
402
TCCGGTTCCGGCTCAAACATTGGAAGTAATTATGTC



NYVS

TCC





220
SGSSSNIGN
403
TCTGGAAGCAGCTCCAACATTGGAAATAATTATATA



NYIS

TCA





221
SGASSNIGN
404
TCTGGAGCCAGTTCCAACATTGGGAATAATTATGTT



NYVS

TCC





222
SGRTSNIGN
405
TCTGGACGCACCTCCAACATCGGGAACAATTATGTA



NYVS

TCC





223
SGGSSNIGS
406
TCTGGAGGCAGCTCCAATATTGGGAGTAATTACGTA



NYVS

TCC





224
SGSGSNIGN
407
TCTGGAAGCGGCTCCAACATTGGGAATAATTATGTA



NYVS

TCC





225
SGSTSNIGS
408
TCTGGAAGCACCTCCAACATTGGGAGTAATTATGTA



NYVS

TCC





226
SGSSSSIGN
409
TCTGGAAGCAGCTCCAGCATTGGGAATAATTATGTG



NYVA

GCG





227
SGSSSNLGN
410
TCTGGAAGCAGTTCCAACCTTGGAAATAATTATGTA



NYVS

TCC





228
SGTSSNIGK
411
TCTGGAACCAGCTCCAACATTGGGAAAAATTATGTA



NYVS

TCC





229
SGSSSDIGN
412
TCTGGAAGCAGCTCCGATATTGGGAACAAGTATATA



KYIS

TCC





230
SGSSSNIGS
413
TCTGGAAGCAGCTCCAACATTGGAAGTAATTACATA



NYIS

TCC





231
SGSTSNIGA
414
TCTGGAAGCACCTCCAACATTGGGGCTAACTATGTG



NYVS

TCC





232
SGSSSNIGN
415
TCTGGAAGCAGCTCCAACATTGGGAATAAGTATGTA



KYVS

TCC





233
SGSSSNIGN
416
TCTGGAAGCAGCTCCAACATTGGGAATAATTATGGA



NYGS

TCC





234
SGSTSNIAN
417
TCTGGAAGCACCTCCAACATTGCGAATAATTATGTA



NYVS

TCC





235
SGSYSNIGS
418
TCTGGAAGCTACTCCAATATTGGGAGTAATTATGTA



NYVS

TCC





236
SGSSSNIGS
419
TCTGGAAGCAGCTCCAACATTGGGAGTAATTTTGTA



NFVS

TCC





237
SGSSSNLEN
420
TCTGGAAGCAGCTCCAATCTTGAGAATAATTATGTA



NYVS

TCC





238
SGSISNIGSN
421
TCTGGAAGCATCTCCAATATTGGCAGTAATTATGTA



YVS

TCC





239
SGSSSDIGS
422
TCTGGAAGCAGCTCCGACATTGGGAGTAATTATGTA



NYVS

TCC





240
SGSSSNIGT
423
TCTGGAAGCAGCTCCAACATTGGGACTAATTATGTA



NYVS

TCC





241
SGSSSNIGK
424
TCTGGAAGCAGCTCCAACATTGGGAAGAATTTTGTA



NFVS

TCC





242
SGSSSNIGN
425
TCTGGAAGCAGCTCCAACATTGGGAATAATTTTATA



NFIS

TCC





243
SGGSSNIGN
426
TCTGGAGGCAGCTCCAACATTGGCAATAATTATGTT



NYVS

TCC





244
SGSSSNIGE
427
TCTGGAAGCAGCTCCAACATTGGGGAGAATTATGTA



NYVS

TCC





245
SGSSSNIGN
428
TCTGGAAGCAGCTCCAATATTGGGAATAATTTTGTG



NFVA

GCC





246
SGGSSNIGN
429
TCTGGAGGCAGCTCCAACATTGGGAATAATTATGTA



NYVA

GCC





247
SGSSSHIGN
430
TCTGGAAGCAGCTCCCACATTGGAAATAATTATGTA



NYVS

TCC





248
SGSSSNIGS
431
TCTGGAAGCAGCTCCAATATTGGAAGTAATGATGTA



NDVS

TCG





249
SGSSSNIGN
432
TCTGGAAGCAGCTCCAACATTGGGAATAATTATGTA



NYVT

ACC





250
SGSSSNIGN
433
TCTGGAAGCAGCTCCAACATTGGGAATAATCCTGTA



NPVS

TCC





251
SGGSSNIGN
434
TCTGGAGGCAGCTCCAATATTGGGAATCATTATGTA



HYVS

TCC





252
SGTSSNIGN
435
TCTGGAACCAGCTCCAACATTGGGAATAATTATGTA



NYVS

TCC





253
SGSSSNIGS
436
TCTGGAAGCAGCTCCAACATTGGAAGTAATTATGTC



NYVS

TCG





254
SGGTSNIGS
437
TCTGGAGGCACCTCCAACATTGGAAGTAATTATGTA



NYVS

TCC





255
SGSKSNIGN
438
TCTGGAAGCAAGTCCAACATTGGGAATAATTATGTA



NYVS

TCC





256
SGRSSNIGN
439
TCTGGAAGAAGCTCCAACATTGGGAATAATTATGTA



NYVS

TCG





257
SGSSSNVGS
440
TCTGGAAGCAGCTCCAACGTTGGGAGTAATTATGTT



NYVS

TCC





258
SGSTSNIGN
441
TCTGGAAGCACCTCCAATATTGGGAATAATTTTGTA



NFVS

TCC





259
SGSNFNIGN
442
TCTGGAAGCAACTTCAACATTGGGAATAATTATGTC



NYVS

TCC





260
SGSTSNIGY
443
TCTGGAAGCACCTCCAATATTGGATATAATTATGTA



NYVS

TCC





261
SGSSSNIVS
444
TCTGGAAGCAGCTCCAATATTGTAAGTAATTATGTA



NYVS

TCC





262
SGTSSNIGN
445
TCTGGAACCAGCTCCAACATTGGGAATAATTTTGTA



NFVS

TCC





263
SGSSSNIGR
446
TCTGGAAGCAGCTCCAACATTGGGAGGAATTTTGTG



NFVS

TCC





264
SGTTSNIGN
447
TCTGGAACGACCTCCAACATTGGGAATAATTATGTC



NYVS

TCC





265
SGSSSNIGN
448
TCTGGAAGCAGCTCCAACATTGGGAATAATGATGTA



NDVS

TCC





266
SGSSSNIGN
449
TCTGGAAGCAGCTCCAACATTGGGAATCATGATGTA



HDVS

TCC





267
SGSSSNIGS
450
TCTGGAAGCAGCTCCAACATTGGAAGTAGTCATGTA



SHVS

TCC





268
SGSSSNIGIH
451
TCTGGAAGCAGCTCCAACATTGGGATTCATTATGTA



YVS

TCC





269
SGGGSNIGY
452
TCTGGAGGCGGCTCCAACATTGGCTATAATTATGTC



NYVS

TCC





270
SGSSSNIGD
453
TCTGGAAGCAGCTCCAACATTGGGGATCATTATGTG



HYVS

TCG





271
SGSSSNLGK
454
TCTGGAAGCAGCTCCAACCTTGGGAAGAATTATGTA



NYVS

TCT





272
SGSSSNIGD
455
TCTGGAAGCAGCTCCAACATTGGCGATAATTTTGTA



NFVS

TCC





273
SGSTSNIEK
456
TCTGGAAGCACCTCCAACATTGAGAAAAACTATGTA



NYVS

TCG





274
SGSSSNIGK
457
TCTGGAAGCAGCTCCAACATTGGGAAGGATTATGTA



DYVS

TCC





275
SGSSSNIGK
458
TCTGGAAGCAGCTCCAACATTGGGAAGAATTATGTA



NYVS

TCC





276
SGSSSNIGN
459
TCTGGAAGCAGCTCCAACATTGGGAATAATTATGTA



NYVS

TCC





277
SGSSSNIGN
460
TCTGGAAGCAGCTCCAACATTGGGAATAATTATGCC



NYAS

TCC





278
SGISSNIGN
461
TCTGGAATCAGCTCCAACATTGGGAATAATTATGTA



NYVS

TCC





279
TGSSSNIGN
462
ACTGGAAGCAGCTCCAACATTGGGAATAATTATGTA



NYVS

TCC





280
SGTSSNIGN
463
TCTGGAACCAGCTCCAACATTGGGAATAATCATGTT



NHVS

TCC





281
SGSRSNIGK
464
TCTGGAAGTCGTTCCAACATTGGGAAAAATTATGTA



NYVS

TCC










IGLV1-51-L2










465
DNNKRPP
616
GACAATAATAAGCGACCCCCA





466
ENNRRPS
617
GAGAATAATAGGCGACCCTCA





467
DNNKQPS
618
GACAATAATAAGCAACCCTCA





468
DNNKRPL
619
GACAATAACAAGCGACCCTTG





469
DNDKRPA
620
GACAATGATAAGCGACCCGCA





470
DNHERPS
621
GACAATCATGAGCGACCCTCA





471
ENRKRPS
622
GAAAACCGTAAGCGACCCTCA





472
DNDQRPS
623
GACAATGATCAGCGACCCTCA





473
ENYKRPS
624
GAGAATTATAAGCGACCCTCA





474
ENTKRPS
625
GAAAATACTAAGCGACCCTCA





475
DTEKRPS
626
GACACTGAGAAGAGGCCCTCA





476
DNDKRPP
627
GACAATGATAAGCGACCCCCA





477
DHNKRPS
628
GACCATAATAAGCGACCCTCA





478
GNNERPS
629
GGCAATAATGAGCGACCCTCA





479
DTSKRPS
630
GACACTAGTAAGCGACCCTCA





480
EYNKRPS
631
GAATATAATAAGCGCCCCTCA





481
ENIKRPS
632
GAAAATATTAAGCGACCCTCA





482
DNVKRPS
633
GACAATGTTAAGCGACCCTCA





483
ENDKRSS
634
GAAAACGATAAACGATCCTCA





484
ENNKRHS
635
GAAAATAATAAGCGACACTCA





485
GNDQRPS
636
GGAAATGATCAGCGACCCTCA





486
DNDRRPS
637
GACAATGATAGGCGACCCTCA





487
DNHKRPS
638
GACAATCATAAGCGGCCCTCA





488
DNNDRPS
639
GACAATAATGACCGACCCTCA





489
ENNQRPS
640
GAGAATAATCAGCGACCCTCA





490
DNNQRPS
641
GACAATAATCAGCGACCCTCA





491
ENVKRPS
642
GAGAATGTTAAGCGACCCTCA





492
DTYKRPS
643
GACACTTATAAGAGACCCTCA





493
NNNNRPS
644
AACAATAATAACCGACCCTCA





494
GNNNRPS
645
GGCAATAATAATCGACCCTCA





495
ENDQRPS
646
GAAAATGATCAGCGACCCTCA





496
DNNKRAS
647
GACAATAATAAGCGAGCCTCA





497
DNDKRPL
648
GACAATGATAAGCGACCCTTA





498
DTDERPS
649
GACACTGATGAGCGACCTTCA





499
DNRKRPS
650
GACAATAGGAAGCGACCCTCA





500
DNDARPS
651
GACAATGATGCTCGACCCTCA





501
DNNKRLS
652
GACAATAATAAGCGACTCTCA





502
DNDKRAS
653
GACAATGATAAGCGAGCCTCA





503
DNTERPS
654
GACAATACTGAGCGACCCTCA





504
DNNIRPS
655
GACAATAATATTCGACCCTCA





505
DNKRRPS
656
GACAATAAGAGGCGACCCTCA





506
DDNNRPS
657
GACGATAATAACCGACCCTCA





507
ANNRRPS
658
GCGAATAATCGACGACCCTCA





508
DNDKRLS
659
GACAATGATAAGCGACTGTCA





509
DNNKRPA
660
GACAATAATAAGCGACCCGCA





510
DNYRRPS
661
GACAATTATAGACGTCCCTCA





511
ANDQRPS
662
GCCAATGATCAGCGACCCTCA





512
DNDKRRS
663
GACAATGATAAGCGACGCTCA





513
DKNERPS
664
GACAAGAATGAGCGACCCTCA





514
DNKERPS
665
GACAATAAGGAGCGACCCTCA





515
DNNKGPS
666
GACAATAATAAGGGACCCTCA





516
ENDRRPS
667
GAAAATGATAGACGACCCTCA





517
ENDERPS
668
GAAAATGATGAGCGACCCTCA





518
QNNKRPS
669
CAAAATAATAAGCGACCCTCA





519
DNRERPS
670
GACAATCGTGAGCGACCCTCA





520
DNNRRPS
671
GACAATAATAGACGACCCTCA





521
GNNRRPS
672
GGAAATAATAGGCGACCCTCA





522
DNDNRPS
673
GACAATGATAACCGACCCTCA





523
EDNKRPS
674
GAAGATAATAAGCGACCCTCA





524
DDDERPS
675
GACGATGATGAGCGGCCCTCA





525
ASNKRPS
676
GCAAGTAATAAGCGACCCTCA





526
DNNKRSS
677
GACAATAATAAGCGATCCTCA





527
QNNERPS
678
CAAAATAATGAGCGACCCTCA





528
DDDRRPS
679
GACGATGATAGGCGACCCTCA





529
NNDKRPS
680
AACAATGATAAGCGACCCTCA





530
DNNNRPS
681
GACAATAATAACCGACCCTCA





531
DNNVRPS
682
GACAATAATGTGCGACCCTCA





532
ENNERPS
683
GAAAATAATGAGCGACCCTCA





533
DNNHRPS
684
GACAATAATCACCGACCCTCA





534
DNDERPS
685
GACAATGATGAGCGCCCCTCG





535
DNIRRPS
686
GACAATATCCGGCGACCCTCA





536
DFNKRPS
687
GACTTTAATAAGCGACCCTCA





537
ETNKRPS
688
GAAACTAATAAGCGACCCTCA





538
NDNKRPS
689
AACGATAATAAGCGACCCTCA





539
DDNKRPS
690
GACGATAATAAGCGACCCTCA





540
DNYKRPS
691
GACAATTATAAGCGACCCTCA





541
HNNKRPS
692
CACAATAATAAGCGACCCTCA





542
DNHQRPS
693
GACAATCATCAGCGACCCTCA





543
DNYKRAS
694
GACAATTATAAGCGAGCCTCA





544
DNIKRPS
695
GACAATATTAAGCGACCCTCA





545
DTHKRPS
696
GACACTCATAAGCGACCCTCA





546
DTNRRPS
697
GACACTAATAGGCGACCCTCT





547
DTNQRPS
698
GACACTAATCAGCGACCCTCA





548
ESDKRPS
699
GAAAGTGATAAGCGACCCTCA





549
DNDKRSS
700
GACAATGATAAGCGATCTTCG





550
GSNKRPS
701
GGCAGTAATAAGCGACCCTCA





551
DNNKRVS
702
GACAATAACAAGCGAGTTTCA





552
NNNRRPS
703
AACAATAATAGGCGACCCTCA





553
DNFKRPS
704
GACAATTTTAAGCGACCCTCA





554
ENDKRPS
705
GAAAATGATAAACGACCCTCA





555
ENNKRLS
706
GAAAATAATAAGCGACTCTCA





556
ADNKRPS
707
GCAGATAATAAGCGACCCTCA





557
EDNERPS
708
GAAGATAATGAGCGCCCCTCA





558
DTDQRPS
709
GACACTGATCAGCGACCCTCA





559
DNYQRPS
710
GACAATTATCAGCGACCCTCA





560
DENKRPS
711
GACGAGAATAAGCGACCCTCA





561
DTNKRPS
712
GACACTAATAAGCGACCCTCA





562
DDYRRPS
713
GACGATTATCGGCGACCCTCA





563
DNDKRHS
714
GACAACGATAAGCGGCACTCA





564
ENDNRPS
715
GAAAATGATAATCGACCCTCA





565
DDNERPS
716
GACGATAATGAGCGCCCCTCA





566
DNKKRPS
717
GACAATAAGAAGCGACCCTCA





567
DVDKRPS
718
GACGTTGATAAGCGACCCTCA





568
ENKKRPS
719
GAAAATAAAAAACGACCCTCT





569
VNDKRPS
720
GTCAATGATAAGCGACCCTCA





570
DNDHRPS
721
GACAATGATCACCGACCCTCA





571
DINKRPS
722
GACATTAATAAGCGACCCTCA





572
ANNERPS
723
GCCAATAATGAGCGACCCTCA





573
DNENRPS
724
GACAATGAAAACCGACCGTCA





574
GDDKRPS
725
GGCGATGATAAGCGACCCTCA





575
ANNQRPS
726
GCCAATAATCAGCGACCTTCA





576
DDDKRPS
727
GACGATGATAAGCGACCCTCA





577
YNNKRPS
728
TACAATAATAAGCGGCCCTCA





578
EDDKRPS
729
GAAGATGATAAGCGACCCTCA





579
ENNNRPS
730
GAAAACAATAACCGACCCTCG





580
DNNLRPS
731
GACAATAATCTGCGACCCTCA





581
ESNKRPS
732
GAGAGTAACAAGCGACCCTCA





582
DTDKRPS
733
GACACTGATAAGCGGCCCTCA





583
DDDQRPS
734
GACGATGATCAGCGACCCTCA





584
VNNKRPS
735
GTGAATAATAAGAGACCCTCC





585
DDYKRPS
736
GACGATTATAAGCGACCCTCA





586
DNTKRPS
737
GACAATACTAAGCGACCCTCA





587
DDTERPS
738
GACGATACTGAGCGACCCTCA





588
GNDKRPS
739
GGCAATGATAAGCGACCCTCA





589
DNEKRPS
740
GACAATGAAAAGCGACCCTCA





590
DNDDRPS
741
GACAATGATGACCGACCCTCA





591
DDNRRPS
742
GACGATAATAGGCGTCCCTCA





592
GNNKRPS
743
GGCAATAATAAGCGACCCTCA





593
ANDKRPS
744
GCCAATGATAAGCGACCCTCA





594
DNNKRHS
745
GACAATAATAAGCGACACTCA





595
DDNQRPS
746
GACGACAATCAGCGACCCTCA





596
GNDRRPS
747
GGCAATGATAGGCGACCCTCA





597
DNHNRPS
748
GACAATCATAACCGACCCTCA





598
DNYERPS
749
GACAATTATGAGCGACCCTCA





599
ENNKRSS
750
GAAAATAATAAGCGATCCTCA





600
DDHKRPS
751
GACGATCATAAGCGGCCCTCA





601
DNNKRRS
752
GACAATAATAAACGACGTTCA





602
DNDKRPS
753
GACAATGATAAGCGACCGTCA





603
DKNKRPS
754
GACAAGAATAAGCGACCCTCA





604
DNNKRPS
755
GACAATAATAAGCGACCCTCA





605
DIDKRPS
756
GACATTGATAAGCGACCCTCA





606
DDKKRPS
757
GACGATAAGAAGCGACCCTCA





607
ANNKRPS
758
GCCAATAATAAGCGACCCTCA





608
DNDKGPS
759
GACAATGATAAGGGACCCTCA





609
EDNRRPS
760
GAAGATAATAGGCGACCCTCA





610
ENNKRPS
761
GAGAATAATAAGCGACCCTCA





611
NNNKRPS
762
AACAATAATAAGCGACCCTCA





612
DNNERPS
763
GACAATAATGAGCGACCCTCA





613
DNIQRPS
764
GACAATATTCAGCGACCCTCA





614
DNNYRPS
765
GACAATAATTACCGACCCTCA





615
DNYNRPS
766
GACAATTATAACCGACCCTCA










IGLV1-51-L3










767
CGTWDTSL
1591
TGCGGAACATGGGATACCAGCCTGAGTGCTGTGGTG



SAVVF

TTC





768
CGTWDTSL
1592
TGCGGAACATGGGATACCAGCCTGAGTGCTGGGGTG



SAGVF

TTC





769
CGTWDTSL
1593
TGCGGAACATGGGATACCAGCCTGAGTGCTTGGGTG



SAWVF

TTC





770
CGTWDRSL
1594
TGCGGAACATGGGATAGGAGCCTGAGTGCGGGGGT



SAGVF

GTTC





771
CGTWDRSL
1595
TGCGGAACATGGGATAGGAGCCTGAGTGCTTGGGTA



SAWVF

TTT





772
CGTWDTSL
1596
TGCGGAACATGGGATACCAGCCTGAGTGGTGGGGTG



SGGVF

TTC





773
CGTWDTSL
1597
TGCGGAACATGGGATACTAGCCTGCGTGCTGGCGTC



RAGVF

TTC





774
CGTWDRSL
1598
TGCGGAACATGGGATAGGAGCCTGAGTGTTTGGGTG



SVWVF

TTC





775
CGTWDTSL
1599
TGCGGAACATGGGATACCAGTCTGAGTGTTGTGGTC



SVVVF

TTC





776
CGTWDTSL
1600
TGCGGAACGTGGGATACCAGCCTGAGTGCTGCGGTG



SAAVF

TTC





777
CGAWDTSL
1601
TGCGGAGCATGGGATACCAGCCTGAGTGCTGGAGTG



SAGVF

TTC





778
CATWDTSL
1602
TGCGCAACATGGGATACCAGCCTGAGTGCTGTGGTA



SAVVF

TTC





779
CATWDTSL
1603
TGCGCAACATGGGATACCAGCCTGAGTGCTGGTGTG



SAGVF

TTC





780
CGTWESSL
1604
TGTGGAACATGGGAGAGCAGCCTGAGTGCTTGGGTG



SAWVF

TTC





781
CGTWDTTL
1605
TGCGGAACATGGGATACCACCCTGAGTGCGGGTGTC



SAGVF

TTC





782
CGTWDTSL
1606
TGCGGAACATGGGATACTAGCCTGAGTGTGTGGGTG



SVWVF

TTC





783
CGTWDTSL
1607
TGCGGAACATGGGATACTAGCCTGAGTGTTGGGGTG



SVGVF

TTC





784
CGTWDTSL
1608
TGCGGAACATGGGACACCAGTCTGAGCACTGGCGTC



STGVF

TTC





785
CGTWDTSL
1609
TGCGGAACATGGGATACCAGCCTGAGTGGTGTGGTC



SGVVF

TTC





786
CGTWDTSL
1610
TGCGGAACATGGGATACCAGCCTGAGTGCTTATGTC



SAYVF

TTC





787
CGTWDTSL
1611
TGCGGAACATGGGATACCAGCCTGAGTGCTGAGGTG



SAEVF

TTC





788
CGTWDTGL
1612
TGCGGAACATGGGATACCGGCCTGAGTGCTGGGGTA



SAGVF

TTC





789
CGTWDRSL
1613
TGCGGAACGTGGGATAGGAGCCTGAGTGCTTATGTC



SAYVF

TTC





790
CGTWDRSL
1614
TGCGGAACATGGGATAGGAGCCTCAGTGCCGTGGTA



SAVVF

TTC





791
CGTWDNTL
1615
TGCGGAACATGGGATAACACCCTGAGTGCGTGGGTG



SAWVF

TTC





792
CGTWDNRL
1616
TGCGGAACATGGGATAACAGGCTGAGTGCTGGGGT



SAGVF

GTTC





793
CGTWDISLS
1617
TGCGGAACATGGGACATCAGCCTGAGTGCTTGGGTG



AWVF

TTC





794
CGTWHSSL
1618
TGCGGAACATGGCATAGCAGCCTGAGTGCTGGGGTA



SAGVF

TTC





795
CGTWGSSL
1619
TGCGGAACATGGGGTAGCAGTTTGAGTGCTTGGGTG



SAWVF

TTC





796
CGTWESSL
1620
TGCGGAACATGGGAGAGCAGCCTGAGTGGTTGGGT



SGWVF

GTTC





797
CGTWESSL
1621
TGCGGAACATGGGAGAGCAGCCTGAGTGCTGTGGTT



SAVVF

TTC





798
CGTWDYSL
1622
TGCGGAACATGGGATTACAGCCTGAGTGCTGTGGTA



SAVVF

TTC





799
CGTWDYSL
1623
TGCGGAACATGGGATTACAGCCTGAGTGCTGGGGTA



SAGVF

TTC





800
CGTWDVSL
1624
TGCGGAACATGGGATGTCAGCCTGAGTGTTGGAGTG



SVGVF

TTC





801
CGTWDTTL
1625
TGCGGAACATGGGATACCACCCTGAGTGCTGTGGTT



SAVVF

TTC





802
CGTWDTTL
1626
TGCGGAACATGGGATACCACTCTGAATATTGGGGTG



NIGVF

TTC





803
CGTWDTSL
1627
TGCGGAACATGGGATACCAGCCTGACTGCTGTGGTA



TAVVF

TTC





804
CGTWDTSL
1628
TGCGGAACCTGGGATACCAGCCTGACTGCTGCTGTG



TAAVF

TTC





805
CGTWDTSL
1629
TGCGGCACATGGGATACCAGCCTGAGTGTGGGGCTA



SVGLF

TTC





806
CGTWDTSL
1630
TGCGGAACCTGGGATACCAGCCTGAGTGGTAGGGTG



SGRVF

TTC





807
CGTWDTSL
1631
TGCGGAACATGGGATACCAGCCTGAGTGGTGCAGTG



SGAVF

TTC





808
CGTWDTSL
1632
TGCGGAACATGGGATACCAGCCTGAGTGCTGGCCTG



SAGLF

TTC





809
CGTWDTSL
1633
TGCGGAACATGGGATACCAGCCTGAGTGCTGGAGG



SAGGVF

GGTCTTC





810
CGTWDTSL
1634
TGCGGAACATGGGATACCAGCCTGCGTGCTTATGTC



RAYVF

TTC





811
CGTWDTSL
1635
TGCGGAACATGGGATACTAGTTTGCGTGCTTGGGTA



RAWVF

TTC





812
CGTWDTSL
1636
TGCGGAACATGGGATACCAGCCTGAATACTGGGGTA



NTGVF

TTC





813
CGTWDTSL
1637
TGCGGAACATGGGATACCAGCCTGAATATTTGGGTG



NIWVF

TTC





814
CGTWDTSL
1638
TGCGGAACATGGGATACAAGCCTGAATATTGGGGTG



NIGVF

TTC





815
CGTWDTSLI
1639
TGCGGAACATGGGATACCAGCCTGATTGCTGTGGTG



AVVF

TTC





816
CGTWDRSL
1640
TGCGGAACGTGGGATAGGAGCCTGAGTGGTTGGGTG



SGWVF

TTC





817
CGTWDNRL
1641
TGCGGAACATGGGATAACAGGCTGAGTGGTTGGGTG



SGWVF

TTC





818
CGTWDKSL
1642
TGCGGAACGTGGGATAAGAGCCTGAGTGCTGTGGTC



SAVVF

TTC





819
CGTWDKGL
1643
TGCGGAACATGGGATAAAGGCCTGAGTGCTTGGGTG



SAWVF

TTC





820
CGTWDISLS
1644
TGCGGAACATGGGATATCAGCCTGAGTGCTGGGGTG



AGVF

TTC





821
CGTWDESL
1645
TGCGGAACATGGGATGAGAGCCTGAGTGGTGGCGA



SGGEVVF

GGTGGTCTTC





822
CGTWDASL
1646
TGCGGAACATGGGATGCCAGCCTGAGTGCCTGGGTG



SAWVF

TTC





823
CGTWDAGL
1647
TGCGGAACTTGGGATGCCGGCCTGAGTGCTTGGGTG



SAWVF

TTC





824
CGAWDTSL
1648
TGCGGAGCATGGGATACCAGCCTGAGTGCTTGGGTG



SAWVF

TTC





825
CGAWDTSL
1649
TGCGGAGCATGGGATACCAGCCTGAGTGCTGTGGTG



SAVVF

TTC





826
CGAWDTSL
1650
TGCGGAGCATGGGATACCAGCCTGCGTGCTGGGGTT



RAGVF

TTC





827
CATWDTSV
1651
TGCGCAACATGGGATACCAGCGTGAGTGCTTGGGTG



SAWVF

TTC





828
CATWDTSL
1652
TGCGCAACATGGGATACCAGCCTGAGTGCGTGGGTG



SAWVF

TTC





829
CATWDNTL
1653
TGCGCAACATGGGACAACACCCTGAGTGCTGGGGTG



SAGVF

TTC





830
CAAWDRSL
1654
TGCGCAGCATGGGATAGGAGCCTGAGTGTTTGGGTG



SVWVF

TTC





831
CYTWHSSL
1655
TGCTACACATGGCATTCCAGTCTGCGTGGTGGGGTG



RGGVF

TTC





832
CVTWTSSPS
1656
TGCGTAACGTGGACTAGTAGCCCGAGTGCTTGGGTG



AWVF

TTC





833
CVTWRGGL
1657
TGCGTGACATGGCGTGGTGGCCTTGTGTTGTTC



VLF







834
CVTWDTSL
1658
TGCGTAACATGGGATACCAGCCTGACTTCTGTGGTA



TSVVL

CTC





835
CVTWDTSL
1659
TGCGTAACATGGGATACCAGCCTGAGTGTTTATTGG



SVYWVF

GTGTTC





836
CVTWDTSL
1660
TGCGTTACATGGGATACCAGCCTGAGTGCCTGGGTG



SAWVF

TTC





837
CVTWDTDL
1661
TGCGTCACATGGGATACCGACCTCAGCGTTGCGCTC



SVALF

TTC





838
CVTWDRSL
1662
TGCGTAACATGGGATAGGAGCCTGAGTGGTTGGGTG



SGWVF

TTC





839
CVTWDRSL
1663
TGCGTAACATGGGATCGCAGCCTGAGAGAGGTGTTA



REVLF

TTC





840
CVTWDRSL
1664
TGCGTAACATGGGATCGCAGCCTGAGAGCGGTGGTA



RAVVF

TTC





841
CVTWDRSL
1665
TGCGTAACATGGGACAGGAGCCTCGATGCTGGGGTT



DAGVF

TTC





842
CVTWDNTL
1666
TGCGTGACATGGGATAACACCCTGAGTGCTGGGGTC



SAGVF

TTC





843
CVTWDNNL
1667
TGCGTAACATGGGATAACAACCTGTTTGGTGTGGTC



FGVVF

TTC





844
CVSWDTSL
1668
TGCGTATCATGGGATACCAGCCTGAGTGGTGCGGTA



SGAVF

TTC





845
CVSWDTSL
1669
TGCGTCTCATGGGATACCAGCCTGAGTGCTGGGGTA



SAGVF

TTC





846
CTTWFRTPS
1670
TGCACAACATGGTTTAGGACTCCGAGTGATGTGGTC



DVVF

TTC





847
CTTWFRTA
1671
TGCACAACATGGTTTAGGACTGCGAGTGATGTGGTC



SDVVF

TTC





848
CTTWDYGL
1672
TGCACAACGTGGGATTACGGTCTGAGTGTCGTCTTC



SVVF







849
CTARDTSLS
1673
TGCACAGCAAGGGATACCAGCCTGAGTCCTGGCGGG



PGGVF

GTCTTC





850
CSTWNTRP
1674
TGCTCAACATGGAATACGAGGCCGAGTGATGTGGTG



SDVVF

TTC





851
CSTWESSLT
1675
TGTTCAACATGGGAGAGCAGTTTGACTACTGTGGTC



TVVF

TTC





852
CSTWDTSL
1676
TGCTCAACATGGGATACCAGCCTCACTAATGTGCTA



TNVLF

TTC





853
CSTWDTSL
1677
TGCTCAACATGGGATACCAGCCTGAGTGGAGTAGTC



SGVVF

TTC





854
CSTWDHSL
1678
TGCTCAACATGGGATCACAGCCTGAAAGCTGCACTG



KAALF

TTC





855
CSTWDARL
1679
TGCTCAACCTGGGATGCGAGGCTGAGTGTCCGGGTG



SVRVF

TTC





856
CSSYTSSST
1680
TGCTCCTCATATACAAGCAGCAGCACTTGGGTGTTC



WVF







857
CSSYATRG
1681
TGCAGCTCATACGCAACCCGCGGCCTTCGTGTGTTG



LRVLF

TTC





858
CSSWDATL
1682
TGTTCATCATGGGACGCCACCCTGAGTGTTCGCATA



SVRIF

TTC





859
CQVWEGSS
1683
TGTCAGGTGTGGGAGGGTAGTAGTGATCATTGGGTG



DHWVF

TTC





860
CQTWDNRL
1684
TGCCAAACCTGGGATAACAGACTGAGTGCTGTGGTG



SAVVF

TTC





861
CQTWDHSL
1685
TGTCAAACGTGGGATCACAGCCTGCATGTTGGGGTG



HVGVF

TTC





862
CQSYDDILN
1686
TGCCAGTCCTATGACGACATCTTGAATGTTTGGGTCC



VWVL

TT





863
CNTWDKSL
1687
TGCAATACATGGGATAAGAGTTTGACTTCTGAACTC



TSELF

TTC





864
CLTWDRSL
1688
TGCTTAACATGGGATCGCAGCCTGAATGTGAGGGTG



NVRVF

TTC





865
CLTWDHSL
1689
TGCCTAACATGGGACCACAGCCTGACTGCTTATGTC



TAYVF

TTC





866
CLTRDTSLS
1690
TGCTTAACAAGGGATACCAGTCTGAGTGCCCCTGTG



APVF

TTC





867
CKTWESGL
1691
TGCAAAACATGGGAAAGTGGCCTTAATTTTGGCCAC



NFGHVF

GTCTTC





868
CKTWDTSL
1692
TGCAAAACATGGGATACCAGCCTGAGTGCTGTGGTC



SAVVF

TTC





869
CGVWDVSL
1693
TGCGGAGTCTGGGATGTCAGTCTGGGTGCTGGGGTG



GAGVF

TTC





870
CGVWDTTP
1694
TGCGGAGTCTGGGATACCACCCCGAGTGCCGTTCTT



SAVLF

TTC





871
CGVWDTTL
1695
TGCGGAGTCTGGGATACCACCCTGAGTGCCGTTCTT



SAVLF

TTC





872
CGVWDTSL
1696
TGCGGAGTATGGGATACCAGCCTGGGGGTCTTC



GVF







873
CGVWDTNL
1697
TGCGGGGTATGGGATACCAACCTGGGTAAATGGGTT



GKWVF

TTC





874
CGVWDTGL
1698
TGTGGAGTTTGGGATACTGGCCTGGATGCTGGTTGG



DAGWVF

GTGTTC





875
CGVWDNV
1699
TGCGGAGTGTGGGATAACGTCCTGGAGGCCTATGTC



LEAYVF

TTC





876
CGVWDISL
1700
TGCGGAGTCTGGGATATCAGCCTGAGTGCTAATTGG



SANWVF

GTGTTC





877
CGVWDHSL
1701
TGCGGAGTATGGGATCACAGCCTGGGGATTTGGGCC



GIWAF

TTC





878
CGVWDDIL
1702
TGCGGAGTTTGGGATGATATTCTGACTGCTGAAGTG



TAEVF

TTC





879
CGVRDTSL
1703
TGCGGAGTTCGGGATACCAGCCTGGGGGTCTTC



GVF







880
CGTYDTSLP
1704
TGCGGAACATACGATACGAGCCTGCCTGCTTGGGTG



AWVF

TTT





881
CGTYDNLV
1705
TGCGGAACTTACGATAATCTTGTATTTGGTTATGTCT



FGYVF

TC





882
CGTYDDRL
1706
TGCGGAACATACGATGATAGACTCAGAGAGGTGTTC



REVF







883
CGTWVTSL
1707
TGCGGAACGTGGGTTACCAGCCTGAGTGCTGGGGTG



SAGVF

TTC





884
CGTWVSSL
1708
TGCGGAACATGGGTTAGCAGCCTGACTACTGTAGTA



TTVVF

TTC





885
CGTWVSSL
1709
TGCGGAACATGGGTTAGCAGCCTGAACGTCTGGGTG



NVWVF

TTC





886
CGTWVGRF
1710
TGCGGAACATGGGTTGGCAGGTTTTGGGTATTC



WVF







887
CGTWSGGP
1711
TGCGGAACATGGTCTGGCGGCCCGAGTGGCCATTGG



SGHWLF

TTGTTC





888
CGTWSGGL
1712
TGCGGAACATGGTCTGGCGGCCTGAGTGGCCATTGG



SGHWLF

TTGTTC





889
CGTWQTGR
1713
TGCGGAACGTGGCAGACCGGCCGGGAGGCTGTCCTA



EAVLF

TTT





890
CGTWQSRL
1714
TGCGGAACGTGGCAGAGCAGGCTGAGGTGGGTGTTC



RWVF







891
CGTWQSRL
1715
TGCGGAACGTGGCAGAGCAGGCTGGGGTGGGTGTTC



GWVF







892
CGTWPRSL
1716
TGCGGAACATGGCCTAGGAGCCTGAGTGCTGTTTGG



SAVWVF

GTGTTC





893
CGTWNNYL
1717
TGCGGAACATGGAATAACTACCTGAGTGCTGGCGAT



SAGDVVF

GTGGTTTTC





894
CGTWLGSQ
1718
TGCGGAACATGGCTTGGCAGCCAGAGTCCTTATTGG



SPYWVF

GTCTTC





895
CGTWHTGL
1719
TGCGGAACATGGCATACCGGCCTGAGTGCTTATGTC



SAYVF

TTC





896
CGTWHSTL
1720
TGCGGAACATGGCATAGTACCCTGAGTGCTGGCCAT



SAGHWVF

TGGGTGTTC





897
CGTWHSSL
1721
TGCGGAACATGGCATAGTAGCCTGAGTACTTGGGTG



STWVF

TTC





898
CGTWHSSL
1722
TGCGGAACATGGCATAGCAGCCTGAGTGCCTATGTC



SAYVF

TTC





899
CGTWHSSL
1723
TGCGGAACATGGCATAGCAGCCTGAGTGCTGTGGTA



SAVVF

TTC





900
CGTWHSGL
1724
TGCGGAACGTGGCATTCCGGCCTGAGTGGGTGGGTT



SGWVF

TTC





901
CGTWHNTL
1725
TGCGGAACATGGCATAACACCCTGCGTAATGTGATA



RNVIF

TTC





902
CGTWHASL
1726
TGCGGAACATGGCATGCCAGCCTGACTGCTGTGTTC



TAVF







903
CGTWGWY
1727
TGCGGGACATGGGGATGGTATGGCAGCCAGAGAGG



GSQRGVVF

CGTCGTCTTC





904
CGTWGWY
1728
TGCGGGACATGGGGATGGTATGGCGGCCAGAGAGG



GGQRGVVF

CGTCGTCTTC





905
CGTWGTSL
1729
TGCGGAACCTGGGGAACCAGCCTGAGTGCTTGGGTG



SAWVF

TTC





906
CGTWGSSL
1730
TGCGGAACCTGGGGTAGCAGCCTGACTACTGGCCTG



TTGLF

TTC





907
CGTWGSSL
1731
TGCGGAACATGGGGTAGCAGCCTGACTGCCTATGTC



TAYVF

TTC





908
CGTWGSSL
1732
TGCGGAACATGGGGTAGCAGCCTGAGTGTTGTGTTC



SVVF







909
CGTWGSSL
1733
TGCGGAACATGGGGTAGCAGCCTGAGTGGTGGGGT



SGGVF

GTTC





910
CGTWGSSL
1734
TGCGGAACATGGGGTAGCAGCCTGAGTGCTTATTGG



SAYWVF

GTGTTC





911
CGTWGSSL
1735
TGCGGAACATGGGGTAGCAGCCTGAGTGCTTATGTG



SAYVVF

GTGTTC





912
CGTWGSSL
1736
TGCGGAACATGGGGTAGCAGCCTGAGTGCTTATGTC



SAYVF

TTC





913
CGTWGSSL
1737
TGCGGAACGTGGGGTAGTAGCCTGAGTGCTGTGGTG



SAVVF

TTC





914
CGTWGSSL
1738
TGCGGAACATGGGGTAGCAGCCTGAGTGCTCCTTAT



SAPYVF

GTCTTC





915
CGTWGSSL
1739
TGCGGAACATGGGGTAGCAGCCTGAGTGCCCCGGTG



SAPVF

TTC





916
CGTWGSSL
1740
TGCGGAACATGGGGTAGCAGCCTGAGTGCTGGGGTG



SAGVF

TTC





917
CGTWGSSL
1741
TGCGGAACTTGGGGTAGCAGCCTGAGTGCTGGACTG



SAGLF

TTC





918
CGTWGSSL
1742
TGCGGAACATGGGGTAGCAGCCTGAGTGCTGGGGC



SAGALF

ACTCTTC





919
CGTWGSSL
1743
TGCGGAACATGGGGCAGTAGCCTGCGTGCTTGGGTG



RAWVF

TTC





920
CGTWFTSL
1744
TGCGGAACCTGGTTTACTAGTCTGGCTAGTGGGGTT



ASGVF

TTC





921
CGTWETSL
1745
TGCGGAACTTGGGAGACCAGTCTGAGTGTCGTGGTC



SVVVI

ATC





922
CGTWETSL
1746
TGCGGAACATGGGAGACCAGCCTGAGTGGTGTCTTC



SGVF







923
CGTWETSL
1747
TGCGGAACATGGGAAACCAGCCTGAGTGATTGGGTA



SDWVF

TTC





924
CGTWETSL
1748
TGCGGAACATGGGAGACCAGCCTGAGTGCTGGGGT



SAGVF

ATTC





925
CGTWETSL
1749
TGCGGAACATGGGAAACCAGCCTTAATTATGTGGCC



NYVAF

TTC





926
CGTWETSL
1750
TGCGGAACATGGGAGACCAGCCTGAATACTTGGTTG



NTWLL

CTC





927
CGTWETSE
1751
TGCGGAACATGGGAGACCAGCGAGAGTGGTAATTA



SGNYIF

CATCTTC





928
CGTWETRL
1752
TGCGGAACATGGGAAACCAGACTGGGTACTTGGGTG



GTWVI

ATC





929
CGTWETQL
1753
TGCGGAACATGGGAGACCCAGTTATATTGGGTGTTC



YWVF







930
CGTWETGL
1754
TGCGGAACATGGGAGACTGGCCTAAGTGCTGGAGA



SAGEVF

GGTGTTC





931
CGTWESTL
1755
TGCGGAACTTGGGAAAGCACCCTGAGTGTTTTCCTA



SVFLF

TTC





932
CGTWESSL
1756
TGCGGGACATGGGAAAGTAGCCTGACTGTTGTGGTC



TVVVF

TTC





933
CGTWESSL
1757
TGCGGAACATGGGAAAGTAGCCTGACTGGAGTGGT



TGVVF

ATTC





934
CGTWESSL
1758
TGCGGAACATGGGAAAGCAGCCTGACTGGTTTTGTC



TGFVF

TTC





935
CGTWESSL
1759
TGTGGAACATGGGAGAGCAGCCTGAGTGTTGGGGTG



SVGVF

TTC





936
CGTWESSL
1760
TGCGGAACCTGGGAAAGTAGCCTCAGTGAATGGGTG



SEWVF

TTC





937
CGTWESSL
1761
TGCGGAACATGGGAGAGCAGCCTGAGTGCTGTATTC



SAVF







938
CGTWESSL
1762
TGCGGAACATGGGAGAGCAGCCTGAGTGCTGGTTAT



SAGYIF

ATCTTC





939
CGTWESSL
1763
TGCGGAACATGGGAGAGCAGCCTGAGTGCTGGAGT



SAGVF

GTTC





940
CGTWESSL
1764
TGCGGAACATGGGAAAGCAGCCTGAGCGCTGGCCC



SAGPVF

GGTGTTC





941
CGTWESSL
1765
TGCGGAACATGGGAAAGCAGCCTGAGTGCTGGAGG



SAGGQVF

CCAGGTGTTC





942
CGTWESSL
1766
TGCGGAACATGGGAGAGCAGCCTGAGTGCCTTCGGC



SAFGGYVF

GGTTATGTCTTC





943
CGTWESSL
1767
TGCGGAACATGGGAAAGCAGCCTGAGGGTTTGGGT



RVWVF

GTTC





944
CGTWESSL
1768
TGCGGAACATGGGAAAGCAGCCTCTTTACTGGGCCT



FTGPWVF

TGGGTGTTC





945
CGTWESLS
1769
TGCGGAACATGGGAGAGCCTGAGTGCCACCTATGTC



ATYVF

TTC





946
CGTWESGL
1770
TGCGGAACATGGGAGAGCGGCCTGAGTGCTGGTGTC



SAGVF

TTC





947
CGTWESDF
1771
TGCGGAACATGGGAAAGCGACTTTTGGGTGTTT



WVF







948
CGTWENRL
1772
TGCGGTACATGGGAAAACAGACTGAGTGCTGTGGTC



SAVVF

TTC





949
CGTWENRL
1773
TGCGGAACATGGGAAAACAGACTGAGTGCCGGGGT



SAGVF

ATTC





950
CGTWEISLT
1774
TGCGGAACATGGGAAATCAGCCTGACTACTTCTGTG



TSVVF

GTATTC





951
CGTWEISLS
1775
TGCGGAACATGGGAAATCAGCCTGAGTACTTCTGTG



TSVVF

GTATTC





952
CGTWEGSL
1776
TGCGGAACATGGGAAGGCAGCCTCAGTGTTGTTTTC



SVVF







953
CGTWEGSL
1777
TGCGGAACATGGGAAGGCAGCCTGAGGGTGTTC



RVF







954
CGTWEGSL
1778
TGCGGAACATGGGAGGGCAGCCTGAGGCACGTGTTC



RHVF







955
CGTWDYSP
1779
TGCGGAACATGGGATTACAGCCCTGTACGTGCTGGG



VRAGVF

GTGTTC





956
CGTWDYSL
1780
TGCGGAACGTGGGATTACAGCCTGAGTGTTTATCTC



SVYLF

TTC





957
CGTWDYSL
1781
TGCGGAACATGGGATTACAGCCTGAGTTCTGGCGTG



SSGVVF

GTATTC





958
CGTWDYSL
1782
TGCGGAACATGGGATTACAGCCTGAGTGCCTGGGTG



SAWVF

TTC





959
CGTWDYSL
1783
TGCGGAACATGGGATTACAGTCTGAGTGCTGAGGTG



SAEVF

TTC





960
CGTWDYSL
1784
TGCGGAACATGGGATTACAGCCTGCGTCGTGCGATA



RRAIF

TTC





961
CGTWDWSL
1785
TGCGGAACATGGGATTGGAGCCTCATTCTTCAATTG



ILQLF

TTC





962
CGTWDVTL
1786
TGCGGAACATGGGATGTCACCTTGCATACTGGGGTG



HTGVF

TTC





963
CGTWDVTL
1787
TGCGGAACATGGGATGTCACCTTGCATATTGGGGTG



HIGVF

TTC





964
CGTWDVTL
1788
TGCGGAACATGGGATGTCACCTTGCATGCTGGGGTG



HAGVF

TTC





965
CGTWDVSL
1789
TGCGGAACATGGGATGTCAGTTTGTATAGTGGCGGG



YSGGVF

GTCTTC





966
CGTWDVSL
1790
TGTGGAACATGGGATGTCAGCCTGACTTCTTTCGTCT



TSFVF

TC





967
CGTWDVSL
1791
TGCGGAACATGGGATGTCAGCCTGAGTGTTGGGGTG



SVGVL

CTC





968
CGTWDVSL
1792
TGCGGAACGTGGGATGTCAGCCTGAGTGCTGGCGAT



SAGDVVF

GTAGTTTTC





969
CGTWDVSL
1793
TGCGGAACATGGGATGTCAGCCTGAATGTCGTGGTT



NVVVF

TTC





970
CGTWDVSL
1794
TGCGGAACATGGGATGTCAGCCTGAATACTCAGGTG



NTQVF

TTC





971
CGTWDVSL
1795
TGCGGCACATGGGATGTGAGCCTGGGTGCGCTGTTC



GALF







972
CGTWDVNL
1796
TGCGGAACGTGGGACGTTAATCTGAAAACTGTCGTT



KTVVF

TTC





973
CGTWDVIL
1797
TGCGGAACATGGGATGTCATCCTGAGTGCTGAGGTA



SAEVF

TTC





974
CGTWDTTV
1798
TGCGGAACATGGGATACCACCGTGAGTGCTGTGGTT



SAVVF

TTC





975
CGTWDTTL
1799
TGCGGAACATGGGATACCACCCTGACTGCCTGGGTG



TAWVF

TTC





976
CGTWDTTL
1800
TGCGGAACATGGGACACCACCTTGAGTGTTTTCCTA



SVFLF

TTC





977
CGTWDTSV
1801
TGCGGGACTTGGGATACCAGTGTGAGTGCTGGGGTG



SAGVF

TTC





978
CGTWDTSV
1802
TGCGGAACATGGGATACCAGTGTGATTTCTTGGGTT



ISWVF

TTC





979
CGTWDTSR
1803
TGCGGAACATGGGATACCAGTCGGAGTTCTCTCTAT



SSLYVVF

GTGGTCTTC





980
CGTWDTSR
1804
TGCGGAACATGGGATACCAGCCGGAGTGCTTGGGTA



SAWVF

TTC





981
CGTWDTSR
1805
TGCGGAACATGGGATACCAGCCGGAATCCTGGAGG



NPGGIF

AATTTTC





982
CGTWDTSR
1806
TGCGGAACATGGGACACCAGTCGGGGTCATGTTTTC



GHVF







983
CGTWDTSP
1807
TGCGGAACATGGGATACCAGCCCGAGTACTGGCCAG



STGQVLF

GTGCTTTTC





984
CGTWDTSP
1808
TGCGGAACATGGGATACCAGCCCGAGTGCCTGGGTG



SAWVF

TTC





985
CGTWDTSL
1809
TGCGGAACATGGGATACTAGCCTGACCTGGGTGTTC



TWVF







986
CGTWDTSL
1810
TGCGGAACATGGGATACCAGCCTGACGTGGTTCGCA



TWFAVF

GTGTTC





987
CGTWDTSL
1811
TGCGGAACATGGGATACCAGCCTGACTGTTGTGGTA



TVVVF

TTC





988
CGTWDTSL
1812
TGCGGAACATGGGATACCAGCCTGACTACTTCTTGG



TTSWVF

GTGTTC





989
CGTWDTSL
1813
TGCGGAACATGGGATACCAGCCTGACCACTGGTCCT



TTGPFWVF

TTTTGGGTGTTC





990
CGTWDTSL
1814
TGCGGAACATGGGATACCAGCCTGACTCCTTTTTAT



TPFYVF

GTCTTC





991
CGTWDTSL
1815
TGCGGAACATGGGATACCAGCCTGACTGCTTATGTC



TAYVF

TTC





992
CGTWDTSL
1816
TGCGGAACATGGGATACCAGCCTGACTGCTTGGGTG



TAWVF

TTC





993
CGTWDTSL
1817
TGCGGAACATGGGATACCAGCCTGACTGCGTGGGGG



TAWGVF

GTGTTC





994
CGTWDTSL
1818
TGCGGCACATGGGATACCAGCCTGACTGCGGTGGTT



TAVVL

CTC





995
CGTWDTSL
1819
TGCGGAACCTGGGATACCAGCCTGACTGCTCGGGTT



TARVF

TTC





996
CGTWDTSL
1820
TGCGGAACATGGGATACCAGCCTGACTGCGATTGTC



TAIVF

TTC





997
CGTWDTSL
1821
TGCGGAACATGGGATACCAGCCTGACTGCTGGTGTC



TAGVF

TTC





998
CGTWDTSL
1822
TGCGGAACATGGGATACCAGCCTGAGTGTTTATGTC



SVYVF

TTC





999
CGTWDTSL
1823
TGCGGAACATGGGATACCAGCCTGAGTGTGGTGTTC



SVVF







1000
CGTWDTSL
1824
TGCGGGACATGGGATACCAGCCTGAGTGTTGGGGAA



SVGEF

TTC





1001
CGTWDTSL
1825
TGCGGAACATGGGATACCAGCCTGAGTACTTGGGTG



STWVF

TTC





1002
CGTWDTSL
1826
TGCGGAACATGGGATACCAGCCTGAGTACTGTGGTA



STVVF

TTC





1003
CGTWDTSL
1827
TGCGGAACATGGGATACCAGCCTGAGTACTGGCCAG



STGQVLF

GTGCTTTTC





1004
CGTWDTSL
1828
TGCGGCACATGGGATACCAGCCTGAGCACTGGTCCT



STGPLWVF

CTTTGGGTGTTC





1005
CGTWDTSL
1829
TGCGGAACTTGGGATACCAGCCTGAGTTCTTATGTC



SSYVF

TTC





1006
CGTWDTSL
1830
TGCGGAACATGGGATACCAGCCTGAGTTCTGTGGTC



SSVVF

TTC





1007
CGTWDTSL
1831
TGCGGAACATGGGATACCAGCCTGAGTTCTAGATAC



SSRYIF

ATATTC





1008
CGTWDTSL
1832
TGCGGAACATGGGATACCAGCCTGAGTTCTAGATTC



SSRFIF

ATATTC





1009
CGTWDTSL
1833
TGCGGAACATGGGATACCAGCCTGAGTTCTGGGTGG



SSGWVF

GTGTTC





1010
CGTWDTSL
1834
TGCGGAACATGGGATACCAGCCTGAGTCGGTATGTG



SRYVF

TTC





1011
CGTWDTSL
1835
TGCGGAACTTGGGATACCAGTCTGAGTCAATGGCTG



SQWLF

TTC





1012
CGTWDTSL
1836
TGCGGAACATGGGATACCAGCCTGAGTCCTGGCCTT



SPGLWVF

TGGGTGTTC





1013
CGTWDTSL
1837
TGCGGAACATGGGATACCAGCCTGAGTAATTATGTC



SNYVF

TTC





1014
CGTWDTSL
1838
TGCGGAACATGGGATACCAGCCTAAGTATTTGGGTG



SIWVF

TTC





1015
CGTWDTSL
1839
TGCGGCACATGGGATACCAGCCTGAGCATTGGTCCT



SIGPFWVF

TTTTGGGTGTTC





1016
CGTWDTSL
1840
TGCGGAACATGGGATACCAGCCTGAGTGGTTGGGTG



SGWVF

TTC





1017
CGTWDTSL
1841
TGCGGAACATGGGATACCAGCCTGAGTGGTACAGTG



SGTVF

TTC





1018
CGTWDTSL
1842
TGCGGAACATGGGATACTAGTCTGAGTGGTGGCCAG



SGGQVF

GTGTTC





1019
CGTWDTSL
1843
TGCGGAACATGGGATACCAGCCTGAGTGGTGGGATA



SGGIF

TTC





1020
CGTWDTSL
1844
TGCGGAACATGGGATACCAGCCTGAGTGGTGAGGAT



SGEDVVI

GTGGTAATC





1021
CGTWDTSL
1845
TGCGGAACATGGGATACCAGCCTGAGTTTCCTTTAT



SFLYAF

GCTTTC





1022
CGTWDTSL
1846
TGCGGAACATGGGATACCAGCCTGAGTGAGGTCGTA



SEVVF

TTC





1023
CGTWDTSL
1847
TGCGGAACATGGGATACCAGCCTGAGTGAAGTGTTC



SEVF







1024
CGTWDTSL
1848
TGCGGAACATGGGATACTAGCCTGAGTGAAAATTGG



SENWVF

GTGTTC





1025
CGTWDTSL
1849
TGCGGAACATGGGATACCAGCCTGAGTGCCTACATA



SAYIF

TTC





1026
CGTWDTSL
1850
TGCGGAACATGGGATACCAGCCTGAGTGCTGTGGTA



SAVVL

CTC





1027
CGTWDTSL
1851
TGCGGAACATGGGATACCAGCCTGAGTGCTGTTTTC



SAVF







1028
CGTWDTSL
1852
TGCGGAACATGGGATACCAGCCTGAGTGCCCGGGTG



SARVF

TTC





1029
CGTWDTSL
1853
TGCGGCACATGGGATACCAGCCTGAGTGCCCGCCAG



SARQVF

GTATTC





1030
CGTWDTSL
1854
TGCGGAACATGGGATACCAGCCTGAGTGCTTTGGTT



SALVF

TTC





1031
CGTWDTSL
1855
TGCGGAACATGGGATACCAGCCTGAGTGCTAAGGTG



SAKVF

TTC





1032
CGTWDTSL
1856
TGCGGAACATGGGATACCAGCCTGAGTGCGAAAATC



SAKIF

TTC





1033
CGTWDTSL
1857
TGCGGAACATGGGATACCAGCCTGAGTGCCAAGGC



SAKAVF

GGTATTC





1034
CGTWDTSL
1858
TGCGGAACATGGGATACCAGCCTGAGTGCCCATGCT



SAHAVF

GTGTTC





1035
CGTWDTSL
1859
TGCGGAACATGGGATACCAGCCTGAGTGCTGGCTAT



SAGYVF

GTCTTC





1036
CGTWDTSL
1860
TGCGGAACATGGGACACCAGTCTGAGTGCTGGCCGC



SAGRWVF

TGGGTGTTC





1037
CGTWDTSL
1861
TGCGGAACATGGGATACCAGCCTGAGTGCTGGGATA



SAGIF

TTC





1038
CGTWDTSL
1862
TGCGGAACATGGGATACCAGCCTGAGTGCTGGTGGG



SAGGFRVF

TTCCGGGTCTTC





1039
CGTWDTSL
1863
TGCGGAACATGGGATACCAGCCTGAGTGCTGGGGCA



SAGAF

TTC





1040
CGTWDTSL
1864
TGCGGAACATGGGATACCAGTCTGAGTGCTGATTGG



SADWFF

TTTTTC





1041
CGTWDTSL
1865
TGCGGAACATGGGATACCAGCCTGAGTGCTGATGAA



SADEYVF

TATGTCTTC





1042
CGTWDTSL
1866
TGCGGCACATGGGATACCAGCCTGAGTGCGGCTTGG



SAAWVF

GTGTTC





1043
CGTWDTSL
1867
TGCGGAACATGGGATACCAGCCTGAGTGCTGCGCTA



SAALF

TTC





1044
CGTWDTSL
1868
TGCGGAACATGGGATACCAGCCTGAGTGCTGCGGGG



SAAGVF

GTTTTC





1045
CGTWDTSL
1869
TGCGGAACATGGGATACCAGCCTGAGAGTTGTGGTT



RVVVF

TTC





1046
CGTWDTSL
1870
TGCGGAACATGGGATACCAGCCTGAGAACCTGGGTA



RTWVF

TTC





1047
CGTWDTSL
1871
TGCGGAACGTGGGATACCAGCCTGAGGGGTGCAGT



RGAVF

GTTC





1048
CGTWDTSL
1872
TGCGGAACATGGGATACCAGCCTGCGTGCTGTGGTA



RAVVF

TTC





1049
CGTWDTSL
1873
TGCGGAACATGGGATACAAGCCTGAATGTAGTTTAT



NVVYVF

GTCTTC





1050
CGTWDTSL
1874
TGCGGAACATGGGATACCAGCCTCAACACCTACCTG



NTYLF

TTC





1051
CGTWDTSL
1875
TGCGGAACATGGGATACTAGCCTGAACTTCGCTTGG



NFAWLF

CTGTTC





1052
CGTWDTSL
1876
TGCGGCACATGGGATACCAGCCTTCTTGTGTGGCTTT



LVWLF

TC





1053
CGTWDTSL
1877
TGCGGAACATGGGATACCAGTCTGAAGACGTGGGTG



KTWVF

TTC





1054
CGTWDTSLI
1878
TGCGGAACATGGGATACCAGTCTGATTGTCTGGGTG



VWVF

TTC





1055
CGTWDTSLI
1879
TGCGGAACATGGGATACCAGCCTAATTACTGGGGTG



TGVF

TTC





1056
CGTWDTSLI
1880
TGCGGAACATGGGATACCAGCCTGATTAGCGTGGTA



SVVF

TTC





1057
CGTWDTSLI
1881
TGCGGAACATGGGATACCAGCCTGATTGCTTATGTC



AYVF

TTC





1058
CGTWDTSL
1882
TGCGGAACATGGGATACCAGCCTGCACACTGAGTTG



HTELF

TTC





1059
CGTWDTSL
1883
TGCGGAACTTGGGATACCAGCCTGGGTTCTTATGTC



GSYVF

TTC





1060
CGTWDTSL
1884
TGCGGAACATGGGATACCAGCCTGGGTTCTCTTTGG



GSLWVF

GTGTTC





1061
CGTWDTSL
1885
TGCGGTACATGGGATACCAGCCTGGGTTCTGGGGTA



GSGVF

TTC





1062
CGTWDTSL
1886
TGCGGAACTTGGGATACCAGTCTGGGTGGTAGAGGG



GGRGVF

GTCTTC





1063
CGTWDTSL
1887
TGCGGAACATGGGATACCAGCCTGGGTGCTTGGGTG



GAWVF

TTC





1064
CGTWDTSL
1888
TGCGGAACATGGGATACCAGCCTGGGTGCCGTGGTA



GAVVF

TTC





1065
CGTWDTSL
1889
TGCGGAACATGGGATACCAGCCTGGGTGCTGGGGTA



GAGVF

TTC





1066
CGTWDTSL
1890
TGCGGAACATGGGATACCAGCCTGGGTGCTGGCCTA



GAGLF

TTC





1067
CGTWDTSL
1891
TGCGGAACATGGGATACCAGTCTGGATGCTGTGGTT



DAVVF

TTC





1068
CGTWDTSL
1892
TGCGGGACTTGGGATACCAGCCTGGATGCTGTGCTG



DAVLF

TTC





1069
CGTWDTSL
1893
TGCGGAACATGGGATACCAGCCTGGCTTGGGTGTTC



AWVF







1070
CGTWDTSL
1894
TGCGGAACATGGGATACCAGCCTGGCGACTGGACTG



ATGLF

TTC





1071
CGTWDTSL
1895
TGCGGGACATGGGATACCAGCCTGGCCCCTGTAGTC



APVVF

TTC





1072
CGTWDTRL
1896
TGCGGAACATGGGACACCCGCCTGACTATTGTGATC



TIVIF

TTC





1073
CGTWDTRL
1897
TGTGGAACATGGGACACCAGGCTGAGTGTTTGGCTG



SVWLF

TTC





1074
CGTWDTRL
1898
TGCGGAACGTGGGACACCAGACTGAGTGTTGGGGTT



SVGVF

TTC





1075
CGTWDTRL
1899
TGCGGCACATGGGATACCAGACTGAGTACTGTAATT



STVIF

TTC





1076
CGTWDTRL
1900
TGCGGAACATGGGATACCCGCCTGAGTTCTGTGGTC



SSVVF

TTC





1077
CGTWDTRL
1901
TGCGGAACATGGGATACCCGCCTGAGTATTGTGGTT



SIVVF

TTC





1078
CGTWDTRL
1902
TGCGGAACATGGGATACCAGACTGAGTGCCTATGTG



SAYVVF

GTATTC





1079
CGTWDTRL
1903
TGCGGAACCTGGGACACCCGCCTGAGTGCGTGGGTG



SAWVF

TTC





1080
CGTWDTRL
1904
TGCGGAACATGGGATACCAGACTGAGTGCTGTGGTG



SAVVF

TTC





1081
CGTWDTRL
1905
TGCGGAACATGGGATACCCGCCTGAGTGCTGGGTTG



SAGLF

TTC





1082
CGTWDTRL
1906
TGCGGAACATGGGATACCAGACTGAGTGCTGGTGGG



SAGGVF

GTGTTC





1083
CGTWDTRL
1907
TGCGGAACATGGGATACCAGATTGAATGTGTGGCTA



NVWLF

TTC





1084
CGTWDTNR
1908
TGCGGAACATGGGATACCAACCGGGAAGTTGTGCTC



EVVLL

CTC





1085
CGTWDTNL
1909
TGCGGAACATGGGATACCAACCTGCGTGCCCATGTC



RAHVF

TTC





1086
CGTWDTNL
1910
TGCGGAACATGGGATACTAATCTGCCCGCTGTAGTG



PAVVF

TTC





1087
CGTWDTNL
1911
TGCGGAACATGGGACACCAATTTGGGTGGGGTGTTC



GGVF







1088
CGTWDTIV
1912
TGCGGAACATGGGATACCATCGTGAGTATTGGGGTG



SIGVF

TTC





1089
CGTWDTILS
1913
TGCGGAACATGGGATACCATCCTGAGTGCGGTGGTG



AVVF

TTC





1090
CGTWDTILS
1914
TGCGGCACATGGGATACCATCCTGAGTGCTGAGGTG



AEVF

TTC





1091
CGTWDTHL
1915
TGCGGAACATGGGATACCCACCTGGGTGTGGTTTTC



GVVF







1092
CGTWDTGP
1916
TGCGGAACATGGGATACCGGCCCGAGCCCTCATTGG



SPHWLF

CTGTTC





1093
CGTWDTGL
1917
TGCGGAACATGGGATACCGGCCTGACTTTTGGAGGC



TFGGVF

GTGTTC





1094
CGTWDTGL
1918
TGCGGAACATGGGATACCGGCCTGACTGCTTTTGTC



TAFVF

TTC





1095
CGTWDTGL
1919
TGCGGAACATGGGATACCGGCCTGAGTGTTTGGGTG



SVWVF

TTC





1096
CGTWDTGL
1920
TGCGGAACATGGGATACCGGCCTGAGTACTGGGATT



STGIF

TTC





1097
CGTWDTGL
1921
TGCGGAACATGGGATACCGGCCTGAGTTCCCTGCTC



SSLLF

TTC





1098
CGTWDTGL
1922
TGCGGAACGTGGGACACCGGCCTGAGTATTGTGGTG



SIVVF

TTC





1099
CGTWDTGL
1923
TGCGGAACGTGGGACACCGGCCTGAGTTTTGTGGTG



SFVVF

TTC





1100
CGTWDTGL
1924
TGCGGAACATGGGATACCGGCCTGAGTGCTTGGGTG



SAWVF

TTC





1101
CGTWDTGL
1925
TGCGGAACATGGGATACCGGCCTGAGTGCTGGTGTG



SAGVVF

GTATTC





1102
CGTWDTGL
1926
TGCGGAACATGGGATACCGGTCTGAGGGGTTGGATT



RGWIF

TTC





1103
CGTWDTEL
1927
TGCGGAACATGGGATACCGAGCTAAGTGCGGGGGT



SAGVF

CTTC





1104
CGTWDTAL
1928
TGCGGAACGTGGGATACCGCCCTGACTGCTGGGGTG



TAGVF

TTC





1105
CGTWDTAL
1929
TGCGGAACATGGGATACTGCCCTGAGTCTTGTGGTC



SLVVF

TTC





1106
CGTWDTAL
1930
TGCGGAACATGGGATACCGCCCTGAGTGCCTGGCTG



SAWLF

TTC





1107
CGTWDTAL
1931
TGCGGCACATGGGATACCGCCCTGAGTGCTGGGGTG



SAGVF

TTC





1108
CGTWDTAL
1932
TGCGGAACATGGGATACCGCCCTGCGTGGCGTGCTG



RGVLF

TTC





1109
CGTWDTAL
1933
TGCGGAACATGGGATACCGCCCTGAAAGAATGGCTG



KEWLF

TTC





1110
CGTWDRTL
1934
TGCGGAACATGGGATAGGACCCTGACTGCTGGCGAT



TAGDVLF

GTGCTCTTC





1111
CGTWDRSV
1935
TGCGGAACATGGGATAGAAGCGTGACTTATGTCTTC



TYVF







1112
CGTWDRSR
1936
TGCGGAACATGGGATCGCAGCCGAAATGAATGGGT



NEWVF

GTTC





1113
CGTWDRSL
1937
TGCGGAACATGGGATCGCAGTCTGACTGTTTGGGTC



TVWVF

TTC





1114
CGTWDRSL
1938
TGCGGAACATGGGATCGCAGCCTGACTCCTGGGTGG



TPGWLF

TTGTTC





1115
CGTWDRSL
1939
TGCGGAACATGGGATAGAAGCCTGACTGCTTGGGTG



TAWVF

TTC





1116
CGTWDRSL
1940
TGCGGAACATGGGACCGCAGCCTGAGTGTTGTGGTA



SVVVF

TTC





1117
CGTWDRSL
1941
TGCGGCACATGGGATCGCAGCCTGAGTGTAGTCTTC



SVVF







1118
CGTWDRSL
1942
TGCGGAACATGGGATAGGAGCCTGAGTGTTCAATTG



SVQLF

TTC





1119
CGTWDRSL
1943
TGCGGAACATGGGATCGCAGCCTCAGTGTTCTTTGG



SVLWVF

GTGTTC





1120
CGTWDRSL
1944
TGCGGAACATGGGATCGCAGCCTGAGTGTTGGATTA



SVGLF

TTC





1121
CGTWDRSL
1945
TGCGGAACATGGGATCGCAGCCTGAGTACTTGGGTG



STWVF

TTC





1122
CGTWDRSL
1946
TGCGGAACATGGGATAGAAGCCTGAGTACTCATTGG



STHWVL

GTGCTC





1123
CGTWDRSL
1947
TGCGGAACATGGGATAGAAGCCTGAGTACTCATTGG



STHWVF

GTGTTC





1124
CGTWDRSL
1948
TGCGGAACCTGGGATCGAAGCCTGAGTTCTGCGGTG



SSAVF

TTC





1125
CGTWDRSL
1949
TGCGGAACATGGGACAGAAGCCTGAGTCCCTCTTAT



SPSYVF

GTCTTC





1126
CGTWDRSL
1950
TGCGGAACATGGGATAGGAGCCTGAGTGGTGAGGT



SGEVF

GTTC





1127
CGTWDRSL
1951
TGCGGAACATGGGATAGGAGCCTGAGTGGTGCGGT



SGAVF

GTTC





1128
CGTWDRSL
1952
TGCGGAACATGGGATCGCAGCCTGAGTGCTGTGGCA



SAVAF

TTC





1129
CGTWDRSL
1953
TGCGGAACATGGGATAGGAGCCTGAGTGCCGGGGG



SAGGEF

GGAATTC





1130
CGTWDRSL
1954
TGCGGAACATGGGATCGCAGCCTGAGTGCTTTTTGG



SAFWVF

GTGTTC





1131
CGTWDRSL
1955
TGCGGAACATGGGATAGGAGCCTGAGTGCTGCGGTG



SAAVF

TTC





1132
CGTWDRSL
1956
TGCGGAACATGGGATAGGAGCCTGAGTGCTGCACTC



SAALF

TTC





1133
CGTWDRSL
1957
TGCGGAACATGGGATCGCAGCCTGAGAGTGTTC



RVF







1134
CGTWDRSL
1958
TGCGGTACATGGGACAGAAGCCTTAATTGGGTGTTC



NWVF







1135
CGTWDRSL
1959
TGCGGAACATGGGATCGCAGCCTGAATGTTTATGTC



NVYVF

TTC





1136
CGTWDRSL
1960
TGCGGAACATGGGATAGGAGCCTGAATGTTGGGGTG



NVGVF

TTC





1137
CGTWDRSL
1961
TGCGGAACATGGGATCGGAGCCTGCATGTGGTCTTC



HVVF







1138
CGTWDRSL
1962
TGTGGAACATGGGATCGCAGCCTGGGTGGTTGGGTG



GGWVF

TTC





1139
CGTWDRSL
1963
TGCGGAACATGGGATCGCAGCCTGGGTGCTTTTTGG



GAFWVF

GTGTTC





1140
CGTWDRSL
1964
TGCGGAACATGGGATAGAAGCCTGTTTTGGGTGTTC



FWVF







1141
CGTWDRSL
1965
TGCGGAACGTGGGATCGCAGCCTGGCTGCTGGGGTG



AAGVF

TTC





1142
CGTWDRRL
1966
TGCGGAACATGGGATAGGAGGTTGAGTGGTGTCGTA



SGVVF

TTC





1143
CGTWDRRL
1967
TGCGGAACGTGGGATCGCCGCCTAAGTGATGTGGTA



SDVVF

TTC





1144
CGTWDRRL
1968
TGCGGAACATGGGATAGGAGGCTGAGTGCTGTGGTA



SAVVF

TTC





1145
CGTWDRRL
1969
TGCGGAACATGGGATAGACGCCTGAATGTTGCGTTC



NVAFF

TTC





1146
CGTWDRRL
1970
TGTGGAACATGGGATAGGAGGCTGCTTGCTGTTTTC



LAVF







1147
CGTWDRNL
1971
TGCGGAACTTGGGATAGGAACCTGCGCGCCGTGGTC



RAVVF

TTC





1148
CGTWDRLS
1972
TGCGGAACATGGGATAGGCTGAGTGCTGGGGTGTTC



AGVF







1149
CGTWDRGP
1973
TGCGGAACATGGGATAGAGGCCCGAATACTGGGGT



NTGVF

ATTC





1150
CGTWDRGL
1974
TGCGGAACATGGGATAGAGGCCTGAATACTGTTTAC



NTVYVF

GTCTTC





1151
CGTWDNY
1975
TGCGGAACATGGGATAACTATGTGAGTGCCCCTTGG



VSAPWVF

GTGTTC





1152
CGTWDNYL
1976
TGCGGAACATGGGATAACTACCTGAGTGCTGGCGAT



SAGDVVF

GTGGTTTTC





1153
CGTWDNYL
1977
TGCGGAACATGGGATAACTACCTGAGAGCTGGGGTC



RAGVF

TTC





1154
CGTWDNYL
1978
TGCGGAACATGGGACAATTATCTGGGTGCCGTGGTT



GAVVF

TTC





1155
CGTWDNYL
1979
TGCGGAACATGGGATAACTACCTGGGTGCGGGGGTG



GAGVF

TTC





1156
CGTWDNTV
1980
TGCGGAACATGGGATAACACCGTGAGTGCCCCTTGG



SAPWVF

GTTTTC





1157
CGTWDNTL
1981
TGCGGAACATGGGATAACACCCTGAGTCTTTGGGTG



SLWVF

TTC





1158
CGTWDNTL
1982
TGCGGAACATGGGATAACACCCTGAGTGCTGGGGTC



SAGVF

TTC





1159
CGTWDNTL
1983
TGCGGAACATGGGACAACACTCTGCTTACTGTGTTA



LTVLF

TTC





1160
CGTWDNRL
1984
TGCGGAACATGGGATAACAGACTGAGTAGTGTGATT



SSVIF

TTC





1161
CGTWDNRL
1985
TGCGGAACATGGGATAACAGGTTGAGTGCTGTGGTC



SAVVF

TTC





1162
CGTWDNRL
1986
TGCGGAACATGGGATAACAGGCTGAGTGCTGGTGG



SAGGIF

GATATTC





1163
CGTWDNRL
1987
TGCGGAACATGGGATAACAGACTGAGTGCTGAGGT



SAEVF

GTTC





1164
CGTWDNRL
1988
TGTGGAACATGGGATAACAGACTGCGTGTTGGGGTT



RVGVL

CTC





1165
CGTWDNRL
1989
TGCGGAACATGGGATAATCGCCTGCTTGAGAATGTC



LENVF

TTC





1166
CGTWDNNL
1990
TGCGGAACATGGGATAACAACCTGCGTGCTGTCTTC



RAVF







1167
CGTWDNNL
1991
TGCGGAACTTGGGATAATAACCTGCGTGCTGGAGTG



RAGVF

TTC





1168
CGTWDNNL
1992
TGCGGAACATGGGACAACAATTTGGGCGGTGGCCG



GGGRVF

GGTGTTC





1169
CGTWDNNL
1993
TGCGGAACATGGGATAACAACCTGGGTGCTGGCGTC



GAGVL

CTC





1170
CGTWDNNL
1994
TGCGGAACATGGGATAACAACCTGGGTGCTGGCGTC



GAGVF

TTC





1171
CGTWDNIL
1995
TGCGGAACTTGGGATAACATCCTGAGCGCTGCGGTG



SAAVF

TTC





1172
CGTWDNIL
1996
TGCGGAACCTGGGATAACATCTTGGATGCAGGGGTT



DAGVF

TTC





1173
CGTWDNDL
1997
TGCGGAACATGGGATAACGACCTGAGTGGTTGGCTG



SGWLF

TTC





1174
CGTWDNDL
1998
TGCGGAACATGGGATAACGACCTGAGTGCCTGGGTG



SAWVF

TTC





1175
CGTWDLTL
1999
TGCGGAACATGGGATCTCACCCTGGGTGGTGTGGTG



GGVVF

TTC





1176
CGTWDLSL
2000
TGCGGAACATGGGATCTCAGCCTGAGTGCTGGGGTA



SAGVF

TTC





1177
CGTWDLSL
2001
TGCGGAACATGGGATCTCAGCCTGAAAGAATGGGTG



KEWVF

TTC





1178
CGTWDLSL
2002
TGCGGAACGTGGGATCTCAGCCTGGATGCTGTTGTT



DAVVF

TTC





1179
CGTWDLKV
2003
TGCGGAACCTGGGACCTGAAGGTTTTC



F







1180
CGTWDKTL
2004
TGCGGAACATGGGATAAGACTCTGAGTGTTTGGGTG



SVWVF

TTC





1181
CGTWDKSL
2005
TGCGGAACATGGGATAAGAGCCTGAGTGTTTGGGTG



SVWVF

TTC





1182
CGTWDKSL
2006
TGCGGAACATGGGATAAGAGCCTGAGTGGTGTGGTA



SGVVF

TTT





1183
CGTWDKSL
2007
TGCGGAACATGGGATAAGAGCCTGAGTGATTGGGTG



SDWVF

TTC





1184
CGTWDKSL
2008
TGCGGAACATGGGATAAGAGCCTGAGTGCTTTGGTT



SALVF

TTC





1185
CGTWDKSL
2009
TGCGGAACATGGGATAAGAGCCTGAGTGCTGGCGTC



SAGVF

TTC





1186
CGTWDKSL
2010
TGCGGAACATGGGATAAGAGCCTGAGTGCCGACGTC



SADVF

TTC





1187
CGTWDKRL
2011
TGCGGAACATGGGATAAACGCCTGACTATTGTGGTC



TIVVF

TTC





1188
CGTWDKRL
2012
TGCGGAACATGGGATAAACGCCTGAGTGCCTGGGTG



SAWVL

CTC





1189
CGTWDKNL
2013
TGCGGAACATGGGATAAGAACCTGCGTGCTGTGGTC



RAVVF

TTC





1190
CGTWDITLS
2014
TGCGGAACATGGGATATCACCCTGAGTGGGTTTGTC



GFVF

TTC





1191
CGTWDITL
2015
TGCGGAACATGGGATATCACCTTGCATACTGGAGTA



HTGVF

TTC





1192
CGTWDISV
2016
TGCGGAACATGGGATATCAGTGTGACTGTGGTGTTC



TVVF







1193
CGTWDISV
2017
TGCGGAACATGGGATATCAGTGTGAGGGGTTATGCC



RGYAF

TTC





1194
CGTWDISR
2018
TGCGGAACATGGGATATCAGCCGTTGGGTTTTC



WVF







1195
CGTWDISPS
2019
TGCGGAACATGGGATATCAGCCCGAGTGCTTGGGTG



AWVF

TTC





1196
CGTWDISLS
2020
TGCGGAACATGGGATATTAGCCTGAGTGTCTGGGTG



VWVF

TTC





1197
CGTWDISLS
2021
TGCGGAACATGGGATATCAGCCTGAGTGTTGTATTC



VVF







1198
CGTWDISLS
2022
TGCGGAACTTGGGATATCAGCCTGAGTTCTGTGGTG



SVVF

TTC





1199
CGTWDISLS
2023
TGCGGAACATGGGATATCAGCCTGAGTCACTGGTTG



HWLF

TTC





1200
CGTWDISLS
2024
TGCGGAACATGGGATATCAGTCTGAGTGGTTGGGTG



GWVF

TTC





1201
CGTWDISLS
2025
TGCGGAACATGGGATATCAGCCTGAGTGGTCGAGTG



GRVF

TTC





1202
CGTWDISLS
2026
TGCGGAACATGGGACATCAGCCTGAGTGCTTGGGCG



AWAF

TTC





1203
CGTWDISLS
2027
TGCGGAACATGGGATATCAGCCTGAGTGCTGTGGTT



AVVF

TTC





1204
CGTWDISLS
2028
TGCGGGACATGGGACATCAGCCTGAGTGCTGTGATA



AVIF

TTC





1205
CGTWDISLS
2029
TGCGGAACATGGGATATCAGCCTGAGTGCTGTGTTC



AVF







1206
CGTWDISLS
2030
TGCGGAACATGGGATATCAGCCTGAGTGCCCGGGTG



ARVF

TTC





1207
CGTWDISLS
2031
TGCGGAACATGGGATATCAGCCTGAGTGCCCTGGTG



ALVF

TTC





1208
CGTWDISLS
2032
TGCGGAACATGGGATATTAGCCTGAGTGCCCATGTC



AHVF

TTC





1209
CGTWDISLS
2033
TGCGGAACATGGGATATCAGCCTGAGTGCTGGGGTG



AGVVF

GTATTC





1210
CGTWDISLS
2034
TGCGGAACATGGGATATCAGCCTGAGTGCCGGCCCT



AGPYVF

TATGTCTTC





1211
CGTWDISLS
2035
TGCGGCACATGGGATATCAGCCTGAGTGCTGGAGGG



AGGVF

GTGTTC





1212
CGTWDISLS
2036
TGCGGAACATGGGATATCAGCCTGAGTGCTGAGGTT



AEVF

TTC





1213
CGTWDISLS
2037
TGCGGAACATGGGATATCAGCCTGAGTGCTGCTGTG



AAVF

TTC





1214
CGTWDISL
2038
TGCGGAACATGGGATATCAGCCTGCGTGCTGTGTTC



RAVF







1215
CGTWDISL
2039
TGCGGAACATGGGATATTAGCCTGAATACTGGGGTG



NTGVF

TTC





1216
CGTWDISL
2040
TGCGGAACATGGGATATCAGCCTAAATAATTATGTC



NNYVF

TTC





1217
CGTWDISLI
2041
TGCGGAACATGGGATATCAGCCTAATTGCTGGGGTA



AGVF

TTC





1218
CGTWDISL
2042
TGCGGAACATGGGATATCAGCCTGCATACTTGGCTG



HTWLF

TTC





1219
CGTWDIRL
2043
TGCGGAACATGGGATATCCGCCTGACCGATGAGCTG



TDELLF

TTATTC





1220
CGTWDIRL
2044
TGCGGAACATGGGATATCAGACTGAGCGGTTTTGTT



SGFVF

TTC





1221
CGTWDINL
2045
TGCGGAACATGGGATATCAACCTGGGTGCTGGGGGC



GAGGLYVF

CTTTATGTCTTC





1222
CGTWDIILS
2046
TGCGGAACATGGGATATCATCCTGAGTGCTGAGGTA



AEVF

TTC





1223
CGTWDHTL
2047
TGCGGAACATGGGATCACACCCTGAGTGCTGTCTTC



SAVF







1224
CGTWDHTL
2048
TGCGGAACATGGGACCACACTCTGCTTACTGTGTTA



LTVLF

TTC





1225
CGTWDHSL
2049
TGCGGAACATGGGATCACAGCCTGACTGCTGTGGTA



TAVVF

TTC





1226
CGTWDHSL
2050
TGCGGAACCTGGGATCACAGCCTGACTGCTGGGATA



TAGIF

TTC





1227
CGTWDHSL
2051
TGCGGAACATGGGATCACAGCCTGAGTGTTGTATTA



SVVLF

TTC





1228
CGTWDHSL
2052
TGCGGAACATGGGATCACAGCCTGAGTTTGGTATTC



SLVF







1229
CGTWDHSL
2053
TGCGGAACATGGGATCACAGCCTGTCTATTGGGGTT



SIGVF

TTC





1230
CGTWDHSL
2054
TGCGGAACATGGGATCACAGCCTGAGTGCTGGGGTG



SAGVF

TTC





1231
CGTWDHSL
2055
TGTGGAACTTGGGATCACAGCCTGAGTGCTTTCGTG



SAFVF

TTC





1232
CGTWDHSL
2056
TGCGGAACATGGGATCACAGTCTGAGTGCTGCTGTT



SAAVF

TTC





1233
CGTWDHNL
2057
TGCGGAACATGGGACCACAATCTGCGTGCTGTCTTC



RAVF







1234
CGTWDFTL
2058
TGCGGGACATGGGATTTCACCCTGAGTGTTGGGCGC



SVGRF

TTC





1235
CGTWDFTL
2059
TGCGGAACATGGGATTTCACCCTGAGTGCTCCTGTC



SAPVF

TTC





1236
CGTWDFSV
2060
TGCGGAACGTGGGATTTCAGCGTGAGTGCTGGGTGG



SAGWVF

GTGTTC





1237
CGTWDFSL
2061
TGCGGAACGTGGGATTTCAGTCTTACTACCTGGTTAT



TTWLF

TC





1238
CGTWDFSL
2062
TGCGGAACATGGGATTTCAGCCTGAGTGTTTGGGTG



SVWVF

TTC





1239
CGTWDFSL
2063
TGCGGAACATGGGATTTCAGCCTGAGTACTGGGGTT



STGVF

TTC





1240
CGTWDFSL
2064
TGCGGCACATGGGATTTCAGCCTGAGTGGTGTGGTA



SGVVF

TTC





1241
CGTWDFSL
2065
TGCGGAACATGGGATTTCAGCCTGAGTGGTTTCGTG



SGFVF

TTC





1242
CGTWDFSL
2066
TGCGGAACATGGGATTTCAGCCTGAGTGCTGGGGTG



SAGVF

TTC





1243
CGTWDETV
2067
TGCGGAACATGGGATGAAACCGTGAGAGGTTGGGT



RGWVF

GTTC





1244
CGTWDESL
2068
TGCGGAACATGGGATGAAAGTCTGAGAAGCTGGGT



RSWVF

GTTC





1245
CGTWDERQ
2069
TGCGGAACTTGGGATGAGAGGCAGACTGATGAGTCC



TDESYVF

TATGTCTTC





1246
CGTWDERL
2070
TGCGGAACATGGGATGAGAGACTCGTTGCTGGCCAG



VAGQVF

GTCTTC





1247
CGTWDERL
2071
TGCGGAACATGGGATGAGAGACTGAGTCCTGGAGCT



SPGAFF

TTTTTC





1248
CGTWDEKV
2072
TGCGGAACATGGGATGAGAAGGTGTTC



F







1249
CGTWDEGQ
2073
TGCGGAACCTGGGATGAAGGCCAGACTACTGATTTC



TTDFFVF

TTTGTCTTC





1250
CGTWDDTL
2074
TGCGGAACATGGGATGACACCCTGGCTGGTGTGGTC



AGVVF

TTC





1251
CGTWDDRL
2075
TGCGGAACATGGGATGACAGGCTGACTTCTGCGGTC



TSAVF

TTC





1252
CGTWDDRL
2076
TGCGGAACATGGGATGACAGACTGTTTGTTGTGGTA



FVVVF

TTC





1253
CGTWDDNL
2077
TGCGGAACATGGGATGATAACCTGAGAGGTTGGGTG



RGWVF

TTC





1254
CGTWDDNL
2078
TGCGGAACATGGGATGACAACCTGCGTGGTGTCGTG



RGVVF

TTC





1255
CGTWDDNL
2079
TGCGGAACCTGGGATGACAATTTGAATATTGGAAGG



NIGRVF

GTGTTC





1256
CGTWDDIL
2080
TGCGGAACATGGGATGACATCCTGAGTGCTGTGATA



SAVIF

TTC





1257
CGTWDDIL
2081
TGCGGAACATGGGATGATATCCTGAGAGGTTGGGTG



RGWVF

TTC





1258
CGTWDATL
2082
TGCGGAACATGGGATGCCACCCTGAGTCCTGGGTGG



SPGWLF

TTATTC





1259
CGTWDASV
2083
TGCGGAACATGGGATGCCAGCGTGACTTCTTGGGTG



TSWVF

TTC





1260
CGTWDASL
2084
TGCGGAACATGGGATGCCAGCCTGACTTCTGTGGTC



TSVVF

TTC





1261
CGTWDASL
2085
TGCGGAACATGGGATGCCAGCCTGAGTGTTTGGGTG



SVWVF

TTC





1262
CGTWDASL
2086
TGCGGAACATGGGATGCCAGCCTGAGTGTTCCTTGG



SVPWVF

GTGTTC





1263
CGTWDASL
2087
TGCGGAACATGGGATGCCAGCCTGAGTGTGGCGGTA



SVAVF

TTC





1264
CGTWDASL
2088
TGCGGAACATGGGATGCCAGCCTGAGTACCTGGGTA



STWVF

TTC





1265
CGTWDASL
2089
TGCGGAACATGGGATGCCAGCCTGAGTGGTGTGGTA



SGVVF

TTC





1266
CGTWDASL
2090
TGCGGAACATGGGATGCCAGCCTGAGTGGTGGGGG



SGGGEF

AGAATTC





1267
CGTWDASL
2091
TGCGGAACATGGGATGCCAGCCTGAGTGCTGGGGTG



SAGVF

TTC





1268
CGTWDASL
2092
TGCGGAACATGGGATGCCAGCCTGAGTGCTGGGCTT



SAGLF

TTC





1269
CGTWDASL
2093
TGTGGCACATGGGATGCCAGCCTGAGTGCTGAAGTC



SAEVF

TTC





1270
CGTWDASL
2094
TGCGGAACATGGGATGCCAGCCTGAGTGCTGACTTT



SADFWVF

TGGGTGTTC





1271
CGTWDASL
2095
TGCGGAACATGGGATGCCAGCCTGAGAGTCTTCTTC



RVFF







1272
CGTWDASL
2096
TGCGGAACATGGGATGCCAGTCTGAGGGCTGTGGTA



RAVVL

CTC





1273
CGTWDASL
2097
TGCGGAACATGGGATGCCAGCCTGAATATTTGGGTT



NIWVF

TTC





1274
CGTWDASL
2098
TGCGGGACATGGGATGCCAGCCTGAAGAATCTGGTC



KNLVF

TTC





1275
CGTWDASL
2099
TGCGGAACATGGGATGCCAGCCTGGGTGCCTGGGTA



GAWVF

TTC





1276
CGTWDASL
2100
TGCGGAACATGGGATGCCAGCCTGGGTGCTGTGGTC



GAVVF

TTC





1277
CGTWDASL
2101
TGCGGAACATGGGATGCCAGCCTGGGTGCGGGGGTC



GAGVF

TTC





1278
CGTWDARL
2102
TGCGGAACATGGGATGCTAGGCTGAGTGGCCTTTAT



SGLYVF

GTCTTC





1279
CGTWDARL
2103
TGTGGAACCTGGGATGCGAGACTGGGTGGTGCAGTC



GGAVF

TTC





1280
CGTWDANL
2104
TGCGGAACATGGGATGCCAATCTGCGTGCTGGGGTC



RAGVF

TTC





1281
CGTWDAIIS
2105
TGCGGAACATGGGATGCTATCATAAGTGGTTGGGTG



GWVF

TTC





1282
CGTWDAG
2106
TGCGGAACATGGGATGCCGGCCAGAGTGTTTGGGTG



QSVWVF

TTC





1283
CGTWDAGL
2107
TGCGGCACATGGGATGCCGGGCTGACTGGCCTTTAT



TGLYVF

GTCTTC





1284
CGTWDAGL
2108
TGCGGAACTTGGGATGCCGGTCTGAGTGTTTATGTC



SVYVF

TTC





1285
CGTWDAGL
2109
TGCGGGACATGGGATGCCGGCCTGAGTACTGGGGTC



STGVF

TTC





1286
CGTWDAGL
2110
TGCGGAACATGGGATGCCGGCCTGAGTGGGGACGTT



SGDVF

TTC





1287
CGTWDAGL
2111
TGCGGAACATGGGATGCCGGCCTGAGTGCTGGTTAT



SAGYVF

GTCTTC





1288
CGTWDAGL
2112
TGCGGAACATGGGATGCCGGCCTGCGTGTTTGGGTG



RVWVF

TTC





1289
CGTWDAGL
2113
TGCGGAACATGGGATGCCGGCCTGAGGGAAATTTTC



REIF







1290
CGTWASSL
2114
TGCGGAACATGGGCCAGCAGCCTGAGTTCTTGGGTG



SSWVF

TTC





1291
CGTWAGSL
2115
TGCGGAACATGGGCTGGCAGCCTGAGTGGTCATGTC



SGHVF

TTC





1292
CGTWAGSL
2116
TGCGGAACATGGGCTGGCAGCCTGAGTGCCGCTTGG



SAAWVF

GTGTTC





1293
CGTWAGSL
2117
TGCGGAACATGGGCTGGCAGCCTGAATGTTTATTGG



NVYWVF

GTGTTC





1294
CGTWAGNL
2118
TGCGGAACATGGGCTGGCAACCTGAGACCTAATTGG



RPNVVVF

GTGTTC





1295
CGTRGSLG
2119
TGCGGAACAAGGGGTAGCCTGGGTGGTGCGGTGTTC



GAVF







1296
CGTRDTTLS
2120
TGCGGAACAAGGGATACCACCCTGAGTGTCCCGGTG



VPVF

TTC





1297
CGTRDTSL
2121
TGCGGAACACGGGATACCAGCCTCAATATTGAAATC



NIEIF

TTC





1298
CGTRDTSL
2122
TGTGGAACACGGGATACCAGCCTGAATGATGTCTTC



NDVF







1299
CGTRDTRL
2123
TGCGGAACACGGGATACCCGCCTGAGTATTGTGGTT



SIVVF

TTC





1300
CGTRDTILS
2124
TGCGGCACACGGGATACCATCCTGAGTGCTGAGGTG



AEVF

TTC





1301
CGTRDRSLS
2125
TGCGGAACACGGGATAGAAGCCTGAGTGGTTGGGT



GWVF

GTTC





1302
CGSWYYNV
2126
TGCGGATCATGGTATTACAATGTCTTCCTTTTC



FLF







1303
CGSWHSSL
2127
TGCGGATCTTGGCATAGCAGCCTCAACCTTGTCGTCT



NLVVF

TC





1304
CGSWGSGL
2128
TGCGGATCATGGGGTAGTGGCCTGAGTGCCCCTTAT



SAPYVF

GTCTTC





1305
CGSWESGL
2129
TGCGGTTCGTGGGAAAGCGGCCTGGGTGCTTGGCTG



GAWLF

TTC





1306
CGSWDYGL
2130
TGCGGATCCTGGGATTACGGCCTCCTACTCTTC



LLF







1307
CGSWDVSL
2131
TGCGGTTCATGGGATGTCAGCCTGACTGCTGTTTTC



TAVF







1308
CGSWDVSL
2132
TGCGGATCCTGGGATGTCAGTCTCAATGTTGGCATTT



NVGIF

TC





1309
CGSWDTTL
2133
TGCGGATCATGGGATACCACCCTGCGTGCTTGGGTG



RAWVF

TTC





1310
CGSWDTSP
2134
TGCGGCTCGTGGGATACCAGCCCTGTCCGTGCTTGG



VRAWVF

GTGTTC





1311
CGSWDTSL
2135
TGCGGATCATGGGATACCAGCCTGAGTGTTTGGGTG



SVWVF

TTC





1312
CGSWDTSL
2136
TGCGGATCATGGGATACCAGCCTGAGTGCTGAGGTG



SAEVF

TTC





1313
CGSWDTSL
2137
TGCGGCTCGTGGGATACCAGCCTGCGTGCTTGGGTG



RAWVF

TTC





1314
CGSWDTSL
2138
TGCGGCTCGTGGGATACCAGCCTGCGTGCTTGGGCG



RAWAF

TTC





1315
CGSWDTSL
2139
TGCGGATCATGGGATACCAGCCTGGATGCTAGGCTG



DARLF

TTC





1316
CGSWDTILL
2140
TGCGGATCATGGGATACCATCCTGCTTGTCTATGTCT



VYVF

TC





1317
CGSWDRW
2141
TGCGGATCATGGGATCGCTGGCAGGCTGCTGTCTTC



QAAVF







1318
CGSWDRSL
2142
TGCGGATCATGGGATAGGAGCCTGAGTGGGTATGTC



SGYVF

TTC





1319
CGSWDRSL
2143
TGCGGATCATGGGATAGAAGCCTGAGTGCTTATGTC



SAYVF

TTC





1320
CGSWDRSL
2144
TGCGGATCATGGGATAGGAGCCTGAGTGCCGTGGTT



SAVVF

TTC





1321
CGSWDNTL
2145
TGCGGATCATGGGATAACACCTTGGGTGTTGTTCTCT



GVVLF

TC





1322
CGSWDNRL
2146
TGCGGATCGTGGGATAACAGACTAAGTACTGTCATC



STVIF

TTC





1323
CGSWDNRL
2147
TGCGGAAGCTGGGATAATCGATTGAACACTGTGATT



NTVIF

TTC





1324
CGSWDLSP
2148
TGCGGTTCATGGGATCTCAGCCCTGTACGTGTCCTTG



VRVLVF

TGTTC





1325
CGSWDLSL
2149
TGCGGATCATGGGATCTCAGCCTGAGTGCTGTCGTT



SAVVF

TTC





1326
CGSWDKNL
2150
TGCGGATCATGGGATAAAAACCTGCGTGCTGTGCTG



RAVLF

TTC





1327
CGSWDISLS
2151
TGCGGCTCATGGGATATCAGCCTGAGTGCTGGGGTG



AGVF

TTC





1328
CGSWDIRLS
2152
TGCGGATCATGGGATATCAGACTGAGTGCAGAGGTC



AEVF

TTC





1329
CGSWDIKL
2153
TGCGGATCATGGGACATCAAACTGAATATTGGGGTA



NIGVF

TTC





1330
CGSWDFSL
2154
TGCGGATCATGGGATTTCAGTCTCAATTATTTTGTCT



NYFVF

TC





1331
CGSWDASL
2155
TGCGGATCATGGGATGCCAGCCTGAGTACTGAGGTG



STEVF

TTC





1332
CGSWDAGL
2156
TGCGGATCCTGGGATGCCGGCCTGCGTGGCTGGGTT



RGWVF

TTC





1333
CGRWESSL
2157
TGCGGAAGATGGGAGAGCAGCCTGGGTGCTGTGGTT



GAVVF

TTC





1334
CGRWDFSL
2158
TGCGGAAGATGGGATTTTAGTCTGAGTGCTTATGTC



SAYVF

TTC





1335
CGQWDND
2159
TGCGGACAATGGGATAACGACCTGAGTGTTTGGGTG



LSVWVF

TTC





1336
CGPWHSSV
2160
TGCGGACCCTGGCATAGCAGCGTGACTAGTGGCCAC



TSGHVL

GTGCTC





1337
CGLWDASL
2161
TGCGGATTATGGGATGCCAGCCTGAGTGCTCCTACT



SAPTWVF

TGGGTGTTC





1338
CGIWHTSLS
2162
TGTGGAATATGGCACACTAGCCTGAGTGCTTGGGTG



AWVF

TTC





1339
CGIWDYSL
2163
TGCGGAATATGGGATTACAGCCTGGATACTTGGGTG



DTWVF

TTC





1340
CGIWDTSLS
2164
TGCGGCATATGGGATACCAGCCTGAGTGCTTGGGTG



AWVF

TTC





1341
CGIWDTRL
2165
TGCGGAATTTGGGATACCAGGCTGAGTGTTTATGTC



SVYVF

TTC





1342
CGIWDTRL
2166
TGCGGAATTTGGGATACCAGGCTGAGTGTTTATATC



SVYIF

TTC





1343
CGIWDTNL
2167
TGTGGAATATGGGATACGAATCTGGGTTATCTCTTC



GYLF







1344
CGIWDTGL
2168
TGCGGTATATGGGATACCGGCCTGAGTGCTGTGGTA



SAVVF

TTC





1345
CGIWDRSLS
2169
TGCGGAATATGGGATCGCAGCCTGAGTGCTTGGGTG



AWVF

TTT





1346
CGIRDTRLS
2170
TGCGGAATTCGGGATACCAGGCTGAGTGTTTATGTC



VYVF

TTC





1347
CGGWSSRL
2171
TGCGGAGGATGGAGTAGCAGACTGGGTGTTGGCCCA



GVGPVF

GTGTTT





1348
CGGWGSGL
2172
TGCGGAGGATGGGGTAGCGGCCTGAGTGCTTGGGTG



SAWVF

TTC





1349
CGGWDTSL
2173
TGCGGAGGATGGGATACCAGCCTGAGTGCTTGGGTG



SAWVF

TTC





1350
CGGWDRGL
2174
TGCGGAGGATGGGATAGGGGCCTGGATGCTTGGGTT



DAWVF

TTC





1351
CGAWRNN
2175
TGCGGAGCATGGCGTAATAACGTGTGGGTGTTC



VWVF







1352
CGAWNRRL
2176
TGCGGAGCATGGAACAGGCGCCTGAATCCTCATTCT



NPHSHWVF

CATTGGGTGTTC





1353
CGAWHNK
2177
TGCGGAGCCTGGCACAACAAACTGAGCGCGGTCTTC



LSAVF







1354
CGAWGSSL
2178
TGCGGAGCATGGGGTAGCAGCCTGAGAGCTAGTGTC



RASVF

TTC





1355
CGAWGSGL
2179
TGCGGAGCATGGGGTAGCGGCCTGAGTGCTTGGGTG



SAWVF

TTC





1356
CGAWESSL
2180
TGCGGAGCATGGGAAAGTAGCCTGAGTGCCCCTTAT



SAPYVF

GTCTTC





1357
CGAWESSL
2181
TGCGGAGCATGGGAGAGCAGCCTCAATGTTGGACTG



NVGLI

ATC





1358
CGAWESGR
2182
TGCGGAGCATGGGAGAGCGGCCGGAGTGCTGGGGT



SAGVVF

GGTGTTC





1359
CGAWDYSV
2183
TGCGGAGCTTGGGATTACAGTGTGAGTGGTTGGGTG



SGWVF

TTC





1360
CGAWDYSL
2184
TGCGGAGCATGGGATTACAGCCTGACTGCCGGAGTA



TAGVF

TTC





1361
CGAWDYRL
2185
TGCGGAGCCTGGGATTACAGACTGAGTGCCGTGCTA



SAVLF

TTC





1362
CGAWDVRL
2186
TGCGGAGCGTGGGATGTTCGTCTGGATGTTGGGGTG



DVGVF

TTC





1363
CGAWDTYS
2187
TGCGGAGCATGGGATACCTACAGTTATGTCTTC



YVF







1364
CGAWDTTL
2188
TGCGGAGCATGGGATACGACCCTGAGTGGTGTGGTA



SGVVF

TTC





1365
CGAWDTTL
2189
TGCGGAGCGTGGGATACTACCCTGAGTGCTGTGATA



SAVIF

TTC





1366
CGAWDTSQ
2190
TGCGGCGCATGGGATACCAGCCAGGGTGCGTCTTAT



GASYVF

GTCTTT





1367
CGAWDTSP
2191
TGCGGAGCATGGGATACCAGCCCTGTACGTGCTGGG



VRAGVF

GTGTTC





1368
CGAWDTSL
2192
TGCGGAGCATGGGATACCAGCCTGTGGCTTTTC



WLF







1369
CGAWDTSL
2193
TGCGGAGCATGGGATACCAGCCTGACTGTTTATGTC



TVYVF

TTC





1370
CGAWDTSL
2194
TGCGGAGCATGGGACACCAGTCTGACTGCTGGGGTG



TAGVF

TTC





1371
CGAWDTSL
2195
TGCGGAGCTTGGGATACCAGCCTGAGTACTGTGGTT



STVVF

TTC





1372
CGAWDTSL
2196
TGCGGAGCATGGGATACCAGCCTGAGTTCTAGATAC



SSRYIF

ATATTC





1373
CGAWDTSL
2197
TGCGGAGCATGGGATACCAGCCTGAGTGGTTATGTC



SGYVF

TTC





1374
CGAWDTSL
2198
TGCGGAGCCTGGGATACCAGCCTGAGTGGCTGGGTG



SGWVF

TTC





1375
CGAWDTSL
2199
TGCGGAGCATGGGATACCAGTCTGAGTGGTGTGCTA



SGVLF

TTC





1376
CGAWDTSL
2200
TGCGGAGCTTGGGATACCAGCTTGAGTGGTCTTGTT



SGLVF

TTC





1377
CGAWDTSL
2201
TGCGGAGCTTGGGATACCAGCTTGAGTGGTTTTGTTT



SGFVF

TC





1378
CGAWDTSL
2202
TGCGGAGCATGGGATACCAGCCTGAGTGGTGAGGTC



SGEVF

TTT





1379
CGAWDTSL
2203
TGCGGAGCTTGGGATACCAGCTTGAGTGATTTTGTTT



SDFVF

TC





1380
CGAWDTSL
2204
TGCGGAGCATGGGATACCAGCCTGCGAACTGCGATA



RTAIF

TTC





1381
CGAWDTSL
2205
TGCGGAGCATGGGATACCAGCCTGCGGCTTTTC



RLF







1382
CGAWDTSL
2206
TGCGGAGCATGGGATACCAGCCTGAATGTTCATGTC



NVHVF

TTC





1383
CGAWDTSL
2207
TGCGGAGCATGGGATACCAGCCTCAATAAATGGGTG



NKWVF

TTC





1384
CGAWDTRL
2208
TGCGGAGCATGGGATACCCGCCTCAGTGCGCGGCTG



SARLF

TTC





1385
CGAWDTRL
2209
TGCGGAGCATGGGATACCAGACTGAGGGGTTTTATT



RGFIF

TTC





1386
CGAWDTNL
2210
TGCGGAGCATGGGATACTAATTTGGGGAATGTTCTC



GNVLL

CTC





1387
CGAWDTNL
2211
TGCGGGGCATGGGATACCAACCTGGGTAAATGGGTT



GKWVF

TTC





1388
CGAWDTGL
2212
TGCGGAGCATGGGATACCGGCCTTGAGTGGTATGTT



EWYVF

TTT





1389
CGAWDRTS
2213
TGCGGAGCATGGGATAGGACTTCTGGATTGTGGCTT



GLWLF

TTC





1390
CGAWDRSL
2214
TGCGGAGCGTGGGATCGTAGCCTGGTTGCTGGACTC



VAGLF

TTC





1391
CGAWDRSL
2215
TGCGGAGCGTGGGATAGAAGCCTGACTGTTTATGTC



TVYVF

TTC





1392
CGAWDRSL
2216
TGCGGAGCATGGGATAGAAGCCTGAGTGGTTATGTC



SGYVF

TTC





1393
CGAWDRSL
2217
TGCGGAGCATGGGATAGAAGCCTGAGTGCTTATGTC



SAYVF

TTC





1394
CGAWDRSL
2218
TGCGGAGCATGGGATAGAAGCCTGAGTGCGGTGGT



SAVVF

ATTC





1395
CGAWDRSL
2219
TGCGGAGCATGGGATCGCAGCCTGAGTGCTGGGGTT



SAGVF

TTC





1396
CGAWDRSL
2220
TGCGGAGCGTGGGATCGCAGCCTGCGTATTGTGGTA



RIVVF

TTC





1397
CGAWDRSL
2221
TGCGGAGCATGGGATAGAAGTCTGAGGGCTTACGTC



RAYVF

TTC





1398
CGAWDRSL
2222
TGCGGAGCATGGGATAGAAGTCTGAATGTTTGGCTG



NVWLF

TTC





1399
CGAWDRGL
2223
TGCGGCGCCTGGGATAGGGGCCTGAATGTCGGTTGG



NVGWLF

CTTTTC





1400
CGAWDNRL
2224
TGCGGCGCATGGGATAATAGACTGAGTATTTTGGCC



SILAF

TTC





1401
CGAWDND
2225
TGCGGAGCTTGGGATAATGACCTGACAGCTTATGTC



LTAYVF

TTC





1402
CGAWDFSL
2226
TGCGGGGCATGGGATTTCAGCCTGACTCCTCTCTTC



TPLF







1403
CGAWDDY
2227
TGCGGAGCCTGGGATGACTATCGGGGTGTGAGTATT



RGVSIYVF

TATGTCTTC





1404
CGAWDDRP
2228
TGTGGAGCATGGGATGACCGGCCTTCGAGTGCCGTG



SSAVVF

GTTTTC





1405
CGAWDDRL
2229
TGCGGAGCATGGGATGACAGACTGACTGTCGTTGTT



TVVVF

TTC





1406
CGAWDDRL
2230
TGCGGAGCGTGGGATGACAGGCTGGGTGCTGTGTTC



GAVF







1407
CGAWDASL
2231
TGCGGAGCGTGGGATGCCAGCCTGAATCCTGGCCGG



NPGRAF

GCATTC





1408
CGAWDAG
2232
TGCGGAGCATGGGATGCCGGCCTGAGGGAAATTTTC



LREIF







1409
CGAWAGSP
2233
TGCGGAGCTTGGGCTGGCAGTCCGAGTCCTTGGGTT



SPWVF

TTC





1410
CGAFDTTLS
2234
TGCGGAGCATTCGACACCACCCTGAGTGCTGGCGTT



AGVF

TTC





1411
CETWESSLS
2235
TGCGAAACATGGGAGAGCAGCCTGAGTGTTGGGGTC



VGVF

TTC





1412
CETWESSL
2236
TGCGAAACATGGGAAAGCAGCCTGAGGGTTTGGGT



RVWVF

GTTC





1413
CETWDTSL
2237
TGCGAAACGTGGGATACCAGCCTGAGTGGTGGGGTG



SGGVF

TTC





1414
CETWDTSL
2238
TGCGAAACATGGGATACCAGCCTGAGTGACTTTTAT



SDFYVF

GTCTTC





1415
CETWDTSL
2239
TGCGAAACATGGGATACCAGCCTGAGTGCCCTCTTC



SALF







1416
CETWDTSL
2240
TGCGAAACATGGGATACCAGCCTGCGTGCTGAAGTC



RAEVF

TTC





1417
CETWDTSL
2241
TGCGAAACATGGGATACCAGCCTGAATGTTGTGGTA



NVVVF

TTC





1418
CETWDTSL
2242
TGCGAAACATGGGATACCAGCCTGGGTGCCGTGGTG



GAVVF

TTC





1419
CETWDRSL
2243
TGCGAAACATGGGATAGAAGCCTGAGTGGTGTGGTA



SGVVF

TTC





1420
CETWDRSL
2244
TGCGAAACATGGGATAGGAGCCTGAGTGCTTGGGTG



SAWVF

TTT





1421
CETWDRSL
2245
TGCGAAACATGGGATCGCAGCCTGAGTGCTGTGGTC



SAVVF

TTC





1422
CETWDRGL
2246
TGCGAGACGTGGGATAGAGGCCTGAGTGTTGTGGTT



SVVVF

TTC





1423
CETWDRGL
2247
TGCGAAACATGGGATAGGGGCCTGAGTGCAGTGGT



SAVVF

ATTC





1424
CETWDHTL
2248
TGCGAAACATGGGATCACACCCTGAGTGTTGTGATA



SVVIF

TTC





1425
CETWDASL
2249
TGCGAAACATGGGATGCCAGCCTGACTGTTGTGTTA



TVVLF

TTC





1426
CETWDASL
2250
TGCGAAACATGGGATGCCAGCCTGAGTGCTGGGGTG



SAGVF

TTC





1427
CETWDAGL
2251
TGCGAAACGTGGGATGCCGGCCTGAGTGAGGTGGTG



SEVVF

TTC





1428
CETFDTSLS
2252
TGCGAAACATTTGATACCAGCCTGAGTGTTGTAGTC



VVVF

TTC





1429
CETFDTSLN
2253
TGCGAAACATTTGATACCAGCCTAAATATTGTAGTC



IVVF

TTT





1430
CESWDRSRI
2254
TGCGAATCATGGGATAGAAGCCGGATTGGTGTGGTC



GVVF

TTC








1431
CESWDRSL
2255
TGCGAAAGTTGGGACAGGAGTCTGAGTGCCCGGGTG



SARVY

TAC





1432
CESWDRSL
2256
TGCGAATCCTGGGATAGGAGCCTGCGTGCCGTGGTC



RAVVF

TTC





1433
CESWDRSLI
2257
TGCGAATCTTGGGATCGTAGTTTGATTGTGGTGTTC



VVF







1434
CESWDNNL
2258
TGCGAAAGTTGGGATAACAATTTAAATGAGGTGGTT



NEVVF

TTC





1435
CEIWESSPS
2259
TGCGAAATATGGGAGAGCAGCCCGAGTGCTGACGA



ADDLVF

TTTGGTGTTC





1436
CEAWDTSL
2260
TGCGAAGCATGGGATACCAGCCTGAGTGGTGCGGTG



SGAVF

TTC





1437
CEAWDTSL
2261
TGCGAAGCATGGGATACCAGCCTGAGTGCCGGGGTG



SAGVF

TTC





1438
CEAWDTSL
2262
TGCGAAGCATGGGATACCAGCCTGGGTGGTGGGGTG



GGGVF

TTC





1439
CEAWDRSL
2263
TGCGAAGCATGGGATCGCAGCCTGACTGGTAGCCTG



TGSLF

TTC





1440
CEAWDRGL
2264
TGCGAAGCGTGGGATAGGGGCCTGAGTGCAGTGGT



SAVVF

ATTC





1441
CEAWDNIL
2265
TGCGAAGCCTGGGATAACATCCTGAGTACTGTGGTG



STVVF

TTC





1442
CEAWDISLS
2266
TGCGAAGCATGGGACATCAGCCTGAGTGCTGGGGTG



AGVF

TTC





1443
CEAWDADL
2267
TGCGAAGCATGGGATGCCGACCTGAGTGGTGCGGTG



SGAVF

TTC





1444
CATWTGSF
2268
TGCGCAACATGGACTGGTAGTTTCAGAACTGGCCAT



RTGHYVF

TATGTCTTC





1445
CATWSSSP
2269
TGCGCAACATGGAGTAGCAGTCCCAGGGGGTGGGT



RGWVF

GTTC





1446
CATWHYSL
2270
TGCGCAACATGGCATTACAGCCTGAGTGCTGGCCGA



SAGRVF

GTGTTC





1447
CATWHTSL
2271
TGCGCAACATGGCATACCAGCCTGAGTATTGTGCAG



SIVQF

TTC





1448
CATWHSTL
2272
TGCGCAACATGGCATAGCACCCTGAGTGCTGATGTG



SADVLF

CTTTTC





1449
CATWHSSL
2273
TGCGCAACATGGCATAGCAGCCTGAGTGCTGGCCGA



SAGRLF

CTCTTC





1450
CATWHIAR
2274
TGCGCAACATGGCATATCGCTCGGAGTGCCTGGGTG



SAWVF

TTC





1451
CATWGSSQ
2275
TGCGCAACATGGGGTAGTAGTCAGAGTGCCGTGGTA



SAVVF

TTC





1452
CATWGSSL
2276
TGCGCAACATGGGGTAGCAGCCTGAGTGCTGGGGGT



SAGGVF

GTTTTC





1453
CATWEYSL
2277
TGTGCAACATGGGAATACAGCCTGAGTGTTGTGCTG



SVVLF

TTC





1454
CATWETTR
2278
TGCGCAACATGGGAGACCACCCGACGTGCCTCTTTT



RASFVF

GTCTTC





1455
CATWETSL
2279
TGCGCAACATGGGAGACCAGCCTGAATGTTTATGTC



NVYVF

TTC





1456
CATWETSL
2280
TGCGCAACATGGGAAACTAGCCTGAATGTTGTGGTC



NVVVF

TTC





1457
CATWETSL
2281
TGCGCAACATGGGAGACCAGCCTGAATCTTTATGTC



NLYVF

TTC





1458
CATWETGL
2282
TGCGCAACATGGGAGACTGGCCTAAGTGCTGGAGA



SAGEVF

GGTGTTC





1459
CATWESTL
2283
TGCGCGACGTGGGAGAGTACCCTAAGTGTTGTGGTT



SVVVF

TTC





1460
CATWESSL
2284
TGCGCAACGTGGGAGAGCAGCCTGAGTATTTTTGTC



SIFVF

TTC





1461
CATWESSL
2285
TGCGCAACATGGGAAAGCAGCCTCAACACTTTTTAT



NTFYVF

GTCTTC





1462
CATWESRV
2286
TGCGCAACATGGGAGAGTAGGGTGGATACTCGAGG



DTRGLLF

GTTGTTATTC





1463
CATWESGL
2287
TGCGCAACATGGGAGAGCGGCCTGAGTGGTGCGGG



SGAGVF

GGTGTTC





1464
CATWEGSL
2288
TGCGCAACATGGGAAGGCAGCCTCAACACTTTTTAT



NTFYVF

GTCTTC





1465
CATWDYSL
2289
TGCGCAACTTGGGATTATAGCCTGAGTGCTGTGGTG



SAVVF

TTC





1466
CATWDYRL
2290
TGCGCAACATGGGATTACAGACTGAGTATTGTGGTA



SIVVF

TTC





1467
CATWDYNL
2291
TGCGCAACATGGGATTATAACCTGGGAGCTGCGGTG



GAAVF

TTC





1468
CATWDVTL
2292
TGCGCCACATGGGATGTCACCCTGGGTGTCTTGCAT



GVLHF

TTC





1469
CATWDTTL
2293
TGCGCAACATGGGATACAACACTGAGTGTCTGGGTC



SVWVF

TTC





1470
CATWDTTL
2294
TGCGCAACATGGGATACCACCCTGAGTGTAGTACTT



SVVLF

TTC





1471
CATWDTTL
2295
TGCGCAACATGGGATACCACCCTGAGTGTTGAGGTC



SVEVF

TTC





1472
CATWDTSP
2296
TGCGCAACATGGGATACCAGCCCCAGCCTGAGTGGT



SLSGFWVF

TTTTGGGTGTTC





1473
CATWDTSL
2297
TGCGCAACATGGGATACCAGCCTGACTGGTGTGGTA



TGVVF

TTC





1474
CATWDTSL
2298
TGCGCAACATGGGATACCAGCCTGACTGGTGCGGTG



TGAVF

TTC





1475
CATWDTSL
2299
TGCGCAACATGGGATACCAGCCTGACTGCCTGGGTA



TAWVF

TTC





1476
CATWDTSL
2300
TGCGCAACATGGGATACCAGCCTGACTGCTGTGGTT



TAVVF

TTC





1477
CATWDTSL
2301
TGCGCAACATGGGATACTAGCCTGACTGCTAAGGTG



TAKVF

TTC





1478
CATWDTSL
2302
TGCGCAACATGGGACACCAGCCTGAGTGTTGTGGTT



SVVVF

TTC





1479
CATWDTSL
2303
TGCGCTACTTGGGATACCAGCCTGAGTGTTGGGGTA



SVGVF

TTT





1480
CATWDTSL
2304
TGCGCAACATGGGATACCAGCCTGAGTTCTTGGGTG



SSWVF

TTC





1481
CATWDTSL
2305
TGCGCAACATGGGATACCAGCCTGAGTGGTGGGGTA



SGGVL

CTC





1482
CATWDTSL
2306
TGCGCAACATGGGATACCAGCCTGAGTGGTGGGGTG



SGGVF

TTC





1483
CATWDTSL
2307
TGCGCAACATGGGATACCAGCCTGAGTGGTGGCCGA



SGGRVF

GTGTTC





1484
CATWDTSL
2308
TGCGCAACATGGGATACCAGCCTGAGTGGTGACCGA



SGDRVF

GTGTTC





1485
CATWDTSL
2309
TGCGCAACGTGGGATACTAGCCTGAGTGAAGGGGTG



SEGVF

TTC





1486
CATWDTSL
2310
TGCGCAACCTGGGATACCAGCCTGAGTGCCGTGGTG



SAVVL

CTC





1487
CATWDTSL
2311
TGCGCAACATGGGATACCAGCCTGAGTGCTGTCTTC



SAVF







1488
CATWDTSL
2312
TGCGCGACATGGGATACCAGCCTGAGTGCTCGGGTG



SARVF

TTC





1489
CATWDTSL
2313
TGCGCAACATGGGATACCAGCCTGAGTGCCTTATTC



SALF







1490
CATWDTSL
2314
TGCGCAACATGGGATACCAGCCTGAGTGCTCATGTC



SAHVF

TTC





1491
CATWDTSL
2315
TGCGCAACATGGGATACCAGCCTGAGTGCTGGCCGG



SAGRVF

GTGTTC





1492
CATWDTSL
2316
TGCGCAACATGGGATACCAGCCTGAGTGCGGAGGTC



SAEVF

TTC





1493
CATWDTSL
2317
TGCGCAACATGGGATACCAGCCTGAGTGCTGATGCT



SADAGGGV

GGTGGGGGGGTCTTC



F







1494
CATWDTSL
2318
TGCGCAACATGGGATACCAGCCTGCGTGTCGTGGTA



RVVVF

TTC





1495
CATWDTSL
2319
TGCGCAACATGGGATACCAGCCTGAGAGGGGTGTTC



RGVF







1496
CATWDTSL
2320
TGCGCAACATGGGATACCAGCCTGCCTGCGTGGGTG



PAWVF

TTC





1497
CATWDTSL
2321
TGTGCAACATGGGATACCAGCCTGAATGTTGGGGTA



NVGVF

TTC





1498
CATWDTSL
2322
TGCGCAACATGGGATACCAGCCTGGGTATTGTGTTA



GIVLF

TTT





1499
CATWDTSL
2323
TGCGCAACATGGGACACCAGCCTGGGTGCGCGTGTG



GARVVF

GTCTTC





1500
CATWDTSL
2324
TGTGCAACGTGGGATACCAGTCTAGGTGCCTTGTTC



GALF







1501
CATWDTSL
2325
TGCGCAACATGGGATACCAGCCTGGCGACTGGACTG



ATGLF

TTC





1502
CATWDTSL
2326
TGCGCAACATGGGATACCAGCCTGGCTGCCTGGGTA



AAWVF

TTC





1503
CATWDTRL
2327
TGCGCAACCTGGGATACCAGGCTGAGTGCTGTGGTC



SAVVF

TTC





1504
CATWDTRL
2328
TGCGCAACATGGGATACCAGGCTGAGTGCTGGGGTG



SAGVF

TTC





1505
CATWDTRL
2329
TGTGCAACGTGGGACACACGTCTACTTATTACGGTT



LITVF

TTC





1506
CATWDTLL
2330
TGCGCAACATGGGACACCCTCCTGAGTGTTGAACTC



SVELF

TTC





1507
CATWDTGR
2331
TGCGCAACATGGGATACTGGCCGCAATCCTCATGTG



NPHVVF

GTCTTC





1508
CATWDTGL
2332
TGCGCAACATGGGATACCGGCCTGTCTTCGGTGTTG



SSVLF

TTC





1509
CATWDTGL
2333
TGCGCAACGTGGGATACCGGCCTGAGTGCGGTTTTC



SAVF







1510
CATWDRTL
2334
TGCGCTACGTGGGATAGGACCCTGAGTATTGGAGTC



SIGVF

TTC





1511
CATWDRSV
2335
TGCGCAACGTGGGATCGCAGTGTGACTGCTGTGCTC



TAVLF

TTC





1512
CATWDRSL
2336
TGCGCAACCTGGGATAGGAGCCTGAGTGGTGTGGTG



SGVVF

TTC





1513
CATWDRSL
2337
TGCGCAACATGGGATAGAAGCCTGAGTGCTGTGGTC



SAVVF

TTC





1514
CATWDRSL
2338
TGCGCAACATGGGATAGAAGCCTGAGTGCTGTTCCT



SAVPWVF

TGGGTGTTC





1515
CATWDRSL
2339
TGCGCAACATGGGATCGCAGCCTGAGTGCTGGGGTG



SAGVF

TTC





1516
CATWDRSL
2340
TGCGCAACGTGGGATAGGAGCCTGCGTGCTGGGGTG



RAGVF

TTC





1517
CATWDRSL
2341
TGCGCAACATGGGATCGCAGTCTGAATGTTTATGTC



NVYVL

CTC





1518
CATWDRIL
2342
TGCGCAACGTGGGATCGCATCCTGAGCGCTGAGGTG



SAEVF

TTC





1519
CATWDRGL
2343
TGCGCAACGTGGGATAGAGGCCTGAGTACTGGGGTG



STGVF

TTC





1520
CATWDNYL
2344
TGCGCAACATGGGATAACTACCTGGGTGCTGCCGTG



GAAVF

TTC





1521
CATWDNTP
2345
TGCGCAACATGGGATAACACGCCTTCGAATATTGTG



SNIVVF

GTATTC





1522
CATWDNTL
2346
TGCGCAACATGGGATAATACACTGAGTGTGTGGGTC



SVWVF

TTC





1523
CATWDNTL
2347
TGCGCAACATGGGATAACACCCTGAGTGTCAATTGG



SVNWVF

GTGTTC





1524
CATWDNTL
2348
TGCGCAACCTGGGATAACACACTGAATGTCTTTTAT



NVFYVF

GTTTTC





1525
CATWDNRL
2349
TGTGCGACATGGGATAATCGGCTCAGTTCTGTGGTC



SSVVF

TTC





1526
CATWDNRL
2350
TGCGCAACATGGGATAACCGCCTGAGTGCTGGGGTG



SAGVL

CTC





1527
CATWDNRL
2351
TGCGCAACGTGGGATAACAGGCTGAGTGCTGGGGTG



SAGVF

TTC





1528
CATWDNRD
2352
TGCGCAACATGGGATAACAGGGATTGGGTCTTC



WVF







1529
CATWDNNL
2353
TGCGCAACATGGGATAACAACCTGGGTGCTGGGGTG



GAGVF

TTC





1530
CATWDNKL
2354
TGCGCAACATGGGATAACAAGCTGACTTCTGGGGTC



TSGVF

TTC





1531
CATWDNIL
2355
TGCGCAACATGGGATAACATCCTGAGTGCCTGGGTG



SAWVF

TTT





1532
CATWDNDI
2356
TGCGCAACCTGGGACAACGATATACATTCTGGGCTG



HSGLF

TTC





1533
CATWDLSL
2357
TGCGCAACTTGGGATCTCAGCCTGAGTGCCCTGTTC



SALF







1534
CATWDITLS
2358
TGCGCAACATGGGATATCACCCTGAGTGCTGAGGTG



AEVF

TTC





1535
CATWDISPS
2359
TGCGCAACGTGGGATATCAGCCCGAGTGCTGGCGGG



AGGVF

GTGTTC





1536
CATWDISLS
2360
TGCGCAACATGGGATATCAGTCTAAGTACTGGCCGG



TGRAVF

GCTGTGTTC





1537
CATWDISLS
2361
TGCGCAACATGGGATATCAGTCTGAGTCAGGTATTC



QVF







1538
CATWDIRL
2362
TGCGCAACATGGGATATCAGGCTGAGTAGTGGAGTG



SSGVF

TTC





1539
CATWDIGP
2363
TGCGCAACGTGGGATATCGGCCCGAGTGCTGGCGGG



SAGGVF

GTGTTC





1540
CATWDHSR
2364
TGCGCAACATGGGATCACAGCCGGGCTGGTGTGCTA



AGVLF

TTC





1541
CATWDHSP
2365
TGCGCAACATGGGATCACAGTCCGAGTGTTGGAGAA



SVGEVF

GTCTTC





1542
CATWDHSL
2366
TGCGCAACATGGGATCACAGCCTGCGTGTTGGGGTG



RVGVF

TTC





1543
CATWDHSL
2367
TGCGCAACATGGGATCACAGCCTGAACATTGGGGTG



NIGVF

TTC





1544
CATWDHSL
2368
TGCGCAACATGGGATCACAGCCTGGGTCTTTGGGCA



GLWAF

TTC





1545
CATWDHNL
2369
TGCGCCACATGGGATCACAATCTGCGTCTTGTTTTC



RLVF







1546
CATWDHIL
2370
TGCGCGACTTGGGATCACATCCTGGCTTCTGGGGTG



ASGVF

TTC





1547
CATWDFSL
2371
TGCGCAACATGGGATTTCAGCCTGAGTGTTTGGGTG



SVWVF

TTC





1548
CATWDFSL
2372
TGCGCAACATGGGATTTCAGCCTGAGTGCTTGGGTG



SAWVF

TTC





1549
CATWDDTL
2373
TGCGCAACATGGGATGACACCCTCACTGCTGGTGTG



TAGVF

TTC





1550
CATWDDRL
2374
TGCGCAACATGGGACGACAGGCTGAGTGCTGTGCTT



SAVLF

TTC





1551
CATWDDRL
2375
TGCGCAACATGGGATGACAGGCTGGATGCTGCGGTG



DAAVF

TTC





1552
CATWDATL
2376
TGCGCAACATGGGATGCGACCCTGAATACTGGGGTG



NTGVF

TTC





1553
CATWDASL
2377
TGCGCAACATGGGATGCCAGCCTGAGTGTTTGGCTG



SVWLL

CTC





1554
CATWDASL
2378
TGCGCGACATGGGATGCCAGCCTGAGTGGTGGGGTG



SGGVF

TTC





1555
CATRDTTLS
2379
TGCGCAACACGGGATACCACCCTCAGCGCCGTTCTG



AVLF

TTC





1556
CATLGSSLS
2380
TGCGCTACATTGGGTAGTAGCCTGAGTCTCTGGGTG



LWVF

TTC





1557
CATIETSLP
2381
TGCGCAACAATCGAAACTAGCCTGCCTGCCTGGGTA



AWVF

TTC





1558
CATGDRSL
2382
TGCGCAACAGGGGACAGAAGCCTGACTGTTGAGGT



TVEVF

ATTC





1559
CATGDLGL
2383
TGCGCTACAGGGGATCTCGGCCTGACCATAGTCTTC



TIVF







1560
CASWDYRG
2384
TGCGCATCATGGGATTACAGGGGGAGATCTGGTTGG



RSGWVF

GTGTTC





1561
CASWDTTL
2385
TGCGCATCATGGGATACCACCCTGAATGTTGGGGTG



NVGVF

TTC





1562
CASWDTTL
2386
TGCGCTTCATGGGATACCACCCTGGGTTTTGTGTTAT



GFVLF

TC





1563
CASWDTSL
2387
TGCGCATCATGGGATACCAGCCTGAGTGGTGGTTAT



SGGYVF

GTCTTC





1564
CASWDTSL
2388
TGCGCATCATGGGATACCAGCCTCCGTGCTGGGGTG



RAGVF

TTC





1565
CASWDTSL
2389
TGCGCATCATGGGATACCAGCCTGGGTGCTGGGGTG



GAGVF

TTC





1566
CASWDRGL
2390
TGCGCATCATGGGACAGAGGCCTGAGTGCAGTGGTG



SAVVF

TTC





1567
CASWDNVL
2391
TGTGCTAGTTGGGATAACGTCCTGCGTGGTGTGGTA



RGVVF

TTC





1568
CASWDNRL
2392
TGCGCGTCATGGGATAACAGGCTGACTGCCGTGGTT



TAVVF

TTC





1569
CASWDASL
2393
TGCGCATCATGGGATGCAAGCCTGTCCGTCGCTTTC



SVAF







1570
CASWDAGL
2394
TGCGCTTCGTGGGATGCCGGCCTGAGTTCTTATGTCT



SSYVF

TC





1571
CASGDTSLS
2395
TGCGCATCCGGGGATACCAGCCTGAGTGGTGTGATA



GVIF

TTC





1572
CARWHTSL
2396
TGCGCAAGATGGCATACGAGCCTAAGTATTTGGGTC



SIWVF

TTC





1573
CAIWDTGL
2397
TGCGCAATATGGGATACCGGCCTGAGTCCTGGCCAA



SPGQVAF

GTTGCCTTC





1574
CAAWHSGL
2398
TGCGCAGCATGGCATAGCGGCCTGGGTCTCCCGGTC



GLPVF

TTC





1575
CAAWDYSL
2399
TGCGCAGCATGGGATTACAGCCTGAGTGCTGGGGTG



SAGVF

TTC





1576
CAAWDTTL
2400
TGCGCAGCCTGGGATACTACCCTGCGTGTTAGGCTG



RVRLF

TTC





1577
CAAWDTSL
2401
TGCGCAGCATGGGATACCAGCCTGACTGCCTGGGTT



TAWVF

TTC





1578
CAAWDTSL
2402
TGCGCAGCATGGGATACCAGCTTGAGTGGTGGGGTG



SGGVF

TTC





1579
CAAWDTSL
2403
TGCGCAGCATGGGATACCAGCCTGAGTGGCGAGGCT



SGEAVF

GTGTTC





1580
CAAWDTSL
2404
TGCGCAGCATGGGATACCAGCTTGAGTGGTGCGGTG



SGAVF

TTC





1581
CAAWDTSL
2405
TGCGCAGCATGGGATACCAGCCTGAGTGCCTGGGTG



SAWVF

TTC





1582
CAAWDTSL
2406
TGCGCAGCATGGGATACCAGCCTGAGTGCTGGGGTA



SAGVF

TTC





1583
CAAWDTSL
2407
TGCGCAGCATGGGATACCAGCCTGGATACTTATGTC



DTYVF

TTC





1584
CAAWDTRL
2408
TGCGCTGCATGGGATACCCGTCTGAGTGGTGTGTTA



SGVLF

TTC





1585
CAAWDTRL
2409
TGCGCAGCATGGGATACCAGGCTGAGTGCTGGGGTG



SAGVF

TTC





1586
CAAWDRSL
2410
TGCGCAGCATGGGATCGCAGTCTGAGTACTGGAGTT



STGVF

TTC





1587
CAAWDIRR
2411
TGCGCAGCGTGGGATATCCGCCGGTCTGTCCTTTTC



SVLF







1588
CAAWDHT
2412
TGCGCTGCGTGGGATCACACTCAGCGTCTTTCCTTC



QRLSF







1589
CAAWDHSL
2413
TGCGCAGCATGGGATCACAGCCTGAGTGCTGGCCAG



SAGQVF

GTGTTC





1590
CAAVDTGL
2414
TGCGCAGCAGTCGATACTGGTCTGAAAGAATGGGTG



KEWVF

TTC









The CDRs were prescreened to contain no amino acid liabilities, cryptic splice sites or nucleotide restriction sites. The CDR variation was observed in at least two individuals and comprises the near-germline space of single, double and triple mutations. The order of assembly is seen in FIG. 21C.


The VH domains that were designed include IGHV1-69 and IGHV3-30. Each of two heavy chain VH domains are assembled with their respective invariant 4 framework elements (FW1, FW2, FW3, FW4) and variable 3 CDR (H1, H2, H3) elements. For IGHV1-69, 417 variants were designed for H1 and 258 variants were designed for H2. For IGHV3-30, 535 variants were designed for H1 and 165 variants were designed for H2. For the CDR H3, the same cassette was used in both IGHV1-69 and IGHV-30 since both designed use an identical FW4, and because the edge of FW3 is also identical for both IGHV1-69 and IGHV3-30. The CDR H3 comprises an N-terminus and C-terminus element that are combinatorially joined to a central middle element to generate 1×1010 diversity. The N-terminal and middle element overlap with a “GGG” glycine codon. The middle and C-terminal element overlap with a “GGT” glycine codon. The CDR H3 comprises 5 subpools that were assembled separately. The various N-terminus and C-terminus elements comprise sequences as seen in Table 14.









TABLE 14







Sequences for N-terminus and C-terminus elements










SEQ




ID



Element
NO
Sequence





Stem A
2415
CARDLRELECEEWT XXX SRGPCVDPRGVAGSFDVW





Stem B
2416
CARDMYYDF XXX EVVPADDAFDIW





Stem C
2417
CARDGRGSLPRPKGGP XXX YDSSEDSGGAFDIW





Stem D
2418
CARANQHF XXX GYHYYGMDVW





Stem E
2419
CAKHMSMQ XXX RADLVGDAFDVW









Example 12. Enrichment for GPCR GLP1R Binding Proteins

Antibodies having CDR-H3 regions with a variant fragments of GPCR binding protein were generated by methods described herein were panned using cell-based methods to identified variants which are enriched for binding to particular GPCRs, as described in Example 10.


Variants of the GLP C-terminus peptide were identified (listed in Table 15) that when embedded in the CDR-H3 region of an antibody, were repeatedly and selectively enriched for binding to GPCR GLP1R.









TABLE 15







Sequences of GLP1 embedded in CDR-H3








SEQ



ID



NO
Sequence





2420
CAKHMSMQEGAVTGEGQAAKEFIAWLVKGRVRADLVGDAFDVW





2421
CARDGRGSLPRPKGGPQTVGEGQAAKEFIAWLVKGGLTYDSSEDS



GGAFDIW





2422
CAKHMSMQDYLVIGEGQAAKEFIAWLVKGGPARADLVGDAFDVW





2423
CAKHMSMQEGAVTGEGQDAKEFIAWLVKGRVRADLVGDAFDVW





2424
WAKHMSMQEGAVTGEGQAAKEFIAWLVKGRVRADLVGDAFDVW





2425
CARDGRGSLPRPKGGPQTVGEGQAAKEFIAWLVKGRVRADLVGDA



FDVW





2426
CARANQHFYEQEGTFTSDVSSYLEGQAAKEFIAWLVKGGIRGYHY



YGMDVW





2427
CARANQHFTELHGEGQAAKEFIAWLVKGRGQIDIGYHYYGMDVW





2428
CARANQHFLGAGVSSYLEGQAAKEFIAWLVKGDTTGYHYYGMDVW





2429
CARANQHFLDKGTFTSDVSSYLEGQAAKEFIAWLVKGIYPGYHYY



GMDVW





2430
CARANQHFGTLSAGEGQAAKEFIAWLVKGGSQYDSSEDSGGAFDIW





2431
CARANQHFGLHAQGEGQAAKEFIAWLVKGSGTYGYHYYGMDVW





2432
CARANQHFGGKGEGQAAKEFIAWLVKGGGSGAGYHYYGMDVW





2433
CAKQMSMQEGAVTGEGQAAKEFIAWLVKGRVRADLVGDAFDVW





2434
CAKHMSMQEGAVTGEGQAAKEFIAWLVKGGPARADLVGDAFDVW





2435
CAKHMSMQEGAVTGEGQAAKEFIAWLVKGGLTYDSSEDSGGAFDIW





2436
CAKHMSMQDYLVIGEGQAAKEFIAWLVKGRVRADLVGDAFDVW









While preferred embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed in practicing the disclosure. It is intended that the following claims define the scope of the disclosure and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims
  • 1. An antibody, wherein the antibody comprises a CDR-H3 comprising a sequence of any one of SEQ ID NOS: 2420 to 2436.
  • 2. The antibody of claim 1, wherein the antibody comprises a CDR-H3 comprising a sequence of any one of SEQ ID NOS: 2420 to 2436; and wherein the antibody is a monoclonal antibody, a polyclonal antibody, a bi-specific antibody, a multispecific antibody, a grafted antibody, a human antibody, a humanized antibody, a synthetic antibody, a chimeric antibody, a camelized antibody, a single-chain Fvs (scFv), a single chain antibody, a Fab fragment, a F(ab′)2 fragment, a Fd fragment, a Fv fragment, a single-domain antibody, an isolated complementarity determining region (CDR), a diabody, a fragment comprised of only a single monomeric variable domain, disulfide-linked Fvs (sdFv), an intrabody, an anti-idiotypic (anti-Id) antibody, or ab antigen-binding fragments thereof.
  • 3. A method for treatment of a metabolic disorder, comprising administering to a subject in need thereof the antibody of claim 1.
  • 4. The method of claim 3, wherein the metabolic disorder is Type II diabetes or obesity.
  • 5. A protein library comprising a plurality of proteins, wherein each of the proteins of the plurality of proteins comprise an immunoglobulin scaffold, wherein the immunoglobulin scaffold comprises a CDR-H3 loop that comprises a sequence variant of a GPCR binding domain.
  • 6. A protein library comprising a plurality of proteins, wherein the plurality of proteins comprises sequence encoding for different GPCR binding domains, and wherein the length of each GPCR binding domain is about 20 to about 80 amino acids.
  • 7. An antibody library comprising a plurality of antibodies, wherein each antibody comprises: a. a CDR-H3 loop comprising an amino acid sequence having at least about 90% sequence identity to any one of SEQ ID NOs: 2420 to 2436; andb. a variable domain of light chain (VL) comprising an amino acid sequence having at least about 90% sequence identity to any one of SEQ ID NOs: 91, 93, 95, 97, 100, 118, 138, 233, 249, 276, 277, 465, 467, 496, 501, 509, 515, 526, 594, 604, 803, 1059, and 1219,wherein each of the antibodies comprises a GPCR binding domain, and wherein the GPCR binding domain is a ligand for a GPCR.
  • 8. The antibody library of claim 7, wherein the antibody further comprises one or more domains selected from variable domain of heavy chain (VH), constant domain of light chain (CL), and constant domain of heavy chain (CH).
  • 9. The antibody library of claim 7, wherein the GPCR binding domains comprise peptidomimetic or small molecule mimetic.
  • 10. The antibody library of claim 7, wherein the ligand for a GPCR is a non-antibody ligand.
  • 11. The antibody library of claim 7, wherein the CDR-H3 loop comprises an amino acid sequence of any one of SEQ ID NOs: 2420 to 2436.
  • 12. The antibody library of claim 7, wherein the variable domain of light chain (VL) comprises an amino acid sequence of any one of SEQ ID NOs: 91, 93, 95, 97, 100, 118, 138, 233, 249, 276, 277, 465, 467, 496, 501, 509, 515, 526, 594, 604, 803, 1059, and 1219.
  • 13. The antibody library of claim 7, wherein the variable domain of light chain (VL) comprises an amino acid sequence having at least about 90% sequence identity to any one of SEQ ID NOs: 91, 93, 95, and 97.
  • 14. The antibody library of claim 7, wherein the variable domain of light chain (VL) comprises an amino acid sequence of any one of SEQ ID NOs: 91, 93, 95, and 97.
  • 15. The antibody library of claim 7, wherein the variable domain of light chain (VL) comprises an amino acid sequence having at least about 90% sequence identity to any one of SEQ ID NOs: 276, 604, 803, 1059, and 1219.
  • 16. The antibody library of claim 7, wherein the variable domain of light chain (VL) comprises an amino acid sequence of any one of SEQ ID NOs: 276, 604, 803, 1059, and 1219.
  • 17. The antibody library of claim 7, wherein the antibody is a monoclonal antibody, a polyclonal antibody, a bi-specific antibody, a multispecific antibody, a grafted antibody, a human antibody, a humanized antibody, a synthetic antibody, a chimeric antibody, a camelized antibody, a single-chain Fvs (scFv), a single chain antibody, a Fab fragment, a F(ab′)2 fragment, a Fd fragment, a Fv fragment, a single-domain antibody, an isolated complementarity determining region (CDR), a diabody, a fragment comprised of only a single monomeric variable domain, disulfide-linked Fvs (sdFv), an intrabody, an anti-idiotypic (anti-Id) antibody, or ab antigen-binding fragments thereof.
  • 18. A method of inhibiting GLP1R activity, comprising administering the antibody library of claim 7.
  • 19. A method for treatment of a metabolic disorder, comprising administering the antibody library of claim 7.
CROSS-REFERENCE

This application is a Divisional Application of U.S. patent application Ser. No. 16/128,372, filed on Sep. 11, 2018, which claims the benefit of U.S. Provisional Patent Application No. 62/556,863 filed on Sep. 11, 2017, each of which are incorporated herein by reference in their entirety. The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Sep. 20, 2018, is named 44854-741_401_SL.txt and is 943,473 bytes in size.

Provisional Applications (1)
Number Date Country
62556863 Sep 2017 US
Divisions (1)
Number Date Country
Parent 16128372 Sep 2018 US
Child 17747764 US