NON-HUMAN ANIMALS HAVING AN IMMUNOGLOBULIN HEAVY CHAIN VARIABLE REGION THAT INCLUDES AN ENGINEERED DIVERSITY CLUSTER AND USES THEREOF

BACKGROUND

Antibody-based therapeutics offer significant promise in the treatment of several diseases. A variety of formats, including monoclonal, murine, chimeric, humanized, human, full-length, Fab, pegylated, radiolabeled, drug-conjugated, multi-specific, etc. are being developed (see e.g., Reichert, J. M., 2012, mAbs 4:3, 413-415; Nixon, A. E. et al., 2014, mAbs 6:1, 73-85; incorporated herein by reference). Of the more than 40 therapeutic antibody agents that have received marketing approval in the United States or Europe, all have been generated with technologies that rely on assembly of traditional antibody genes from human and/or non-human (e.g., mouse) sources by in vitro (e.g., phage display) or in vivo (e.g., genetically engineered animals) systems. Still, development of particularly effective antibody agents that bind intractable targets remains a challenge.

SUMMARY

Disclosed herein is the recognition that it is desirable to engineer non-human animals to permit improved in vivo systems for identifying and developing new antibody-based therapeutics and, in some embodiments, antibody agents (e.g., monoclonal antibodies and/or fragments thereof), which can be used for the treatment of a variety of diseases characterized by intractable disease targets. Further, disclosed herein is the recognition that non-human animals having an engineered heavy chain diversity (D_H) cluster/region within an immunoglobulin heavy chain variable region (e.g., a heterologous immunoglobulin heavy chain variable region), in particular, an engineered D_Hcluster (or D_Hregion) containing nucleotide coding sequences not naturally present within an immunoglobulin heavy chain variable region, and/or otherwise expressing, containing, or producing antibodies containing complementary determining regions (CDRs) that are characterized by diversity that directs binding to particular antigens are desirable, for example, for use in identifying and developing antibody-based therapeutics, which may target e.g., membrane-spanning or cytoplasmic polypeptides. In some embodiments, non-human animals disclosed herein are in vivo systems for development of antibodies and/or antibody-based therapeutics for administration to humans.

In some embodiments, a non-human animal is provided, whose genome, e.g., germline genome, comprises an immunoglobulin heavy chain variable region that includes an engineered D_Hregion, wherein the engineered D_Hregion includes one or more nucleotide sequences that each encode a non-immunoglobulin polypeptide of interest, or portion thereof.

In another aspect, non-human animals whose genome, e.g., germline genome, are modified to comprise an immunoglobulin heavy chain variable region that includes an engineered D_Hregion, wherein the engineered D_Hregion includes one or more nucleotide sequences that each encode a non-immunoglobulin polypeptide of interest, or portion thereof, may be further modified to express a single rearranged light chain, e.g., a common light chain (ULC).

A single rearranged light chain variable gene sequence operably linked to a light chain constant region, also referred to as common or universal light chain (ULC), may be encoded by a light chain locus comprising a single rearranged V_L:J_Lgene sequence. In some embodiments, the light chain locus comprises a single rearranged V_L:J_Lgene sequence in which the V_Lsequence is a Vκ gene sequence. In some aspects, the Vκ sequence is selected from Vκ1-39 or Vκ3-20. In some aspects, the J_Lsequence is a Jκ gene sequence, e.g., a Jκ1 sequence, a Jκ2 sequence, a Jκ3 sequence, a Jκ4 sequence, or a Jκ5 sequence, etc. In some embodiments, the light chain locus comprises a single rearranged Vκ k sequence selected from the group consisting of Vκ1-39Jκ5 and Vκ3-20Jκ1. In one embodiment, the light chain locus comprises a single rearranged Vκ k sequence of Vκ1-39Jκ5. In another embodiment, the light chain locus comprises a single rearranged Vκ k sequence of Vκ3-20Jκ1. In some embodiments, the single rearranged variable gene sequence is operably linked to a non-human light chain constant region gene, e.g., endogenous non-human light constant region gene. In another embodiment, the single rearranged variable gene sequence is operably linked to a human light chain constant region gene. In some aspects, the single rearranged variable gene sequence is a human V:J sequence inserted to the endogenous immunoglobulin light chain locus such that the resulting non-human animal does not comprise functional unrearranged V and/or J gene segments in one or more light chain loci.

In some embodiments, an isolated non-human cell or tissue is provided, whose genome comprises an immunoglobulin heavy chain variable region that includes an engineered D_Hregion, wherein the engineered D_Hregion includes one or more nucleotide sequences that each encode a non-immunoglobulin polypeptide of interest, or portion thereof, and optionally, a common or universal light chain. In some embodiments, a cell is from a lymphoid or myeloid lineage. In some embodiments, a cell is a lymphocyte. In some embodiments, a cell is selected from a B cell, dendritic cell, macrophage, monocyte, and a T cell. In some embodiments, a tissue is selected from adipose, bladder, brain, breast, bone marrow, eye, heart, intestine, kidney, liver, lung, lymph node, muscle, pancreas, plasma, serum, skin, spleen, stomach, thymus, testis, ovum, and a combination thereof.

In some embodiments, an immortalized cell made from an isolated non-human cell as described herein is provided.

In some embodiments, a non-human embryonic stem (ES) cell is provided, whose genome comprises an immunoglobulin heavy chain variable region that includes an engineered D_Hregion, wherein the engineered D_Hregion includes one or more nucleotide sequences that each encode a non-immunoglobulin polypeptide of interest, or portion thereof, and optionally, a common light chain. In some embodiments, a non-human embryonic stem cell is a rodent embryonic stem cell. In some certain embodiments, a rodent embryonic stem cell is a mouse embryonic stem cell and is from a 129 strain, C57BL strain, or a mixture thereof. In some certain embodiments, a rodent embryonic stem cell is a mouse embryonic stem cell and is a mixture of 129 and C57BL strains.

In some embodiments, a non-human germ cell is provided, whose genome comprises an immunoglobulin heavy chain variable region that includes an engineered D_Hregion, wherein the engineered D_Hregion includes one or more nucleotide sequences that each encode a non-immunoglobulin polypeptide of interest, or portion thereof, and optionally, a common light chain. In some embodiments, a non-human germ cell is a rodent germ cell. In some certain embodiments, a rodent germ cell is a mouse germ cell and is from a 129 strain, C57BL strain, or a mixture thereof. In some certain embodiments, a rodent germ cell is a mouse germ cell and is a mixture of 129 and C57BL strains.

In some embodiments, use of a non-human embryonic stem cell or germ cell as described herein to make a non-human animal is provided. In some certain embodiments, a non-human embryonic stem cell or germ cell is a mouse embryonic stem cell or germ cell and is used to make a mouse comprising an immunoglobulin heavy chain variable region that includes an engineered D_Hregion, and optionally a common light chain, as described herein. In some certain embodiments, a non-human embryonic stem cell or germ cell is a rat embryonic stem cell or germ cell and is used to make a rat comprising an immunoglobulin heavy chain variable region that includes an engineered D_Hregion, and optionally a common light chain as described herein.

In some embodiments, a non-human embryo comprising, made from, obtained from, or generated from a non-human embryonic stem cell as described herein is provided. In some certain embodiments, a non-human embryo is a rodent embryo; in some embodiments, a mouse embryo; in some embodiments, a rat embryo.

In some embodiments, use of a non-human embryo described herein to make a non-human animal is provided. In some certain embodiments, a non-human embryo is a mouse embryo and is used to make a mouse comprising an immunoglobulin heavy chain variable region that includes an engineered D_Hregion, and optionally a common light chain locus, as described herein. In some certain embodiments, a non-human embryo is a rat embryo and is used to make a rat comprising an immunoglobulin heavy chain variable region that includes an engineered D_Hregion and optionally a common light chain locus, as described herein.

In some embodiments, a kit is provided, comprising an isolated non-human cell or tissue as described herein, an immortalized cell as described herein, a non-human embryonic stem cell as described herein, a non-human embryo as described herein, or a non-human animal as described herein.

In some embodiments, a kit as described herein is provided, for use in the manufacture and/or development of a drug (e.g., an antibody or antigen-binding fragment thereof) for therapy or diagnosis.

In some embodiments, a kit as described herein is provided, for use in the manufacture and/or development of a drug (e.g., an antibody or antigen-binding fragment thereof) for the treatment, prevention or amelioration of a disease, disorder or condition.

In some embodiments, a transgene, nucleic acid construct, DNA construct, or targeting vector as described herein is provided. In some certain embodiments, a transgene, nucleic acid construct, DNA construct, or targeting vector comprises an engineered D_Hregion as described herein. In some certain embodiments, a transgene, nucleic acid construct, DNA construct, or targeting vector comprises a DNA fragment that includes one or more nucleotide coding sequences described herein. In some certain embodiments, a transgene, nucleic acid construct, DNA construct, or targeting vector comprises an engineered D_Hregion that comprises one or more nucleotide coding sequences selected from Table 3 or Table 4, which one or more nucleotide coding sequences are each flanked by a recombination signal sequence selected from FIG. 2. In some certain embodiments, a transgene, nucleic acid construct, DNA construct, or targeting vector further comprises one or more selection markers. In some certain embodiments, a transgene, nucleic acid construct, DNA construct, or targeting vector further comprises one or more site-specific recombination sites (e.g., lox, Frt, or combinations thereof). In some certain embodiments, a transgene, nucleic acid construct, DNA construct, or targeting vector is depicted in any one of FIGS. 3A, 3B, 4A, 4B, 7A, 7B, 8A and 8B.

In some embodiments, use of a transgene, nucleic acid construct, DNA construct, or targeting vector as described herein to make a non-human embryonic stem cell, non-human cell, non-human embryo and/or non-human animal is provided.

In some embodiments, a non-immunoglobulin polypeptide of interest is a chemokine receptor. In some embodiments, a chemokine receptor is selected from the group consisting of a CC-chemokine receptor (or β-chemokine receptor), CXC-chemokine receptor, CX3C-chemokine receptor and a XC-chemokine receptor. In some embodiments, a chemokine receptor is an atypical chemokine receptor (ACKR). In some embodiments, an ACKR is selected from the group consisting of ACKR1, ACKR2, ACKR3 and ACKR4. In some certain embodiments, an ACKR is ACKR2 or D6 chemokine decoy receptor.

In some embodiments, a non-immunoglobulin polypeptide of interest is a toxin. In some embodiments, a toxin is a toxin that is found in the venom of a tarantula, spider, scorpion or sea anemone.

In some embodiments, a non-immunoglobulin polypeptide of interest is a conotoxin or a tarantula toxin. In some embodiments, a conotoxin is selected from the group consisting of α-conotoxin, δ-conotoxin, κ-conotoxin, μ-conotoxin, ω-conotoxin and combinations thereof. In some certain embodiments, a conotoxin is μ-conotoxin. In some certain embodiments, a tarantula toxin is ProTxI, ProTxII, Huwentoxin-IV (HWTX-IV), or combinations thereof.

In some embodiments, an engineered D_Hregion includes 5, 10, 15, 20, 25 or more nucleotide sequences that each encode an extracellular portion of a D6 chemokine decoy receptor, or that each encode a portion of a conotoxin (e.g., μ-conotoxin), or that each encode a portion of a tarantula toxin (e.g., ProTxI, ProTxII, etc.), or combinations thereof.

In some embodiments, an engineered D_Hregion includes 25 nucleotide sequences that each encode an extracellular portion of a D6 chemokine decoy receptor or includes 26 nucleotide sequences that each encode a portion of a conotoxin (e.g., μ-conotoxin) and/or a tarantula toxin (e.g., ProTxI, ProTxII, etc.).

In some embodiments, an extracellular portion of a D6 chemokine decoy receptor is selected from the group consisting of an N-terminal region, an extracellular loop, and combinations thereof.

In some embodiments, a portion of a conotoxin as described herein includes a sequence that comprises one or more disulfide bonds. In some embodiments, a portion of a conotoxin as described herein includes a sequence that lacks one or more disulfide bonds as compared to a conotoxin sequence that appears in nature (e.g., a reference or parental conotoxin sequence). In some embodiments, a portion of a conotoxin as described herein includes a sequence that exhibits a number and/or pattern of disulfide bonds that is the same or different as compared to a conotoxin sequence that appears in nature (e.g., a reference or parental conotoxin sequence).

In some embodiments, a portion of a tarantula toxin as described herein includes a sequence that comprises a cysteine knot motif that appears in a tarantula toxin sequence found in nature. In some embodiments, a portion of a tarantula toxin as described herein includes a sequence that is or comprises a cysteine knot peptide(s). In some embodiments, a portion of a tarantula toxin as described herein includes a sequence that lacks one or more disulfide bonds as compared to a tarantula toxin sequence that appears in nature (e.g., a reference or parental tarantula toxin sequence). In some embodiments, a portion of a tarantula toxin as described herein includes a sequence exhibits a number and/or pattern of disulfide bonds that is the same or different as compared to a tarantula toxin sequence that appears in nature (e.g., a reference or parental tarantula toxin sequence).

In some embodiments, an engineered D_Hregion includes 25 or 26 nucleotide sequences that is at least 50%, 60%, 70%, 80%, 90%, 95%, or at least 98% identical to a nucleotide sequence that appears in Table 3 or Table 4 and/or encodes an amino acid sequence that appears in Table 3 or Table 4. In some embodiments, an engineered D_Hregion includes 25 or 26 nucleotide sequences that each encode an amino acid sequence that is substantially identical or identical to an amino acid sequence that appears in Table 3 or Table 4, or that has the same function as an amino acid sequence that appears in Table 3 or 4.

In some embodiments, one or more nucleotide sequences comprise one or more nucleotide substitutions that increase somatic hypermutation of the one or more nucleotide sequences.

In some embodiments, an engineered D_Hregion further includes a first and a second recombination signal sequence flanking each of the one or more nucleotide sequences. In some certain embodiments, a first recombination signal sequence comprises a sequence that is at least 50%, 60%, 70%, 80%, 90%, 95%, or at least 98% identical to a first recombination signal sequence that appears in FIG. 2. In some certain embodiments, a first recombination signal sequence comprises a sequence that is substantially identical or identical to a first recombination signal sequence that appears in FIG. 2. In some certain embodiments, a second recombination signal sequence comprises a sequence that is at least 50%, 60%, 70%, 80%, 90%, 95%, or at least 98% identical to a second recombination signal sequence that appears in FIG. 2. In some certain embodiments, a second recombination signal sequence comprises a sequence that is substantially identical or identical to a second recombination signal sequence that appears in FIG. 2. In some embodiments, first and second recombination signal sequences are selected from FIG. 2.

In some embodiments, the genome of a provided non-human animal lacks one or more wild-type endogenous D_Hgene segments. In some certain embodiments, the genome of a provided non-human animal lacks all or substantially all wild-type endogenous D_Hgene segments. In some embodiments, the genome of a provided non-human animal lacks one or more wild-type endogenous recombination signal sequences.

In some embodiments, an engineered D_Hregion comprises one or more wild-type human D_Hgene segments. In some certain embodiments, an engineered D_Hregion comprises a human D_H6-25 gene segment. In some certain embodiments, an engineered D_Hregion lacks a human D_H6-25 gene segment.

In some embodiments, an immunoglobulin heavy chain variable region is operably linked to an immunoglobulin heavy chain constant region.

In some embodiments, an immunoglobulin heavy chain constant region is an endogenous immunoglobulin heavy chain constant region.

In some embodiments, an immunoglobulin heavy chain variable region is an unrearranged human immunoglobulin heavy chain variable region, e.g., comprising at least one human (h) unrearranged V_Hgene segment and/or at least one human (h) unrearranged J_Hgene segment flanking an engineered (e) D_Hregion. In some embodiments, an immunoglobulin heavy chain variable region comprises a plurality of human (h) unrearranged V_Hgene segment and/or a plurality of one human (h) unrearranged J_Hgene segment flanking an engineered (e) D_Hregion. In some embodiments, the unrearranged human immunoglobulin heavy chain variable region is operably linked to an immunoglobulin heavy chain constant region, e.g., a non-human immunoglobulin heavy chain constant region, e.g., at an endogenous non-human heavy chain locus.

In some embodiments, a human immunoglobulin heavy chain variable region comprises a rearranged human immunoglobulin heavy chain variable region, wherein the rearranged human immunoglobulin heavy chain comprises at least one human (h) unrearranged V_Hgene segment and/or at least one human (h) unrearranged J_Hgene segment recombined with an engineered (e) D_Hregion to form a rearranged (h)V_H/(e)D_H/(h)J_Hgene sequence, that may be operably linked to the heavy chain constant region. In some embodiments, such recombination occurs in a B cell during B cell development.

Accordingly, a non-human animal described herein may comprise

(i) a germ cell comprising an unrearranged human heavy chain variable region comprising

(a) at least one or a plurality of human (h) unrearranged V_Hgene segment,

(b) at least one or a plurality of human (h) unrearranged J_Hgene segment, and

(c) an engineered (e) D_Hregion flanked by (a) and (b), wherein (a), (b), and (c) recombine to form a rearranged human heavy chain variable region hV_H/eD_H/hJ_Hsequence; and

(ii) a somatic cell, e.g., a B cell, comprising the rearranged human hV_H/eD_H/hJ_Hgene sequence, wherein the rearranged hV_H/eD_H/hJ_Hgene sequence comprises a CDR3 encoding sequence comprising one or more nucleotide sequences that encode a non-immunoglobulin polypeptide of interest, or portion thereof, or somatically hypermutated variant thereof. In some embodiments, the human unrearranged or rearranged human heavy chain variable region is operably linked to a heavy chain constant region, which may be a non-human heavy chain constant region, e.g., at a non-human endogenous heavy chain locus. In some embodiments, an immunoglobulin heavy chain variable region is a human immunoglobulin heavy chain variable region. In some embodiments, the B cell further expresses the rearranged hV_H/eD_H/hJ_Hgene sequence operably linked to a heavy chain constant region as an immunoglobulin heavy chain-like polypeptide comprising a CDR3 comprising a non-immunoglobulin polypeptide of interest, portion thereof, or somatically hypermutated variant thereof.

In some embodiments, a non-human animal as described herein further comprises a humanized light chain locus. In some embodiments, a non-human animal as disclosed herein comprises

(i) in a germ cell:

(a) an immunoglobulin heavy chain locus comprising an unrearranged immunoglobulin heavy chain variable region and an immunoglobulin heavy chain constant region, wherein the unrearranged immunoglobulin heavy chain variable region comprises at least one unrearranged V_Hgene segment (which may be a human unrearranged V_Hgene segment, e.g., an hV_H) an engineered D_Hregion (which may include a human D_Hgene segment and/or engineered D_Hgene segments, e.g., hDH), and at least one unrearranged, optionally human, J_Hgene segments, wherein the V_Hgene segment(s), engineered D_Hregion, and J_Hgene segment(s) are operably linked such that they can recombine, e.g., in a B cell during B cell development, to form a rearranged immunoglobulin heavy chain hV_H/eD_H/hJ_Hvariable region gene sequence in operable linkage with an immunoglobulin heavy chain constant region, and

(b) an immunoglobulin light chain locus comprising human V_Land/or J_Lgene segments, and which may encode a (human) common light chain; and

(ii) in a somatic cell, e.g., a B cell,

(a) the rearranged immunoglobulin heavy chain hV_H/eD_H/hJ_Hvariable region gene sequence in operable linkage with an immunoglobulin heavy chain constant region, wherein the engineered D_Hregion comprises a sequence encoding a non-immunoglobulin polypeptide of interest or portion thereof, and wherein rearranged hV_H/eD_H/hJ_Hgene sequence comprises a CDR3 encoding sequence comprising one or more nucleotide sequences that encode a non-immunoglobulin polypeptide of interest, or portion thereof, or somatically hypermutated variant thereof, and

(b) the humanized and/or common immunoglobulin light chain locus or a somatically hypermutated variant thereof. In some embodiments, the human unrearranged or rearranged human heavy chain variable region is operably linked to a heavy chain constant region, which may be a non-human heavy chain constant region, e.g., at a non-human endogenous heavy chain locus. In some embodiments, an immunoglobulin heavy chain variable region is a human immunoglobulin heavy chain variable region. In some embodiments, the B cell further expresses the rearranged hV_H/eD_H/hJ_Hgene sequence operably linked to a heavy chain constant region as an immunoglobulin heavy chain-like polypeptide comprising a CDR3 comprising a non-immunoglobulin polypeptide of interest, portion thereof, or somatically hypermutated variant thereof and the human(ized) and/or common light chain as a tetrameric immunoglobulin-like antigen binding protein, wherein the tetramer comprises a dimer of the immunoglobulin heavy chain-like polypeptide, each covalently bound to the human(ized) and/or common light chain.

In some embodiments, a method of making a non-human animal whose genome contains an immunoglobulin heavy chain variable region that includes an engineered D_Hregion is provided, the method comprising (a) inserting a DNA fragment into a non-human embryonic stem cell, said DNA fragment comprising one or more nucleotide sequences that each encode a non-immunoglobulin polypeptide of interest, or portion thereof; (b) obtaining the non-human embryonic stem cell generated in (a); and (c) creating a non-human using the embryonic stem cell of (b).

In one embodiment, the method of making a non-human animal as disclosed herein comprises (a) obtaining a first non-human animal whose genome contains an immunoglobulin heavy chain variable region that includes an engineered D_Hregion as disclosed herein, and (b) breeding the first non-human animal of (a) with a second non-human animal, which in one aspect may be a different strain as the first non-human animal, wherein the second non-human animal expresses a universal light chain, and wherein the breeding results in offspring that produce, e.g., comprise, a genetically engineered heavy chain comprising an amino acid sequence of a non-immunoglobulin protein (or portion thereof), e.g., in a CDR3, and a genetically engineered rearranged light chain (single rearranged light chain; ULC).

In some embodiments, a DNA fragment includes 5, 10, 15, 20, 25 or more nucleotide sequences that each encode an extracellular portion of a D6 chemokine decoy receptor or that each encode a portion of a conotoxin (e.g., μ-conotoxin), a tarantula toxin, or combinations thereof. In some certain embodiments, a DNA fragment includes 25 nucleotide sequences that each encode an extracellular portion of a D6 chemokine decoy receptor or includes 26 nucleotide sequences that each encode a portion of a conotoxin (e.g., μ-conotoxin) and/or a tarantula toxin (e.g., ProTxI, ProTxII, etc.). In some certain embodiments, a DNA fragment further comprises first and second recombination signal sequences flanking each of the 25 or 26 nucleotide sequences.

In some embodiments, a method of making a non-human animal whose genome contains an immunoglobulin heavy chain variable region that includes an engineered D_Hregion is provided, the method comprising modifying the genome of a non-human animal so that it comprises an immunoglobulin heavy chain variable region that includes an engineered D_Hregion, which engineered D_Hregion comprises one or more nucleotide sequences that each encode a non-immunoglobulin polypeptide of interest, or portion thereof, thereby making said non-human animal.

In some embodiments, the genome of the non-human animal is modified to include κ, 10, 15, 20, 25 or more nucleotide sequences that each encode an extracellular portion of a D6 chemokine decoy receptor or that each encode a portion of a conotoxin (e.g., μ-conotoxin), a tarantula toxin, or combinations thereof. In some certain embodiments, the genome of the non-human animal is modified to include 25 nucleotide sequences that each encode an extracellular portion of a D6 chemokine decoy receptor or is modified to include 26 nucleotide sequences that each encode a portion of a conotoxin (e.g., μ-conotoxin) and/or a tarantula toxin (e.g., ProTxI, ProTxII, etc.). In some certain embodiments, the genome of the non-human animal is modified to further include first and second recombination signal sequences flanking each of the 25 or 26 nucleotide sequences.

In some embodiments, a method of producing an antibody in a non-human animal is provided, the method comprising the steps of (a) immunizing a non-human animal with an antigen, which non-human animal has a genome comprising an immunoglobulin heavy chain variable region that includes an engineered D_Hregion, wherein the engineered D_Hregion includes one or more nucleotide sequences that each encode a non-immunoglobulin polypeptide of interest, or portion thereof; (b) maintaining the non-human animal under conditions sufficient that the non-human animal produces an immune response to the antigen; and (c) recovering an antibody from the non-human animal, or a non-human animal cell, that binds the antigen. In some certain embodiments, a non-human cell is a B cell. In some certain embodiments, a non-human cell is a hybridoma.

In some embodiments, a non-human animal is provided whose germ cell genome comprises (a) a human immunoglobulin heavy chain variable region that comprises one or more unrearranged human V_Hgene segments, an engineered D_Hregion, and one or more unrearranged human J_Hgene segments, which engineered D_Hregion includes (i) one or more nucleotide sequences that each encode an extracellular portion of an atypical chemokine receptor (ACKR); and (ii) first and second recombination signal sequences flanking each of the one or more nucleotide sequences of (i) so that the one or more unrearranged human V_Hgene segments, an engineered D_Hregion, and one or more unrearranged human J_Hgene segments recombine, e.g., in a B cell, such that the non-human animal comprises a B cell genome comprising a human immunoglobulin heavy chain variable region that comprises a rearranged hV_H/eD_H/hJ_Hgene sequence; wherein the human immunoglobulin heavy chain variable region is operably linked to one or more endogenous immunoglobulin heavy chain constant region genes so that the non-human animal is characterized in that when it is immunized with an antigen, the rearranged hV_H/eD_H/hJ_Hgene sequence operably linked to an immunoglobulin heavy chain constant region gene encodes an antibody comprising a human heavy chain variable domains encoded by one of the human V_Hgene segments (or portion thereof), an engineered D_Hregion (or portion thereof), and one of the human J_Hgene segments (or portion thereof) operably linked to non-human animal heavy chain constant domains encoded by the one or more endogenous immunoglobulin constant region genes, and wherein the antibody shows specific binding to the antigen. In some embodiments, the germ cell and, e.g., B cell, of the non-human animal further comprises an immunoglobulin light chain locus encoding a common light chain such that the antibody further comprises the common light chain.

In some embodiments, a non-human animal is provided whose germ cell genome comprises a human immunoglobulin heavy chain variable region that comprises one or more unrearranged human (h) V_Hgene segments, an engineered (e) D_Hregion, and one or more (h) human J_Hgene segments, which engineered D_Hregion includes (i) one or more nucleotide sequences that each encode a portion of a toxin (e.g., a μ-conotoxin, tarantula toxin, or combinations thereof); and (ii) first and second recombination signal sequences flanking each of the one or more nucleotide sequences of (i) so that the one or more unrearranged human V_Hgene segments, engineered D_Hregion, and one or more unrearranged human J_Hgene segments recombine, e.g., in a B cell, such that the non-human animal comprises a B cell genome comprising a human immunoglobulin heavy chain variable region that comprises a rearranged hV_H/eD_H/hJ_Hgene sequence; wherein the human immunoglobulin heavy chain variable region is operably linked to one or more endogenous immunoglobulin heavy chain constant region genes so that the non-human animal is characterized in that when it is immunized with an antigen, the rearranged hV_H/eD_H/hJ_Hgene sequence operably linked to an immunoglobulin heavy chain constant region gene encodes an antibody comprising a human heavy chain variable domain encoded by one of the human V_Hgene segments (or portion thereof), the engineered D_Hregion (or portion thereof), and one of the human J_Hgene segments (or portion thereof) operably linked to one or more heavy chain constant domains encoded by the one or more endogenous immunoglobulin constant region genes, and wherein the antibodies show specific binding to the antigen. In some embodiments, the germ cell and, e.g., B cell, of the non-human animal further comprises an immunoglobulin light chain locus encoding a common light chain such that the antibody further comprises the common light chain.

In some embodiments, a provided non-human animal further comprises an insertion of one or more human V_Lgene segments and one or more human J_Lgene segments into an endogenous light chain locus. In some embodiments, human V_Land J_Lsegments are Vκ and Jκ gene segments and are inserted into an endogenous κ light chain locus. In some embodiments, human Vκ and Jκ gene segments are operably linked to a rodent Cκ gene (e.g. a mouse or a rat Cκ gene). In some embodiments, human V_Land J_Lsegments are Vλ and Jλ gene segments and are inserted into an endogenous λ light chain locus. In some embodiments, human Vλ and Jλ gene segments are operably linked to a rodent Cλ gene (e.g., a mouse or a rat Cλ gene). In some embodiments, a single rearranged human light chain variable region gene sequence is operably linked to an endogenous non-human light chain constant region gene. In some embodiments, a single rearranged human Vκ/Jκ gene sequence is operably linked to an endogenous Cκ gene (e.g., a mouse or rate Cκ gene). In some embodiments, a single rearranged Vλ/Jλ, gene sequence is operably linked to an endogenous Cλ gene.

In some embodiments, a provided non-human animal is homozygous, heterozygous or hemizygous for an engineered D_Hregion as described herein. In some embodiments, a provided non-human animal is transgenic for an engineered D_Hregion as described herein.

Disclosed herein are also cells, e.g., B cells, or hybridomas derived therefrom by fusion with a myeloma cell, each comprising a rearranged (h)V_H/eD_H/(h)J_Hsequence, which may be operably linked to a human or non-human heavy chain constant region comprising one or more heavy chain constant region genes. Such cell, e.g., B cell, may be isolated from a non-human animal, e.g., rodent (e.g., rat, mouse, etc.) as described herein.

Also described herein are nucleotide sequences comprising a rearranged variable (h)V_H/eD_H/(h)J_Hsequence, which may be operably linked to a human or non-human heavy chain constant region comprising one or more heavy chain constant region genes. Such nucleotide sequences may be isolated from a non-human animal, e.g., rodent (e.g., rat, mouse, etc.) or non-human cell as described herein.

In some embodiments, use of a non-human animal as described herein in the manufacture and/or development of a drug or vaccine for use in medicine, such as use as a medicament, is provided.

In some embodiments, use of a non-human animal as described herein in the manufacture of a medicament for the treatment of a disease, disorder or condition is provided.

In some embodiments, use of a non-human animal as described herein in the manufacture and/or development of an antibody that binds a chemokine or voltage-gated sodium (Na_V) channel is provided.

In some embodiments, use of a non-human animal as described herein in the manufacture of a medicament for the treatment or detection of a disease characterized by chemokine or voltage-gated sodium (Na_V) channel expression or function is provided (e.g., aberrant expression or function).

In some embodiments, a non-human animal as described herein is provided for use in the manufacture and/or development of a drug for therapy or diagnosis.

In some embodiments, a non-human animal as described herein is provided for use in the manufacture of a medicament for the treatment, prevention or amelioration of a disease, disorder or condition.

In some embodiments, a non-human animal as described herein is provided for use in the manufacture and/or development of an antibody that binds a chemokine or voltage-gated sodium (Na_V) channel is provided.

In some embodiments, a disease, disorder or condition is an inflammatory disease, disorder or condition. In some embodiments, a disease, disorder or condition is characterized by chemokine expression or function (e.g., aberrant chemokine expression or function).

In some embodiments, a disease, disorder or condition is a pain disease, disorder or condition. In some embodiments, a disease, disorder or condition is characterized by ion channel expression or function (e.g., aberrant Na_Vchannel expression or function).

In many embodiments, a non-human animal provided herein is a rodent; in some embodiments, a mouse; in some embodiments, a rat.

As used in this application, the terms “about” and “approximately” are used as equivalents. Any numerals used in this application with or without about/approximately are meant to cover any normal fluctuations appreciated by one of ordinary skill in the relevant art.

Other features, objects, and advantages of the non-human animals, cells, nucleic acids and compositions disclosed herein are apparent in the detailed description of certain embodiments that follows. It should be understood, however, that the detailed description, while indicating certain embodiments, is given by way of illustration only, not limitation.

BRIEF DESCRIPTION OF THE DRAWING

The Drawing included herein, which is composed of the following Figures, is for illustration purposes only and not for limitation.

FIGS. 1A-1D shows exemplary optimization of selected D6 chemokine decoy receptor coding sequences to include somatic hypermutation hotspots. FIG. 1A: optimized Nterm domain of D6 chemokine decoy receptor with locations of natural (broken line fill) and artificial (diagonal line fill) RGYW activation-induced cytidine deaminase (AID) hotspots; FIG. 1B: optimized EC1 domain of D6 chemokine decoy receptor with locations of natural (broken line fill) and artificial (diagonal line fill) RGYW activation-induced cytidine deaminase (AID) hotspots; FIG. 1C: optimized EC2 domain of D6 chemokine decoy receptor with locations of natural (broken line fill) and artificial (diagonal line fill) RGYW AID hotspots; FIG. 1D: optimized EC3 domain of D6 chemokine decoy receptor with locations of natural (broken line fill) and artificial (diagonal line fill) RGYW AID hotspots.

FIG. 2 shows a table of exemplary optimized 5′ and 3′ recombination signal sequences (RSSs) designed for each nucleotide coding sequence to allow for efficient recombination frequency and equal usage of the nucleotide coding sequences during V(D)J recombination. global RSS consensus 5′ RSS (SEQ ID NO:51); global RSS consensus 3′ RSS (SEQ ID NO:52); human D_Hconsensus 5′ RSS (SEQ ID NO:53); human D_Hconsensus 3′ RSS (SEQ ID NO:54); mouse D_Hconsensus 5′ RSS (SEQ ID NO:55); mouse D_Hconsensus 3′ RSS (SEQ ID NO:56); optimized RSS 5′ RSS (SEQ ID NO:57); optimized RSS 3′ RSS (SEQ ID NO:58); 1-1 opt 5′ RSS (SEQ ID NO:59); 1-1 opt 3′ RSS (SEQ ID NO:60); 1-7 opt 5′ RSS (SEQ ID NO:61); 1-7 opt 3′ RSS (SEQ ID NO:62); 1-14 ORF opt 5′ RSS (SEQ ID NO:63); 1-14 ORF opt 3′ RSS (SEQ ID NO:64); 1-20 opt 5′ RSS (SEQ ID NO:65); 1-20 opt 3′ RSS (SEQ ID NO:66); 1-26 opt 5′ RSS (SEQ ID NO:67); 1-26 opt 3′ RSS (SEQ ID NO:68); 2-2*02 opt 5′ RSS (SEQ ID NO:69); 2-2*02 opt 3′ RSS (SEQ ID NO:70); 2-8*01 opt 5′ RSS (SEQ ID NO:71); 2-8*01 opt 3′ RSS (SEQ ID NO:72); 2-15 opt 5′ RSS (SEQ ID NO:73); 2-15 opt 3′ RSS (SEQ ID NO:74); 2-21*02 opt 5′ RSS (SEQ ID NO:75); 2-21*02 opt 3′ RSS (SEQ ID NO:76); 3-3*01 opt 5′ RSS (SEQ ID NO:77); 3-3*01 opt 3′ RSS (SEQ ID NO:78); 3-9 opt 5′ RSS (SEQ ID NO:79); 3-9 opt 3′ RSS (SEQ ID NO:80); 3-10*01 opt 5′ RSS (SEQ ID NO:81); 3-10*01 opt 3′ RSS (SEQ ID NO:82); 3-16*02 opt 5′ RSS (SEQ ID NO:83); 3-16*02 opt 3′ RSS (SEQ ID NO:84); 3-22 opt 5′ RSS (SEQ ID NO:85); 3-22 opt 3′ RSS (SEQ ID NO:86); 4-4 opt 5′ RSS (SEQ ID NO:87); 4-4 opt 3′ RSS (SEQ ID NO:88); 4-11 ORF opt 5′ RSS (SEQ ID NO:89); 4-11 ORF opt 3′ RSS (SEQ ID NO:90); 4-17 opt 5′ RSS (SEQ ID NO:91); 4-17 opt 3′ RSS (SEQ ID NO:92); 4-23 ORF opt 5′ RSS (SEQ ID NO:93); 4-23 ORF opt 3′ RSS (SEQ ID NO:94); 5-5 opt 5′ RSS (SEQ ID NO:95); 5-5 opt 3′ RSS (SEQ ID NO:96); 5-12 opt 5′ RSS (SEQ ID NO:97); 5-12 opt 3′ RSS (SEQ ID NO:98); 5-18 opt 5′ RSS (SEQ ID NO:99); 5-18 opt 3′ RSS (SEQ ID NO:100); 5-24 ORF opt 5′ RSS (SEQ ID NO:101); 5-24 ORF opt 3′ RSS (SEQ ID NO:102); 6-6 opt 5′ RSS (SEQ ID NO:103); 6-6 opt 3′ RSS (SEQ ID NO:104); 6-13 opt 5′ RSS (SEQ ID NO:105); 6-13 opt 3′ RSS (SEQ ID NO:106); 6-19 opt 5′ RSS (SEQ ID NO:107); 6-19 opt 3′ RSS (SEQ ID NO:108); 6-25 (not optimized) 5′ RSS (SEQ ID NO:109); 6-25 (not optimized) 3′ RSS (SEQ ID NO:110); 7-27 (not optimized) 5′ RSS (SEQ ID NO:111); 7-27 (not optimized) 3′ RSS (SEQ ID NO:112). Bold and italicized font for global RSS consensus and 7-27 indicates match to global RSS consensus sequence based on rodent and human RSS from immunoglobulin and T cell receptor V, D and J gene segments; bold font for mouse D_Hconsensus indicates match to mouse immunoglobulin D_Hconsensus sequence; bold font for human D_Hconsensus, optimized RSS and all remaining RSS (e.g., 1-1 opt, 1-7 opt, 1-20 opt, etc.) indicates match to the human immunoglobulin D_Hconsensus sequence.

FIGS. 3A-3B shows an illustration, not to scale, of an exemplary strategy for construction of a targeting vector for integration into rodent embryonic stem (ES) cells to create a rodent whose genome comprises an immunoglobulin heavy chain variable region that includes an engineered diversity cluster (i.e., D_Hregion), which diversity cluster includes one or more nucleotide sequences that each encode a portion of a non-immunoglobulin polypeptide (e.g., an extracellular portion of a D6 chemokine decoy receptor). FIG. 3A: four initial steps highlighting (1) de novo synthesis of D6 coding sequences, (2) AgeI/EcoRI digestion and ligation of a selection cassette (e.g., neomycin) and a D6 DNA fragment, (3) SnaBI digestion of D6 DNA fragments and NotI/AscI digestion of a BAC vector (pBacE3.6), and (4) one-step isothermal assembly of digested DNA fragments to create contiguous engineered diversity cluster of D6 chemokine decoy receptor coding sequences in place of traditional D_Hsegments; FIG. 3B: additional step for creating a targeting vector for integration into the genome of rodent ES cells, (5) PI-SceI/I-CeuI digestion and ligation of 25 synthetic D6 chemokine decoy receptor coding sequences into BAC clone to append 5′ and 3′ homology arms containing human immunoglobulin V_HDNA and J_HDNA, respectively. Various restriction enzyme recognition sites are indicated for each of the depicted DNA fragments. 1p: loxP site; neo: neomycin selection cassette drive by ubiquitin promoter; cm: chloramphenicol selection cassette; frt: Flippase recognition target sequence; hyg: hygromycin selection cassette; Ei: murine heavy chain intronic enhancer; IgM: murine immunoglobulin M constant region gene; lox: loxP site sequence of pBACe3.6 vector.

FIGS. 4A-4B shows an illustration, not to scale, of an alternative exemplary strategy to assemble D6 chemokine decoy receptor coding sequences by sequential ligation for construction of a targeting vector for integration into rodent embryonic stem (ES) cells to create a rodent whose genome comprises an immunoglobulin heavy chain variable region that includes an engineered diversity cluster (i.e., an engineered D_Hregion), which diversity cluster includes one or more nucleotide sequences that each encode an extracellular portion of a D6 chemokine decoy receptor. FIG. 4A: four initial steps highlighting (1) de novo synthesis of D6 coding sequences, (2) AgeI/EcoRI digestion and ligation of a selection cassette (e.g., neomycin) and a D6 DNA fragment, (3) NotI/AscI digestion and ligation of a D6 DNA fragment into a BAC vector (pBacE3.6), and (4) Pacl/Nsi digestion and ligation of D6 fragments into a BAC vector backbone; FIG. 4B: two additional steps for creating a targeting vector for integration into the genome of rodent ES cells, (5) PI-SceI/I-CeuI digestion and ligation of an additional D6 DNA fragment into the BAC vector backbone, and (6) Nsil/I-CeuI digestion and ligation of final D6 DNA fragment to create 25 synthetic D6 chemokine decoy receptor coding sequences into a BAC vector backbone. Various restriction enzyme recognition sites are indicated for each of the depicted DNA fragments; 1p: loxP site sequence; neo: neomycin selection cassette driven by ubiquitin promoter; cm: chloramphenicol selection cassette.

FIG. 5 shows an exemplary screening strategy using genetic material of drug-resistant colonies after electroporation screened by TAQMAN™ and karyotyping. Names and approximate locations, not to scale (line encompassed by oval), of various primer/probe sets (see Table 7) are indicated below various alleles shown (not to scale). hyg: hygromycin selection cassette driven by ubiquitin promoter; neo: neomycin selection cassette driven by ubiquitin promoter; L: loxP site sequence; Frt: Flippase recognition target sequence.

FIGS. 6A-6L shows exemplary optimization of selected μ-conotoxin and tarantula toxin coding sequences to include somatic hypermutation hotspots. FIG. 6A: optimized KIIIA fl of μ-conotoxin with locations of artificial (diagonal line fill) RGYW activation-induced cytidine deaminase (AID) hotspots; FIG. 6B: optimized KIIIA mini (top) and KIIIA midi (bottom) of μ-conotoxin with locations of artificial (diagonal line fill) RGYW activation-induced cytidine deaminase (AID) hotspots; FIG. 6C: optimized PIIIA fl of μ-conotoxin with locations of artificial (diagonal line fill) RGYW AID hotspots; FIG. 6D: optimized PIIIA mini (top) and PIIIA midi (bottom) of μ-conotoxin with locations of artificial (diagonal line fill) RGYW AID hotspots; FIG. 6E: optimized SMIIIA fl of μ-conotoxin with locations of artificial (diagonal line fill) RGYW AID hotspots; FIG. 6F: optimized SmIIIA mini (top) and SmIIIA midi (bottom) of μ-conotoxin with locations of artificial (diagonal line fill) RGYW AID hotspots; FIG. 6G: optimized ProTxII tarantula toxin with locations of artificial (diagonal line fill) RGYW AID hotspots; FIG. 6H: optimized tarantula toxin ProTxII C1SC4S (top), ProTxII C2SC5S (middle) and ProTxII C3 SC6S (bottom) with locations of artificial (diagonal line fill) RGYW AID hotspots; FIG. 6I: optimized SmIIIA SSRW loop (left), SmIIIA SSKW loop (middle) and PIIIA RSRQ loop (right) of μ-conotoxin with locations of artificial (diagonal line fill) RGYW AID hotspots; FIG. 6J: optimized KIIIA or SmIIIA mini/midi of μ-conotoxin in DO segment locations with locations of artificial (diagonal line fill) RGYW AID hotspots; FIG. 6K: optimized SmIIIA or PIIIA mini/midi of μ-conotoxin in D_H3 or D_H1 segment locations with locations of artificial (diagonal line fill) RGYW AID hotspots; FIG. 6L: optimized SSRW or RSRQ loops of μ-conotoxin in D_H2 segment locations with locations of artificial (diagonal line fill) RGYW AID hotspots.

FIGS. 7A-7B shows an illustration, not to scale, of an exemplary strategy for construction of a targeting vector for integration into rodent embryonic stem (ES) cells to create a rodent whose genome comprises an immunoglobulin heavy chain variable region that includes an engineered diversity cluster (i.e., an engineered D_Hregion), which diversity cluster includes one or more nucleotide sequences that each encode a portion of a non-immunoglobulin polypeptide (e.g., a portion of μ-conotoxin and/or tarantula toxin). FIG. 7A: four initial steps highlighting (1) de novo synthesis of toxin coding sequences, (2) AgeI/EcoRI digestion and ligation of a selection cassette (e.g., neomycin) and a toxin DNA fragment (TX-D_H1166), (3) SnaBI digestion of toxin DNA fragments and NotI/AscI digestion of a BAC vector (pBacE3.6), and (4) one-step isothermal assembly of digested DNA fragments to create an engineered diversity cluster comprising contiguous toxin coding sequences in place of one or more, and optionally, all functional D_Hgene segments; FIG. 7B: additional step for creating a targeting vector for integration into the genome of rodent ES cells, (5) PI-SceI/I-CeuI digestion and ligation of 26 synthetic toxin coding sequences into BAC clone to append 5′ and 3′ homology arms containing human immunoglobulin V_HDNA and J_HDNA, respectively. Various restriction enzyme recognition sites are indicated for each of the depicted DNA fragments; 1p: loxP site; neo: neomycin selection cassette drive by ubiquitin promoter; cm: chloramphenicol selection cassette; frt: Flippase recognition target sequence; hyg: hygromycin selection cassette; Ei: murine heavy chain intronic enhancer; IgM: murine immunoglobulin M constant region gene; lox: loxP of pBACe3.6 vector.

FIGS. 8A-8B shows an illustration, not to scale, of an alternative exemplary strategy to assemble toxin (e.g., μ-conotoxin and tarantula toxin) coding sequences by sequential ligation for construction of a targeting vector for integration into rodent embryonic stem (ES) cells to create a rodent whose genome comprises an immunoglobulin heavy chain variable region that includes an engineered diversity cluster (i.e., D_Hregion), which diversity cluster includes one or more nucleotide sequences that each encode a portion of a toxin peptide (e.g., μ-conotoxin and tarantula toxin ProTxII). FIG. 8A: four initial steps highlighting (1) de novo synthesis of toxin coding sequences, (2) AgeI/EcoRI digestion and ligation of a selection cassette (e.g., neomycin) and a toxin DNA fragment (TX-D_H1166), (3) NotI/AscI digestion and ligation of a toxin DNA fragment into a BAC vector (pBacE3.6), and (4) Pacl/Nsi digestion and ligation of toxin DNA fragments into a BAC vector backbone; FIG. 8B: two additional steps for creating a targeting vector for integration into the genome of rodent ES cells, (5) PI-SceI/I-CeuI digestion and ligation of an additional toxin DNA fragment into the BAC vector backbone, and (6) Nsil/I-CeuI digestion and ligation of final toxin DNA fragment to create 26 synthetic toxin coding sequences into a BAC vector backbone. Various restriction enzyme recognition sites are indicated for each of the depicted DNA fragments; 1p: loxP site; neo: neomycin selection cassette drive by ubiquitin promoter; cm: chloramphenicol selection cassette.

FIGS. 9A-9D show representative contour plots of lymphocytes in spleen harvested from VELOCIMMUNE® (VI) and mice homozygous for an engineered D_Hregion containing toxin coding sequences (6579ho/1293ho, “TX-D_Hho”; a rodent strain having a genome comprising a homozygous immunoglobulin heavy chain locus containing a plurality of human V_H, engineered D_Hsegments including toxin coding sequences in the place of traditional D_Hsegments, and J_Hsegments operably linked to a rodent immunoglobulin heavy chain constant region including rodent heavy chain enhancers and regulatory regions, and containing an inserted nucleotide sequence encoding one or more murine Adam6 genes [e.g., U.S. Pat. Nos. 8,642,835 and 8,697,940]; and a homozygous immunoglobulin κ light chain locus containing human Vκ and Jκ gene segments operably linked to a rodent Cκ region gene including rodent κ light chain enhancers), and stained for cell surface expression of various cell markers. FIG. 9A: representative contour plot of lymphocytes from spleen gated on singlets illustrating expression of CD19 (y-axis) and CD3 (x-axis). FIG. 9B: representative contour plot of lymphocytes from spleen singlets gated on CD19⁺ illustrating expression of immunoglobulin D (IgD, y-axis) and immunoglobulin M (IgM, x-axis); mature (CD19⁺ IgD⁺ IgM^int) and transitional (CD19⁺ IgD^intIgM) B cells are indicated on each dot plot. FIG. 9C: representative contour plot of lymphocytes from spleen singlets gated on CD19⁺ illustrating expression of Igλ (y-axis) or Igκ (x-axis) light chain. FIG. 9D: shows representative contour plot of B cell maturation illustrating lymphocytes from spleen singlets gated on CD19⁺ and showing expression of [from left to right] CD93 (y-axis) and B220 (x-axis), IgM (y-axis) and CD23 (x-axis); CD21/35 (y-axis) and IgM (x-axis), B220 (y-axis) and CD23 (x-axis), and IgD (y-axis) and IgM (x-axis). Top row: VELOCIMMUNE® mice; Bottom row: TX-D_Hho (6579ho/1293ho) mice. Specific B cell populations are indicated on each dot plot: Immature (CD19⁺ CD93⁺ B220⁺), mature (CD19⁺ CD93⁻ B220⁺), T1 (CD19⁺CD93⁺ B220⁺ IgM⁺ CD23⁻), T2 (CD19⁺ CD93⁺ B220⁺ IgM⁺ CD23⁺), T3 (CD19⁺ CD93⁺ B220⁺ IgM^intCD23⁺), MZ (CD19⁺ CD93⁻ B220⁺ CD21/35⁺ IgM⁺ CD23⁻), MZ precursor (CD19⁺ CD93⁻ B220⁺ CD21/35⁺ IgM⁺ CD23⁺), Fol I (CD19⁺ CD93⁻ B220⁺ CD21/35^intIgM^intIgD⁺), and Fol II (CD19⁺ CD93⁻ B220⁺ CD21/35^intIgM⁺ IgD⁺).

FIGS. 10A-10D show representative contour plots of lymphocytes in bone marrow harvested from VELOCIMMUNE® (VI) and mice homozygous for an engineered D_Hregion containing toxin coding sequences (6579ho/1293ho, “TX-D_Hho”; a rodent strain having a genome comprising a homozygous immunoglobulin heavy chain locus containing a plurality of human V_H, engineered D_Hsegments including toxin coding sequences in the place of traditional D_Hsegments, and J_Hsegments operably linked to a rodent immunoglobulin heavy chain constant region including rodent heavy chain enhancers and regulatory regions, and containing an inserted nucleotide sequence encoding one or more murine Adam6 genes [e.g., U.S. Pat. Nos. 8,642,835 and 8,697,940]; and a homozygous immunoglobulin κ light chain locus containing human Vκ and Jκ gene segments operably linked to a rodent Cκ region gene including rodent κ light chain enhancers), and stained for cell surface expression of various cell markers. FIG. 10A: representative contour plot of lymphocytes from bone marrow gated on singlets illustrating expression of CD19 (y-axis) and CD3 (x-axis). FIG. 10B: representative contour plot of lymphocytes from bone marrow gated on CD19⁺ IgM^−/lowIgD⁻ illustrating expression of c-kit (y-axis) and CD43 (x-axis); pre- (c-kit⁻ CD43⁻) and pro-B (c-kit⁺ CD43⁺) cells are indicated on each dot plot. FIG. 10C: representative contour plot of lymphocytes from bone marrow gated on singlets illustrating expression of IgM (y-axis) and B220 (x-axis); immature (IgM^{int to +} B220^int) and mature (IgM^{int to +} B220⁺) B cells are indicated on each dot plot. FIG. 10D: representative contour plot of lymphocytes from bone marrow gated on CD19⁺ IgM^{int to +} B220^int(top row) and CD19⁺ IgM^{int to +} B220⁺ (bottom row) illustrating expression of Igλ (y-axis) or Igκ (x-axis) light chain.

FIGS. 11A-11D show representative contour plots of lymphocytes in spleen harvested from VELOCIMMUNE® (VI) and mice heterozygous for an engineered D_Hregion containing D6 coding sequences (6590het, “D6-D_Hhet”; a rodent strain having a genome comprising a heterozygous immunoglobulin heavy chain locus containing a plurality of human V_H, engineered D_Hsegments including D6 chemokine decoy receptor coding sequences in the place of traditional D_Hsegments, and J_Hsegments operably linked to a rodent immunoglobulin heavy chain constant region including rodent heavy chain enhancers and regulatory regions, and containing an inserted nucleotide sequence encoding one or more murine Adam6 genes [e.g., U.S. Pat. Nos. 8,642,835 and 8,697,940]; and stained for cell surface expression of various cell markers. FIG. 11A: representative contour plot of lymphocytes from spleen gated on singlets illustrating expression of CD19 (y-axis) and CD3 (x-axis). FIG. 11B: representative contour plot of lymphocytes from spleen gated on CD19⁺ singlets illustrating expression of IgD (y-axis) and IgM (x-axis); mature (CD19⁺ IgD⁺ IgM^int) and transitional (CD19⁺ IgD^intIgM⁺) B cells are indicated on each dot plot. FIG. 11C: representative contour plot of lymphocytes from spleen gated on CD19⁺ singlets illustrating expression of Igλ (y-axis) or Igκ (x-axis) light chain. FIG. 11D: shows representative contour plot of B cell maturation illustrating lymphocytes from spleen gated on CD19⁺ singlets and showing expression of [from left to right] CD93 (y-axis) and B220 (x-axis), IgM (y-axis) and CD23 (x-axis); CD21/35 (y-axis) and IgM (x-axis), B220 (y-axis) and CD23 (x-axis), and IgD (y-axis) and IgM (x-axis). Top row: VELOCIMMUNE® mice; Bottom row: D6-D_Hhet (6590het) mice. Specific B cell populations are indicated on each dot plot: Immature (CD19⁺ CD93⁺ B220⁺), mature (CD19⁺ CD93⁻ B220⁺), T1 (CD19⁺ CD93⁺ B220⁺ IgM⁺ CD23⁻), T2 (CD19⁺ CD93⁺ B220⁺ IgM⁺ CD23⁺), T3 (CD19⁺ CD93⁺ B220⁺ IgM^intCD23⁺), MZ (CD19⁺ CD93⁻ B220⁺ CD21/35⁺ IgM⁺ CD23⁻), MZ precursor (CD19⁺ CD93⁻ B220⁺ CD21/35⁺ IgM⁺ CD23⁺), Fol I (CD19⁺ CD93⁻ B220⁺ CD21/35^intIgM^intIgD⁺), and Fol II (CD19⁺ CD93⁻ B220⁺ CD21/35^intIgM⁺ IgD⁺).

FIGS. 12A-12D show representative contour plots of lymphocytes in bone marrow harvested from VELOCIMMUNE® (VI) and mice heterozygous for an engineered D_Hregion containing D6 coding sequences (6590het, “D6-D_Hhet”; a rodent strain having a genome comprising a heterozygous immunoglobulin heavy chain locus containing a plurality of human V_H, engineered D_Hsegments including D6 chemokine decoy receptor coding sequences in the place of traditional D_Hsegments, and J_Hsegments operably linked to a rodent immunoglobulin heavy chain constant region including rodent heavy chain enhancers and regulatory regions, and containing an inserted nucleotide sequence encoding one or more murine Adam6 genes [e.g., U.S. Pat. Nos. 8,642,835 and 8,697,940]; and stained for cell surface expression of various cell markers. FIG. 12A: representative contour plot of lymphocytes from bone marrow gated on singlets illustrating expression of CD19 (y-axis) and CD3 (x-axis). FIG. 12B: representative contour plot of lymphocytes from bone marrow gated on CD19⁺ IgM^{− to low}IgD⁻ illustrating expression of c-kit (y-axis) and CD43 (x-axis); pre-B (c-kit⁻ CD43⁻) and pro-B (c-kit⁺ CD43⁺) cells are indicated on each dot plot. FIG. 12C: representative contour plot of lymphocytes from bone marrow gated on CD19⁺ illustrating expression of IgM (y-axis) and B220 (x-axis); immature (IgM^{int to +} B220^int) and mature (IgM^{int to +} B220⁺), pre- and pro-B cells (IgM^{− to low}B220^int) are indicated on each dot plot. FIG. 12D: representative contour plot of lymphocytes from bone marrow gated on CD19⁺ IgM^{int to +} B220^int(top row) and CD19⁺ IgM^{int to +} B220⁺ (bottom row) illustrating expression of Igλ (y-axis) or Igκ (x-axis) light chain.

FIGS. 13A-13D show representative contour plots of lymphocytes in spleen harvested from VELOCIMMUNE® (VI) and mice homozygous for an engineered D_Hregion containing D6 coding sequences (6590ho/1293ho, “D6-D_Hho”; a rodent strain having a genome comprising a homozygous immunoglobulin heavy chain locus containing a plurality of human V_H, engineered D_Hsegments including D6 chemokine decoy receptor coding sequences in the place of traditional D_Hsegments, and J_Hsegments operably linked to a rodent immunoglobulin heavy chain constant region including rodent heavy chain enhancers and regulatory regions, and containing an inserted nucleotide sequence encoding one or more murine Adam6 genes [e.g., U.S. Pat. Nos. 8,642,835 and 8,697,940]; and a homozygous immunoglobulin κ light chain locus containing human Vκ and Jκ gene segments operably linked to a rodent Cκ region gene including rodent κ light chain enhancers), and stained for cell surface expression of various cell markers. FIG. 13A: representative contour plot of lymphocytes from spleen gated on singlets illustrating expression of CD19 (y-axis) and CD3 (x-axis). FIG. 13B: representative contour plot of lymphocytes from spleen gated on CD19⁺ illustrating expression of IgD (y-axis) and IgM (x-axis); mature (CD19⁺ IgD⁺ IgM^int) and transitional (CD19⁺ IgD^intIgM⁺) B cells are indicated on each dot plot. FIG. 13C: representative contour plot of lymphocytes from spleen gated on CD19⁺ illustrating expression of Igλ (y-axis) or Igκ (x-axis) light chain. FIG. 13D: shows representative contour plot of B cell maturation illustrating lymphocytes from spleen gated on CD19⁺ singlets and showing expression of [from left to right] CD93 (y-axis) and B220 (x-axis), IgM (y-axis) and CD23 (x-axis); CD21/35 (y-axis) and IgM (x-axis), B220 (y-axis) and CD23 (x-axis), and IgD (y-axis) and IgM (x-axis). Top row: VELOCIMMUNE® mice; Bottom row: D6-D_Hho (6590ho) mice. Specific B cell populations are indicated on each dot plot: Immature (CD19⁺ CD93⁺ B220⁺), mature (CD19⁺ CD93⁻ B220⁺), T1 (CD19⁺ CD93⁺ B220⁺ IgM⁺ CD23⁻), T2 (CD19⁺ CD93⁺ B220⁺ IgM⁺ CD23⁺), T3 (CD19⁺ CD93⁺ B220⁺ IgM^intCD23⁺), MZ (CD19⁺ CD93⁻ B220⁺ CD21/35⁺ IgM⁺ CD23⁻), MZ precursor (CD19⁺ CD93⁻ B220⁺ CD21/35⁺ IgM⁺ CD23⁺), Fol I (CD19⁺ CD93⁻ B220⁺ CD21/35^intIgM^intIgD⁺) and Fol II (CD19⁺ CD93⁻ B220⁺ CD21/35^intIgM⁺ IgD⁺).

FIGS. 14A-14D show representative contour plots of lymphocytes in bone marrow harvested from VELOCIMMUNE® (VI) and mice homozygous for an engineered D_Hregion containing D6 coding sequences (6590ho/1293ho, “D6-D_Hho”; a rodent strain having a genome comprising a heterozygous immunoglobulin heavy chain locus containing a plurality of human V_H, engineered D_Hsegments including D6 chemokine decoy receptor coding sequences in the place of traditional D_Hsegments, and J_Hsegments operably linked to a rodent immunoglobulin heavy chain constant region including rodent heavy chain enhancers and regulatory regions, and containing an inserted nucleotide sequence encoding one or more murine Adam6 genes [e.g., U.S. Pat. Nos. 8,642,835 and 8,697,940]; and a homozygous immunoglobulin κ light chain locus containing human Vκ and Jκ gene segments operably linked to a rodent Cκ region gene including rodent κ light chain enhancers), and stained for cell surface expression of various cell markers. FIG. 14A: representative contour plot of lymphocytes from bone marrow gated on singlets illustrating expression of CD19 (y-axis) and CD3 (x-axis). FIG. 14B: representative contour plot of lymphocytes from bone marrow gated on CD19⁺ IgM^{− to low}IgD⁻ illustrating expression of c-kit (y-axis) and CD43 (x-axis); pre-B (c-kit⁻ CD43⁻) and pro-B (c-kit⁺ CD43⁺) cells are indicated on each dot plot. FIG. 14C: representative contour plot of lymphocytes from bone marrow gated on CD19⁺ illustrating expression of IgM (y-axis) and B220 (x-axis); immature (IgM^{int to +} B220^int) and mature (IgM^{int to +} B220⁺), pre- and pro-B cells (IgM^{− to low}B220^int) are indicated on each dot plot. FIG. 14D: representative contour plot of lymphocytes from bone marrow gated on CD19⁺ IgM^{int to +} B220^int(top row) and CD19⁺ IgM^{into to +} B220⁺ (bottom row) illustrating expression of Igλ (y-axis) or Igκ (x-axis) light chain.

FIG. 15 shows representative usage frequency of toxin coding sequences in an engineered D_Hregion in amplified RNA from spleen and bone marrow (combined all V_H-families, not reflective of quantitative V_Husage) of three 6579ho/1293ho mice (“TX-D_Hho”, supra). The y-axis indicates the name of each toxin coding sequence within the engineered D_Hregion. The x-axis indicates the frequency (percentage of sequences) of each toxin coding sequence among analyzed sequence reads.

FIG. 16 shows representative percent usage of human V_Hgene segments in amplified RNA from spleen and bone marrow (combined all V_H-families, not reflective of quantitative V_Husage) of three 6579ho/1293ho mice (“TX-D_Hho”, supra). The x-axis indicates the name of each human V_Hgene segment within the humanized heavy chain variable region.

FIG. 17 shows representative percent usage of human J_Hgene segments in amplified RNA from spleen and bone marrow (combined all V_H-families, not reflective of quantitative J_Husage) of three 6579ho/1293ho mice (“TX-D_Hho”, supra). The x-axis indicates the name of each human J_Hgene segment within the humanized heavy chain variable region.

FIG. 18 shows representative usage frequency of selected D6 coding sequences in an engineered D_Hregion in amplified RNA from spleen and bone marrow (combined all V_H-families, not reflective of quantitative V_Husage) of three 6590hetmice (“D6-D_Hhet”, supra). The y-axis indicates the name of selected D6 coding sequences within the engineered D_Hregion. The x-axis indicates the frequency (percentage of sequences) of D6 coding sequences among analyzed sequence reads. BM: bone marrow.

FIG. 19 shows representative percent usage of human V_Hgene segments in amplified RNA from spleen and bone marrow (combined all V_H-families, not reflective of quantitative V_Husage) of three 6590het mice (“D6-D_Hhet”, supra). The x-axis indicates the name of each human V_Hgene segment within the humanized heavy chain variable region. BM: bone marrow.

FIG. 20 shows representative percent usage of human J_Hgene segments in amplified RNA from spleen and bone marrow (combined all V_H-families, not reflective of quantitative J_Husage) of three 6590het mice (“D6-D_Hhet”, supra). The x-axis indicates the name of each human J_Hgene segment within the humanized heavy chain variable region. BM: bone marrow.

FIG. 21 shows the titer above background (y-axis) from control and 6579HO/1634 animals (x-axis) after immunization with engineered soluble form of a cell surface protein.

DEFINITIONS

Those skilled in the art, reading the present disclosure, will be aware of various modifications that may be equivalent to such described embodiments, or otherwise within the scope of the instant disclosure. In general, terminology used herein is in accordance with its understood meaning in the art, unless clearly indicated otherwise. Explicit definitions of certain terms are provided herein and below; meanings of these and other terms in particular instances throughout this specification will be clear to those skilled in the art from context. Additional definitions for the following terms and other terms are set forth throughout the specification. References cited within this specification, or relevant portions thereof, are incorporated herein by reference.

Administration: refers to the administration of a composition to a subject or system (e.g., to a cell, organ, tissue, organism, or relevant component or set of components thereof). Those of ordinary skill will appreciate that route of administration may vary depending, for example, on the subject or system to which the composition is being administered, the nature of the composition, the purpose of the administration, etc. For example, in certain embodiments, administration to an animal subject (e.g., to a human or a rodent) may be bronchial (including by bronchial instillation), buccal, enteral, interdermal, intra-arterial, intradermal, intragastric, intramedullary, intramuscular, intranasal, intraperitoneal, intrathecal, intravenous, intraventricular, mucosal, nasal, oral, rectal, subcutaneous, sublingual, topical, tracheal (including by intratracheal instillation), transdermal, vaginal and/or vitreal. In some embodiments, administration may involve intermittent dosing. In some embodiments, administration may involve continuous dosing (e.g., perfusion) for at least a selected period of time.

The term “antibody” includes typical immunoglobulin molecules comprising four polypeptide chains, two heavy (H) chains (each of which may comprise an amino acid sequence encoded by an engineered D_Hcluster) and two light (L) chains (each of which may be a common light chain) inter-connected by disulfide bonds. The term also includes an immunoglobulin that is reactive to an antigen or fragment thereof. Suitable antibodies include, but are not limited to, human antibodies, primatized antibodies, chimeric antibodies, monoclonal antibodies, monospecific antibodies, polyclonal antibodies, polyspecific antibodies, nonspecific antibodies, bispecific antibodies, multispecific antibodies, humanized antibodies, synthetic antibodies, recombinant antibodies, hybrid antibodies, mutated antibodies, grafted conjugated antibodies (i.e., antibodies conjugated or fused to other proteins, radiolabels, cytotoxins), and in vitro-generated antibodies. A skilled artisan will readily recognize common antibody isotypes, e.g., antibodies having a heavy chain constant region selected from the group consisting of IgG, IgA, IgM, IgD, and IgE, and any subclass thereof (e.g., IgG1, IgG2, IgG3, and IgG4).

Approximately: As applied to one or more values of interest, refers to a value that is similar to a stated reference value. In certain embodiments, the term “approximately” or “about” refers to a range of values that fall within 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, or less in either direction (greater than or less than) of the stated reference value unless otherwise stated or otherwise evident from the context (except where such number would exceed 100% of a possible value).

Biologically active: refers to a characteristic of any agent that has activity in a biological system, in vitro or in vivo (e.g., in an organism). For instance, an agent that, when present in an organism, has a biological effect within that organism is considered to be biologically active. In particular embodiments, where a protein or polypeptide is biologically active, a portion of that protein or polypeptide that shares at least one biological activity of the protein or polypeptide is typically referred to as a “biologically active” portion.

Comparable: refers to two or more agents, entities, situations, sets of conditions, etc. that may not be identical to one another but that are sufficiently similar to permit comparison there between so that conclusions may reasonably be drawn based on differences or similarities observed. Those of ordinary skill in the art will understand, in context, what degree of identity is required in any given circumstance for two or more such agents, entities, situations, sets of conditions, etc. to be considered comparable.

The phrase “complementarity determining region,” or the term “CDR,” includes an amino acid sequence encoded by a nucleic acid sequence of an organism's immunoglobulin genes that normally (i.e., in a wild-type animal) appears between two framework regions in a variable region of a light or a heavy chain of an immunoglobulin molecule (e.g., an antibody or a T cell receptor). A CDR can be encoded by, for example, a germline sequence or a rearranged or unrearranged sequence, and, for example, by a naive or a mature B cell or a T cell. A CDR can be somatically mutated (e.g., vary from a sequence encoded in an animal's germline), humanized, and/or modified with amino acid substitutions, additions, or deletions. In some circumstances (e.g., for a CDR3), CDRs can be encoded by two or more sequences (e.g., germline sequences) that are not contiguous (e.g., in an unrearranged nucleic acid sequence) but are contiguous in a B cell nucleic acid sequence, e.g., as the result of splicing or connecting the sequences (e.g., V-D-J recombination to form a heavy chain CDR3).

Conservative: in reference to a conservative amino acid substitution, refers to substitution of an amino acid residue by another amino acid residue having a side chain R group with similar chemical properties (e.g., charge or hydrophobicity). In general, a conservative amino acid substitution will not substantially change the functional properties of interest of a protein, for example, the ability of a receptor to bind to a ligand. Examples of groups of amino acids that have side chains with similar chemical properties include: aliphatic side chains such as glycine, alanine, valine, leucine, and isoleucine; aliphatic-hydroxyl side chains such as serine and threonine; amide-containing side chains such as asparagine and glutamine; aromatic side chains such as phenylalanine, tyrosine, and tryptophan; basic side chains such as lysine, arginine, and histidine; acidic side chains such as aspartic acid and glutamic acid; and sulfur-containing side chains such as cysteine and methionine. Conservative amino acids substitution groups include, for example, valine/leucine/isoleucine, phenylalanine/tyrosine, lysine/arginine, alanine/valine, glutamate/aspartate, and asparagine/glutamine. In some embodiments, a conservative amino acid substitution can be a substitution of any native residue in a protein with alanine, as used in, for example, alanine scanning mutagenesis. In some embodiments, a conservative substitution is made that has a positive value in the PAM250 log-likelihood matrix disclosed in Gonnet, G. H. et al., 1992, Science 256:1443-1445, hereby incorporated by reference. In some embodiments, a substitution is a moderately conservative substitution wherein the substitution has a nonnegative value in the PAM250 log-likelihood matrix.

Control: refers to the art-understood meaning of a “control” being a standard against which results are compared. Typically, controls are used to augment integrity in experiments by isolating variables in order to make a conclusion about such variables. In some embodiments, a control is a reaction or assay that is performed simultaneously with a test reaction or assay to provide a comparator. A “control” may refer to a “control animal.” A “control animal” may have a modification as described herein, a modification that is different as described herein, or no modification (i.e., a wild-type animal). In one experiment, a “test” (i.e., a variable being tested) is applied. In a second experiment, the “control,” the variable being tested is not applied. In some embodiments, a control is a historical control (i.e., of a test or assay performed previously, or an amount or result that is previously known). In some embodiments, a control is or comprises a printed or otherwise saved record. A control may be a positive control or a negative control.

Disruption: refers to the result of a homologous recombination event with a DNA molecule (e.g., with an endogenous homologous sequence such as a gene or gene locus). In some embodiments, a disruption may achieve or represent an insertion, deletion, substitution, replacement, missense mutation, or a frame-shift of a DNA sequence(s), or any combination thereof. Insertions may include the insertion of entire genes, fragments of genes, e.g., exons, which may be of an origin other than the endogenous sequence (e.g., a heterologous sequence), or coding sequences derived or isolated from a particular gene of interest. In some embodiments, a disruption may increase expression and/or activity of a gene or gene product (e.g., of a protein encoded by a gene). In some embodiments, a disruption may decrease expression and/or activity of a gene or gene product. In some embodiments, a disruption may alter sequence of a gene or an encoded gene product (e.g., an encoded protein). In some embodiments, a disruption may truncate or fragment a gene or an encoded gene product (e.g., an encoded protein). In some embodiments, a disruption may extend a gene or an encoded gene product. In some such embodiments, a disruption may achieve assembly of a fusion protein. In some embodiments, a disruption may affect level, but not activity, of a gene or gene product. In some embodiments, a disruption may affect activity, but not level, of a gene or gene product. In some embodiments, a disruption may have no significant effect on level of a gene or gene product. In some embodiments, a disruption may have no significant effect on activity of a gene or gene product. In some embodiments, a disruption may have no significant effect on either level or activity of a gene or gene product.

Determining, measuring, evaluating, assessing, assaying and analyzing: Are used interchangeably to refer to any form of measurement, and include determining if an element is present or not. These terms include both quantitative and/or qualitative determinations. Assaying may be relative or absolute. “Assaying for the presence of” can be determining the amount of something present and/or determining whether or not it is present or absent.

Endogenous locus or endogenous gene: refers to a genetic locus found in a parent or reference organism prior to introduction of an alteration, disruption, deletion, insertion, modification, substitution or replacement as described herein. In some embodiments, the endogenous locus comprises a sequence, in whole or in part, found in nature. In some embodiments, the endogenous locus is a wild-type locus. In some embodiments, a reference organism is a wild-type organism. In some embodiments, a reference organism is an engineered organism. In some embodiments, a reference organism is a laboratory-bred organism (whether wild-type or engineered).

Endogenous promoter: refers to a promoter that is naturally associated, e.g., in a wild-type organism, with an endogenous gene.

Engineered: refers, in general, to the aspect of having been manipulated by the hand of man. For example, in some embodiments, a polynucleotide may be considered to be “engineered” when two or more sequences that are not linked together in that order in nature are manipulated by the hand of man to be directly linked to one another in the engineered polynucleotide. In some particular such embodiments, an engineered polynucleotide may comprise a regulatory sequence that is found in nature in operative association with a first coding sequence but not in operative association with a second coding sequence, is linked by the hand of man so that it is operatively associated with the second coding sequence. Alternatively, or additionally, in some embodiments, first and second nucleic acid sequences that each encodes polypeptide elements or domains that in nature are not linked to one another may be linked to one another in a single engineered polynucleotide. Comparably, in some embodiments, a cell or organism may be considered to be “engineered” if it has been manipulated so that its genetic information is altered (e.g., new genetic material not previously present has been introduced, or previously present genetic material has been altered or removed). As is common practice and is understood by those in the art, progeny of an engineered polynucleotide or cell are typically still referred to as “engineered” even though the actual manipulation was performed on a prior entity. Furthermore, as will be appreciated by those skilled in the art, a variety of methodologies are available through which “engineering” as described herein may be achieved. For example, in some embodiments, “engineering” may involve selection or design (e.g., of nucleic acid sequences, polypeptide sequences, cells, tissues, and/or organisms) through use of computer systems programmed to perform analysis or comparison, or otherwise to analyze, recommend, and/or select sequences, alterations, etc.). Alternatively, or additionally, in some embodiments, “engineering” may involve use of in vitro chemical synthesis methodologies and/or recombinant nucleic acid technologies such as, for example, for example, nucleic acid amplification (e.g., via the polymerase chain reaction) hybridization, mutation, transformation, transfection, etc., and/or any of a variety of controlled mating methodologies. As will be appreciated by those skilled in the art, a variety of established such techniques (e.g., for recombinant DNA, oligonucleotide synthesis, and tissue culture and transformation (e.g., electroporation, lipofection, etc.) are well known in the art and described in various general and more specific references that are cited and/or discussed throughout the present specification. See e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989).

Gene: refers to a DNA sequence in a chromosome that codes for a product (e.g., an RNA product and/or a polypeptide product). In some embodiments, a gene includes coding sequence (i.e., sequence that encodes a particular product). In some embodiments, a gene includes non-coding sequence. In some particular embodiments, a gene may include both coding (e.g., exonic) and non-coding (e.g., intronic) sequence. In some embodiments, a gene may include one or more regulatory sequences (e.g., promoters, enhancers, etc.) and/or intron sequences that, for example, may control or impact one or more aspects of gene expression (e.g., cell-type-specific expression, inducible expression, etc.). For the purpose of clarity we note that, as used in the present application, the term “gene” generally refers to a portion of a nucleic acid that encodes a polypeptide; the term may optionally encompass regulatory sequences, as will be clear from context to those of ordinary skill in the art. This definition is not intended to exclude application of the term “gene” to non-protein-coding expression units but rather to clarify that, in most cases, the term as used in this document refers to a polypeptide-coding nucleic acid.

The phrase “gene segment,” or “segment” includes reference to a V (light or heavy) or D or J (light or heavy) immunoglobulin gene segment, which includes unrearranged sequences at immunoglobulin loci (in e.g., humans and mice) that can participate in a rearrangement (mediated by, e.g., endogenous recombinases) to form a rearranged V/J (light) or V/D/J (heavy) sequence. Unless indicated otherwise, the V, D, and J segments comprise recombination signal sequences (RSS) that allow for V/J recombination or V/D/J recombination according to the 12/23 rule. Unless indicated otherwise, the segments further comprise sequences with which they are associated in nature or functional equivalents thereof (e.g., for V segments, promoter(s) and leader(s)).

The term “germline” in reference to an immunoglobulin nucleic acid sequence includes a nucleic acid sequence that can be passed to progeny, e.g., the germline genome that may be found in a germ cell.

The phrase “heavy chain,” or “immunoglobulin heavy chain” includes an immunoglobulin heavy chain sequence, including immunoglobulin heavy chain constant region sequence, from any organism. Heavy chain variable domains include three heavy chain complementarity determining regions (CDRs) and four FR regions, unless otherwise specified. Fragments of heavy chains include CDRs, CDRs and FRs, and combinations thereof. A typical heavy chain has, following the variable domain (from N-terminal to C-terminal), a CH1 domain, a hinge, a CH2 domain, a CH3 domain, and a CH4 domain (in the context of IgM or IgE). A functional fragment of a heavy chain includes a fragment that is capable of specifically recognizing an epitope (e.g., recognizing the epitope with a KD in the micromolar, nanomolar, or picomolar range), that is capable of expressing and secreting from a cell, and that comprises at least one CDR. A heavy chain variable domain is encoded by a variable region gene sequence, which generally comprises V_H, D_H, and J_Hsegments derived from a repertoire of V_H, D_H, and J_Hsegments present in the germline. Sequences, locations and nomenclature for V, D, and J heavy chain segments for various organisms can be viewed at the website of the International Immunogenetics Information System (IMGT) found at www.imgt.org.

The phrase “light chain” includes an immunoglobulin light chain sequence from any organism, and unless otherwise specified includes human kappa and lambda light chains and a VpreB, as well as surrogate light chains. Light chain variable domains typically include three light chain complementarity determining regions (CDRs) and four framework (FR) regions, unless otherwise specified. Generally, a full-length light chain includes, from amino terminus to carboxyl terminus, a variable domain that includes FR1-CDR1-FR2-CDR2-FR3-CDR3-FR4, and a light chain constant region. A light chain variable domain is encoded by a light chain variable region gene sequence, which generally comprises V_Land J_Lgene segments, derived from a repertoire of V_Land J_Lgene segments present in the germline. Sequences, locations and nomenclature for V and J light chain segments for various organisms can be viewed at the website of the International Immunogenetics Information System (IMGT) found at www.imgt.org. Light chains include those, e.g., that do not selectively bind either a first or a second epitope selectively bound by the epitope-binding protein in which they appear. Light chains also include those that bind and recognize, or assist the heavy chain with binding and recognizing, one or more epitopes selectively bound by the epitope-binding protein in which they appear. The phrase light chain includes a “common light chain,” also referred to as a “universal light chain” (ULC).

Common or universal light chains (ULCs) include those derived from an immunoglobulin light chain locus comprising a single rearranged immunoglobulin light chain variable region encoding sequence operably linked with a light chain constant region, wherein expression of the immunoglobulin light chain locus produces only a light chain derived from the single rearranged immunoglobulin light chain variable region operably linked to the light chain constant region regardless of the inclusion of other nucleic acid sequences, e.g., other light chain gene segments, in the immunoglobulin light chain locus. Universal light chains include human Vκ1-39Jκ gene (e.g., Vκ1-39Jκ5 gene) or a human Vκ3-20Jκ gene (e.g., Vκ3-20Jκ1 gene), and include somatically mutated (e.g., affinity matured) versions of the same.

Heterologous: refers to an agent or entity from a different source. For example, when used in reference to a polypeptide, gene, or gene product present in a particular cell or organism, the term clarifies that the relevant polypeptide or fragment thereof, gene or fragment thereof, or gene product or fragment thereof: 1) was engineered by the hand of man; 2) was introduced into the cell or organism (or a precursor thereof) through the hand of man (e.g., via genetic engineering); and/or 3) is not naturally produced by or present in the relevant cell or organism (e.g., the relevant cell type or organism type). As used herein, the term “heterologous” also includes a polypeptide or fragment thereof, gene or fragment thereof, or gene product or fragment thereof that is normally present in a particular native cell or organism, but has been modified, for example, by mutation or placement under the control of non-naturally associated and, in some embodiments, non-endogenous regulatory elements (e.g., a promoter).

Host cell: refers to a cell into which a heterologous (e.g., exogenous) nucleic acid or protein has been introduced. Persons of skill upon reading this disclosure will understand that such terms refer not only to the particular subject cell, but also is used to refer to the progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term “host cell” as used herein. In some embodiments, a host cell is or comprises a prokaryotic or eukaryotic cell. In general, a host cell is any cell that is suitable for receiving and/or producing a heterologous nucleic acid or protein, regardless of the Kingdom of life to which the cell is designated. Exemplary cells include those of prokaryotes and eukaryotes (single-cell or multiple-cell), bacterial cells (e.g., strains of Escherichia coli, Bacillus spp., Streptomyces spp., etc.), mycobacteria cells, fungal cells, yeast cells (e.g., Saccharomyces cerevisiae, Schizosaccharomyces pombe, Pichia pastoris, Pichia methanolica, etc.), plant cells, insect cells (e.g., SF-9, SF-21, baculovirus-infected insect cells, Trichoplusia ni, etc.), non-human animal cells, human cells, or cell fusions such as, for example, hybridomas or quadromas. In some embodiments, the cell is a human, monkey, ape, hamster, rat, or mouse cell. In some embodiments, the cell is eukaryotic and is selected from the following cells: CHO (e.g., CHO Kl, DXB-11 CHO, Veggie-CHO), COS (e.g., COS-7), retinal cell, Vero, CV1, kidney (e.g., HEK293, 293 EBNA, MSR 293, MDCK, HaK, BHK), HeLa, HepG2, WI38, MRC 5, Colo205, HB 8065, HL-60, (e.g., BHK21), Jurkat, Daudi, A431 (epidermal), CV-1, U937, 3T3, L cell, C127 cell, SP2/0, NS-0, MMT 060562, Sertoli cell, BRL 3A cell, HT1080 cell, myeloma cell, tumor cell, and a cell line derived from an aforementioned cell. In some embodiments, the cell comprises one or more viral genes, e.g., a retinal cell that expresses a viral gene (e.g., a PER.C6® cell). In some embodiments, a host cell is or comprises an isolated cell. In some embodiments, a host cell is part of a tissue. In some embodiments, a host cell is part of an organism.

Identity: used in connection with a comparison of sequences, refers to identity as determined by a number of different algorithms known in the art that can be used to measure nucleotide and/or amino acid sequence identity. In some embodiments, identities as described herein are determined using a ClustalW v. 1.83 (slow) alignment employing an open gap penalty of 10.0, an extend gap penalty of 0.1, and using a Gonnet similarity matrix (MACVECTOR™ 10.0.2, MacVector Inc., 2008).

In vitro: refers to events that occur in an artificial environment, e.g., in a test tube or reaction vessel, in cell culture, etc., rather than within a multi-cellular organism.

In vivo: refers to events that occur within a multi-cellular organism, such as a human and a non-human animal. In the context of cell-based systems, the term may be used to refer to events that occur within a living cell (as opposed to, for example, in vitro systems).

Isolated: refers to a substance and/or entity that has been (1) separated from at least some of the components with which it was associated when initially produced (whether in nature and/or in an experimental setting), and/or (2) designed, produced, prepared, and/or manufactured by the hand of man. Isolated substances and/or entities may be separated from about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, or more than about 99% of the other components with which they were initially associated. In some embodiments, isolated agents are about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, or more than about 99% pure. As used herein, a substance is “pure” if it is substantially free of other components. In some embodiments, as will be understood by those skilled in the art, a substance may still be considered “isolated” or even “pure”, after having been combined with certain other components such as, for example, one or more carriers or excipients (e.g., buffer, solvent, water, etc.); in such embodiments, percent isolation or purity of the substance is calculated without including such carriers or excipients. To give but one example, in some embodiments, a biological polymer such as a polypeptide or polynucleotide that occurs in nature is considered to be “isolated” when: a) by virtue of its origin or source of derivation is not associated with some or all of the components that accompany it in its native state in nature; b) it is substantially free of other polypeptides or nucleic acids of the same species from the species that produces it in nature; or c) is expressed by or is otherwise in association with components from a cell or other expression system that is not of the species that produces it in nature. Thus, for instance, in some embodiments, a polypeptide that is chemically synthesized or is synthesized in a cellular system different from that which produces it in nature is considered to be an “isolated” polypeptide. Alternatively or additionally, in some embodiments, a polypeptide that has been subjected to one or more purification techniques may be considered to be an “isolated” polypeptide to the extent that it has been separated from other components: a) with which it is associated in nature; and/or b) with which it was associated when initially produced.

Non-human animal: refers to any vertebrate organism that is not a human. In some embodiments, a non-human animal is a cyclostome, a bony fish, a cartilaginous fish (e.g., a shark or a ray), an amphibian, a reptile, a mammal, and a bird. In some embodiments, a non-human mammal is a primate, a goat, a sheep, a pig, a dog, a cow, or a rodent. In some embodiments, a non-human animal is a rodent such as a rat or a mouse.

Nucleic acid: in its broadest sense, refers to any compound and/or substance that is or can be incorporated into an oligonucleotide chain. In some embodiments, a “nucleic acid” is a compound and/or substance that is or can be incorporated into an oligonucleotide chain via a phosphodiester linkage. As will be clear from context, in some embodiments, “nucleic acid” refers to individual nucleic acid residues (e.g., nucleotides and/or nucleosides); in some embodiments, “nucleic acid” refers to an oligonucleotide chain comprising individual nucleic acid residues. In some embodiments, a “nucleic acid” is or comprises RNA; in some embodiments, a “nucleic acid” is or comprises DNA. In some embodiments, a “nucleic acid” is, comprises, or consists of one or more natural nucleic acid residues. In some embodiments, a “nucleic acid” is, comprises, or consists of one or more nucleic acid analogs. In some embodiments, a nucleic acid analog differs from a “nucleic acid” in that it does not utilize a phosphodiester backbone. For example, in some embodiments, a “nucleic acid” is, comprises, or consists of one or more “peptide nucleic acids”, which are known in the art and have peptide bonds instead of phosphodiester bonds in the backbone. Alternatively, or additionally, in some embodiments, a “nucleic acid” has one or more phosphorothioate and/or 5′-N-phosphoramidite linkages rather than phosphodiester bonds. In some embodiments, a “nucleic acid” is, comprises, or consists of one or more natural nucleosides (e.g., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine). In some embodiments, a “nucleic acid” is, comprises, or consists of one or more nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C-5 propynyl-cytidine, C-5 propynyl-uridine, 2-aminoadenosine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 2-aminoadenosine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O(6)-methylguanine, 2-thiocytidine, methylated bases, intercalated bases, and combinations thereof). In some embodiments, a “nucleic acid” comprises one or more modified sugars (e.g., 2′-fluororibose, ribose, 2′-deoxyribose, arabinose, and hexose) as compared with those in natural nucleic acids. In some embodiments, a “nucleic acid” has a nucleotide sequence that encodes a functional gene product such as an RNA or protein. In some embodiments, a “nucleic acid” has a nucleotide sequence that encodes polypeptide fragment (e.g., a peptide). In some embodiments, a “nucleic acid” includes one or more introns. In some embodiments, a “nucleic acid” includes one or more exons. In some embodiments, a “nucleic acid” includes one or more coding sequences. In some embodiments, a “nucleic acid” is prepared by one or more of isolation from a natural source, enzymatic synthesis by polymerization based on a complementary template (in vivo or in vitro), reproduction in a recombinant cell or system, and chemical synthesis. In some embodiments, a “nucleic acid” is at least 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 20, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000 or more residues long. In some embodiments, a “nucleic acid” is single stranded; in some embodiments, a “nucleic acid” is double stranded. In some embodiments, a “nucleic acid” has a nucleotide sequence comprising at least one element that encodes, or is the complement of a sequence that encodes, a polypeptide or fragment thereof. In some embodiments, a “nucleic acid” has enzymatic activity.

Operably linked: refers to a juxtaposition wherein the components described are in a relationship permitting them to function in their intended manner. A control sequence “operably linked” to a coding sequence is ligated in such a way that expression of the coding sequence is achieved under conditions compatible with the control sequences. “Operably linked” sequences include both expression control sequences that are contiguous with the gene of interest and expression control sequences that act in trans or at a distance to control the gene of interest. The term “expression control sequence”, as used herein, refers to polynucleotide sequences, which are necessary to affect the expression and processing of coding sequences to which they are ligated. “Expression control sequences” include: appropriate transcription initiation, termination, promoter and enhancer sequences; efficient RNA processing signals such as splicing and polyadenylation signals; sequences that stabilize cytoplasmic mRNA; sequences that enhance translation efficiency (i.e., Kozak consensus sequence); sequences that enhance protein stability; and when desired, sequences that enhance protein secretion. The nature of such control sequences differs depending upon the host organism. For example, in prokaryotes, such control sequences generally include promoter, ribosomal binding site and transcription termination sequence, while in eukaryotes typically, such control sequences include promoters and transcription termination sequence. The term “control sequences” is intended to include components whose presence is essential for expression and processing, and can also include additional components whose presence is advantageous, for example, leader sequences and fusion partner sequences.

Physiological conditions: includes its art-understood meaning referencing conditions under which cells or organisms live and/or reproduce. In some embodiments, the term refers to conditions of the external or internal milieu that may occur in nature for an organism or cell system. In some embodiments, physiological conditions are those conditions present within the body of a human or non-human animal, especially those conditions present at and/or within a surgical site. Physiological conditions typically include, e.g., a temperature range of 20−40° C., atmospheric pressure of 1, pH of 6-8, glucose concentration of 1-20 mM, oxygen concentration at atmospheric levels, and gravity as it is encountered on earth. In some embodiments, conditions in a laboratory are manipulated and/or maintained at physiological conditions. In some embodiments, physiological conditions are encountered in an organism (e.g., non-human animal).

Polypeptide: refers to any polymeric chain of amino acids. In some embodiments, a polypeptide has an amino acid sequence that occurs in nature. In some embodiments, a polypeptide has an amino acid sequence that does not occur in nature. In some embodiments, a polypeptide has an amino acid sequence that contains portions that occur in nature separately from one another (i.e., from two or more different organisms, for example, human and non-human portions). In some embodiments, a polypeptide has an amino acid sequence that is engineered in that it is designed and/or produced through action of the hand of man. In some embodiments, a polypeptide may comprise or consist of a plurality of fragments, each of which is found in the same parent polypeptide in a different spatial arrangement relative to one another than is found in the polypeptide of interest (e.g., fragments that are directly linked in the parent may be spatially separated in the polypeptide of interest or vice versa, and/or fragments may be present in a different order in the polypeptide of interest than in the parent), so that the polypeptide of interest is a derivative of its parent polypeptide.

Recombinant: refers to polypeptides that are designed, engineered, prepared, expressed, created or isolated by recombinant means, such as polypeptides expressed using a recombinant expression vector transfected into a host cell, polypeptides isolated from a recombinant, combinatorial human polypeptide library (Hoogenboom H. R., 1997 TIB Tech. 15:62-70; Hoogenboom H., and Chames P., 2000, Immunology Today 21:371-378; Azzazy H., and Highsmith W. E., 2002, Clin. Biochem. 35:425-445; Gavilondo J. V., and Larrick J. W., 2002, BioTechniques 29:128-145), antibodies isolated from an animal (e.g., a mouse) that is transgenic for human immunoglobulin genes (see e.g., Taylor, L. D., et al., 1992, Nucl. Acids Res. 20:6287-6295; Little M. et al., 2000, Immunology Today 21:364-370; Kellermann S. A. and Green L. L., 2002, Current Opinion in Biotechnology 13:593-597; Murphy, A. J., et al., 2014, Proc. Natl. Acad. Sci. U.S.A 111(14):5153-5158) or polypeptides prepared, expressed, created or isolated by any other means that involves splicing selected sequence elements to one another. In some embodiments, one or more of such selected sequence elements is found in nature. In some embodiments, one or more of such selected sequence elements is designed in silico. In some embodiments, one or more such selected sequence elements result from mutagenesis (e.g., in vivo or in vitro) of a known sequence element, e.g., from a natural or synthetic source. For example, in some embodiments, a recombinant polypeptide comprises sequences found in the genome (or polypeptide) of a source organism of interest (e.g., human, mouse, etc.). In some embodiments, a recombinant polypeptide comprises sequences that occur in nature separately from one another (i.e., from two or more different organisms, for example, human and non-human portions) in two different organisms (e.g., a human and a non-human organism). In some embodiments, a recombinant polypeptide has an amino acid sequence that resulted from mutagenesis (e.g., in vitro or in vivo, for example in a non-human animal), so that the amino acid sequences of the recombinant polypeptides are sequences that, while originating from and related to polypeptide sequences, may not naturally exist within the genome of a non-human animal in vivo.

Reference: is intended to describe a standard or control agent, animal, cohort, individual, population, sample, sequence or value against which an agent, animal, cohort, individual, population, sample, sequence or value of interest is compared. In some embodiments, a reference agent, animal, cohort, individual, population, sample, sequence or value is tested and/or determined substantially simultaneously with the testing or determination of the agent, animal, cohort, individual, population, sample, sequence or value of interest. In some embodiments, a reference agent, animal, cohort, individual, population, sample, sequence or value is a historical reference, optionally embodied in a tangible medium. In some embodiments, a reference may refer to a control. As used herein, a “reference” may refer to a “reference animal”. A “reference animal” may have a modification as described herein, a modification that is different as described herein or no modification (i.e., a wild-type animal). Typically, as would be understood by those skilled in the art, a reference agent, animal, cohort, individual, population, sample, sequence or value is determined or characterized under conditions comparable to those utilized to determine or characterize the agent, animal (e.g., a mammal), cohort, individual, population, sample, sequence or value of interest.

Immunoglobulins participate in a cellular mechanism, termed somatic hypermutation, which produces affinity-matured antibody variants characterized by high affinity to their target. Although somatic hypermutation largely occurs within the CDRs of antibody variable regions, mutations are preferentially targeted to certain sequence motifs that are referred to as hot spots, e.g., RGYW activation-induced cytidine deaminase (AID) hotspots (see, e.g., Li, Z. et al., 2004, Genes Dev. 18:1-11; Teng, G. and F. N. Papavasiliou, 2007, Annu. Rev. Genet. 41:107-20; hereby incorporated by reference). The non-immunoglobulin peptides of interest, or portion thereof, disclosed herein useful for the generation of an engineered D_Hregion may comprise one or more natural and/or artificial hotspots. The phrase “somatically mutated” includes reference to a nucleic acid sequence from a B cell that has undergone class-switching, wherein the nucleic acid sequence of an immunoglobulin variable region (e.g., nucleotide sequence encoding a heavy chain variable domain or including a heavy chain CDR or FR sequence) in the class-switched B cell is not identical to the nucleic acid sequence in the B cell prior to class-switching, such as, for example, a difference in a CDR or framework nucleic acid sequence between a B cell that has not undergone class-switching and a B cell that has undergone class-switching. “Somatically mutated” includes reference to nucleic acid sequences from affinity-matured B cells that are not identical to corresponding immunoglobulin variable region sequences in B cells that are not affinity-matured (i.e., sequences in the genome of germline cells). The phrase “somatically mutated” also includes reference to an immunoglobulin variable region nucleic acid sequence from a B cell after exposure of the B cell to an epitope of interest, wherein the nucleic acid sequence differs from the corresponding nucleic acid sequence prior to exposure of the B cell to the epitope of interest. The phrase “somatically mutated” refers to sequences from binding proteins that have been generated in an animal, e.g., a mouse having human immunoglobulin variable region nucleic acid sequences, in response to an immunogen challenge, and that result from the selection processes inherently operative in such an animal.

Substantially: refers to the qualitative condition of exhibiting total or near-total extent or degree of a characteristic or property of interest. One of ordinary skill in the biological arts will understand that biological and chemical phenomena rarely, if ever, go to completion and/or proceed to completeness or achieve or avoid an absolute result. The term “substantially” is therefore used herein to capture the potential lack of completeness inherent in many biological and chemical phenomena.

Substantial homology: refers to a comparison between amino acid or nucleic acid sequences. As will be appreciated by those of ordinary skill in the art, two sequences are generally considered to be “substantially homologous” if they contain homologous residues in corresponding positions. Homologous residues may be identical residues. Alternatively, homologous residues may be non-identical residues with appropriately similar structural and/or functional characteristics. For example, as is well known by those of ordinary skill in the art, certain amino acids are typically classified as “hydrophobic” or “hydrophilic” amino acids, and/or as having “polar” or “non-polar” side chains. Substitution of one amino acid for another of the same type may often be considered a “homologous” substitution. Typical amino acid categorizations are summarized below.

Alanine
Ala
A
Nonpolar
Neutral
1.8

Arginine
Arg
R
Polar
Positive
−4.5

Asparagine
Asn
N
Polar
Neutral
−3.5

Aspartic acid
Asp
D
Polar
Negative
−3.5

Cysteine
Cys
C
Nonpolar
Neutral
2.5

Glutamic
Glu
E
Polar
Negative
−3.5

acid

Glutamine
Gln
Q
Polar
Neutral
−3.5

Glycine
Gly
G
Nonpolar
Neutral
−0.4

Histidine
His
H
Polar
Positive
−3.2

Isoleucine
Ile
I
Nonpolar
Neutral
4.5

Leucine
Leu
L
Nonpolar
Neutral
3.8

Lysine
Lys
K
Polar
Positive
−3.9

Methionine
Met
M
Nonpolar
Neutral
1.9

Phenylalanine
Phe
F
Nonpolar
Neutral
2.8

Proline
Pro
P
Nonpolar
Neutral
−1.6

Serine
Ser
S
Polar
Neutral
−0.8

Threonine
Thr
T
Polar
Neutral
−0.7

Tryptophan
Trp
W
Nonpolar
Neutral
−0.9

Tyrosine
Tyr
Y
Polar
Neutral
−1.3

Valine
Val
V
Nonpolar
Neutral
4.2

Ambiguous Amino Acids
3-Letter
1-Letter

Asparagine or aspartic acid
Asx
B

Glutamine or glutamic acid
Glx
Z

Leucine or Isoleucine
Xle
J

Unspecified or unknown amino acid
Xaa
X

As is well known in this art, amino acid or nucleic acid sequences may be compared using any of a variety of algorithms, including those available in commercial computer programs such as BLASTN for nucleotide sequences and BLASTP, gapped BLAST, and PSI-BLAST for amino acid sequences. Exemplary such programs are described in Altschul, S. F. et al., 1990, J. Mol. Biol., 215(3): 403-410; Altschul, S. F. et al., 1997, Methods in Enzymology; Altschul, S. F. et al., 1997, Nucleic Acids Res., 25:3389-3402; Baxevanis, A. D., and B. F. F. Ouellette (eds.) Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Wiley, 1998; and Misener et al. (eds.) Bioinformatics Methods and Protocols (Methods in Molecular Biology, Vol. 132), Humana Press, 1998. In addition to identifying homologous sequences, the programs mentioned above typically provide an indication of the degree of homology. In some embodiments, two sequences are considered to be substantially homologous if at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more of their corresponding residues are homologous over a relevant stretch of residues. In some embodiments, the relevant stretch is a complete sequence. In some embodiments, the relevant stretch is at least 9, 10, 11, 12, 13, 14, 15, 16, 17 or more residues. In some embodiments, the relevant stretch includes contiguous residues along a complete sequence. In some embodiments, the relevant stretch includes discontinuous residues along a complete sequence, for example, noncontiguous residues brought together by the folded conformation of a polypeptide or a portion thereof. In some embodiments, the relevant stretch is at least 10, 15, 20, 25, 30, 35, 40, 45, 50, or more residues.

Substantial identity: refers to a comparison between amino acid or nucleic acid sequences. As will be appreciated by those of ordinary skill in the art, two sequences are generally considered to be “substantially identical” if they contain identical residues in corresponding positions. As is well known in this art, amino acid or nucleic acid sequences may be compared using any of a variety of algorithms, including those available in commercial computer programs such as BLASTN for nucleotide sequences and BLASTP, gapped BLAST, and PSI-BLAST for amino acid sequences. Exemplary such programs are described in Altschul, S. F. et al., 1990, J. Mol. Biol., 215(3): 403-410; Altschul, S. F. et al., 1997, Methods in Enzymology; Altschul, S. F. et al., 1997, Nucleic Acids Res., 25:3389-3402; Baxevanis, A. D., and B. F. F. Ouellette (eds.) Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Wiley, 1998; and Misener et al. (eds.) Bioinformatics Methods and Protocols (Methods in Molecular Biology, Vol. 132), Humana Press, 1998. In addition to identifying identical sequences, the programs mentioned above typically provide an indication of the degree of identity. In some embodiments, two sequences are considered to be substantially identical if at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more of their corresponding residues are identical over a relevant stretch of residues. In some embodiments, the relevant stretch is a complete sequence. In some embodiments, the relevant stretch is at least 10, 15, 20, 25, 30, 35, 40, 45, 50, or more residues.

Transformation: refers to any process by which exogenous DNA is introduced into a host cell. Transformation may occur under natural or artificial conditions using various methods well known in the art. Transformation may rely on any known method for the insertion of foreign nucleic acid sequences into a prokaryotic or eukaryotic host cell. In some embodiments, a particular transformation methodology is selected based on the host cell being transformed and may include, but is not limited to, viral infection, electroporation, mating, lipofection. In some embodiments, a “transformed” cell is stably transformed in that the inserted DNA is capable of replication either as an autonomously replicating plasmid or as part of the host chromosome. In some embodiments, a transformed cell transiently expresses introduced nucleic acid for limited periods of time.

Targeting vector or targeting construct: refers to a polynucleotide molecule that comprises a targeting region. A targeting region comprises a sequence that is identical or substantially identical to a sequence in a target cell, tissue or animal and provides for integration of the targeting construct into a position within the genome of the cell, tissue or animal via homologous recombination. Targeting regions that target using site-specific recombinase recognition sites (e.g., loxP or Frt sites) are also included. In some embodiments, a targeting construct as described herein further comprises a nucleic acid sequence or gene of particular interest, a selectable marker, control and or regulatory sequences, and other nucleic acid sequences that allow for recombination mediated through exogenous addition of proteins that aid in or facilitate recombination involving such sequences. In some embodiments, a targeting construct further comprises a gene of interest in whole or in part, wherein the gene of interest is a heterologous gene that encodes a polypeptide, in whole or in part, that has a similar function as a protein encoded by an endogenous sequence. In some embodiments, a targeting construct further comprises a humanized gene of interest, in whole or in part, wherein the humanized gene of interest encodes a polypeptide, in whole or in part, that has a similar function as a polypeptide encoded by an endogenous sequence. In some embodiments, a targeting construct (or targeting vector) may comprise a nucleic acid sequence manipulated by the hand of man. For example, in some embodiments, a targeting construct (or targeting vector) may be constructed to contain an engineered or recombinant polynucleotide that contains two or more sequences that are not linked together in that order in nature yet manipulated by the hand of man to be directly linked to one another in the engineered or recombinant polynucleotide.

Transgene or transgene construct: refers to a nucleic acid sequence (encoding e.g., a polypeptide of interest, in whole or in part) that has been introduced into a cell by the hand of man such as by the methods described herein. A transgene could be partly or entirely heterologous, i.e., foreign, to the transgenic animal or cell into which it is introduced. A transgene can include one or more transcriptional regulatory sequences and any other nucleic acid, such as introns or promoters, which may be necessary for expression of a selected nucleic acid sequence.

Transgenic animal, transgenic non-human animal or Tg⁺: may be used interchangeably and refer to any non-naturally occurring non-human animal in which one or more of the cells of the non-human animal contain heterologous nucleic acid and/or gene encoding a polypeptide of interest, in whole or in part. In some embodiments, a heterologous nucleic acid and/or gene is introduced into the cell, directly or indirectly by introduction into a precursor cell, by way of deliberate genetic manipulation, such as by microinjection or by infection with a recombinant virus. The term genetic manipulation does not include classic breeding techniques, but rather is directed to introduction of recombinant DNA molecule(s). This molecule may be integrated within a chromosome, or it may be extrachromosomally replicating DNA. The term “Tg⁺” includes animals that are heterozygous or homozygous for a heterologous nucleic acid and/or gene, and/or animals that have single or multi-copies of a heterologous nucleic acid and/or gene.

Variant: refers to an entity that shows significant structural identity with a reference entity, but differs structurally from the reference entity in the presence or level of one or more chemical moieties as compared with the reference entity. In many embodiments, a “variant” also differs functionally from its reference entity. In general, whether a particular entity is properly considered to be a “variant” of a reference entity is based on its degree of structural identity with the reference entity. As will be appreciated by those skilled in the art, any biological or chemical reference entity has certain characteristic structural elements. A “variant”, by definition, is a distinct chemical entity that shares one or more such characteristic structural elements. To give but a few examples, a small molecule may have a characteristic core structural element (e.g., a macrocycle core) and/or one or more characteristic pendent moieties so that a variant of the small molecule is one that shares the core structural element and the characteristic pendent moieties but differs in other pendent moieties and/or in types of bonds present (single vs. double, E vs. Z, etc.) within the core, a polypeptide may have a characteristic sequence element comprised of a plurality of amino acids having designated positions relative to one another in linear or three-dimensional space and/or contributing to a particular biological function, a nucleic acid may have a characteristic sequence element comprised of a plurality of nucleotide residues having designated positions relative to on another in linear or three-dimensional space. For example, a “variant polypeptide” may differ from a reference polypeptide as a result of one or more differences in amino acid sequence and/or one or more differences in chemical moieties (e.g., carbohydrates, lipids, etc.) covalently attached to the polypeptide backbone. In some embodiments, a “variant polypeptide” shows an overall sequence identity with a reference polypeptide that is at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, or 99%. Alternatively or additionally, in some embodiments, a “variant polypeptide” does not share at least one characteristic sequence element with a reference polypeptide. In some embodiments, the reference polypeptide has one or more biological activities. In some embodiments, a “variant polypeptide” shares one or more of the biological activities of the reference polypeptide. In some embodiments, a “variant polypeptide” lacks one or more of the biological activities of the reference polypeptide. In some embodiments, a “variant polypeptide” shows a reduced level of one or more biological activities as compared with the reference polypeptide. In many embodiments, a polypeptide of interest is considered to be a “variant” of a parent or reference polypeptide if the polypeptide of interest has an amino acid sequence that is identical to that of the parent but for a small number of sequence alterations at particular positions. Typically, fewer than 20%, 15%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, or 2% of the residues in the variant are substituted as compared with the parent. In some embodiments, a “variant” has 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 substituted residue(s) as compared with a parent. Often, a “variant” has a very small number (e.g., fewer than 5, 4, 3, 2, or 1) number of substituted functional residues (i.e., residues that participate in a particular biological activity). Furthermore, a “variant” typically has not more than 5, 4, 3, 2, or 1 additions or deletions, and often has no additions or deletions, as compared with the parent. Moreover, any additions or deletions are typically fewer than about 25, about 20, about 19, about 18, about 17, about 16, about 15, about 14, about 13, about 10, about 9, about 8, about 7, about 6, and commonly are fewer than about 5, about 4, about 3, or about 2 residues. In some embodiments, the parent or reference polypeptide is one found in nature. As will be understood by those of ordinary skill in the art, a plurality of variants of a particular polypeptide of interest may commonly be found in nature, particularly when the polypeptide of interest is an infectious agent polypeptide.

Vector: refers to a nucleic acid molecule capable of transporting another nucleic acid to which it is associated. In some embodiment, vectors are capable of extra-chromosomal replication and/or expression of nucleic acids to which they are linked in a host cell such as a eukaryotic and/or prokaryotic cell. Vectors capable of directing the expression of operably linked genes are referred to herein as “expression vectors.”

Wild-type: includes its art-understood meaning that refers to an entity having a structure and/or activity as found in nature in a “normal” (as contrasted with mutant, diseased, altered, etc.) state or context. Those of ordinary skill in the art will appreciate that wild-type genes and polypeptides often exist in multiple different forms (e.g., alleles).

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

Disclosed herein are, among other things, transgenic non-human animals having heterologous genetic material encoding one or more portions (functional fragments, binding portions, etc.) of a polypeptide of interest, which heterologous genetic material is inserted into the diversity cluster (i.e., D_Hregion) of an immunoglobulin heavy chain variable region so that the heterologous genetic material is operably linked with heavy chain variable (V_H) and joining (J_H) segments. It is contemplated that such non-human animals demonstrate a capacity to generate antibodies to intractable disease targets. It is also contemplated that such non-human animals demonstrate an antibody population characterized by heavy chain variable regions having an increase in CDR3 diversity as compared to an antibody population having immunoglobulin heavy chain variable CDR3 diversity generated from traditional immunoglobulin D_Hgene segments (or immunoglobulin D_Hgene segments that appear in nature). Therefore, the non-human animals described herein may be useful for the development of antibody-based therapeutics that bind particular antigens, in particular, antigens associated with low and/or poor immunogenicity. In particular, disclosed herein is the introduction of exemplary nucleotide coding sequences that each encode a portion of a polypeptide of interest (e.g., an extracellular portion of an atypical chemokine receptor, a portion of a conotoxin, or a portion of a tarantula toxin) into the D_Hregion of an immunoglobulin heavy chain variable region resulting in expression of antibodies having heavy chain variable regions and, in particular, CDR3 regions, generated from V(D)J recombination involving an inserted nucleotide coding sequence. In some embodiments, inserted nucleotide coding sequences of a polypeptide of interest (e.g., an atypical chemokine receptor (ACKR), conotoxin, tarantula toxin, or combinations thereof) replace all or substantially all traditional D_Hsegments (i.e., wild-type D_Hsegments) within an immunoglobulin heavy chain diversity cluster (i.e., D_Hregion) as described herein. In some embodiments, inserted nucleotide coding sequences of a polypeptide of interest (e.g., an atypical chemokine receptor (ACKR), conotoxin, tarantula toxin, or combinations thereof) partially replace one or more traditional D_Hsegments within an immunoglobulin heavy chain diversity cluster (i.e., D_Hregion) as described herein. In some embodiments, nucleotide coding sequences of a polypeptide of interest (e.g., an atypical chemokine receptor (ACKR), conotoxin, tarantula toxin, or combinations thereof) are inserted into one or more traditional D_Hsegments within an immunoglobulin heavy chain diversity cluster (i.e., D_Hregion) as described herein, so that said nucleotide coding sequences are flanked by sequences that are normally or naturally found in or associated with the one or more traditional D_Hsegments. In some embodiments, one or more traditional D_Hsegments remain intact within an immunoglobulin heavy chain diversity cluster as described herein. In some embodiments, one or more traditional D_Hsegments are deleted, removed or otherwise rendered non-functional from an immunoglobulin heavy chain diversity cluster as described herein. In some certain embodiments, an immunoglobulin heavy chain diversity cluster as described herein lacks all or substantially all traditional D_Hsegments. In some certain embodiments, an immunoglobulin heavy chain diversity cluster as described herein comprises synthetic D_Hsegments made or generated using nucleotide coding sequences described herein. Such transgenic non-human animals provide an in vivo system for identifying and developing antibodies and/or antibody-based therapeutics that bind disease targets beyond the targeting capabilities of established drug discovery technologies. Further, such transgenic non-human animals provide a useful animal model system for the development of antibodies and/or antibody-based therapeutics centered on or designed for disrupting protein-protein interactions that are central to various diseases and/or disease pathologies that affect humans.

In some embodiments, non-human animals described herein comprise an immunoglobulin heavy chain variable region containing an engineered diversity cluster (i.e., an engineered D_Hregion) characterized by the presence of one or more nucleotide coding sequences corresponding to a portion(s) of a polypeptide of interest such as, for example, an extracellular domain of an ACKR (e.g., a D6 chemokine decoy receptor), a toxin (e.g., an ion channel blocker such as, for example, a conotoxin, spider toxin, tarantula toxin, sea anemone toxin, or scorpion toxin), a G-protein-coupled receptor, long heavy chain CDRs of selected antibodies (e.g., neutralizing antibodies that bind viruses including, for example, HIV, HCV, HPV, influenza, etc.), glucagon-like peptide-1 receptor agonists (e.g., exenatide, liraglutide, lixisenatide, albiglutide, dulaglutide, taspoglutide, etc.), heavy chain diversity (D_H) gene segments from a non-human species (e.g., bird, chicken, cow, rabbit, swine, etc.), etc. In such embodiments, antibodies containing CDR3s generated from recombination involving such nucleotide coding sequences can be characterized as having increased diversity to direct binding to particular antigens (e.g., membrane-spanning polypeptides). In some embodiments, antibodies produced by non-human animals described herein have an immunoglobulin heavy chain variable region sequence that contains a CDR3 region corresponding to a peptide encoded by the one or more nucleotide coding sequences. In some embodiments, non-human animals described herein comprise heavy chain variable (V_H) and joining (J_H) gene segments operably linked with the one or more nucleotide coding sequences so that V(D)J recombination occurs between said V_H, J_Hand one or more nucleotide coding sequences to create a heavy chain variable region that binds an antigen of interest. In some embodiments, non-human animals described herein comprise a plurality of V_Hand J_Hgene segments operably linked to 5, 10, 15, 20, 25 or more (e.g., a plurality) nucleotide coding sequences at an immunoglobulin heavy chain variable region in the genome of the non-human animal. In many embodiments, V_Hand J_Hsegments are human V_Hand human J_Hgene segments. In some embodiments, non-human animals described herein further comprise a human or humanized immunoglobulin light chain locus (e.g., κ and/or λ) such that the non-human animals produce antibodies comprising human variable regions (i.e., heavy and light) and non-human constant regions. In some certain embodiments, said human or humanized immunoglobulin light chain locus comprises human V_Land J_Lgene segments operably linked to a rodent light chain constant region (e.g., a rodent Cκ or Cλ). In some embodiments, non-human animals described herein further comprise an immunoglobulin light chain locus as described in U.S. Patent Application Publication Nos. 2011-0195454 A1, 2012-0021409 A1, 2012-0192300 A1, 2013-0045492 A1, 2013-0185821 A1, 2013-0198880 A1, 2013-0302836 A1, 2015-0059009 A1; International Patent Application Publication Nos. WO 2011/097603, WO 2012/148873, WO 2013/134263, WO 2013/184761, WO 2014/160179, WO 2014/160202; all of which are hereby incorporated by reference).

Various aspects of the compositions and methods are described in detail in the following sections. The use of sections is not meant to limit any embodiment. Each section can apply to any embodiment specifically described. In this application, the use of “or” means “and/or” unless stated otherwise.

V(D)J Recombination

A series of recombination events, involving several genetic components, serves to assemble immunoglobulins from ordered arrangement of gene segments (e.g., V, D and J). This assembly of gene segments is known to be imprecise and, therefore, immunoglobulin diversity is achieved both by combination of different gene segments and formation of unique junctions through imprecise joining. Further diversity is generated through a process known as somatic hypermutation in which the variable region sequence of immunoglobulins is altered to increase affinity and specificity for antigen. The immunoglobulin is a Y-shaped polypeptide composed of two identical heavy and two identical light chains, each of which have two structural components: one variable domain and one constant domain. It is the variable domains of heavy and light chains that are formed by the assembly of gene segments, while constant domains are fused to variable domains through RNA splicing. Although the mechanism of assembling (or joining) gene segments is similar for heavy and light chains, only one joining event is required for light chains (i.e., V to J) while two are required for heavy chains (i.e., D to J and V to DJ).

The assembly of gene segments for heavy and light chain variable regions is guided by conserved noncoding DNA sequences that flank each gene segment, termed recombination signal sequences (RSSs), which ensure DNA rearrangements at precise locations relative to V, D and J coding sequences (see, e.g., Ramsden, D. A. et al., 1994, Nuc. Acids Res. 22(10):1785-96). Each RSS consists of a conserved block of seven nucleotides (heptamer) that is contiguous with a coding sequence (e.g., a V segment) followed by a conserved spacer (either 12 or 23 bp) and a second conserved block of nine nucleotides (nonamer). Although considerable sequence divergence among individuals is tolerated, the length of these sequences typically does not vary. Recombination between immunoglobulin gene segments follows a rule commonly referred to as the 12/23 rule, in which gene segments flanked by an RSS with a 12 bp spacer are typically joined to a gene segment flanked by a 23 bp spacer (see, e.g., Hiom, K. and M. Gellert, 1998, Mol. Cell. 1(7):1011-9). The sequence of an RSS has been reported to influence the efficiency and/or the frequency of recombination with a particular gene segment (see, e.g., Ramsden, D. A and G. E. Wu, 1991, Proc. Natl. Acad. Sci. U.S.A. 88:10721-5; Boubnov, N. V. et al., 1995, Nuc. Acids Res. 23:1060-7; Ezekiel, U. R. et al., 1995, Immunity 2:381-9; Sadofsky, M. et al., 1995, Genes Dev. 9:2193-9; Cuomo, C. A. et al., 1996, Mol. Cell Biol. 16:5683-90; Ramsden, D. A. et al., 1996, EMBO J 15:3197-3206). Indeed, many reports point to a highly biased and variable usage of gene segments, in particular, D_Hsegments, among individuals.

In some embodiments, non-human animals described herein comprise one or more RSSs flanking coding sequences that is or are optimized for recombination to a V and/or a J gene segment. In various embodiments, coding sequences are inserted into an immunoglobulin heavy chain locus in the place of (or within) traditional D_Hgene segments (or D_Hgene segments that appear in nature). Optimization of RSSs may be achieved using standard techniques known in the art such as, for example, site-directed mutagenesis of known RSS sequences or in silico generation of synthetic RSS sequences followed by de novo synthesis. To give but one example, an RSS that is associated with low or poor recombination efficiency and/or frequency may be optimized by comparison to an RSS that is associated with a high or optimal recombination efficiency and/or frequency. Recombination efficiency and/or frequency may be determined, in some embodiments, by usage frequencies of gene segments in a population of antibody sequences (e.g., from an individual or group of individuals; see e.g., Arnaout, R. et al., 2011, PLoS One 6(8):e22365; Glanville, J. et al., 2011, Proc. Natl. Acad. Sci. U.S.A. 108(50):20066-71). Thus, non-human animals described herein may, in some embodiments, comprise one or more optimized RSSs flanking coding sequences (or gene segments) so that recombination of coding sequences (or gene segments) occurs at equal or about equal frequencies. Exemplary optimized RSSs are set forth in FIG. 2.

Assembly of gene segments to form heavy and light chain variable regions results in the formation of antigen-binding regions (or sites) of immunoglobulins. Such antigen-binding regions are characterized, in part, by the presence of hypervariable regions, which are commonly referred to as complementary determining regions (CDRs). There are three CDRs for both heavy and light chains (i.e., for a total of six CDRs) with both CDR1 and CDR2 being entirely encoded by the V gene segment. CDR3, however, is encoded by the sequence resulting from the joining of the V and J segments for light chains, and the V, D and J segments for heavy chains. Thus, the additional gene segment employed during recombination to form a heavy chain variable region coding sequence significantly increases the diversity of the antigen-binding sites of heavy chains.

Chemokines and Chemokine Receptors

Immune and inflammatory responses are complex biological processes involving several types of immune cells and molecular components. Migration of leukocytes has been reported as an important factor in the initiation, maintenance and resolution of immune and inflammatory responses, some of which is achieved through the action of chemokines and their receptors. Indeed, several chemokines and corresponding receptors have been reported. Chemokine receptors have a structure characterized by a seven-transmembrane domain that couples to a G-protein for signal transduction in the intracellular compartment and are divided into multiple different families corresponding to the subsets of chemokines they bind: CC-chemokine receptors (β-chemokine receptors), CXC-chemokine receptors, CX3C-chemokine receptors and XC-chemokine receptors. In addition to these traditional chemokine receptors, other chemokine receptors (termed atypical chemokine receptors or ACKRs) that share a similar structure yet lacking the ability to initiate signaling in response to ligand binding have been reported (see, e.g., Bonecchi, R. et al., 2010, Curr. Top. Microbiol. Immunol. 341:15-36; Nibbs, R. J. B. and G. J. Graham, 2013, Nature Rev. 13:815-29).

Among ACKRs, ACKR2 (also known as CCBP2 and D6 chemokine decoy receptor) is a seven-transmembrane protein receptor similar to other G protein-coupled receptors and encoded by the CCBP2 gene (Nibbs, R. J. B. et al., 1997, J. Biol. Chem. 272(51):32078-83). In contrast to other chemokine receptors, ACKR2 contains a DKYLEIV motif in the second intracellular loop in place of the canonical DRYLAIV motif and is incapable of signaling through Gα_iproteins. ACKR2 is expressed on lymphatic endothelial cells of skin, gut and lung, as well as on B cells and dendritic cells. ACKR is a promiscuous receptor for many pro-inflammatory β-chemokines (CC chemokines), but does not bind constitutive CCL chemokines. ACKR2 has been suggested to be a scavenger receptor that internalizes CC chemokines and targets them for lysosomal degradation thereby limiting inflammatory responses via clearance of CC chemokines (Jamieson, T. et al., 2005, Nature Immunol. 6(4):403-11; Bonecchi, R. et al., supra; Hansell, C. A. H. et al., 2011, Immunol. Cell Biol. 89(2):197-206; Nibbs, R. J. B. and G. J. Graham, supra).

Peptide Toxins

Toxins are naturally occurring substances found in plants and animals that can be poisonous to humans. For example, cone snails produce neurotoxic peptides, called conotoxins, in their venom that modulate the activity of various receptors in humans. In particular, several conotoxins have been shown to modulate ion channels (e.g., Na_Vchannels). Typically, conotoxins are peptides 10 to 30 amino acids in length that include one or more disulfide bonds, and are characterized based on the target upon which they act: α-conotoxins (acetylcholine receptors), δ-conotoxins (voltage-gated sodium channels), κ-conotoxins (potassium channels), μ-conotoxins (voltage-gated sodium channels) and ω-conotoxins (voltage-gated calcium channels). Conotoxins are known to be highly polymorphic among species of snails and, as a result, the genes that encode them are not conserved (see, for example, Terlau, H. and B. M. Olivera, 2004, Physiol. Rev. 84(1):41-68; Biggs, J. S. et al., 2010, Mol. Phylogenet. Evol. 56(1):1-12; Olivera, B. M. et al., 2012, Ann. N.Y. Acad. Sci. 1267(1):61-70; Wong, E. S. and K. Belov, 2012, Gene 496(1):1-7).

Among conotoxins, μ-conotoxins have been reported to have two types of cysteine patterns and act on voltage-gated sodium channels in muscle tissue (Cruz, L. J. et al., 1985, J. Biol. Chem. 260(16):9280-8; Zeikus, R. D. et al., 1985, J. Biol. Chem. 260(16):9280-8; McIntosh, J. M. and R. M. Jones, 2001, Toxicon. 39(10):1447-51; Nielsen, K. J. et al., 2002, J. Biol. Chem. 277(30):27247-55; Floresca, C. Z., 2003, Toxicol. Appl. Pharmacol. 190(2):95-101; Priest, B. T. et al., 2007, Toxicon. 49(2):194-201; Schmalhofer, W. A. et al., 2008, Mol. Pharmacol. 74(5):1476-84; Ekberg, J. et al., 2008, Int. J. Biochem. Cell Biol. 40(11):2363-8). Indeed, μ-conotoxins have been the subject of investigation for their potential pharmacological use (see, e.g., Olivera, B. M. and R. W. Teichert, 2007, Mol. Interv. 7(5):251-60; Stevens, M. et al., 2012, J. Biol. Chem. 287(37):31382-92). Although, several studies have been conducted to elucidate the mechanism of action of various μ-conotoxins, much remains unknown.

Other examples of toxins include the venom of sea anemones, scorpions, spiders and tarantulas, which have been reported to act on ion channels by inhibiting activation and blocking neuronal transmission. Indeed, several scorpion toxins have been identified and their structures solved (see, e.g., Rochat, H. and J. Gregoire, 1983, Toxicon. 21(1):153-62; Zhou, X. H. et al., 1989, Biochem. J. 257(2):509-17; Granier, C. et al., 1990, FEBS Lett. 261(2):423-6). Further, neurotoxin-based libraries have been developed for potassium channels using toxins from scorpion venom (see, e.g., Takacs, Z. et al., 2009, Proc. Natl. Acad. Sci. U.S.A. 106(52):22211-6). To give yet another example, peptides from tarantula venom have been reported to specifically act on voltage-gated sodium channels such as Na_V1.5 and Na_V1.7 (see, e.g., Priest, B. T. et al., 2007, Toxicon. 49(2):194-201; Xiao, Y. et al., 2010, Mol. Pharmacol. 78(6):1124-34). Described herein is the finding that a particularly useful set of nucleotide coding sequences for construction an engineered D_Hregion is or comprises nucleotide coding sequences from tarantula toxin and μ-conotoxin sequences (see, e.g., Table 4). Disclosed herein is the use of toxin coding sequences that act on voltage-gated sodium channels (e.g., Na_V1.7) for construction of an engineered D_Hregion. The methods described herein can be employed to utilize any set of coding sequences derived from any desired toxin peptide(s), or combination of toxin peptides (or toxin peptide sequence fragments) from multiple (i.e., two, three, four, five, etc.) toxin peptides as desired.

Provided In Vivo Systems

Described herein is recognition that particular antigens are associated with low and/or poor immunogenicity and, therefore, are poor targets for antibody-based therapeutics. Indeed, many disease targets (e.g., membrane-spanning proteins) have been characterized as intractable or undruggable. Thus, disclosed herein is the creation of an in vivo system for the development of antibodies and antibody-based therapeutics that overcome deficiencies associated with established drug discovery technologies. The present disclosure specifically demonstrates the construction of a transgenic rodent whose genome comprises an immunoglobulin heavy chain variable region that includes an engineered D_Hregion, which engineered D_Hregion includes one or more heterologous nucleotide coding sequences that each encode a portion (e.g., an extracellular portion, binding portion, functional fragment, etc.) of a heterologous polypeptide or peptide of interest. The methods described herein can be adapted to employ a set of heterologous nucleotide coding sequences that each encode a portion of any polypeptide or peptide of interest (e.g., a membrane-spanning protein, a toxin, a G-protein-coupled receptor, etc.) for creating an engineered D_Hregion. The engineered D_Hregion, once integrated into an immunoglobulin heavy chain variable region (i.e., placed in operable linkage with V and J gene segments and/or one or more constant regions), provides for recombination of gene segments (i.e., V and J) with the one or more heterologous nucleotide coding sequences to generate antibodies characterized by heavy chains having added diversity (i.e., CDR3 diversity) to direct binding to particular antigens.

Described herein is the recognition that particularly useful heterologous polypeptides of interest from which to design a set of nucleotide coding sequences (or combinations of nucleotide coding sequences) for constructing an engineered D_Hregion as described herein include chemokine receptors, conotoxins, tarantula toxins and/or combinations thereof.

In some embodiments, chemokine receptors include CC-chemokine receptors, CXC-chemokine receptors, CX3C-chemokine receptors, XC-chemokine receptors and combinations thereof.

In some embodiments, CC-chemokine receptors (also known as β-chemokine receptors) include CCR1, CCR2, CCR3, CCR4, CCR5, CCR6, CCR7, CCR8, CCR9, CCR10 and CCR11.

In some embodiments, CXC-chemokine receptors include CXCR1, CXCR2, CXCR3, CXCR4, CXCR5, CXCR6 and CXCR7.

In some embodiments, CX3C-chemokine receptors include CX3CR1.

In some embodiments, XC-chemokine receptors include XCR1.

In some embodiments, conotoxins include α-conotoxins, δ-conotoxins, κ-conotoxins, μ-conotoxins, ω-conotoxins and combinations thereof.

In some embodiments, conotoxins are or comprise μ-conotoxins.

In some embodiments, tarantula toxins include ProTxI, ProTxII, Huwentoxin-IV (HWTX-IV), and combinations thereof.

Without wishing to be bound by any particular theory, we note that data provided herein demonstrate that, in some embodiments, rodents whose genome comprises an immunoglobulin heavy chain variable locus that includes an engineered D_Hregion characterized by the inclusion of one or more heterologous nucleotide coding sequences derived from an extracellular portion of a heterologous atypical chemokine receptor (e.g., ACKR2, also known as D6 chemokine decoy receptor) effectively generate an immunoglobulin heavy chain variable region locus that produces antibodies characterized by CDR3s having added diversity to bind ligands of a heterologous atypical chemokine receptor (e.g., a heterologous D6 chemokine decoy receptor). We also note that data provided herein demonstrate that, in some embodiments, rodents whose genome comprises an immunoglobulin heavy chain variable locus that includes an engineered D_Hregion characterized by the inclusion of one or more heterologous nucleotide coding sequences derived from a portion of one or more toxins (e.g., μ-conotoxin and/or ProTxII) effectively generate an immunoglobulin heavy chain variable region locus that produces antibodies characterized by CDR3s having added diversity to bind a heterologous voltage-gated sodium channel (e.g., a heterologous Na_Vchannel).

In particular, the present disclosure specifically demonstrates, among other things, exemplary nucleotide coding sequences from a human atypical chemokine receptor (ACKR) such as, for example, ACKR2 (i.e., D6 chemokine decoy receptor) that are particularly useful for integration into a D_Hregion of an immunoglobulin heavy chain locus for the generation of antibodies characterized by CDR3s having diversity resulting from recombination of the nucleotide coding sequences with V_Hand J_Hsegments, and that block several inflammatory cytokines (e.g., CCL2, CCL3, CCL4, CCL5, CCL7, CCL8, CCL11, CCL12, CCL13, CCL14, CCL17, CCL22 and CCL3L1). The present disclosure also demonstrates exemplary nucleotide coding sequences from a conotoxin (e.g., μ-conotoxin) and a tarantula toxin (e.g., ProTxII) that are particularly useful for integration into a D_Hregion of an immunoglobulin heavy chain locus for the generation of antibodies characterized by CDR3s having diversity resulting from recombination of the nucleotide coding sequences with V_Hand J_Hsegments, and that block and/or inhibit the activation and/or function of a voltage-gated sodium channel(s) (e.g., Na_V1.7). Thus, the present disclosure, in at least some embodiments, embraces the development of an in vivo system for generating antibodies and/or antibody-based therapeutics to intractable disease targets.

Exemplary human ACKRs (along with their associated ligands) are set forth in Table 1 (see, e.g., Nibbs, R. J. B. and G. J. Graham, 2013, Nature Reviews 13:815-29). Exemplary toxins (along with their associated targets) are set forth in Table 2 (see, e.g., Terlau, H. and B. M. Olivera, 2004, Physiol. Rev. 84(1):41-68).

TABLE 1

ACKR
Ligands

ACKR1 (also known as DARC)
CCL2, CCL5, CCL7, CCL11, CCL13,

CCL14, CCL17, CXCL1, CXCL2, CXCL3,

CXCL5, CXCL6, CXCL8, CXCL11

ACKR2 (also known as D6)
CCL2, CCL3, CCL4, CCL5, CCL7, CCL8,

CCL11, CCL12, CCL13, CCL14, CCL17,

CCL22, CCL3L1

ACKR3 (also known as CXCR7)
CXCL11, CXCL12

ACKR4 (also known as CCRL1,
CCL19, CCL21, CCL25

CCX-CKR or CCR11)

TABLE 2

Toxin
Target

α-conotoxins
Acetylcholine receptors

δ-conotoxins
Voltage-gated sodium channels

κ-conotoxins
Potassium channels

μ-conotoxins
Voltage-gated sodium channels

ω-conotoxins
N-type voltage-dependent calcium channels

ProTxI
Selective Ca_v3.1 channel blocker; inhibits

Na_vl subtypes and K_v2.1 channels

ProTxII
Selective Na_v1.7 inhibitor

Huwentoxin-
Selective Na_v1.7 inhibitor

IV

ACKR2 (D6 Chemokine Decoy Receptor) Coding Sequences

Exemplary nucleotide coding sequences (DNA and amino acid (AA)) of a human ACKR2 (D6 chemokine decoy receptor) for construction of an engineered D_Hregion as described herein are set forth in Table 3. The set of human ACKR2 nucleotide coding sequences set forth in Table 3 are characterized by four extracellular domains, four extracellular domains with Cys to Ser substitutions to remove disulfide bonds, four Cys crossovers (Nterm-EC3, EC3-Nterm, EC1-EC2, EC2-EC1), four Cys crossovers with Cys to Ser substitutions to remove disulfide bonds, two loop fusions (with Cys retained; Nterm+EC3, EC1+EC2), and seven partial domains.

TABLE 3

Nterm DNA

ATGGCAGCTACTGCCAGCCCGCAGCCACTGGCTACTGAGGATGCCGATTCTGAGAATAGCAGCT

TCTACTACTATGACTACCTGGATGAAGTAGCTTTCATGCTCTGCCGGAAGGATGCTGTGGTTAG

CTTTGGCAAAGTTTTCCTGCCA (SEQ ID NO: 1)

Nterm AA

MAATASPQPLATEDADSENSSFYYYDYLDEVAFMLCRKDAVVSFGKVFLP (SEQ ID NO: 2)

EC1 DNA

AGCTTCTTGTGCAAG (SEQ ID NO: 3)

EC1 AA

SFLCK (SEQ ID NO: 4)

EC2 DNA

CAAACCCATGAAAACCCCAAGGGAGTTTGGAACTGCCATGCCGATTTCGGCGGGCATGGCACC

ATTTGGAAGCTCTTCCTCCGGTTCCAGCAGAACCTGCTA (SEQ ID NO: 5)

EC2 AA

QTHENPKGVWNCHADFGGHGTIWKLFLRFQQNLL (SEQ ID NO: 6)

EC3 DNA

CTGCATACCCTGCTGGACCTGCAAGTATTCGGCAACTGTGAGGTTAGCCAGCATCTAGACTATG

CC (SEQ ID NO: 7)

EC3 AA

LHTLLDLQVFGNCEVSQHLDYA (SEQ ID NO: 8)

Nterm-S DNA

ATGGCAGCTACTGCCAGCCCGCAGCCACTGGCTACTGAGGATGCCGATTCTGAGAATAGCAGCT

TCTACTACTATGACTACCTGGATGAAGTAGCTTTCATGCTCAGCCGGAAGGATGCTGTGGTTAG

CTTTGGCAAAGTTTTCCTGCCA (SEQ ID NO: 9)

Nterm-S AA

MAATASPQPLATEDADSENSSFYYYDYLDEVAFMLSRKDAVVSFGKVFLP (SEQ ID NO: 10)

Ed1-S DNA

AGCTTCTTGAGCAAG (SEQ ID NO: 11)

Ed1-S AA

SFLSK (SEQ ID NO: 12)

EC2-S DNA

CAAACCCATGAAAACCCCAAGGGAGTTTGGAACAGCCATGCCGATTTCGGCGGGCATGGCACC

ATTTGGAAGCTCTTCCTCCGGTTCCAGCAGAACCTGCTA (SEQ ID NO: 13)

EC2-S AA

QTHENPKGVWNSHADFGGHGTIWKLFLRFQQNLL (SEQ ID NO: 14)

EC3-S DNA

CTGCATACCCTGCTGGACCTGCAAGTATTCGGCAACAGTGAGGTTAGCCAGCATCTAGACTATG

CC (SEQ ID NO: 15)

EC3-S AA

LHTLLDLQVFGNSEVSQHLDYA (SEQ ID NO: 16)

Nterm-EC3 DNA

ATGGCAGCTACTGCCAGCCCGCAGCCACTGGCTACTGAGGATGCCGATTCTGAGAATAGCAGCT

TCTACTACTATGACTACCTGGATGAAGTAGCTTTCATGCTCTGCGAGGTTAGCCAGCATCTAGA

CTATGCC (SEQ ID NO: 17)

Nterm-EC3 AA

MAATASPQPLATEDADSENSSFYYYDYLDEVAFMLCEVSQHLDYA (SEQ ID NO: 18)

EC3-Nterm DNA

CTGCATACCCTGCTGGACCTGCAAGTATTCGGCAACTGTCGGAAGGATGCTGTGGTTAGCTTTG

GCAAAGTTTTCCTGCCA (SEQ ID NO: 19)

EC3-Nterm AA

LHTLLDLQVFGNCRKDAVVSFGKVFLP (SEQ ID NO: 20)

EC1-EC2 DNA

AGCTTCTTGTGCCATGCCGATTTCGGCGGGCATGGCACCATTTGGAAGCTCTTCCTCCGGTTCCA

GCAGAACCTGCTA (SEQ ID NO: 21)

EC1-EC2 AA

SFLCHADFGGHGTIWKLFLRFQQNLL (SEQ ID NO: 22)

EC2-EC1 DNA

CAAACCCATGAAAACCCCAAGGGAGTTTGGAACTGCAAG (SEQ ID NO: 23)

EC2-EC1 AA

QTHENPKGVWNCK (SEQ ID NO: 24)

Nterm-EC3-S DNA

ATGGCAGCTACTGCCAGCCCGCAGCCACTGGCTACTGAGGATGCCGATTCTGAGAATAGCAGCT

TCTACTACTATGACTACCTGGATGAAGTAGCTTTCATGCTCAGCGAGGTTAGCCAGCATCTAGA

CTATGCC (SEQ ID NO: 25)

Nterm-EC3-S AA

MAATASPQPLATEDADSENSSFYYYDYLDEVAFMLSEVSQHLDYA (SEQ ID NO: 26)

EC3-Nterm-S DNA

CTGCATACCCTGCTGGACCTGCAAGTATTCGGCAACAGTCGGAAGGATGCTGTGGTTAGCTTTG

GCAAAGTTTTCCTGCCA (SEQ ID NO: 27)

EC3-Nterm-S AA

LHTLLDLQVFGNSRKDAVVSFGKVFLP (SEQ ID NO: 28)

EC1-EC2-S DNA

AGCTTCTTGAGCCATGCCGATTTCGGCGGGCATGGCACCATTTGGAAGCTCTTCCTCCGGTTCCA

GCAGAACCTGCTA (SEQ ID NO: 29)

EC1-EC2-S AA

SFLSHADFGGHGTIWKLFLRFQQNLL (SEQ ID NO: 30)

EC2-EC1-S DNA

CAAACCCATGAAAACCCCAAGGGAGTTTGGAACAGCAAG (SEQ ID NO: 31)

EC2-EC1-S AA

QTHENPKGVWNSK (SEQ ID NO: 32)

Nterm + EC3 DNA

ATGGCAGCTACTGCCAGCCCGCAGCCACTGGCTACTGAGGATGCCGATTCTGAGAATAGCAGCT

TCTACTACTATGACTACCTGGATGAAGTAGCTTTCATGCTCTGCCGGAAGGATGCTGTGGTTAG

CTTTGGCAAAGTTTTCCTGCCACTGCATACCCTGCTGGACCTGCAAGTATTCGGCAACTGTGAG

GTTAGCCAGCATCTAGACTATGCC (SEQ ID NO: 33)

Nterm + EC3 AA

MAATASPQPLATEDADSENSSFYYYDYLDEVAFMLCRKDAVVSFGKVFLPLHTLLDLQVFGNCEVS

QHLDYA (SEQ ID NO: 34)

EC1 + EC2 DNA

AGCTTCTTGTGCAAGCAAACCCATGAAAACCCCAAGGGAGTTTGGAACTGCCATGCCGATTTCG

GCGGGCATGGCACCATTTGGAAGCTCTTCCTCCGGTTCCAGCAGAACCTGCTA

(SEQ ID NO: 35)

EC1 + EC2 AA

SFLCKQTHENPKGVWNCHADFGGHGTIWKLFLRFQQNLL (SEQ ID NO: 36)

Nterm-N DNA

ATGGCAGCTACTGCCAGCCCGCAGCCACTGGCTACTGAGGATGCCGATTCTGAGAATAGCAGCT

TCTACTACTATGACTACCTGGATGAAGTAGCTTTCATGCTC (SEQ ID NO: 37)

Nterm-N AA

MAATASPQPLATEDADSENSSFYYYDYLDEVAFML (SEQ ID NO: 38)

Nterm-C DNA

CGGAAGGATGCTGTGGTTAGCTTTGGCAAAGTTTTCCTGCCA (SEQ ID NO: 39)

Nterm-C AA

RKDAVVSFGKVFLP (SEQ ID NO: 40)

Ed1-N DNA

AGCTTCTTG (SEQ ID NO: 41)

EC1-N AA

SFL (SEQ ID NO: 42)

EC2-N DNA

CAAACCCATGAAAACCCCAAGGGAGTTTGGAAC (SEQ ID NO: 43)

EC2-N AA

QTHENPKGVWN (SEQ ID NO: 44)

EC2-C DNA

CATGCCGATTTCGGCGGGCATGGCACCATTTGGAAGCTCTTCCTCCGGTTCCAGCAGAACCTGC

TA (SEQ ID NO: 45)

EC2-C AA

HADFGGHGTIWKLFLRFQQNLL (SEQ ID NO: 46)

EC3-N DNA

CTGCATACCCTGCTGGACCTGCAAGTATTCGGCAAC (SEQ ID NO: 47)

EC3-N AA

LHTLLDLQVFGN (SEQ ID NO: 48)

EC3-C DNA

GAGGTTAGCCAGCATCTAGACTATGCC (SEQ ID NO: 49)

EC3-C AA

EVSQHLDYA (SEQ ID NO: 50)

Exemplary DNA fragments containing human ACKR2 (D6 chemokine decoy receptor) nucleotide coding sequences for construction of an engineered D_Hregion are provided below.

D6-DH1166 (SEQ ID NO: 131) includes D6 coding sequences

inserted in positions corresponding to D_H1-1 to D_H6-6:

TACGTAGCCGTTTCGATCCTCCCGAATTGACTAGTGGGTAGGCCTGGCGGCCGCTGCCAT

TTCATTACCTCTTTCTCCGCACCCGACATAGATACCGGTGGATTCGAATTCTCCCCGTTGA

AGCTGACCTGCCCAGAGGGGCCTGGGCCCACCCCACACACCGGGGCGGAATGTGTACAG

GCCCCGGTCTCTGTGGGTGTTCCGCTAACTGGGGCTCCCAGTGCTCACCCCACAACTAAA

GCGAGCCCCAGCCTCCAGAGCCCCCGAAGGAGATGCCGCCCACAAGCCCAGCCCCCATC

CAGGAGGCCCCAGAGCTCAGGGCGCCGGGGCGGATTTTGTACAGCCCCGAGTCACTGTG

CGGAAGGATGCTGTGGTTAGCTTTGGCAAAGTTTTCCTGCCACCACAGTGAGAAAAACTG

TGTCAAAAACCGTCTCCTGGCCCCTGCTGGAGGCCGCGCCAGAGAGGGGAGCAGCCGCC

CCGAACCTAGGTCCTGCTCAGCTCACACGACCCCCAGCACCCAGAGCACAACGGAGTCC

CCATTGAATGGTGAGGACGGGGACCAGGGCTCCAGGGGGTCATGGAAGGGGCTGGACCC

CATCCTACTGCTATGGTCCCAGTGCTCCTGGCCAGAACTGACCCTACCACCGACAAGAGT

CCCTCAGGGAAACGGGGGTCACTGGCACCTCCCAGCATCAACCCCAGGCAGCACAGGCA

TAAACCCCACATCCAGAGCCGACTCCAGGAGCAGAGACACCCCAGTACCCTGGGGGACA

CCGACCCTGATGACTCCCCACTGGAATCCACCCCAGAGTCCACCAGGACCAAAGACCCC

GCCCCTGTCTCTGTCCCTCACTCAGGACCTGCTGCGGGGCGGGCCATGAGACCAGACTCG

GGCTTAGGGAACACCACTGTGGCCCCAACCTCGACCAGGCCACAGGCCCTTCCTTCCTGC

CCTGCGGCAGCACAGACTTTGGGGTCTGTGCAGAGAGGAATCACAGAGGCCCCAGGCTG

AGGTGGTGGGGGTGGAAGACCCCCAGGAGGTGGCCCACTTCCCTTCCTCCCAGCTGGAA

CCCACCATGACCTTCTTAAGATAGGGGTGTCATCCGAGGCAGGTCCTCCATGGAGCTCCC

TTCAGGCTCCTCCCCGGTCCTCACTAGGCCTCAGTCCCGGCTGCGGGAATGCAGCCACCA

CAGGCACACCAGGCAGCCCAGACCCAGCCAGCCTGCAGTGCCCAAGCCCACATTCTGGA

GCAGAGCAGGCTGTGTCTGGGAGAGTCTGGGCTCCCCACCGCCCCCCCGCACACCCCAC

CCACCCCTGTCCAGGCCCTATGCAGGAGGGTCAGAGCCCCCCATGGGGTATGGACTTAG

GGTCTCACTCACGTGGCTCCCCTCCTGGGTGAAGGGGTCTCATGCCCAGATCCCCACAGC

AGAGCTGGTCAAAGGTGGAGGCAGTGGCCCCAGGGCCACCCTGACCTGGACCCTCAGGC

TCCTCTAGCCCTGGCTGCCCTGCTGTCCCTGGGAGGCCTGGACTCCACCAGACCACAGGT

CCAGGGCACCGCCCATAGGTGCTGCCCACACTCAGTTCACAGGAAGAAGATAAGCTCCA

GACCCCCAAGACTGGGACCTGCCTTCCTGCCACCGCTTGTAGCTCCAGACCTCCGTGCCT

CCCCCGACCACTTACACACGGGCCAGGGAGCTGTTCCACAAAGATCAACCCCAAACCGG

GACCGCCTGGCACTCGGGCCGCTGCCACTTCCCTCTCCATTTGTTCCCAGCACCTCTGTGC

TCCCTCCCTCCTCCCTCCTTCAGGGGAACAGCCTGTGCAGCCCCTCCCTGCACCCCACACC

CTGGGGAGGCCCAACCCTGCCTCCAGCCCTTTCTCCCCCGCTGCTCTTCCTGCCCATCCAG

ACAACCCTGGGGTCCCATCCCTGCAGCCTACACCCTGGTCTCCACCCAGACCCCTGTCTC

TCCCTCCAGACACCCCTCCCAGGCCAACCCTGCACATGCAGGCCCTCCCCTTTTCTGCTGC

CAGAGCCTCAGTTTCTACCCTCTGTGCCTACCCCCTGCCTCCTCCTGCCCACAACTCGAGC

TCTTCCTCTCCTGGGGCCCCTGAGCCATGGCACTGACCGTGCACTCCCACCCCCACACTG

CCCATGCCCTCACCTTCCTCCTGGACACTCTGACCCCGCTCCCCTCTTGGACCCAGCCCTG

GTATTTCCAGGACAAAGGCTCACCCAAGTCTTCCCCATGCAGGCCCTTGCCCTCACTGCC

CGGTTACACGGCAGCCTCCTGTGCACAGAAGCAGGGAGCTCAGCCCTTCCACAGGCAGA

AGGCACTGAAAGAAATCGGCCTCCAGCACCCTGATGCACGTCCGCCTGTGTCTCTCACTG

CCCGCACCTGCAGGGAGGCTCGGCACTCCCTGTAAAGACGAGGGATCCAGGCAGCAACA

TCATGGGAGAATGCAGGGCTCCCAGACAGCCCAGCCCTCTCGCAGGCCTCTCCTGGGAA

GAGACCTGCAGCCACCACTGAACAGCCACGGAGCCCGCTGGATAGTAACTGAGTCAGTG

ACCGACCTGGAGGGCAGGGGAGCAGTGAACCGGAGCCCAGACCATAGGGACAGAGACC

AGCCGCTGACATCCCGAGCCCCTCACTGGCGGCCCCAGAACACCGCGTGGAAACAGAAC

AGACCCACATTCCCACCTGGAACAGGGCAGACACTGCTGAGCCCCCAGCACCAGCCCTG

AGAAACACCAGGCAACGGCATCAGAGGGGGCTCCTGAGAAAGAAAGGAGGGGAGGTCT

CCTTCACCAGCAAGTACTTCCCTTGACCAAAAACAGGGTCCACGCAACTCCCCCAGGACA

AAGGAGGAGCCCCCTGTACAGCACTGGGCTCAGAGTCCTCTCCCACACACCCTGAGTTTC

AGACAAAAACCCCCTGGAAATCATAGTATCAGCAGGAGAACTAGCCAGAGACAGCAAG

AGGGGACTCAGTGACTCCCGCGGGGACAGGAGGATTTTGTGGGGGCTCGTGTCACTGTG

CTGCATACCCTGCTGGACCTGCAAGTATTCGGCAACTGTGAGGTTAGCCAGCATCTAGAC

TATGCCCACAGTGACACAGCCCCATTCAAAAACCCCTGCTGTAAACGCTTCCACTTCTGG

AGCTGAGGGGCTGGGGGGAGCGTCTGGGAAGTAGGGCCTAGGGGTGGCCATCAATGCCC

AAAACGCACCAGACTCCCCCCCAGACATCACCCCACTGGCCAGTGAGCAGAGTAAACAG

AAAATGAGAAGCAGCTGGGAAGCTTGCACAGGCCCCAAGGAAAGAGCTTTGGCGGGTGT

GCAAGAGGGGATGCGGGCAGAGCCTGAGCAGGGCCTTTTGCTGTTTCTGCTTTCCTGTGC

AGATAGTTCCATAAACTGGTGTTCAAGATCGATGGCTGGGAGTGAGCCCAGGAGGACAG

TGTGGGAAGGGCACAGGGAAGGAGAAGCAGCCGCTATCCTACACTGTCATCTTTCAAGA

GTTTGCCCTGTGCCCACAATGCTGCATCATGGGATGCTTAACAGCTGATGTAGACACAGC

TAAAGAGAGAATCAGTGAAATGGATTTGCAGCACAGATCTGAATAAATTCTCCAGAATG

TGGAGCCACACAGAAGCAAGCACAAGGAAAGTGCCTGATGCAAGGGCAAAGTACAGTG

TGTACCTTCAGGCTGGGCACAGACACTCTGAAAAGCCTTGGCAGGAACTCCCTGCAACA

AAGCAGAGCCCTGCAGGCAATGCCAGCTCCAGAGCCCTCCCTGAGAGCCTCATGGGCAA

AGATGTGCACAACAGGTGTTTCTCATAGCCCCAAACTGAGAATGAAGCAAACAGCCATC

TGAAGGAAAACAGGCAAATAAACGATGGCAGGTTCATGAAATGCAAACCCAGACAGCC

AGAAGGACAACAGTGAGGGTTACAGGTGACTCTGTGGTTGAGTTCATGACAATGCTGAG

TAATTGGAGTAACAAAGGAAAGTCCAAAAAATACTTTCAATGTGATTTCTTCTAAATAAA

ATTTACAGCCGGCAAAATGAACTATCTTCTTAAGGGATAAACTTTCCACTAGGAAAACTA

TAAGGAAAATCAAGAAAAGGATGATCACATAAACACAGTGGTCGTTACTTCTACTGGGG

AAGGAAGAGGGTATGAACTGAGACACACAGGGTTGGCAAGTCTCCTAACAAGAACAGA

ACAAATACATTACAGTACCTTGAAAACAGCAGTTAAAATTCTAAATTGCAAGAAGAGGA

AAATGCACACAGCTGTGTTTAGAAAATTCTCAGTCCAGCACTGTTCATAATAGCAAAGAC

ATTAACCCAGGTTGGATAAATAAACGATGACACAGGCAATTGCACAATGATACAGACAT

ACATTCAGTATATGAGACATTGATGATGTATCCCCAAAGAAATGACTTTAAAGAGAAAA

GGCCTGATATGTGGTGGCACTCACCTCCCTGGGCATCCCCGGACAGGCTGCAGGCACACT

GTGTGGCAGGGCAGGCTGGTACCTGCTGGCAGCTCCTGGGGCCTGATGTGGAGCAGGCA

CAGAGCCGTATCCCCCCGAGGACATATACCCCCAAGGACGGCACAGTTGGTACATTCCG

GAGACAAGCAACTCAGCCACACTCCCAGGCCAGAGCCCGAGAGGGACGCCCATGCACA

GGGAGGCAGAGCCCAGCTCCTCCACAGCCAGCAGCACCCGTGCAGGGGCCGCCATCTGG

CAGGCACAGAGCATGGGCTGGGAGGAGGGGCAGGGACACCAGGCAGGGTTGGCACCAA

CTGAAAATTACAGAAGTCTCATACATCTACCTCAGCCTTGCCTGACCTGGGCCTCACCTG

ACCTGGACCTCACCTGGCCTGGACCTCACCTGGCCTAGACCTCACCTCTGGGCTTCACCT

GAGCTCGGCCTCACCTGACTTGGACCTTGCCTGTCCTGAGCTCACATGATCTGGGCCTCA

CCTGACCTGGGTTTCACCTGACCTGGGCTTCACCTGACCTGGGCCTCATCTGACCTGGGC

CTCACTGGCCTGGACCTCACCTGGCCTGGGCTTCACCTGGCCTCAGGCCTCATCTGCACC

TGCTCCAGGTCTTGCTGGAACCTCAGTAGCACTGAGGCTGCAGGGGCTCATCCAGGGTTG

CAGAATGACTCTAGAACCTCCCACATCTCAGCTTTCTGGGTGGAGGCACCTGGTGGCCCA

GGGAATATAAAAAGCCTGAATGATGCCTGCGTGATTTGGGGGCAATTTATAAACCCAAA

AGGACATGGCCATGCAGCGGGTAGGGACAATACAGACAGATATCAGCCTGAAATGGAG

CCTCAGGGCACAGGTGGGCACGGACACTGTCCACCTAAGCCAGGGGCAGACCCGAGTGT

CCCCGCAGTAGACCTGAGAGCGCTGGGCCCACAGCCTCCCCTCGGTGCCCTGCTACCTCC

TCAGGTCAGCCCTGGACATCCCGGGTTTCCCCAGGCCTGGCGGTAGGATTTTGTTGAGGT

CTGTGTCACTGTGCATGGCAGCTACTGCCAGCCCGCAGCCACTGGCTACTGAGGATGCCG

ATTCTGAGAATAGCAGCTTCTACTACTATGACTACCTGGATGAAGTAGCTTTCATGCTCT

GCCGGAAGGATGCTGTGGTTAGCTTTGGCAAAGTTTTCCTGCCACCACAGTGTCACAGAG

TCCATCAAAAACCCATCCCTGGGAACCTTCTGCCACAGCCCTCCCTGTGGGGCACCGCCG

CGTGCCATGTTAGGATTTTGACTGAGGACACAGCACCATGGGTATGGTGGCTACCGCAGC

AGTGCAGCCCGTGACCCAAACACACAGGGCAGCAGGCACAACAGACAAGCCCACAAGT

GACCACCCTGAGCTCCTGCCTGCCAGCCCTGGAGACCATGAAACAGATGGCCAGGATTA

TCCCATAGGTCAGCCAGACCTCAGTCCAACAGGTCTGCATCGCTGCTGCCCTCCAATACC

AGTCCGGATGGGGACAGGGCTGGCCCACATTACCATTTGCTGCCATCCGGCCAACAGTCC

CAGAAGCCCCTCCCTCAAGGCTGGGCCACATGTGTGGACCCTGAGAGCCCCCCATGTCTG

AGTAGGGGCACCAGGAAGGTGGGGCTGGCCCTGTGCACTGTCCCTGCCCCTGTGGTCCCT

GGCCTGCCTGGCCCTGACACCTGGGCCTCTCCTGGGTCATTTCCAAGACAGAAGACATTC

CCAGGACAGCTGGAGCTGGGAGTCCATCATCCTGCCTGGCCGTCCTGAGTCCTGCGCCTT

TCCAAACCTCACCCGGGAAGCCAACAGAGGAATCACCTCCCACAGGCAGAGACAAAGAC

CTTCCAGAAATCTCTGTCTCTCTCCCCAGTGGGCACCCTCTTCCAGGGCAGTCCTCAGTGA

TATCACAGTGGGAACCCACATCTGGATCGGGACTGCCCCCAGAACACAAGATGGCCCAC

AGGGACAGCCCCACAGCCCAGCCCTTCCCAGACCCCTAAAAGGCGTCCCACCCCCTGCA

TCTGCCCCAGGGCTCAAACTCCAGGAGGACTGACTCCTGCACACCCTCCTGCCAGACATC

ACCTCAGCCCCTCCTGGAAGGGACAGGAGCGCGCAAGGGTGAGTCAGACCCTCCTGCCC

TCGATGGCAGGCGGAGAAGATTCAGAAAGGTCTGAGATCCCCAGGACGCAGCACCACTG

TCAATGGGGGCCCCAGACGCCTGGACCAGGGCCTGCGTGGGAAAGGCCTCTGGGCACAC

TCAGGGGGATTTTGTGAAGGGTCCTCCCACTGTGCATGGCAGCTACTGCCAGCCCGCAGC

CACTGGCTACTGAGGATGCCGATTCTGAGAATAGCAGCTTCTACTACTATGACTACCTGG

ATGAAGTAGCTTTCATGCTCTGCCGGAAGGATGCTGTGGTTAGCTTTGGCAAAGTTTTCC

TGCCACTGCATACCCTGCTGGACCTGCAAGTATTCGGCAACTGTGAGGTTAGCCAGCATC

TAGACTATGCCCACAGTGATGAACCCAGCATCAAAAACCGACCGGACTCCCAAGGTTTA

TGCACACTTCTCCGCTCAGAGCTCTCCAGGATCAGAAGAGCCGGGCCCAAGGGTTTCTGC

CCAGACCCTCGGCCTCTAGGGACATCTTGGCCATGACAGCCCATGGGCTGGTGCCCCACA

CATCGTCTGCCTTCAAACAAGGGCTTCAGAGGGCTCTGAGGTGACCTCACTGATGACCAC

AGGTGCCCTGGCCCCTTCCCCACCAGCTGCACCAGACCCCGTCATGACAGATGCCCCGAT

TCCAACAGCCAATTCCTGGGGCCAGGAATCGCTGTAGACACCAGCCTCCTTCCAACACCT

CCTGCCAATTGCCTGGATTCCCATCCCGGTTGGAATCAAGAGGACAGCATCCCCCAGGCT

CCCAACAGGCAGGACTCCCACACCCTCCTCTGAGAGGCCGCTGTGTTCCGTAGGGCCAG

GCTGCAGACAGTCCCCCTCACCTGCCACTAGACAAATGCCTGCTGTAGATGTCCCCACCT

GGAAAATACCACTCATGGAGCCCCCAGCCCCAGGTACAGCTGTAGAGAGAGTCTCTGAG

GCCCCTAAGAAGTAGCCATGCCCAGTTCTGCCGGGACCCTCGGCCAGGCTGACAGGAGT

GGACGCTGGAGCTGGGCCCATACTGGGCCACATAGGAGCTCACCAGTGAGGGCAGGAGA

GCACATGCCGGGGAGCACCCAGCCTCCTGCTGACCAGAGGCCCGTCCCAGAGCCCAGGA

GGCTGCAGAGGCCTCTCCAGGGGGACACTGTGCATGTCTGGTCCCTGAGCAGCCCCCCAC

GTCCCCAGTCCTGGGGGCCCCTGGCACAGCTGTCTGGACCCTCTCTATTCCCTGGGAAGC

TCCTCCTGACAGCCCCGCCTCCAGTTCCAGGTGTGGATTTTGTCAGGGGGTGTCACACTG

TGCAGCTTCTTGTGCAAGCACAGTGGTGCTGCCCATATCAAAAACCAGGCCAAGTAGAC

AGGCCCCTGCTGTGCAGCCCCAGGCCTCCAGCTCACCTGCTTCTCCTGGGGCTCTCAAGG

CTGCTGTTTTCTGCACTCTCCCCTCTGTGGGGAGGGTTCCCTCAGTGGGAGATCTGTTCTC

AACATCCCACGGCCTCATTCCTGCAAGGAAGGCCAATGGATGGGCAACCTCACATGCCG

CGGCTAAGATAGGGTGGGCAGCCTGGCGGGGACAGGACATCCTGCTGGGGTATCTGTCA

CTGTGCCTAGTGGGGCACTGGCTCCCAAACAACGCAGTCCTTGCCAAAATCCCCACGGCC

TCCCCCGCTAGGGGCTGGCCTGATCTCCTGCAGTCCTAGGAGGCTGCTGACCTCCAGAAT

GGCTCCGTCCCCAGTTCCAGGGCGAGAGCAGATCCCAGGCCGGCTGCAGACTGGGAGGC

CACCCCCTCCTTCCCAGGGTTCACTGCAGGTGACCAGGGCAGGAAATGGCCTGAACACA

GGGATAACCGGGCCATCCCCCAACAGAGTCCACCCCCTCCTGCTCTGTACCCCGCACCCC

CCAGGCCAGCCCATGACATCCGACAACCCCACACCAGAGTCACTGCCCGGTGCTGCCCT

AGGGAGGACCCCTCAGCCCCCACCCTGTCTAGAGGACTGGGGAGGACAGGACACGCCCT

CTCCTTATGGTTCCCCCACCTGGCTCTGGCTGGGACCCTTGGGGTGTGGACAGAAAGGAC

GCTTGCCTGATTGGCCCCCAGGAGCCCAGAACTTCTCTCCAGGGACCCCAGCCCGAGCAC

CCCCTTACCCAGGACCCAGCCCTGCCCCTCCTCCCCTCTGCTCTCCTCTCATCACCCCATG

GGAATCCAGAATCCCCAGGAAGCCATCAGGAAGGGCTGAGGGAGGAAGTGGGGCCACT

GCACCACCAGGCAGGAGGCTCTGTCTTTGTGAACCCAGGGAGGTGCCAGCCTCCTAGAG

GGTATGGTCCACCCTGCCTATGGCTCCCACAGTGGCAGGCTGCAGGGAAGGACCAGGGA

CGGTGTGGGGGAGGGCTCAGGGCCCCGCGGGTGCTCCATCTTGGATGAGCCTATCTCTCT

CACCCACGGACTCGCCCACCTCCTCTTCACCCTGGCCACACGTCGTCCACACCATCCTAA

GTCCCACCTACACCAGAGCCGGCACAGCCAGTGCAGACAGAGGCTGGGGTGCAGGGGG

GCCGACTGGGCAGCTTCGGGGAGGGAGGAATGGAGGAAGGGGAGTTCAGTGAAGAGGC

CCCCCTCCCCTGGGTCCAGGATCCTCCTCTGGGACCCCCGGATCCCATCCCCTCCAGGCT

CTGGGAGGAGAAGCAGGATGGGAGAATCTGTGCGGGACCCTCTCACAGTGGAATACCTC

CACAGCGGCTCAGGCCAGATACAAAAGCCCCTCAGTGAGCCCTCCACTGCAGTGCTGGG

CCTGGGGGCAGCCGCTCCCACACAGGATGAACCCAGCACCCCGAGGATGTCCTGCCAGG

GGGAGCTCAGAGCCATGAAGGAGCAGGATATGGGACCCCCGATACAGGCACAGACCTC

AGCTCCATTCAGGACTGCCACGTCCTGCCCTGGGAGGAACCCCTTTCTCTAGTCCCTGCA

GGCCAGGAGGCAGCTGACTCCTGACTTGGACGCCTATTCCAGACACCAGACAGAGGGGC

AGGCCCCCCAGAACCAGGGATGAGGACGCCCCGTCAAGGCCAGAAAAGACCAAGTTGC

GCTGAGCCCAGCAAGGGAAGGTCCCCAAACAAACCAGGAGGATTTTGTAGGTGTCTGTG

TCACTGTGCAAACCCATGAAAACCCCAAGGGAGTTTGGAACTGCCATGCCGATTTCGGC

GGGCATGGCACCATTTGGAAGCTCTTCCTCCGGTTCCAGCAGAACCTGCTACCACAGTGA

CACTCGCCAGGTCAAAAACCCCATCCCAAGTCAGCGGAATGCAGAGAGAGCAGGGAGG

ACATGTTTAGGATCTGAGGCCGCACCTGACACCCAGGCCAGCAGACGTCTCCTGTCCACG

GCACCCTGCCATGTCCTGCATTTCTGGAAGAACAAGGGCAGGCTGAAGGGGGTCCAGGA

CCAGGAGATGGGTCCGCTCTACCCAGAGAAGGAGCCAGGCAGGACACAAGCCCCCACGC

GTGGGCTCGTAGTTTGACGTGCGTGAAGTGTGGGTAAGAAAGTACGTA

D6-D17613 (SEQ ID NO: 132) includes D6 coding sequences

inserted in positions corresponding to D_H1-7 to D_H6-13:

GCGGCCGCTGCCATTTCATTACCTCTTTCTCCGCACCCGACATAGATTACGTAACGCGTG

GGCTCGTAGTTTGACGTGCGTGAAGTGTGGGTAAGAAAGTCCCCATTGAGGCTGACCTGC

CCAGAGGGTCCTGGGCCCACCCAACACACCGGGGCGGAATGTGTGCAGGCCTCGGTCTC

TGTGGGTGTTCCGCTAGCTGGGGCTCACAGTGCTCACCCCACACCTAAAACGAGCCACAG

CCTCCGGAGCCCCTGAAGGAGACCCCGCCCACAAGCCCAGCCCCCACCCAGGAGGCCCC

AGAGCACAGGGCGCCCCGTCGGATTTTGTACAGCCCCGAGTCACTGTGCAGCTTCTTGCA

CAGTGAGAAAAGCTTCGTCAAAAACCGTCTCCTGGCCACAGTCGGAGGCCCCGCCAGAG

AGGGGAGCAGCCACCCCAAACCCATGTTCTGCCGGCTCCCATGACCCCGTGCACCTGGA

GCCCCACGGTGTCCCCACTGGATGGGAGGACAAGGGCCGGGGGCTCCGGCGGGTCGGGG

CAGGGGCTTGATGGCTTCCTTCTGCCGTGGCCCCATTGCCCCTGGCTGGAGTTGACCCTTC

TGACAAGTGTCCTCAGAGAGTCAGGGATCAGTGGCACCTCCCAACATCAACCCCACGCA

GCCCAGGCACAAACCCCACATCCAGGGCCAACTCCAGGAACAGAGACACCCCAATACCC

TGGGGGACCCCGACCCTGATGACTCCCGTCCCATCTCTGTCCCTCACTTGGGGCCTGCTG

CGGGGCGAGCACTTGGGAGCAAACTCAGGCTTAGGGGACACCACTGTGGGCCTGACCTC

GAGCAGGCCACAGACCCTTCCCTCCTGCCCTGGTGCAGCACAGACTTTGGGGTCTGGGCA

GGGAGGAACTTCTGGCAGGTCACCAAGCACAGAGCCCCCAGGCTGAGGTGGCCCCAGGG

GGAACCCCAGCAGGTGGCCCACTACCCTTCCTCCCAGCTGGACCCCATGTCTTCCCCAAG

ATAGGGGTGCCATCCAAGGCAGGTCCTCCATGGAGCCCCCTTCAGGCTCCTCTCCAGACC

CCACTGGGCCTCAGTCCCCACTCTAGGAATGCAGCCACCACGGGCACACCAGGCAGCCC

AGGCCCAGCCACCCTGCAGTGCCCAAGCCCACACCCTGGAGGAGAGCAGGGTGCGTCTG

GGAGGGGCTGGGCTCCCCACCCCCACCCCCACCTGCACACCCCACCCACCCTTGCCCGGG

CCCCCTGCAGGAGGGTCAGAGCCCCCATGGGATATGGACTTAGGGTCTCACTCACGCAC

CTCCCCTCCTGGGAGAAGGGGTCTCATGCCCAGATCCCCCCAGCAGCGCTGGTCACAGGT

AGAGGCAGTGGCCCCAGGGCCACCCTGACCTGGCCCCTCAGGCTCCTCTAGCCCTGGCTG

CCCTGCTGTCCCTGGGAGGCCTGGGCTCCACCAGACCACAGGTCTAGGGCACCGCCCAC

ACTGGGGCCGCCCACACACAGCTCACAGGAAGAAGATAAGCTCCAGACCCCCAGGCCCG

GGACCTGCCTTGCTGCTACGACTTCCTGCCCCAGACCTCGTTGCCCTCCCCCGTCCACTTA

CACACAGGCCAGGAAGCTGTTCCCACACAGACCAACCCCAGACGGGGACCACCTGGCAC

TCAGGTCACTGCCATTTCCTTCTCCATTCACTTCCAATGCCTCTGTGCTTCCTCCCTCCTCC

TTCCTTCGGGGGAGCACCCTGTGCAGCTCCTCCCTGCAGTCCACACCCTGGGGAGACCCG

ACCCTGCAGCCCACACCCTGGGGAGACCTGACCCTCCTCCAGCCCTTTCTCCCCCGCTGC

TCTTGCCACCCACCAAGACAGCCCTGGGGTCCTGTCCCTACAGCCCCCACCCAGTTCTCT

ACCTAGACCCGTCTTCCTCCCTCTAAACACCTCTCCCAGGCCAACCCTACACCTGCAGGC

CCTCCCCTCCACTGCCAAAGACCCTCAGTTTCTCCTGCCTGTGCCCACCCCCGTGCTCCTC

CTGCCCACAGCTCGAGCTCTTCCTCTCCTAGGGCCCCTGAGGGATGGCATTGACCGTGCC

CTCGCACCCACACACTGCCCATGCCCTCACATTCCTCCTGGCCACTCCAGCCCCACTCCCC

TCTCAGGCCTGGCTCTGGTATTTCTGGGACAAAGCCTTACCCAAGTCTTTCCCATGCAGG

CCTGGGCCCTTACCCTCACTGCCCGGTTACAGGGCAGCCTCCTGTGCACAGAAGCAGGGA

GCTCAGCCCTTCCACAGGCAGAAGGCACTGAAAGAAATCGGCCTCCAGCGCCTTGACAC

ACGTCTGCCTGTGTCTCTCACTGCCCGCACCTGCAGGGAGGCTCGGCACTCCCTCTAAAG

ACGAGGGATCCAGGCAGCAGCATCACAGGAGAATGCAGGGCTACCAGACATCCCAGTCC

TCTCACAGGCCTCTCCTGGGAAGAGACCTGAAGACGCCCAGTCAACGGAGTCTAACACC

AAACCTCCCTGGAGGCCGATGGGTAGTAACGGAGTCATTGCCAGACCTGGAGGCAGGGG

AGCAGTGAGCCCGAGCCCACACCATAGGGCCAGAGGACAGCCACTGACATCCCAAGCCA

CTCACTGGTGGTCCCACAACACCCCATGGAAAGAGGACAGACCCACAGTCCCACCTGGA

CCAGGGCAGAGACTGCTGAGACCCAGCACCAGAACCAACCAAGAAACACCAGGCAACA

GCATCAGAGGGGGCTCTGGCAGAACAGAGGAGGGGAGGTCTCCTTCACCAGCAGGCGCT

TCCCTTGACCGAAGACAGGATCCATGCAACTCCCCCAGGACAAAGGAGGAGCCCCTTGT

TCAGCACTGGGCTCAGAGTCCTCTCCAAGACACCCAGAGTTTCAGACAAAAACCCCCTG

GAATGCACAGTCTCAGCAGGAGAGCCAGCCAGAGCCAGCAAGATGGGGCTCAGTGACA

CCCGCAGGGACAGGAGGATTTTGTGGGGGCTCGTGTCACTGTGCTGCATACCCTGCTGGA

CCTGCAAGTATTCGGCAACAGTGAGGTTAGCCAGCATCTAGACTATGCCCACAGTGACA

CAGCCCCATTCAAAAACCCCTACTGCAAACGCATTCCACTTCTGGGGCTGAGGGGCTGGG

GGAGCGTCTGGGAAATAGGGCTCAGGGGTGTCCATCAATGCCCAAAACGCACCAGACTC

CCCTCCATACATCACACCCACCAGCCAGCGAGCAGAGTAAACAGAAAATGAGAAGCAAG

CTGGGGAAGCTTGCACAGGCCCCAAGGAAAGAGCTTTGGCGGGTGTGTAAGAGGGGATG

CGGGCAGAGCCTGAGCAGGGCCTTTTGCTGTTTCTGCTTTCCTGTGCAGAGAGTTCCATA

AACTGGTGTTCGAGATCAATGGCTGGGAGTGAGCCCAGGAGGACAGCGTGGGAAGAGC

ACAGGGAAGGAGGAGCAGCCGCTATCCTACACTGTCATCTTTCGAAAGTTTGCCTTGTGC

CCACACTGCTGCATCATGGGATGCTTAACAGCTGATGTAGACACAGCTAAAGAGAGAAT

CAGTGAGATGGATTTGCAGCACAGATCTGAATAAATTCTCCAGAATGTGGAGCAGCACA

GAAGCAAGCACACAGAAAGTGCCTGATGCAAGGACAAAGTTCAGTGGGCACCTTCAGGC

ATTGCTGCTGGGCACAGACACTCTGAAAAGCCCTGGCAGGAACTCCCTGTGACAAAGCA

GAACCCTCAGGCAATGCCAGCCCCAGAGCCCTCCCTGAGAGCCTCATGGGCAAAGATGT

GCACAACAGGTGTTTCTCATAGCCCCAAACTGAGAGCAAAGCAAACGTCCATCTGAAGG

AGAACAGGCAAATAAACGATGGCAGGTTCATGAAATGCAAACCCAGACAGCCACAAGC

ACAAAAGTACAGGGTTATAAGCGACTCTGGTTGAGTTCATGACAATGCTGAGTAATTGG

AGTAACAAAGTAAACTCCAAAAAATACTTTCAATGTGATTTCTTCTAAATAAAATTTACA

CCCTGCAAAATGAACTGTCTTCTTAAGGGATACATTTCCCAGTTAGAAAACCATAAAGAA

AACCAAGAAAAGGATGATCACATAAACACAGTGGTGGTTACTTCTGCTGGGGAAGGAAG

AGGGTATGAACTGAGATACACAGGGTGGGCAAGTCTCCTAACAAGAACAGAACGAATAC

ATTACAGTACCTTGAAAACAGCAGTTAAACTTCTAAATTGCAAGAAGAGGAAAATGCAC

ACAGTTGTGTTTAGAAAATTCTCAGTCCAGCACTGTTCATAATAGCAAAGACATTAACCC

AGGTCGGATAAATAAGCGATGACACAGGCAATTGCACAATGATACAGACATATATTTAG

TATATGAGACATCGATGATGTATCCCCAAATAAACGACTTTAAAGAGATAAAGGGCTGA

TGTGTGGTGGCATTCACCTCCCTGGGATCCCCGGACAGGTTGCAGGCTCACTGTGCAGCA

GGGCAGGCGGGTACCTGCTGGCAGTTCCTGGGGCCTGATGTGGAGCAAGCGCAGGGCCA

TATATCCCGGAGGACGGCACAGTCAGTGAATTCCAGAGAGAAGCAACTCAGCCACACTC

CCCAGGCAGAGCCCGAGAGGGACGCCCACGCACAGGGAGGCAGAGCCCAGCACCTCCG

CAGCCAGCACCACCTGCGCACGGGCCACCACCTTGCAGGCACAGAGTGGGTGCTGAGAG

GAGGGGCAGGGACACCAGGCAGGGTGAGCACCCAGAGAAAACTGCAGACGCCTCACAC

ATCCACCTCAGCCTCCCCTGACCTGGACCTCACTGGCCTGGGCCTCACTTAACCTGGGCT

TCACCTGACCTTGGCCTCACCTGACTTGGACCTCGCCTGTCCCAAGCTTTACCTGACCTGG

GCCTCAACTCACCTGAACGTCTCCTGACCTGGGTTTAACCTGTCCTGGAACTCACCTGGC

CTTGGCTTCCCCTGACCTGGACCTCATCTGGCCTGGGCTTCACCTGGCCTGGGCCTCACCT

GACCTGGACCTCATCTGGCCTGGACCTCACCTGGCCTGGACTTCACCTGGCCTGGGCTTC

ACCTGACCTGGACCTCACCTGGCCTCGGGCCTCACCTGCACCTGCTCCAGGTCTTGCTGG

AGCCTGAGTAGCACTGAGGGTGCAGAAGCTCATCCAGGGTTGGGGAATGACTCTAGAAG

TCTCCCACATCTGACCTTTCTGGGTGGAGGCAGCTGGTGGCCCTGGGAATATAAAAATCT

CCAGAATGATGACTCTGTGATTTGTGGGCAACTTATGAACCCGAAAGGACATGGCCATG

GGGTGGGTAGGGACATAGGGACAGATGCCAGCCTGAGGTGGAGCCTCAGGACACAGGT

GGGCACGGACACTATCCACATAAGCGAGGGATAGACCCGAGTGTCCCCACAGCAGACCT

GAGAGCGCTGGGCCCACAGCCTCCCCTCAGAGCCCTGCTGCCTCCTCCGGTCAGCCCTGG

ACATCCCAGGTTTCCCCAGGCCTGGCGGTAGGATTTTGTTGAGGTCTGTGTCACTGTGCA

TGGCAGCTACTGCCAGCCCGCAGCCACTGGCTACTGAGGATGCCGATTCTGAGAATAGC

AGCTTCTACTACTATGACTACCTGGATGAAGTAGCTTTCATGCTCCACAGTGTCACAGAG

TCCATCAAAAACCCATGCCTGGAAGCTTCCCGCCACAGCCCTCCCCATGGGGCCCTGCTG

CCTCCTCAGGTCAGCCCCGGACATCCCGGGTTTCCCCAGGCTGGGCGGTAGGATTTTGTT

GAGGTCTGTGTCACTGTGCCAAACCCATGAAAACCCCAAGGGAGTTTGGAACAGCCATG

CCGATTTCGGCGGGCATGGCACCATTTGGAAGCTCTTCCTCCGGTTCCAGCAGAACCTGC

TACCCACAGTGTCACAGAGTCCATCAAAAACCCATCCCTGGGAGCCTCCCGCCACAGCCC

TCCCTGCAGGGGACCGGTACGTGCCATGTTAGGATTTTGATCGAGGAGACAGCACCATG

GGTATGGTGGCTACCACAGCAGTGCAGCCTGTGACCCAAACCCGCAGGGCAGCAGGCAC

GATGGACAGGCCCGTGACTGACCACGCTGGGCTCCAGCCTGCCAGCCCTGGAGATCATG

AAACAGATGGCCAAGGTCACCCTACAGGTCATCCAGATCTGGCTCCGAGGGGTCTGCAT

CGCTGCTGCCCTCCCAACGCCAGTCCAAATGGGACAGGGACGGCCTCACAGCACCATCT

GCTGCCATCAGGCCAGCGATCCCAGAAGCCCCTCCCTCAAGGCTGGGCACATGTGTGGA

CACTGAGAGCCCTCATATCTGAGTAGGGGCACCAGGAGGGAGGGGCTGGCCCTGTGCAC

TGTCCCTGCCCCTGTGGTCCCTGGCCTGCCTGGCCCTGACACCTGAGCCTCTCCTGGGTCA

TTTCCAAGACAGAAGACATTCCTGGGGACAGCCGGAGCTGGGCGTCGCTCATCCTGCCC

GGCCGTCCTGAGTCCTGCTCATTTCCAGACCTCACCGGGGAAGCCAACAGAGGACTCGCC

TCCCACATTCAGAGACAAAGAACCTTCCAGAAATCCCTGCCTCTCTCCCCAGTGGACACC

CTCTTCCAGGACAGTCCTCAGTGGCATCACAGCGGCCTGAGATCCCCAGGACGCAGCAC

CGCTGTCAATAGGGGCCCCAAATGCCTGGACCAGGGCCTGCGTGGGAAAGGCCTCTGGC

CACACTCGGGGATTTTGTGAAGGGCCCTCCCACTGTGCCAGCTTCTTGAGCCATGCCGAT

TTCGGCGGGCATGGCACCATTTGGAAGCTCTTCCTCCGGTTCCAGCAGAACCTGCTACCA

CAGTGATGAACCCAGTGTCAAAAACCGGCTGGAAACCCAGGGGCTGTGTGCACGCCTCA

GCTTGGAGCTCTCCAGGAGCACAAGAGCCGGGCCCAAGGATTTGTGCCCAGACCCTCAG

CCTCTAGGGACACCTGGGTCATCTCAGCCTGGGCTGGTGCCCTGCACACCATCTTCCTCC

AAATAGGGGCTTCAGAGGGCTCTGAGGTGACCTCACTCATGACCACAGGTGACCTGGCC

CTTCCCTGCCAGCTATACCAGACCCTGTCTTGACAGATGCCCCGATTCCAACAGCCAATT

CCTGGGACCCTGAATAGCTGTAGACACCAGCCTCATTCCAGTACCTCCTGCCAATTGCCT

GGATTCCCATCCTGGCTGGAATCAAGAAGGCAGCATCCGCCAGGCTCCCAACAGGCAGG

ACTCCCGCACACCCTCCTCTGAGAGGCCGCTGTGTTCCGCAGGGCCAGGCCCTGGACAGT

TCCCCTCACCTGCCACTAGAGAAACACCTGCCATTGTCGTCCCCACCTGGAAAAGACCAC

TCGTGGAGCCCCCAGCCCCAGGTACAGCTGTAGAGACAGTCCTCGAGGCCCCTAAGAAG

GAGCCATGCCCAGTTCTGCCGGGACCCTCGGCCAGGCCGACAGGAGTGGACGCTGGAGC

TGGGCCCACACTGGGCCACATAGGAGCTCACCAGTGAGGGCAGGAGAGCACATGCCGGG

GAGCACCCAGCCTCCTGCTGACCAGAGGCCCGTCCCAGAGCCCAGGAGGCTGCAGAGGC

CTCTCCAGGGAGACACTGTGCATGTCTGGTACCTAAGCAGCCCCCCACGTCCCCAGTCCT

GGGGGCCCCTGGCTCAGCTGTCTGGGCCCTCCCTGCTCCCTGGGAAGCTCCTCCTGACAG

CCCCGCCTCCAGTTCCAGGTGTGGATTTTGTCAGGCGATGTCACACTGTGCAGCTTCTTG

AGCAAGCACAGTGGTGCCGCCCATATCAAAAACCAGGCCAAGTAGACAGGCCCCTGCTG

CGCAGCCCCAGGCATCCACTTCACCTGCTTCTCCTGGGGCTCTCAAGGCTGCTGTCTGTCC

TCTGGCCCTCTGTGGGGAGGGTTCCCTCAGTGGGAGGTCTGTGCTCCAGGGCAGGGATGA

TTGAGATAGAAATCAAAGGCTGGCAGGGAAAGGCAGCTTCCCGCCCTGAGAGGTGCAGG

CAGCACCACGGAGCCACGGAGTCACAGAGCCACGGAGCCCCCATTGTGGGCATTTGAGA

GTGCTGTGCCCCCGGCAGGCCCAGCCCTGATGGGGAAGCCTGTCCCATCCCACAGCCCG

GGTCCCACGGGCAGCGGGCACAGAAGCTGCCAGGTTGTCCTCTATGATCCTCATCCCTCC

AGCAGCATCCCCTCCACAGTGGGGAAACTGAGGCTTGGAGCACCACCCGGCCCCCTGGA

AATGAGGCTGTGAGCCCAGACAGTGGGCCCAGAGCACTGTGAGTACCCCGGCAGTACCT

GGCTGCAGGGATCAGCCAGAGATGCCAAACCCTGAGTGACCAGCCTACAGGAGGATCCG

GCCCCACCCAGGCCACTCGATTAATGCTCAACCCCCTGCCCTGGAGACCTCTTCCAGTAC

CACCAGCAGCTCAGCTTCTCAGGGCCTCATCCCTGCAAGGAAGGTCAAGGGCTGGGCCT

GCCAGAAACACAGCACCCTCCCTAGCCCTGGCTAAGACAGGGTGGGCAGACGGCTGTGG

ACGGGACATATTGCTGGGGCATTTCTCACTGTCACTTCTGGGTGGTAGCTCTGACAAAAA

CGCAGACCCTGCCAAAATCCCCACTGCCTCCCGCTAGGGGCTGGCCTGGAATCCTGCTGT

CCTAGGAGGCTGCTGACCTCCAGGATGGCTCCGTCCCCAGTTCCAGGGCGAGAGCAGAT

CCCAGGCAGGCTGTAGGCTGGGAGGCCACCCCTGCCCTTGCCGGGGTTGAATGCAGGTG

CCCAAGGCAGGAAATGGCATGAGCACAGGGATGACCGGGACATGCCCCACCAGAGTGC

GCCCCTTCCTGCTCTGCACCCTGCACCCCCCAGGCCAGCCCACGACGTCCAACAACTGGG

CCTGGGTGGCAGCCCCACCCAGACAGGACAGACCCAGCACCCTGAGGAGGTCCTGCCAG

GGGGAGCTAAGAGCCATGAAGGAGCAAGATATGGGGCCCCCGATACAGGCACAGATGT

CAGCTCCATCCAGGACCACCCAGCCCACACCCTGAGAGGAACGTCTGTCTCCAGCCTCTG

CAGGTCGGGAGGCAGCTGACCCCTGACTTGGACCCCTATTCCAGACACCAGACAGAGGC

GCAGGCCCCCCAGAACCAGGGTTGAGGGACGCCCCGTCAAAGCCAGACAAAACCAAGG

GGTGTTGAGCCCAGCAAGGGAAGGCCCCCAAACAGACCAGGAGGATTTTGTAGGTGTCT

GTGTCACTGTGCATGGCAGCTACTGCCAGCCCGCAGCCACTGGCTACTGAGGATGCCGAT

TCTGAGAATAGCAGCTTCTACTACTATGACTACCTGGATGAAGTAGCTTTCATGCTCAGC

CGGAAGGATGCTGTGGTTAGCTTTGGCAAAGTTTTCCTGCCACCCACAGTGACACTCACC

CAGTCAAAAACCCCATTCCAAGTCAGCGGAAGCAGAGAGAGCAGGGAGGACACGTTTA

GGATCTGAGACTGCACCTGACACCCAGGCCAGCAGACGTCTCCCCTCCAGGGCACCCCA

CCCTGTCCTGCATTTCTGCAAGATCAGGGGCGGCCTGAGGGGGGGTCTAGGGTGAGGAG

ATGGGTCCCCTGTACACCAAGGAGGAGTTAGGCAGGTCCCGAGCACTCTTAATTAAACG

ACGCCTCGAATGGAACTACTACAACGAATGGTTGCTCTACGTAATGCATTCGCTACCTTA

GGACCGTTATAGTTAGGCGCGCC

D6-DH114619 (SEQ ID NO: 133) includes D6 coding sequence

inserted in positions corresponding to D_H1-14 to D_H6-19:

TACGTATTAATTAAACGACGCCTCGAATGGAACTACTACAACGAATGGTTGCTCTCCCCA

TTGAGGCTGACCTGCCCAGAGAGTCCTGGGCCCACCCCACACACCGGGGCGGAATGTGT

GCAGGCCTCGGTCTCTGTGGGTGTTCCGCTAGCTGGGGCTCACAGTGCTCACCCCACACC

TAAAATGAGCCACAGCCTCCGGAGCCCCCGCAGGAGACCCCGCCCACAAGCCCAGCCCC

CACCCAGGAGGCCCCAGAGCTCAGGGCGCCCCGTCGGATTTTGTACAGCCCCGAGTCAC

TGTGCAAACCCATGAAAACCCCAAGGGAGTTTGGAACCACAGTGAGAATAGCTACGTCA

AAAACCGTCCAGTGGCCACTGCCGGAGGCCCCGCCAGAGAGGGCAGCAGCCACTCTGAT

CCCATGTCCTGCCGGCTCCCATGACCCCCAGCACGCGGAGCCCCACAGTGTCCCCACTGG

ATGGGAGGACAAGAGCTGGGGATTCCGGCGGGTCGGGGCAGGGGCTTGATCGCATCCTT

CTGCCGTGGCTCCAGTGCCCCTGGCTGGAGTTGACCCTTCTGACAAGTGTCCTCAGAGAG

ACAGGCATCACCGGCGCCTCCCAACATCAACCCCAGGCAGCACAGGCACAAACCCCACA

TCCAGAGCCAACTCCAGGAGCAGAGACACCCCAATACCCTGGGGGACCCCGACCCTGAT

GACTTCCCACTGGAATTCGCCGTAGAGTCCACCAGGACCAAAGACCCTGCCTCTGCCTCT

GTCCCTCACTCAGGACCTGCTGCCGGGCGAGGCCTTGGGAGCAGACTTGGGCTTAGGGG

ACACCAGTGTGACCCCGACCTTGACCAGGACGCAGACCTTTCCTTCCTTTCCTGGGGCAG

CACAGACTTTGGGGTCTGGGCCAGGAGGAACTTCTGGCAGGTCGCCAAGCACAGAGGCC

ACAGGCTGAGGTGGCCCTGGAAAGACCTCCAGGAGGTGGCCACTCCCCTTCCTCCCAGCT

GGACCCCATGTCCTCCCCAAGATAAGGGTGCCATCCAAGGCAGGTGCTCCTTGGAGCCCC

ATTCAGACTCCTCCCTGGACCCCACTGGGCCTCAGTCCCAGCTCTGGGGATGAAGCCACC

ACAAGCACACCAGGCAGCCCAGGCCCAGCCACCCTGCAGTGCCCAAGCACACACTCTGG

AGCAGAGCAGGGTGCCTCTGGGAGGGGCTGAGCTCCCCACCCCACCCCCACCTGCACAC

CCCACCCACCCCTGCCCAGCGGCTCTGCAGGAGGGTCAGAGCCCCACATGGGGTATGGA

CTTAGGGTCTCACTCACGTGGCTCCCATCATGAGTGAAGGGGCCTCAAGCCCAGGTTCCC

ACAGCAGCGCCTGTCGCAAGTGGAGGCAGAGGCCCGAGGGCCACCCTGACCTGGTCCCT

GAGGTTCCTGCAGCCCAGGCTGCCCTGCTGTCCCTGGGAGGCCTGGGCTCCACCAGACCA

CAGGTCCAGGGCACCGGGTGCAGGAGCCACCCACACACAGCTCACAGGAAGAAGATAA

GCTCCAGACCCCCAGGGCCAGAACCTGCCTTCCTGCTACTGCTTCCTGCCCCAGACCTGG

GCGCCCTCCCCCGTCCACTTACACACAGGCCAGGAAGCTGTTCCCACACAGAACAACCCC

AAACCAGGACCGCCTGGCACTCAGGTGGCTGCCATTTCCTTCTCCATTTGCTCCCAGCGC

CTCTGTCCTCCCTGGTTCCTCCTTCGGGGGAACAGCCTGTGCAGCCAGTCCCTGCAGCCC

ACACCCTGGGGAGACCCAACCCTGCCTGGGGCCCTTCCAACCCTGCTGCTCTTACTGCCC

ACCCAGAAAACTCTGGGGTCCTGTCCCTGCAGTCCCTACCCTGGTCTCCACCCAGACCCC

TGTGTATCACTCCAGACACCCCTCCCAGGCAAACCCTGCACCTGCAGGCCCTGTCCTCTT

CTGTCGCTAGAGCCTCAGTTTCTCCCCCCTGTGCCCACACCCTACCTCCTCCTGCCCACAA

CTCTAACTCTTCTTCTCCTGGAGCCCCTGAGCCATGGCATTGACCCTGCCCTCCCACCACC

CACAGCCCATGCCCTCACCTTCCTCCTGGCCACTCCGACCCCGCCCCCTCTCAGGCCAAG

CCCTGGTATTTCCAGGACAAAGGCTCACCCAAGTCTTTCCCAGGCAGGCCTGGGCTCTTG

CCCTCACTTCCCGGTTACACGGGAGCCTCCTGTGCACAGAAGCAGGGAGCTCAGCCCTTC

CACAGGCAGAAGGCACTGAAAGAAATCGGCCTCCAGCACCTTGACACACGTCCGCCCGT

GTCTCTCACTGCCCGCACCTGCAGGGAGGCTCCGCACTCCCTCTAAAGACAAGGGATCCA

GGCAGCAGCATCACGGGAGAATGCAGGGCTCCCAGACATCCCAGTCCTCTCACAGGCCT

CTCCTGGGAAGAGACCTGCAGCCACCACCAAACAGCCACAGAGGCTGCTGGATAGTAAC

TGAGTCAATGACCGACCTGGAGGGCAGGGGAGCAGTGAGCCGGAGCCCATACCATAGG

GACAGAGACCAGCCGCTGACATCCCGAGCTCCTCAATGGTGGCCCCATAACACACCTAG

GAAACATAACACACCCACAGCCCCACCTGGAACAGGGCAGAGACTGCTGAGCCCCCAGC

ACCAGCCCCAAGAAACACCAGGCAACAGTATCAGAGGGGGCTCCCGAGAAAGAGAGGA

GGGGAGATCTCCTTCACCATCAAATGCTTCCCTTGACCAAAAACAGGGTCCACGCAACTC

CCCCAGGACAAAGGAGGAGCCCCCTATACAGCACTGGGCTCAGAGTCCTCTCTGAGACA

CCCTGAGTTTCAGACAACAACCCGCTGGAATGCACAGTCTCAGCAGGAGAACAGACCAA

AGCCAGCAAAAGGGACCTCGGTGACACCAGTAGGGACAGGAGGATTTTGTGGGGGCTCG

TGTCACTGTGCAAACCCATGAAAACCCCAAGGGAGTTTGGAACTGCAAGCACAGTGACA

CAGACCCATTCAAAAACCCCTACTGCAAACACACCCACTCCTGGGGCTGAGGGGCTGGG

GGAGCGTCTGGGAAGTAGGGTCCAGGGGTGTCTATCAATGTCCAAAATGCACCAGACTC

CCCGCCAAACACCACCCCACCAGCCAGCGAGCAGGGTAAACAGAAAATGAGAGGCTCTG

GGAAGCTTGCACAGGCCCCAAGGAAAGAGCTTTGGCGGGTGTGCAAGAGGGGATGCAG

GCAGAGCCTGAGCAGGGCCTTTTGCTGTTTCTGCTTTCCTGTGCAGAGAGTTCCATAAAC

TGGTGTTCAAGATCAGTGGCTGGGAATGAGCCCAGGAGGGCAGTCTGTGGGAAGAGCAC

AGGGAAGGAGGAGCAGCCGCTATCCTACACTGTCATCTTTCAAAAGTTTGCCTTGTGACC

ACACTATTGCATCATGGGATGCTTAAGAGCTGATGTAGACACAGCTAAAGAGAGAATCA

GTGAGATGAATTTGCAGCATAGATCTGAATAAACTCTCCAGAATGTGGAGCAGTACAGA

AGCAAACACACAGAAAGTGCCTGATGCAAGGACAAAGTTCAGTGGGCACCTTCAGGCAT

TGCTGCTGGGCACAGACACTCTGAAAAGCCTTGGCAGGATCTCCCTGCGACAAAGCAGA

ACCCTCAGGCAATGCCAGCCCCAGAGCCCTCCCTGAGAGCGTCATGGGGAAAGATGTGC

AGAACAGCTGATTATCATAGACTCAAACTGAGAACAGAGCAAACGTCCATCTGAAGAAC

AGTCAAATAAGCAATGGTAGGTTCATGCAATGCAAACCCAGACAGCCAGGGGACAACAG

TAGAGGGCTACAGGCGGCTTTGCGGTTGAGTTCATGACAATGCTGAGTAATTGGAGTAA

CAGAGGAAAGCCCAAAAAATACTTTTAATGTGATTTCTTCTAAATAAAATTTACACCAGG

CAAAATGAACTGTCTTCTTAAGGGATAAACTTTCCCCTGGAAAAACTACAAGGAAAATT

AAGAAAACGATGATCACATAAACACAGTTGTGGTTACTTCTACTGGGGAAGGAAGAGGG

TATGAGCTGAGACACACAGAGTCGGCAAGTCTCCAAGCAAGCACAGAACGAATACATTA

CAGTACCTTGAATACAGCAGTTAAACTTCTAAATCGCAAGAACAGGAAAATGCACACAG

CTGTGTTTAGAAAATTCTCAGTCCAGCACTATTCATAATAGCAAAGACATTAACCCAGGT

TGGATAAATAAATGATGACACAGGCAATTGCACAATGATACAGACATACATTTAGTACA

TGAGACATCGATGATGTATCCCCAAAGAAATGACTTTAAAGAGAAAAGGCCTGATGTGT

GGTGGCACTCACCTCCCTGGGATCCCCGGACAGGTTGCAGGCACACTGTGTGGCAGGGC

AGGCTGGTACATGCTGGCAGCTCCTGGGGCCTGATGTGGAGCAAGCGCAGGGCTGTATA

CCCCCAAGGATGGCACAGTCAGTGAATTCCAGAGAGAAGCAGCTCAGCCACACTGCCCA

GGCAGAGCCCGAGAGGGACGCCCACGTACAGGGAGGCAGAGCCCAGCTCCTCCACAGC

CACCACCACCTGTGCACGGGCCACCACCTTGCAGGCACAGAGTGGGTGCTGAGAGGAGG

GGCAGGGACACCAGGCAGGGTGAGCACCCAGAGAAAACTGCAGAAGCCTCACACATCC

ACCTCAGCCTCCCCTGACCTGGACCTCACCTGGTCTGGACCTCACCTGGCCTGGGCCTCA

CCTGACCTGGACCTCACCTGGCCTGGGCTTCACCTGACCTGGACCTCACCTGGCCTCCGG

CCTCACCTGCACCTGCTCCAGGTCTTGCTGGAACCTGAGTAGCACTGAGGCTGCAGAAGC

TCATCCAGGGTTGGGGAATGACTCTGGAACTCTCCCACATCTGACCTTTCTGGGTGGAGG

CATCTGGTGGCCCTGGGAATATAAAAAGCCCCAGAATGGTGCCTGCGTGATTTGGGGGC

AATTTATGAACCCGAAAGGACATGGCCATGGGGTGGGTAGGGACATAGGGACAGATGCC

AGCCTGAGGTGGAGCCTCAGGACACAGTTGGACGCGGACACTATCCACATAAGCGAGGG

ACAGACCCGAGTGTTCCTGCAGTAGACCTGAGAGCGCTGGGCCCACAGCCTCCCCTCGGT

GCCCTGCTGCCTCCTCAGGTCAGCCCTGGACATCCCGGGTTTCCCCAGGCCAGATGGTAG

GATTTTGTTGAGGTCTGTGTCACTGTGCATGGCAGCTACTGCCAGCCCGCAGCCACTGGC

TACTGAGGATGCCGATTCTGAGAATAGCAGCTTCTACTACTATGACTACCTGGATGAAGT

AGCTTTCATGCTCTGCGAGGTTAGCCAGCATCTAGACTATGCCCACAGTGTCACACGGTC

CATCAAAAACCCATGCCACAGCCCTCCCCGCAGGGGACCGCCGCGTGCCATGTTACGATT

TTGATCGAGGACACAGCGCCATGGGTATGGTGGCTACCACAGCAGTGCAGCCCATGACC

CAAACACACAGGGCAGCAGGCACAATGGACAGGCCTGTGAGTGACCATGCTGGGCTCCA

GCCCGCCAGCCCCGGAGACCATGAAACAGATGGCCAAGGTCACCCCACAGTTCAGCCAG

ACATGGCTCCGTGGGGTCTGCATCGCTGCTGCCCTCTAACACCAGCCCAGATGGGGACAA

GGCCAACCCCACATTACCATCTCCTGCTGTCCACCCAGTGGTCCCAGAAGCCCCTCCCTC

ATGGCTGAGCCACATGTGTGAACCCTGAGAGCACCCCATGTCAGAGTAGGGGCAGCAGA

AGGGCGGGGCTGGCCCTGTGCACTGTCCCTGCACCCATGGTCCCTCGCCTGCCTGGCCCT

GACACCTGAGCCTCTTCTGAGTCATTTCTAAGATAGAAGACATTCCCGGGGACAGCCGGA

GCTGGGCGTCGCTCATCCCGCCCGGCCGTCCTGAGTCCTGCTTGTTTCCAGACCTCACCA

GGGAAGCCAACAGAGGACTCACCTCACACAGTCAGAGACAAAGAACCTTCCAGAAATCC

CTGTCTCACTCCCCAGTGGGCACCTTCTTCCAGGACATTCCTCGGTCGCATCACAGCAGG

CACCCACATCTGGATCAGGACGGCCCCCAGAACACAAGATGGCCCATGGGGACAGCCCC

ACAACCCAGGCCTTCCCAGACCCCTAAAAGGCGTCCCACCCCCTGCACCTGCCCCAGGGC

TAAAAATCCAGGAGGCTTGACTCCCGCATACCCTCCAGCCAGACATCACCTCAGCCCCCT

CCTGGAGGGGACAGGAGCCCGGGAGGGTGAGTCAGACCCACCTGCCCTCGATGGCAGGC

GGGGAAGATTCAGAAAGGCCTGAGATCCCCAGGACGCAGCACCACTGTCAATGGGGGCC

CCAGACGCCTGGACCAGGGCCTGCGTGGGAAAGGCCGCTGGGCACACTCAGGGGGATTT

TGTGAAGGCCCCTCCCACTGTGCAGCTTCTTGTGCAAGCAAACCCATGAAAACCCCAAGG

GAGTTTGGAACTGCCATGCCGATTTCGGCGGGCATGGCACCATTTGGAAGCTCTTCCTCC

GGTTCCAGCAGAACCTGCTACCACAGTGATGAAACTAGCATCAAAAACCGGCCGGACAC

CCAGGGACCATGCACACTTCTCAGCTTGGAGCTCTCCAGGACCAGAAGAGTCAGGTCTG

AGGGTTTGTAGCCAGACCCTCGGCCTCTAGGGACACCCTGGCCATCACAGCGGATGGGC

TGGTGCCCCACATGCCATCTGCTCCAAACAGGGGCTTCAGAGGGCTCTGAGGTGACTTCA

CTCATGACCACAGGTGCCCTGGCCCCTTCCCCGCCAGCTACACCGAACCCTGTCCCAACA

GCTGCCCCAGTTCCAACAGCCAATTCCTGGGGCCCAGAATTGCTGTAGACACCAGCCTCG

TTCCAGCACCTCCTGCCAATTGCCTGGATTCACATCCTGGCTGGAATCAAGAGGGCAGCA

TCCGCCAGGCTCCCAACAGGCAGGACTCCCGCACACCCTCCTCTGAGAGGCCGCTGTGTT

CCGCAGGGCCAGGCCCTGGACAGTTCCCCTCACCTGCCACTAGAGAAACACCTGCCATTG

TCGTCCCCACCTGGAAAAGACCACTCGTGGAGCCCCCAGCCCCAGGTACAGCTGTAGAG

AGACTCCCCGAGGGATCTAAGAAGGAGCCATGCGCAGTTCTGCCGGGACCCTCGGCCAG

GCCGACAGGAGTGGACACTGGAGCTGGGCCCACACTGGGCCACATAGGAGCTCACCAGT

GAGGGCAGGAGAGCACATGCCGGGGAGCACCCAGCCTCCTGCTGACCAGAGGCCCGTCC

CAGAGCCCAGGAGGCTGCAGAGGCCTCTCCAGGGGGACACTGTGCATGTCTGGTCCCTG

AGCAGCCCCCCACGTCCCCAGTCCTGGGGGCCCCTGGCACAGCTGTCTGGACCCTCCCTG

TTCCCTGGGAAGCTCCTCCTGACAGCCCCGCCTCCAGTTCCAGGTGTGGATTTTGTCAGG

GGGTGTCACACTGTGCTGCATACCCTGCTGGACCTGCAAGTATTCGGCAACTGTCGGAAG

GATGCTGTGGTTAGCTTTGGCAAAGTTTTCCTGCCACCACAGTGGTGCTGCCCATATCAA

AAACCAGGCCAAGTAGACAGGCCCCTGCTGTGCAGCCCCAGGCCTCCACTTCACCTGCTT

CTCCTGGGGCTCTCAAGGTCACTGTTGTCTGTACTCTGCCCTCTGTGGGGAGGGTTCCCTC

AGTGGGAGGTCTGTTCTCAACATCCCAGGGCCTCATGTCTGCACGGAAGGCCAATGGAT

GGGCAACCTCACATGCCGCGGCTAAGATAGGGTGGGCAGCCTGGCGGGGGACAGTACAT

ACTGCTGGGGTGTCTGTCACTGTGCCTAGTGGGGCACTGGCTCCCAAACAACGCAGTCCT

CGCCAAAATCCCCACAGCCTCCCCTGCTAGGGGCTGGCCTGATCTCCTGCAGTCCTAGGA

GGCTGCTGACCTCCAGAATGTCTCCGTCCCCAGTTCCAGGGCGAGAGCAGATCCCAGGCC

GGCTGCAGACTGGGAGGCCACCCCCTCCTTCCCAGGGTTCACTGGAGGTGACCAAGGTA

GGAAATGGCCTTAACACAGGGATGACTGCGCCATCCCCCAACAGAGTCAGCCCCCTCCT

GCTCTGTACCCCGCACCCCCCAGGCCAGTCCACGAAAACCAGGGCCCCACATCAGAGTC

ACTGCCTGGCCCGGCCCTGGGGCGGACCCCTCAGCCCCCACCCTGTCTAGAGGACTTGGG

GGGACAGGACACAGGCCCTCTCCTTATGGTTCCCCCACCTGCCTCCGGCCGGGACCCTTG

GGGTGTGGACAGAAAGGACACCTGCCTAATTGGCCCCCAGGAACCCAGAACTTCTCTCC

AGGGACCCCAGCCCGAGCACCCCCTTACCCAGGACCCAGCCCTGCCCCTCCTCCCCTCTG

CTCTCCTCTCATCACCCCATGGGAATCCGGTATCCCCAGGAAGCCATCAGGAAGGGCTGA

AGGAGGAAGCGGGGCCGTGCACCACCGGGCAGGAGGCTCCGTCTTCGTGAACCCAGGGA

AGTGCCAGCCTCCTAGAGGGTATGGTCCACCCTGCCTGGGGCTCCCACCGTGGCAGGCTG

CGGGGAAGGACCAGGGACGGTGTGGGGGAGGGCTCAGGGCCCTGCGGGTGCTCCTCCAT

CTTCGGTGAGCCTCCCCCTTCACCCACCGTCCCGCCCACCTCCTCTCCACCCTGGCTGCAC

GTCTTCCACACCATCCTGAGTCCTACCTACACCAGAGCCAGCAAAGCCAGTGCAGACAA

AGGCTGGGGTGCAGGGGGGCTGCCAGGGCAGCTTCGGGGAGGGAAGGATGGAGGGAGG

GGAGGTCAGTGAAGAGGCCCCCTTCCCCTGGGTCCAGGATCCTCCTCTGGGACCCCCGGA

TCCCATCCCCTCCTGGCTCTGGGAGGAGAAGCAGGATGGGAGAATCTGTGCGGGACCCT

CTCACAGTGGAATATCCCCACAGCGGCTCAGGCCAGACCCAAAAGCCCCTCAGTGAGCC

CTCCACTGCAGTCCTGGGCCTGGGTAGCAGCCCCTCCCACAGAGGACAGACCCAGCACC

CCGAAGAAGTCCTGCCAGGGGGAGCTCAGAGCCATGAAAGAGCAGGATATGGGGTCCCC

GATACAGGCACAGACCTCAGCTCCATCCAGGCCCACCGGGACCCACCATGGGAGGAACA

CCTGTCTCCGGGTTGTGAGGTAGCTGGCCTCTGTCTCGGACCCCACTCCAGACACCAGAC

AGAGGGGCAGGCCCCCCAAAACCAGGGTTGAGGGATGATCCGTCAAGGCAGACAAGAC

CAAGGGGCACTGACCCCAGCAAGGGAAGGCTCCCAAACAGACGAGGAGGATTTTGTAGC

TGTCTGTATCACTGTGCAGCTTCTTGTGCCATGCCGATTTCGGCGGGCATGGCACCATTTG

GAAGCTCTTCCTCCGGTTCCAGCAGAACCTGCTACCACAGTGACACTCGCCAGGTCAAAA

ACCCCGTCCCAAGTCAGCGGAAGCAGAGAGAGCAGGGAGGACACGTTTAGGATCTGAG

GCCGCACCTGACACCCAGGGCAGCAGACGTCTCCCCTCCAGGGCACCCTCCACCGTCCTG

CGTTTCTTCAAGAATAGGGGCGGCCTGAGGGGGTCCAGGGCCAGGCGATAGGTCCCCTC

TACCCCAAGGAGGAGCCAGGCAGGACCCGAGCACCGATGCATCTAACGCAGTCATGTAA

TGCTGGGTGACAGTCAGTTCGCCTACGTA

D6-DH120126 (SEQ ID NO: 134) includes D6 coding sequence

inserted in positions corresponding to D_H1-20 to D_H1-26:

TACGTAATGCATCTAACGCAGTCATGTAATGCTGGGTGACAGTCAGTTCGCCTCCCCATT

GAGGCTGACCTGCCCAGACGGGCCTGGGCCCACCCCACACACCGGGGCGGAATGTGTGC

AGGCCCCAGTCTCTGTGGGTGTTCCGCTAGCTGGGGCCCCCAGTGCTCACCCCACACCTA

AAGCGAGCCCCAGCCTCCAGAGCCCCCTAAGCATTCCCCGCCCAGCAGCCCAGCCCCTG

CCCCCACCCAGGAGGCCCCAGAGCTCAGGGCGCCTGGTCGGATTTTGTACAGCCCCGAG

TCACTGTGCATGCCGATTTCGGCGGGCATGGCACCATTTGGAAGCTCTTCCTCCGGTTCC

AGCAGAACCTGCTACCACAGTGAGAAAAACTGTGTCAAAAACCGACTCCTGGCAGCAGT

CGGAGGCCCCGCCAGAGAGGGGAGCAGCCGGCCTGAACCCATGTCCTGCCGGTTCCCAT

GACCCCCAGCACCCAGAGCCCCACGGTGTCCCCGTTGGATAATGAGGACAAGGGCTGGG

GGCTCCGGTGGTTTGCGGCAGGGACTTGATCACATCCTTCTGCTGTGGCCCCATTGCCTCT

GGCTGGAGTTGACCCTTCTGACAAGTGTCCTCAGAAAGACAGGGATCACCGGCACCTCC

CAATATCAACCCCAGGCAGCACAGACACAAACCCCACATCCAGAGCCAACTCCAGGAGC

AGAGACACCCCAACACTCTGGGGGACCCCAACCGTGATAACTCCCCACTGGAATCCGCC

CCAGAGTCTACCAGGACCAAAGGCCCTGCCCTGTCTCTGTCCCTCACTCAGGGCCTCCTG

CAGGGCGAGCGCTTGGGAGCAGACTCGGTCTTAGGGGACACCACTGTGGGCCCCAACTT

TGATGAGGCCACTGACCCTTCCTTCCTTTCCTGGGGCAGCACAGACTTTGGGGTCTGGGC

AGGGAAGAACTACTGGCTGGTGGCCAATCACAGAGCCCCCAGGCCGAGGTGGCCCCAAG

AAGGCCCTCAGGAGGTGGCCACTCCACTTCCTCCCAGCTGGACCCCAGGTCCTCCCCAAG

ATAGGGGTGCCATCCAAGGCAGGTCCTCCATGGAGCCCCCTTCAGACTCCTCCCGGGACC

CCACTGGACCTCAGTCCCTGCTCTGGGAATGCAGCCACCACAAGCACACCAGGAAGCCC

AGGCCCAGCCACCCTGCAGTGGGCAAGCCCACACTCTGGAGCAGAGCAGGGTGCGTCTG

GGAGGGGCTAACCTCCCCACCCCCCACCCCCCATCTGCACACAGCCACCTACCACTGCCC

AGACCCTCTGCAGGAGGGCCAAGCCACCATGGGGTATGGACTTAGGGTCTCACTCACGT

GCCTCCCCTCCTGGGAGAAGGGGCCTCATGCCCAGATCCCTGCAGCACTAGACACAGCT

GGAGGCAGTGGCCCCAGGGCCACCCTGACCTGGCATCTAAGGCTGCTCCAGCCCAGACA

GCACTGCCGTTCCTGGGAAGCCTGGGCTCCACCAGACCACAGGTCCAGGGCACAGCCCA

CAGGAGCCACCCACACACAGCTCACAGGAAGAAGATAAGCTCCAGACCCCAGGGCGGG

ACCTGCCTTCCTGCCACCACTTACACACAGGCCAGGGAGCTGTTCCCACACAGATCAACC

CCAAACCGGGACTGCCTGGCACTAGGGTCACTGCCATTTCCCTCTCCATTCCCTCCCAGT

GCCTCTGTGCTCCCTCCTTCTGGGGAACACCCTGTGCAGCCCCTCCCTGCAGCCCACACG

CTGGGGAGACCCCACCCTGCCTCGGGCCTTTTCTACCTGCTGCACTTGCCGCCCACCCAA

ACAACCCTGGGTACGTGACCCTGCAGTCCTCACCCTGATCTGCAACCAGACCCCTGTCCC

TCCCTCTAAACACCCCTCCCAGGCCAACTCTGCACCTGCAGGCCCTCCGCTCTTCTGCCAC

AAGAGCCTCAGGTTTTCCTACCTGTGCCCACCCCCTAACCCCTCCTGCCCACAACTTGAG

TTCTTCCTCTCCTGGAGCCCTTGAGCCATGGCACTGACCCTACACTCCCACCCACACACTG

CCCATGCCATCACCTTCCTCCTGGACACTCTGACCCCGCTCCCCTCCCTCTCAGACCCGGC

CCTGGTATTTCCAGGACAAAGGCTCACCCAAGTCTTCCCCATGCAGGCCCTTGCCCTCAC

TGCCTGGTTACACGGGAGCCTCCTGTGCGCAGAAGCAGGGAGCTCAGCTCTTCCACAGG

CAGAAGGCACTGAAAGAAATCAGCCTCCAGTGCCTTGACACACGTCCGCCTGTGTCTCTC

ACTGCCTGCACCTGCAGGGAGGCTCCGCACTCCCTCTAAAGATGAGGGATCCAGGCAGC

AACATCACGGGAGAATGCAGGGCTCCCAGACAGCCCAGCCCTCTCGCAGGCCTCTCCTG

GGAAGAGACCTGCAGCCACCACTGAACAGCCACGGAGGTCGCTGGATAGTAACCGAGTC

AGTGACCGACCTGGAGGGCAGGGGAGCAGTGAACCGGAGCCCATACCATAGGGACAGA

GACCAGCCGCTAACATCCCGAGCCCCTCACTGGCGGCCCCAGAACACCCCGTGGAAAGA

GAACAGACCCACAGTCCCACCTGGAACAGGGCAGACACTGCTGAGCCCCCAGCACCAGC

CCCAAGAAACACTAGGCAACAGCATCAGAGGGGGCTCCTGAGAAAGAGAGGAGGGGAG

GTCTCCTTCACCATCAAATGCTTCCCTTGACCAAAAACAGGGTCCACGCAACTCCCCCAG

GACAAAGGAGGAGCCCCCTGTACAGCACTGGGCTCAGAGTCCTCTCTGAGACAGGCTCA

GTTTCAGACAACAACCCGCTGGAATGCACAGTCTCAGCAGGAGAGCCAGGCCAGAGCCA

GCAAGAGGAGACTCGGTGACACCAGTCTCCTGTAGGGACAGGAGGATTTTGTGGGGGTT

CGTGTCACTGTGCAAACCCATGAAAACCCCAAGGGAGTTTGGAACAGCAAGCACAGTGA

CACAACCCCATTCAAAAACCCCTACTGCAAACGCACCCACTCCTGGGACTGAGGGGCTG

GGGGAGCGTCTGGGAAGTATGGCCTAGGGGTGTCCATCAATGCCCAAAATGCACCAGAC

TCTCCCCAAGACATCACCCCACCAGCCAGTGAGCAGAGTAAACAGAAAATGAGAAGCAG

CTGGGAAGCTTGCACAGGCCCCAAGGAAAGAGCTTTGGCAGGTGTGCAAGAGGGGATGT

GGGCAGAGCCTCAGCAGGGCCTTTTGCTGTTTCTGCTTTCCTGTGCAGAGAGTTCCATAA

ACTGGTATTCAAGATCAATGGCTGGGAGTGAGCCCAGGAGGACAGTGTGGGAAGAGCAC

AGGGAAGGAGGAGCAGCCGCTATCCTACACTGTCATCTTTTGAAAGTTTGCCCTGTGCCC

ACAATGCTGCATCATGGGATGCTTAACAGCTGATGTAGACACAGCTAAAGAGAGAATCA

GTGAAATGGATTTGCAGCACAGATCTGAATAAATCCTCCAGAATGTGGAGCAGCACAGA

AGCAAGCACACAGAAAGTGCCTGATGCCAAGGCAAAGTTCAGTGGGCACCTTCAGGCAT

TGCTGCTGGGCACAGACACTCTGAAAAGCACTGGCAGGAACTGCCTGTGACAAAGCAGA

ACCCTCAGGCAATGCCAGCCCTAGAGCCCTTCCTGAGAACCTCATGGGCAAAGATGTGC

AGAACAGCTGTTTGTCATAGCCCCAAACTATGGGGCTGGACAAAGCAAACGTCCATCTG

AAGGAGAACAGACAAATAAACGATGGCAGGTTCATGAAATGCAAACTAGGACAGCCAG

AGGACAACAGTAGAGAGCTACAGGCGGCTTTGCGGTTGAGTTCATGACAATGCTGAGTA

ATTGGAGTAACAGAGGAAAGCCCAAAAAATACTTTTAATGTGATTTCTTCTAAATAAAAT

TTACACCCGGCAAAATGAACTATCTTCTTAAGGGATAAACTTTCCCCTGGAAAAACTATA

AGGAAAATCAAGAAAACGATGATCACATAAACACAGTGGTGGTTACTTCTACTGGGGAA

GGAAGAGGGTATGAGCTGAGACACACAGAGTCGGCAAGTCTCCTAACAAGAACAGAAC

AAATACATTACAGTACCTTGAAAACAGCAGTTAAACTTCTAAATCGCAAGAAGAGGAAA

ATGCACACACCTGTGTTTAGAAAATTCTCAGTCCAGCACTGTTCATAATAGCAAAGACAT

TAACCCAGGTTGGATAAATAAGCGATGACACAGGCAATTGCACAATGATACAGACATAC

ATTCAGTATATGAGACATCGATGATGTATCCCCAAAGAAATGACTTTAAAGAGAAAAGG

CCTGATGTGTGGTGGCAATCACCTCCCTGGGCATCCCCGGACAGGCTGCAGGCTCACTGT

GTGGCAGGGCAGGCAGGCACCTGCTGGCAGCTCCTGGGGCCTGATGTGGAGCAGGCACA

GAGCTGTATATCCCCAAGGAAGGTACAGTCAGTGCATTCCAGAGAGAAGCAACTCAGCC

ACACTCCCTGGCCAGAACCCAAGATGCACACCCATGCACAGGGAGGCAGAGCCCAGCAC

CTCCGCAGCCACCACCACCTGCGCACGGGCCACCACCTTGCAGGCACAGAGTGGGTGCT

GAGAGGAGGGGCAGGGACACCAGGCAGGGTGAGCACCCAGAGAAAACTGCAGAAGCCT

CACACATCCCTCACCTGGCCTGGGCTTCACCTGACCTGGACCTCACCTGGCCTCGGGCCT

CACCTGCACCTGCTCCAGGTCTTGCTGGAGCCTGAGTAGCACTGAGGCTGTAGGGACTCA

TCCAGGGTTGGGGAATGACTCTGCAACTCTCCCACATCTGACCTTTCTGGGTGGAGGCAC

CTGGTGGCCCAGGGAATATAAAAAGCCCCAGAATGATGCCTGTGTGATTTGGGGGCAAT

TTATGAACCCGAAAGGACATGGCCATGGGGTGGGTAGGGACAGTAGGGACAGATGTCAG

CCTGAGGTGAAGCCTCAGGACACAGGTGGGCATGGACAGTGTCCACCTAAGCGAGGGAC

AGACCCGAGTGTCCCTGCAGTAGACCTGAGAGCGCTGGGCCCACAGCCTCCCCTCGGGG

CCCTGCTGCCTCCTCAGGTCAGCCCTGGACATCCCGGGTTTCCCCAGGCCTGGCGGTAGG

ATTTTGTTGAGGTCTGTGTCACTGTGCATGGCAGCTACTGCCAGCCCGCAGCCACTGGCT

ACTGAGGATGCCGATTCTGAGAATAGCAGCTTCTACTACTATGACTACCTGGATGAAGTA

GCTTTCATGCTCAGCGAGGTTAGCCAGCATCTAGACTATGCCCACAGTGTCACAGAGTCC

ATCAAAAACCCATGCCTGGGAGCCTCCCACCACAGCCCTCCCTGCGGGGGACCGCTGCA

TGCCGTGTTAGGATTTTGATCGAGGACACGGCGCCATGGGTATGGTGGCTACCACAGCA

GTGCAGCCCATGACCCAAACACACGGGGCAGCAGAAACAATGGACAGGCCCACAAGTG

ACCATGATGGGCTCCAGCCCACCAGCCCCAGAGACCATGAAACAGATGGCCAAGGTCAC

CCTACAGGTCATCCAGATCTGGCTCCAAGGGGTCTGCATCGCTGCTGCCCTCCCAACGCC

AAACCAGATGGAGACAGGGCCGGCCCCATAGCACCATCTGCTGCCGTCCACCCAGCAGT

CCCGGAAGCCCCTCCCTGAACGCTGGGCCACGTGTGTGAACCCTGCGAGCCCCCCATGTC

AGAGTAGGGGCAGCAGGAGGGCGGGGCTGGCCCTGTGCACTGTCACTGCCCCTGTGGTC

CCTGGCCTGCCTGGCCCTGACACCTGAGCCTCTCCTGGGTCATTTCCAAGACATTCCCAG

GGACAGCCGGAGCTGGGAGTCGCTCATCCTGCCTGGCTGTCCTGAGTCCTGCTCATTTCC

AGACCTCACCAGGGAAGCCAACAGAGGACTCACCTCACACAGTCAGAGACAACGAACCT

TCCAGAAATCCCTGTTTCTCTCCCCAGTGAGAGAAACCCTCTTCCAGGGTTTCTCTTCTCT

CCCACCCTCTTCCAGGACAGTCCTCAGCAGCATCACAGCGGGAACGCACATCTGGATCA

GGACGGCCCCCAGAACACGCGATGGCCCATGGGGACAGCCCAGCCCTTCCCAGACCCCT

AAAAGGTATCCCCACCTTGCACCTGCCCCAGGGCTCAAACTCCAGGAGGCCTGACTCCTG

CACACCCTCCTGCCAGATATCACCTCAGCCCCCTCCTGGAGGGGACAGGAGCCCGGGAG

GGTGAGTCAGACCCACCTGCCCTCAATGGCAGGCGGGGAAGATTCAGAAAGGCCTGAGA

TCCCCAGGACGCAGCACCACTGTCAATGGGGGCCCCAGACGCCTGGACCAGGGCCTGTG

TGGGAAAGGCCTCTGGCCACACTCAGGGGGATTTTGTGAAGGGCCCTCCCACTGTGGAG

GTTAGCCAGCATCTAGACTATGCCCACAGTGATGAAACCAGCATCAAAAACCGACCGGA

CTCGCAGGGTTTATGCACACTTCTCGGCTCGGAGCTCTCCAGGAGCACAAGAGCCAGGCC

CGAGGGTTTGTGCCCAGACCCTCGGCCTCTAGGGACACCCGGGCCATCTTAGCCGATGGG

CTGATGCCCTGCACACCGTGTGCTGCCAAACAGGGGCTTCAGAGGGCTCTGAGGTGACTT

CACTCATGACCACAGGTGCCCTGGTCCCTTCACTGCCAGCTGCACCAGACCCTGTTCCGA

GAGATGCCCCAGTTCCAAAAGCCAATTCCTGGGGCCGGGAATTACTGTAGACACCAGCC

TCATTCCAGTACCTCCTGCCAATTGCCTGGATTCCCATCCTGGCTGGAATCAAGAGGGCA

GCATCCGCCAGGCTCCCAACAGGCAGGACTCCCACACACCCTCCTCTGAGAGGCCGCTGT

GTTCCGCAGGGCCAGGCCGCAGACAGTTCCCCTCACCTGCCCATGTAGAAACACCTGCCA

TTGTCGTCCCCACCTGGCAAAGACCACTTGTGGAGCCCCCAGCCCCAGGTACAGCTGTAG

AGAGAGTCCTCGAGGCCCCTAAGAAGGAGCCATGCCCAGTTCTGCCGGGACCCTCGGCC

AGGCCGACAGGAGTGGACGCTGGAGCTGGGCCCACACTGGGCCACATAGGAGCTCACCA

GTGAGGGCAGGAGAGCACATGCCGGGGAGCACCCAGCCTCCTGCTGACCAGAGACCCGT

CCCAGAGCCCAGGAGGCTGCAGAGGCCTCTCCAGGGGGACACAGTGCATGTCTGGTCCC

TGAGCAGCCCCCAGGCTCTCTAGCACTGGGGGCCCCTGGCACAGCTGTCTGGACCCTCCC

TGTTCCCTGGGAAGCTCCTCCTGACAGCCCCGCCTCCAGTTCCAGGTGTGGATTTTGTCA

GGGGGTGCCACACTGTGCTGCATACCCTGCTGGACCTGCAAGTATTCGGCAACAGTCGG

AAGGATGCTGTGGTTAGCTTTGGCAAAGTTTTCCTGCCACCACAGTGGTGCCGCCCATAT

CAAAAACCAGGCCAAGTAGACAGACCCCTGCCACGCAGCCCCAGGCCTCCAGCTCACCT

GCTTCTCCTGGGGCTCTCAAGGCTGCTGTCTGCCCTCTGGCCCTCTGTGGGGAGGGTTCCC

TCAGTGGGAGGTCTGTGCTCCAGGGCAGGGATGACTGAGATAGAAATCAAAGGCTGGCA

GGGAAAGGCAGCTTCCCGCCCTGAGAGGTGCAGGCAGCACCACAGAGCCATGGAGTCAC

AGAGCCACGGAGCCCCCAGTGTGGGCGTGTGAGGGTGCTGGGCTCCCGGCAGGCCCAGC

CCTGATGGGGAAGCCTGCCCCGTCCCACAGCCCAGGTCCCCAGGGGCAGCAGGCACAGA

AGCTGCCAAGCTGTGCTCTACGATCCTCATCCCTCCAGCAGCATCCACTCCACAGTGGGG

AAACTGAGCCTTGGAGAACCACCCAGCCCCCTGGAAACAAGGCGGGGAGCCCAGACAGT

GGGCCCAGAGCACTGTGTGTATCCTGGCACTAGGTGCAGGGACCACCCGGAGATCCCCA

TCACTGAGTGGCCAGCCTGCAGAAGGACCCAACCCCAACCAGGCCGCTTGATTAAGCTC

CATCCCCCTGTCCTGGGAACCTCTTCCCAGCGCCACCAACAGCTCGGCTTCCCAGGCCCT

CATCCCTCCAAGGAAGGCCAAAGGCTGGGCCTGCCAGGGGCACAGTACCCTCCCTTGCC

CTGGCTAAGACAGGGTGGGCAGACGGCTGCAGATAGGACATATTGCTGGGGCATCTTGC

TCTGTGACTACTGGGTACTGGCTCTCAACGCAGACCCTACCAAAATCCCCACTGCCTCCC

CTGCTAGGGGCTGGCCTGGTCTCCTCCTGCTGTCCTAGGAGGCTGCTGACCTCCAGGATG

GCTTCTGTCCCCAGTTCTAGGGCCAGAGCAGATCCCAGGCAGGCTGTAGGCTGGGAGGC

CACCCCTGTCCTTGCCGAGGTTCAGTGCAGGCACCCAGGACAGGAAATGGCCTGAACAC

AGGGATGACTGTGCCATGCCCTACCTAAGTCCGCCCCTTTCTACTCTGCAACCCCCACTC

CCCAGGTCAGCCCATGACGACCAACAACCCAACACCAGAGTCACTGCCTGGCCCTGCCC

TGGGGAGGACCCCTCAGCCCCCACCCTGTCTAGAGGAGTTGGGGGGACAGGACACAGGC

TCTCTCCTTATGGTTCCCCCACCTGGCTCCTGCCGGGACCCTTGGGGTGTGGACAGAAAG

GACGCCTGCCTAATTGGCCCCCAGGAACCCAGAACTTCTCTCCAGGGACCCCAGCCCGA

GCACCCCCTTACCCAGGACCCAGCCCTGCCCCTCCTCCCCTCTGCTCTCCTCTCATCACTC

CATGGGAATCCAGAATCCCCAGGAAGCCATCAGGAAGGGCTGAAGGAGGAAGCGGGGC

CGCTGCACCACCGGGCAGGAGGCTCCGTCTTCGTGAACCCAGGGAAGTGCCAGCCTCCT

AGAGGGTATGGTCCACCCTGCCTGGGGCTCCCACCGTGGCAGGCTGCGGGGAAGGACCA

GGGACGGTGTGGGGGAGGGCTCAGGGCCCTGCAGGTGCTCCATCTTGGATGAGCCCATC

CCTCTCACCCACCGACCCGCCCACCTCCTCTCCACCCTGGCCACACGTCGTCCACACCAT

CCTGAGTCCCACCTACACCAGAGCCAGCAGAGCCAGTGCAGACAGAGGCTGGGGTGCAG

GGGGGCCGCCAGGGCAGCTTTGGGGAGGGAGGAATGGAGGAAGGGGAGGTCAGTGAAG

AGGCCCCCCTCCCCTGGGTCTAGGATCCACCTTTGGGACCCCCGGATCCCATCCCCTCCA

GGCTCTGGGAGGAGAAGCAGGATGGGAGATTCTGTGCAGGACCCTCTCACAGTGGAATA

CCTCCACAGCGGCTCAGGCCAGATACAAAAGCCCCTCAGTGAGCCCTCCACTGCAGTGC

AGGGCCTGGGGGCAGCCCCTCCCACAGAGGACAGACCCAGCACCCCGAAGAAGTCCTGC

CAGGGGGAGCTCAGAGCCATGAAGGAGCAAGATATGGGGACCCCAATACTGGCACAGA

CCTCAGCTCCATCCAGGCCCACCAGGACCCACCATGGGTGGAACACCTGTCTCCGGCCCC

TGCTGGCTGTGAGGCAGCTGGCCTCTGTCTCGGACCCCCATTCCAGACACCAGACAGAGG

GACAGGCCCCCCAGAACCAGTGTTGAGGGACACCCCTGTCCAGGGCAGCCAAGTCCAAG

AGGCGCGCTGAGCCCAGCAAGGGAAGGCCCCCAAACAAACCAGGAGGTTTCTGAAGCTG

TCTGTGTCACAGTCGGGCATAGCCACGGCTACCACAATGACACTGGGCAGGACAGAAAC

CCCATCCCAAGTCAGCCGAAGGCAGAGAGAGCAGGCAGGACACATTTAGGATCTGAGGC

CACACCTGACACTCAAGCCAACAGATGTCTCCCCTCCAGGGCGCCCTGCCCTGTTCAGTG

TTCCTGAGAAAACAGGGGCAGCCTGAGGGGATCCAGGGCCAGGAGATGGGTCCCCTCTA

CCCCGAGGAGGAGCCAGGCGGGAATCCCAGCCCCCTCCCCATTGAGGCCATCCTGCCCA

GAGGGGCCCGGACCCACCCCACACACCCAGGCAGAATGTGTGCAGGCCTCAGGCTCTGT

GGGTGCCGCTAGCTGGGGCTGCCAGTCCTCACCCCACACCTAAGGTGAGCCACAGCCGC

CAGAGCCTCCACAGGAGACCCCACCCAGCAGCCCAGCCCCTACCCAGGAGGCCCCAGAG

CTCAGGGCGCCTGGGTGGATTTTGTACAGCCCCGAGTCACTGTGCTGCATACCCTGCTGG

ACCTGCAAGTATTCGGCAACCACAGTGAGAAAAGCTATGTCAAAAACCGTCTCCCGGCC

ACTGCTGGAGGCCCAGCCAGAGAAGGGACCAGCCGCCCGAACATACGACCTTCCCAGAC

CTCATGACCCCCAGCACTTGGAGCTCCACAGTGTCCCCATTGGATGGTGAGGATGGGGGC

CGGGGCCATCTGCACCTCCCAACATCACCCCCAGGCAGCACAGGCACAAACCCCAAATC

CAGAGCCGACACCAGGAACACAGACACCCCAATACCCTGGGGGACCCTGGCCCTGGTGA

CTTCCCACTGGGATCCACCCCCGTGTCCACCTGGATCAAAGACCCCACCGCTGTCTCTGT

CCCTCACTCAGGGCCTGCTGAGGGGCGGGTGCTTTGGAGCAGACTCAGGTTTAGGGGCC

ACCATTGTGGGGCCCAACCTCGACCAGGACACAGATTTTTCTTTCCTGCCCTGGGGCAAC

ACAGACTTTGGGGTCTGTGCAGGGAGGACCTTCTGGAAAGTCACCAAGCACAGAGCCCT

GACTGAGGTGGTCTCAGGAAGACCCCCAGGAGGGGGCTTGTGCCCCTTCCTCTCATGTGG

ACCCCATGCCCCCCAAGATAGGGGCATCATGCAGGGCAGGTCCTCCATGCAGCCACCAC

TAGGCAACTCCCTGGCGCCGGTCCCCACTGCGCCTCCATCCCGGCTCTGGGGATGCAGCC

ACCATGGCCACACCAGGCAGCCCGGGTCCAGCAACCCTGCAGTGCCCAAGCCCTTGGCA

GGATTCCCAGAGGCTGGAGCCCACCCCTCCTCATCCCCCCACACCTGCACACACACACCT

ACCCCCTGCCCAGTCCCCCTCCAGGAGGGTTGGAGCCGCCCATAGGGTGGGGGCTCCAG

GTCTCACTCACTCGCTTCCCTTCCTGGGCAAAGGAGCCTCGTGCCCCGGTCCCCCCTGAC

GGCGCTGGGCACAGGTGTGGGTACTGGGCCCCAGGGCTCCTCCAGCCCCAGCTGCCCTG

CTCTCCCTGGGAGGCCTGGGCACCACCAGACCACCAGTCCAGGGCACAGCCCCAGGGAG

CCGCCCACTGCCAGCTCACAGGAAGAAGATAAGCTTCAGACCCTCAGGGCCGGGAGCTG

CCTTCCTGCCACCCCTTCCTGCCCCAGACCTCCATGCCCTCCCCCAACCACTTACACACAA

GCCAGGGAGCTGTTTCCACACAGTTCAACCCCAAACCAGGACGGCCTGGCACTCGGGTC

ACTGCCATTTCTGTCTGCATTCGCTCCCAGCGCCCCTGTGTTCCCTCCCTCCTCCCTCCTTC

CTTTCTTCCTGCATTGGGTTCATGCCGCAGAGTGCCAGGTGCAGGTCAGCCCTGAGCTTG

GGGTCACCTCCTCACTGAAGGCAGCCTCAGGGTGCCCAGGGGCAGGCAGGGTGGGGGTG

AGGCTTCCAGCTCCAACCGCTTCGCTACCTTAGGACCGTTATAGTTAGGCGCGCCGTCGA

CCAATTCTCATGTTTGACAGCTTATCATCGAATTTCTACGTA

Toxin Coding Sequences

Exemplary nucleotide coding sequences (DNA and amino acid (AA)) of toxins (e.g., μ-conotoxin and tarantula toxin ProTxII) for construction of an engineered D_Hregion as described herein are set forth in Table 4.

TABLE 4

SmIIIA C1SC4S DNA

GAGAGAAGCTGCAATGGCAGACGCGGCTGCAGCAGCAGATGGAGCCGCGATCATAGCAGGTG

CTGC (SEQ ID NO: 180)

SmIIIA C1SC4S AA

ERSCNGRRGCSSRWSRDHSRCC (SEQ ID NO: 181)

CSSRWC DNA

AGGATATTGTAGCAGCAGATGGTGCTATACC (SEQ ID NO: 182)

CSSRWC AA

GYCSSRWCYT (SEQ ID NO: 183)

KIIIA mini DNA

GTATTACGATTGCAACTGCAGCAGATGGCGCGACCATAGCAGGTGCTGCTATTATACC (SEQ ID

NO: 184)

KIIIA mini AA

YYDCNCSRWRDHSRCCYYT (SEQ ID NO: 185)

PIIIA C1SC4S DNA

GAGAGGCTTAGCTGTGGCTTCCCTAAGAGCTGCCGCAGCAGGCAAAGCAAGCCTCACAGATGC

TGC (SEQ ID NO: 186)

PIIIA C1SC4S AA

ERLSCGFPKSCRSRQSKPHRCC (SEQ ID NO: 187)

ProTxII C1SC4S DNA

TACAGCCAGAAGTGGATGTGGACTTGCGATAGTGAGAGGAAGTGCAGTGAGGGTATGGTATGC

CGGCTGTGGTGTAAGAAGAAGCTCTGG (SEQ ID NO: 188)

ProTxII C1SC4S AA

YSQKWMWTCDSERKCSEGMVCRLWCKKKLW (SEQ ID NO: 189)

KIIIA C1SC4S DNA

AGCTGCAACTGCAGCAGCAAATGGAGCCGCGACCATAGCAGGTGCTGC (SEQ ID NO: 190)

KIIIA C1SC4S AA

SCNCSSKWSRDHSRCC (SEQ ID NO: 191)

SmIIIA mini DNA

GAGAGATGCAATGGCAGACGCGGCTGCAGCAGATGGCGCGATCATAGCAGGTGCTGC (SEQ ID

NO: 192)

SmIIIA mini AA

ERCNGRRGCSRWRDHSRCC (SEQ ID NO: 193)

RSRQ insertion DNA

AGGATATTGTACTAATCGGAGCAGGCAGGGTGTATGCTATACC (SEQ ID NO: 194)

RSRQ insertion AA

GYCTNRSRQGVCYT (SEQ ID NO: 195)

KIIIA midi DNA

GTATTACGATTGCAACTGCAGCAGATGGGCTCGCGACCATAGCAGGTGCTGCTATTATAAC

(SEQ ID NO: 196)

KIIIA midi AA

YYDCNCSRWARDHSRCCYYN (SEQ ID NO: 197)

SmIIIA mini DNA

GTATTACTATGAGAGATGCAATGGCAGACGCGGCTGCAGCAGATGGCGCGATCATAGCAGGTG

CTGCTATTATAAC (SEQ ID NO: 198)

SmIIIA mini AA

YYYERCNGRRGCSRWRDHSRCCYYN (SEQ ID NO: 199)

PIIIA mini DNA

GAGAGGCTTTGTGGCTTCCCTAAGAGCTGCAGCAGGCAAAAGCCTCACAGATGCTGC (SEQ ID

NO: 200)

PIIIA mini AA

ERLCGFPKSCSRQKPHRCC (SEQ ID NO: 201)

ProTxII C2SC5S DNA

TACTGCCAGAAGTGGATGTGGACTAGCGATAGTGAGAGGAAGTGCTGTGAGGGTATGGTAAGC

CGGCTGTGGTGTAAGAAGAAGCTCTGG (SEQ ID NO: 202)

ProTxII C2SC5S AA

YCQKWMWTSDSERKCCEGMVSRLWCKKKLW (SEQ ID NO: 203)

KIIIA mini DNA

TGCAACTGCAGCAGATGGCGCGACCATAGCAGGTGCTG (SEQ ID NO: 204)

KIIIA mini AA

CNCSRWRDHSRC (SEQ ID NO: 205)

SmIIIA fl DNA

GAGAGATGCTGCAATGGCAGACGCGGCTGCAGCAGCAGATGGTGCCGCGATCATAGCAGGTGC

TGC (SEQ ID NO: 206)

SmIIIA fl AA

ERCCNGRRGCSSRWCRDHSRCC (SEQ ID NO: 207)

SSRW insertion DNA

AGGATATTGTAGTGGTAGCAGCAGATGGGGTAGCTGCTACTCC (SEQ ID NO: 208)

SSRW insertion AA

GYCSGSSRWGSCYS (SEQ ID NO: 209)

SmIIIA midi DNA

GTATTATGATTACGAGAGAGCTTGCAATGGCAGACGCGGCTGCAGCAGATGGGCTCGCGATCA

TAGCAGGTGCTGCTATCGTTATACC (SEQ ID NO: 210)

SmIIIA midi AA

YYDYERACNGRRGCSRWARDHSRCCYRYT (SEQ ID NO: 211)

PIIIA midi DNA

GAGAGGCTTGCTTGTGGCTTCCCTAAGAGCTGCAGCAGGCAAGCTAAGCCTCACAGATGCTGC

(SEQ ID NO: 212)

PIIIA midi AA

ERLACGFPKSCSRQAKPHRCC (SEQ ID NO: 213)

ProTxII C3SC6S DNA

TACTGCCAGAAGTGGATGTGGACTTGCGATAGTGAGAGGAAGAGCTGTGAGGGTATGGTATGC

CGGCTGTGGAGTAAGAAGAAGCTCTGG (SEQ ID NO: 214)

ProTxII C3SC6S AA

YCQKWMWTCDSERKSCEGMVCRLWSKKKLW (SEQ ID NO: 215)

KIIIA midi DNA

TGCAACTGCAGCAGATGGGCTCGCGACCATAGCAGGTGCTG (SEQ ID NO: 216)

KIIIA midi AA

CNCSRWARDHSRC (SEQ ID NO: 217)

SmIIIA midi DNA

GAGAGAGCTTGCAATGGCAGACGCGGCTGCAGCAGATGGGCTCGCGATCATAGCAGGTGCTGC

(SEQ ID NO: 218)

SmIIIA midi AA

ERACNGRRGCSRWARDHSRCC (SEQ ID NO: 219)

CRSRQC DNA

AGCATATTGTCGGAGCAGGCAGTGCTATTCC (SEQ ID NO: 220)

CRSRQC AA

AYCRSRQCYS (SEQ ID NO: 221)

PIIIA midi DNA

GTATTACTATGAGAGGCTTGCTTGTGGCTTCCCTAAGAGCTGCAGCAGGCAAGCTAAGCCTCAC

AGATGCTGCTATTACTAC (SEQ ID NO: 222)

PIIIA midi AA

YYYERLACGFPKSCSRQAKPFIRCCYYY (SEQ ID NO: 223)

PIIA fl DNA

GAGAGGCTTTGCTGTGGCTTCCCTAAGAGCTGCCGCAGCAGGCAATGCAAGCCTCACAGATGCT

GC (SEQ ID NO: 224)

PIIA fl AA

ERLCCGFPKSCRSRQCKPFIRCC (SEQ ID NO: 225)

ProTxII fl DNA

TACTGCCAGAAGTGGATGTGGACTTGCGATAGTGAGAGGAAGTGCTGTGAGGGTATGGTATGC

CGGCTGTGGTGTAAGAAGAAGCTCTGG (SEQ ID NO: 226)

ProTxII fl AA

YCQKWMWTCDSERKCCEGMVCRLWCKKKLW (SEQ ID NO: 227)

KIIIA fl DNA

TGCTGCAACTGCAGCAGCAAATGGTGCCGCGACCATAGCAGGTGCTGC (SEQ ID NO: 228)

KIIIA fl AA

CCNCSSKWCRDHSRCC (SEQ ID NO: 229)

PIIIA mini DNA

GGTATAGTGGGGAGAGGCTTTGTGGCTTCCCTAAGAGCTGCAGCAGGCAAAAGCCTCACAGAT

GCTGCAGCTACTAC (SEQ ID NO: 230)

PIIIA mini AA

YSGERLCGFPKSCSRQKPFIRCCSYY (SEQ ID NO: 231)

Exemplary DNA fragments containing toxin nucleotide coding sequences for construction of an engineered D_Hregion are provided below.

TX-DH1166 (SEQ ID NO: 232) includes toxin coding sequences inserted in

positions corresponding to D_H1-1 to D_H6-6:

TACGTAGCCGTTTCGATCCTCCCGAATTGACTAGTGGGTAGGCCTGGCGGCCGCTGCCAT

TTCATTACCTCTTTCTCCGCACCCGACATAGATACCGGTGGATTCGAATTCTCCCCGTTGA

AGCTGACCTGCCCAGAGGGGCCTGGGCCCACCCCACACACCGGGGCGGAATGTGTACAG

GCCCCGGTCTCTGTGGGTGTTCCGCTAACTGGGGCTCCCAGTGCTCACCCCACAACTAAA

GCGAGCCCCAGCCTCCAGAGCCCCCGAAGGAGATGCCGCCCACAAGCCCAGCCCCCATC

CAGGAGGCCCCAGAGCTCAGGGCGCCGGGGCGGATTTTGTACAGCCCCGAGTCACTGTG

GAGAGAAGCTGCAATGGCAGACGCGGCTGCAGCAGCAGATGGAGCCGCGATCATAGCA

GGTGCTGCCACAGTGAGAAAAACTGTGTCAAAAACCGTCTCCTGGCCCCTGCTGGAGGC

CGCGCCAGAGAGGGGAGCAGCCGCCCCGAACCTAGGTCCTGCTCAGCTCACACGACCCC

CAGCACCCAGAGCACAACGGAGTCCCCATTGAATGGTGAGGACGGGGACCAGGGCTCCA

GGGGGTCATGGAAGGGGCTGGACCCCATCCTACTGCTATGGTCCCAGTGCTCCTGGCCAG

AACTGACCCTACCACCGACAAGAGTCCCTCAGGGAAACGGGGGTCACTGGCACCTCCCA

GCATCAACCCCAGGCAGCACAGGCATAAACCCCACATCCAGAGCCGACTCCAGGAGCAG

AGACACCCCAGTACCCTGGGGGACACCGACCCTGATGACTCCCCACTGGAATCCACCCC

AGAGTCCACCAGGACCAAAGACCCCGCCCCTGTCTCTGTCCCTCACTCAGGACCTGCTGC

GGGGCGGGCCATGAGACCAGACTCGGGCTTAGGGAACACCACTGTGGCCCCAACCTCGA

CCAGGCCACAGGCCCTTCCTTCCTGCCCTGCGGCAGCACAGACTTTGGGGTCTGTGCAGA

GAGGAATCACAGAGGCCCCAGGCTGAGGTGGTGGGGGTGGAAGACCCCCAGGAGGTGG

CCCACTTCCCTTCCTCCCAGCTGGAACCCACCATGACCTTCTTAAGATAGGGGTGTCATC

CGAGGCAGGTCCTCCATGGAGCTCCCTTCAGGCTCCTCCCCGGTCCTCACTAGGCCTCAG

TCCCGGCTGCGGGAATGCAGCCACCACAGGCACACCAGGCAGCCCAGACCCAGCCAGCC

TGCAGTGCCCAAGCCCACATTCTGGAGCAGAGCAGGCTGTGTCTGGGAGAGTCTGGGCT

CCCCACCGCCCCCCCGCACACCCCACCCACCCCTGTCCAGGCCCTATGCAGGAGGGTCAG

AGCCCCCCATGGGGTATGGACTTAGGGTCTCACTCACGTGGCTCCCCTCCTGGGTGAAGG

GGTCTCATGCCCAGATCCCCACAGCAGAGCTGGTCAAAGGTGGAGGCAGTGGCCCCAGG

GCCACCCTGACCTGGACCCTCAGGCTCCTCTAGCCCTGGCTGCCCTGCTGTCCCTGGGAG

GCCTGGACTCCACCAGACCACAGGTCCAGGGCACCGCCCATAGGTGCTGCCCACACTCA

GTTCACAGGAAGAAGATAAGCTCCAGACCCCCAAGACTGGGACCTGCCTTCCTGCCACC

GCTTGTAGCTCCAGACCTCCGTGCCTCCCCCGACCACTTACACACGGGCCAGGGAGCTGT

TCCACAAAGATCAACCCCAAACCGGGACCGCCTGGCACTCGGGCCGCTGCCACTTCCCTC

TCCATTTGTTCCCAGCACCTCTGTGCTCCCTCCCTCCTCCCTCCTTCAGGGGAACAGCCTG

TGCAGCCCCTCCCTGCACCCCACACCCTGGGGAGGCCCAACCCTGCCTCCAGCCCTTTCT

CCCCCGCTGCTCTTCCTGCCCATCCAGACAACCCTGGGGTCCCATCCCTGCAGCCTACAC

CCTGGTCTCCACCCAGACCCCTGTCTCTCCCTCCAGACACCCCTCCCAGGCCAACCCTGC

ACATGCAGGCCCTCCCCTTTTCTGCTGCCAGAGCCTCAGTTTCTACCCTCTGTGCCTACCC

CCTGCCTCCTCCTGCCCACAACTCGAGCTCTTCCTCTCCTGGGGCCCCTGAGCCATGGCAC

TGACCGTGCACTCCCACCCCCACACTGCCCATGCCCTCACCTTCCTCCTGGACACTCTGAC

CCCGCTCCCCTCTTGGACCCAGCCCTGGTATTTCCAGGACAAAGGCTCACCCAAGTCTTC

CCCATGCAGGCCCTTGCCCTCACTGCCCGGTTACACGGCAGCCTCCTGTGCACAGAAGCA

GGGAGCTCAGCCCTTCCACAGGCAGAAGGCACTGAAAGAAATCGGCCTCCAGCACCCTG

ATGCACGTCCGCCTGTGTCTCTCACTGCCCGCACCTGCAGGGAGGCTCGGCACTCCCTGT

AAAGACGAGGGATCCAGGCAGCAACATCATGGGAGAATGCAGGGCTCCCAGACAGCCC

AGCCCTCTCGCAGGCCTCTCCTGGGAAGAGACCTGCAGCCACCACTGAACAGCCACGGA

GCCCGCTGGATAGTAACTGAGTCAGTGACCGACCTGGAGGGCAGGGGAGCAGTGAACCG

GAGCCCAGACCATAGGGACAGAGACCAGCCGCTGACATCCCGAGCCCCTCACTGGCGGC

CCCAGAACACCGCGTGGAAACAGAACAGACCCACATTCCCACCTGGAACAGGGCAGACA

CTGCTGAGCCCCCAGCACCAGCCCTGAGAAACACCAGGCAACGGCATCAGAGGGGGCTC

CTGAGAAAGAAAGGAGGGGAGGTCTCCTTCACCAGCAAGTACTTCCCTTGACCAAAAAC

AGGGTCCACGCAACTCCCCCAGGACAAAGGAGGAGCCCCCTGTACAGCACTGGGCTCAG

AGTCCTCTCCCACACACCCTGAGTTTCAGACAAAAACCCCCTGGAAATCATAGTATCAGC

AGGAGAACTAGCCAGAGACAGCAAGAGGGGACTCAGTGACTCCCGCGGGGACAGGAGG

ATTTTGTGGGGGCTCGTGTCACTGTGAGGATATTGTAGCAGCAGATGGTGCTATACCCAC

AGTGACACAGCCCCATTCAAAAACCCCTGCTGTAAACGCTTCCACTTCTGGAGCTGAGGG

GCTGGGGGGAGCGTCTGGGAAGTAGGGCCTAGGGGTGGCCATCAATGCCCAAAACGCAC

CAGACTCCCCCCCAGACATCACCCCACTGGCCAGTGAGCAGAGTAAACAGAAAATGAGA

AGCAGCTGGGAAGCTTGCACAGGCCCCAAGGAAAGAGCTTTGGCGGGTGTGCAAGAGG

GGATGCGGGCAGAGCCTGAGCAGGGCCTTTTGCTGTTTCTGCTTTCCTGTGCAGATAGTT

CCATAAACTGGTGTTCAAGATCGATGGCTGGGAGTGAGCCCAGGAGGACAGTGTGGGAA

GGGCACAGGGAAGGAGAAGCAGCCGCTATCCTACACTGTCATCTTTCAAGAGTTTGCCCT

GTGCCCACAATGCTGCATCATGGGATGCTTAACAGCTGATGTAGACACAGCTAAAGAGA

GAATCAGTGAAATGGATTTGCAGCACAGATCTGAATAAATTCTCCAGAATGTGGAGCCA

CACAGAAGCAAGCACAAGGAAAGTGCCTGATGCAAGGGCAAAGTACAGTGTGTACCTTC

AGGCTGGGCACAGACACTCTGAAAAGCCTTGGCAGGAACTCCCTGCAACAAAGCAGAGC

CCTGCAGGCAATGCCAGCTCCAGAGCCCTCCCTGAGAGCCTCATGGGCAAAGATGTGCA

CAACAGGTGTTTCTCATAGCCCCAAACTGAGAATGAAGCAAACAGCCATCTGAAGGAAA

ACAGGCAAATAAACGATGGCAGGTTCATGAAATGCAAACCCAGACAGCCAGAAGGACA

ACAGTGAGGGTTACAGGTGACTCTGTGGTTGAGTTCATGACAATGCTGAGTAATTGGAGT

AACAAAGGAAAGTCCAAAAAATACTTTCAATGTGATTTCTTCTAAATAAAATTTACAGCC

GGCAAAATGAACTATCTTCTTAAGGGATAAACTTTCCACTAGGAAAACTATAAGGAAAA

TCAAGAAAAGGATGATCACATAAACACAGTGGTCGTTACTTCTACTGGGGAAGGAAGAG

GGTATGAACTGAGACACACAGGGTTGGCAAGTCTCCTAACAAGAACAGAACAAATACAT

TACAGTACCTTGAAAACAGCAGTTAAAATTCTAAATTGCAAGAAGAGGAAAATGCACAC

AGCTGTGTTTAGAAAATTCTCAGTCCAGCACTGTTCATAATAGCAAAGACATTAACCCAG

GTTGGATAAATAAACGATGACACAGGCAATTGCACAATGATACAGACATACATTCAGTA

TATGAGACATTGATGATGTATCCCCAAAGAAATGACTTTAAAGAGAAAAGGCCTGATAT

GTGGTGGCACTCACCTCCCTGGGCATCCCCGGACAGGCTGCAGGCACACTGTGTGGCAG

GGCAGGCTGGTACCTGCTGGCAGCTCCTGGGGCCTGATGTGGAGCAGGCACAGAGCCGT

ATCCCCCCGAGGACATATACCCCCAAGGACGGCACAGTTGGTACATTCCGGAGACAAGC

AACTCAGCCACACTCCCAGGCCAGAGCCCGAGAGGGACGCCCATGCACAGGGAGGCAG

AGCCCAGCTCCTCCACAGCCAGCAGCACCCGTGCAGGGGCCGCCATCTGGCAGGCACAG

AGCATGGGCTGGGAGGAGGGGCAGGGACACCAGGCAGGGTTGGCACCAACTGAAAATT

ACAGAAGTCTCATACATCTACCTCAGCCTTGCCTGACCTGGGCCTCACCTGACCTGGACC

TCACCTGGCCTGGACCTCACCTGGCCTAGACCTCACCTCTGGGCTTCACCTGAGCTCGGC

CTCACCTGACTTGGACCTTGCCTGTCCTGAGCTCACATGATCTGGGCCTCACCTGACCTG

GGTTTCACCTGACCTGGGCTTCACCTGACCTGGGCCTCATCTGACCTGGGCCTCACTGGC

CTGGACCTCACCTGGCCTGGGCTTCACCTGGCCTCAGGCCTCATCTGCACCTGCTCCAGG

TCTTGCTGGAACCTCAGTAGCACTGAGGCTGCAGGGGCTCATCCAGGGTTGCAGAATGA

CTCTAGAACCTCCCACATCTCAGCTTTCTGGGTGGAGGCACCTGGTGGCCCAGGGAATAT

AAAAAGCCTGAATGATGCCTGCGTGATTTGGGGGCAATTTATAAACCCAAAAGGACATG

GCCATGCAGCGGGTAGGGACAATACAGACAGATATCAGCCTGAAATGGAGCCTCAGGGC

ACAGGTGGGCACGGACACTGTCCACCTAAGCCAGGGGCAGACCCGAGTGTCCCCGCAGT

AGACCTGAGAGCGCTGGGCCCACAGCCTCCCCTCGGTGCCCTGCTACCTCCTCAGGTCAG

CCCTGGACATCCCGGGTTTCCCCAGGCCTGGCGGTAGGATTTTGTTGAGGTCTGTGTCAC

TGTGGTATTACGATTGCAACTGCAGCAGATGGCGCGACCATAGCAGGTGCTGCTATTATA

CCCACAGTGTCACAGAGTCCATCAAAAACCCATCCCTGGGAACCTTCTGCCACAGCCCTC

CCTGTGGGGCACCGCCGCGTGCCATGTTAGGATTTTGACTGAGGACACAGCACCATGGGT

ATGGTGGCTACCGCAGCAGTGCAGCCCGTGACCCAAACACACAGGGCAGCAGGCACAAC

AGACAAGCCCACAAGTGACCACCCTGAGCTCCTGCCTGCCAGCCCTGGAGACCATGAAA

CAGATGGCCAGGATTATCCCATAGGTCAGCCAGACCTCAGTCCAACAGGTCTGCATCGCT

GCTGCCCTCCAATACCAGTCCGGATGGGGACAGGGCTGGCCCACATTACCATTTGCTGCC

ATCCGGCCAACAGTCCCAGAAGCCCCTCCCTCAAGGCTGGGCCACATGTGTGGACCCTG

AGAGCCCCCCATGTCTGAGTAGGGGCACCAGGAAGGTGGGGCTGGCCCTGTGCACTGTC

CCTGCCCCTGTGGTCCCTGGCCTGCCTGGCCCTGACACCTGGGCCTCTCCTGGGTCATTTC

CAAGACAGAAGACATTCCCAGGACAGCTGGAGCTGGGAGTCCATCATCCTGCCTGGCCG

TCCTGAGTCCTGCGCCTTTCCAAACCTCACCCGGGAAGCCAACAGAGGAATCACCTCCCA

CAGGCAGAGACAAAGACCTTCCAGAAATCTCTGTCTCTCTCCCCAGTGGGCACCCTCTTC

CAGGGCAGTCCTCAGTGATATCACAGTGGGAACCCACATCTGGATCGGGACTGCCCCCA

GAACACAAGATGGCCCACAGGGACAGCCCCACAGCCCAGCCCTTCCCAGACCCCTAAAA

GGCGTCCCACCCCCTGCATCTGCCCCAGGGCTCAAACTCCAGGAGGACTGACTCCTGCAC

ACCCTCCTGCCAGACATCACCTCAGCCCCTCCTGGAAGGGACAGGAGCGCGCAAGGGTG

AGTCAGACCCTCCTGCCCTCGATGGCAGGCGGAGAAGATTCAGAAAGGTCTGAGATCCC

CAGGACGCAGCACCACTGTCAATGGGGGCCCCAGACGCCTGGACCAGGGCCTGCGTGGG

AAAGGCCTCTGGGCACACTCAGGGGGATTTTGTGAAGGGTCCTCCCACTGTGGAGAGGC

TTAGCTGTGGCTTCCCTAAGAGCTGCCGCAGCAGGCAAAGCAAGCCTCACAGATGCTGC

CACAGTGATGAACCCAGCATCAAAAACCGACCGGACTCCCAAGGTTTATGCACACTTCTC

CGCTCAGAGCTCTCCAGGATCAGAAGAGCCGGGCCCAAGGGTTTCTGCCCAGACCCTCG

GCCTCTAGGGACATCTTGGCCATGACAGCCCATGGGCTGGTGCCCCACACATCGTCTGCC

TTCAAACAAGGGCTTCAGAGGGCTCTGAGGTGACCTCACTGATGACCACAGGTGCCCTG

GCCCCTTCCCCACCAGCTGCACCAGACCCCGTCATGACAGATGCCCCGATTCCAACAGCC

AATTCCTGGGGCCAGGAATCGCTGTAGACACCAGCCTCCTTCCAACACCTCCTGCCAATT

GCCTGGATTCCCATCCCGGTTGGAATCAAGAGGACAGCATCCCCCAGGCTCCCAACAGG

CAGGACTCCCACACCCTCCTCTGAGAGGCCGCTGTGTTCCGTAGGGCCAGGCTGCAGACA

GTCCCCCTCACCTGCCACTAGACAAATGCCTGCTGTAGATGTCCCCACCTGGAAAATACC

ACTCATGGAGCCCCCAGCCCCAGGTACAGCTGTAGAGAGAGTCTCTGAGGCCCCTAAGA

AGTAGCCATGCCCAGTTCTGCCGGGACCCTCGGCCAGGCTGACAGGAGTGGACGCTGGA

GCTGGGCCCATACTGGGCCACATAGGAGCTCACCAGTGAGGGCAGGAGAGCACATGCCG

GGGAGCACCCAGCCTCCTGCTGACCAGAGGCCCGTCCCAGAGCCCAGGAGGCTGCAGAG

GCCTCTCCAGGGGGACACTGTGCATGTCTGGTCCCTGAGCAGCCCCCCACGTCCCCAGTC

CTGGGGGCCCCTGGCACAGCTGTCTGGACCCTCTCTATTCCCTGGGAAGCTCCTCCTGAC

AGCCCCGCCTCCAGTTCCAGGTGTGGATTTTGTCAGGGGGTGTCACACTGTGTACAGCCA

GAAGTGGATGTGGACTTGCGATAGTGAGAGGAAGTGCAGTGAGGGTATGGTATGCCGGC

TGTGGTGTAAGAAGAAGCTCTGGCACAGTGGTGCTGCCCATATCAAAAACCAGGCCAAG

TAGACAGGCCCCTGCTGTGCAGCCCCAGGCCTCCAGCTCACCTGCTTCTCCTGGGGCTCT

CAAGGCTGCTGTTTTCTGCACTCTCCCCTCTGTGGGGAGGGTTCCCTCAGTGGGAGATCT

GTTCTCAACATCCCACGGCCTCATTCCTGCAAGGAAGGCCAATGGATGGGCAACCTCACA

TGCCGCGGCTAAGATAGGGTGGGCAGCCTGGCGGGGACAGGACATCCTGCTGGGGTATC

TGTCACTGTGCCTAGTGGGGCACTGGCTCCCAAACAACGCAGTCCTTGCCAAAATCCCCA

CGGCCTCCCCCGCTAGGGGCTGGCCTGATCTCCTGCAGTCCTAGGAGGCTGCTGACCTCC

AGAATGGCTCCGTCCCCAGTTCCAGGGCGAGAGCAGATCCCAGGCCGGCTGCAGACTGG

GAGGCCACCCCCTCCTTCCCAGGGTTCACTGCAGGTGACCAGGGCAGGAAATGGCCTGA

ACACAGGGATAACCGGGCCATCCCCCAACAGAGTCCACCCCCTCCTGCTCTGTACCCCGC

ACCCCCCAGGCCAGCCCATGACATCCGACAACCCCACACCAGAGTCACTGCCCGGTGCT

GCCCTAGGGAGGACCCCTCAGCCCCCACCCTGTCTAGAGGACTGGGGAGGACAGGACAC

GCCCTCTCCTTATGGTTCCCCCACCTGGCTCTGGCTGGGACCCTTGGGGTGTGGACAGAA

AGGACGCTTGCCTGATTGGCCCCCAGGAGCCCAGAACTTCTCTCCAGGGACCCCAGCCCG

AGCACCCCCTTACCCAGGACCCAGCCCTGCCCCTCCTCCCCTCTGCTCTCCTCTCATCACC

CCATGGGAATCCAGAATCCCCAGGAAGCCATCAGGAAGGGCTGAGGGAGGAAGTGGGG

CCACTGCACCACCAGGCAGGAGGCTCTGTCTTTGTGAACCCAGGGAGGTGCCAGCCTCCT

AGAGGGTATGGTCCACCCTGCCTATGGCTCCCACAGTGGCAGGCTGCAGGGAAGGACCA

GGGACGGTGTGGGGGAGGGCTCAGGGCCCCGCGGGTGCTCCATCTTGGATGAGCCTATC

TCTCTCACCCACGGACTCGCCCACCTCCTCTTCACCCTGGCCACACGTCGTCCACACCATC

CTAAGTCCCACCTACACCAGAGCCGGCACAGCCAGTGCAGACAGAGGCTGGGGTGCAGG

GGGGCCGACTGGGCAGCTTCGGGGAGGGAGGAATGGAGGAAGGGGAGTTCAGTGAAGA

GGCCCCCCTCCCCTGGGTCCAGGATCCTCCTCTGGGACCCCCGGATCCCATCCCCTCCAG

GCTCTGGGAGGAGAAGCAGGATGGGAGAATCTGTGCGGGACCCTCTCACAGTGGAATAC

CTCCACAGCGGCTCAGGCCAGATACAAAAGCCCCTCAGTGAGCCCTCCACTGCAGTGCT

GGGCCTGGGGGCAGCCGCTCCCACACAGGATGAACCCAGCACCCCGAGGATGTCCTGCC

AGGGGGAGCTCAGAGCCATGAAGGAGCAGGATATGGGACCCCCGATACAGGCACAGAC

CTCAGCTCCATTCAGGACTGCCACGTCCTGCCCTGGGAGGAACCCCTTTCTCTAGTCCCT

GCAGGCCAGGAGGCAGCTGACTCCTGACTTGGACGCCTATTCCAGACACCAGACAGAGG

GGCAGGCCCCCCAGAACCAGGGATGAGGACGCCCCGTCAAGGCCAGAAAAGACCAAGT

TGCGCTGAGCCCAGCAAGGGAAGGTCCCCAAACAAACCAGGAGGATTTTGTAGGTGTCT

GTGTCACTGTGAGCTGCAACTGCAGCAGCAAATGGAGCCGCGACCATAGCAGGTGCTGC

CACAGTGACACTCGCCAGGTCAAAAACCCCATCCCAAGTCAGCGGAATGCAGAGAGAGC

AGGGAGGACATGTTTAGGATCTGAGGCCGCACCTGACACCCAGGCCAGCAGACGTCTCC

TGTCCACGGCACCCTGCCATGTCCTGCATTTCTGGAAGAACAAGGGCAGGCTGAAGGGG

GTCCAGGACCAGGAGATGGGTCCGCTCTACCCAGAGAAGGAGCCAGGCAGGACACAAG

CCCCCACGCGTGGGCTCGTAGTTTGACGTGCGTGAAGTGTGGGTAAGAAAGTACGTA

TX-DH17613 (SEQ ID NO: 233) includes toxin coding sequences inserted in

positions corresponding to D_H1-7 to D_H6-13:

GCGGCCGCTGCCATTTCATTACCTCTTTCTCCGCACCCGACATAGATTACGTAACGCGTG

GGCTCGTAGTTTGACGTGCGTGAAGTGTGGGTAAGAAAGTCCCCATTGAGGCTGACCTGC

CCAGAGGGTCCTGGGCCCACCCAACACACCGGGGCGGAATGTGTGCAGGCCTCGGTCTC

TGTGGGTGTTCCGCTAGCTGGGGCTCACAGTGCTCACCCCACACCTAAAACGAGCCACAG

CCTCCGGAGCCCCTGAAGGAGACCCCGCCCACAAGCCCAGCCCCCACCCAGGAGGCCCC

AGAGCACAGGGCGCCCCGTCGGATTTTGTACAGCCCCGAGTCACTGTGGAGAGATGCAA

TGGCAGACGCGGCTGCAGCAGATGGCGCGATCATAGCAGGTGCTGCCACAGTGAGAAAA

GCTTCGTCAAAAACCGTCTCCTGGCCACAGTCGGAGGCCCCGCCAGAGAGGGGAGCAGC

CACCCCAAACCCATGTTCTGCCGGCTCCCATGACCCCGTGCACCTGGAGCCCCACGGTGT

CCCCACTGGATGGGAGGACAAGGGCCGGGGGCTCCGGCGGGTCGGGGCAGGGGCTTGAT

GGCTTCCTTCTGCCGTGGCCCCATTGCCCCTGGCTGGAGTTGACCCTTCTGACAAGTGTCC

TCAGAGAGTCAGGGATCAGTGGCACCTCCCAACATCAACCCCACGCAGCCCAGGCACAA

ACCCCACATCCAGGGCCAACTCCAGGAACAGAGACACCCCAATACCCTGGGGGACCCCG

ACCCTGATGACTCCCGTCCCATCTCTGTCCCTCACTTGGGGCCTGCTGCGGGGCGAGCAC

TTGGGAGCAAACTCAGGCTTAGGGGACACCACTGTGGGCCTGACCTCGAGCAGGCCACA

GACCCTTCCCTCCTGCCCTGGTGCAGCACAGACTTTGGGGTCTGGGCAGGGAGGAACTTC

TGGCAGGTCACCAAGCACAGAGCCCCCAGGCTGAGGTGGCCCCAGGGGGAACCCCAGCA

GGTGGCCCACTACCCTTCCTCCCAGCTGGACCCCATGTCTTCCCCAAGATAGGGGTGCCA

TCCAAGGCAGGTCCTCCATGGAGCCCCCTTCAGGCTCCTCTCCAGACCCCACTGGGCCTC

AGTCCCCACTCTAGGAATGCAGCCACCACGGGCACACCAGGCAGCCCAGGCCCAGCCAC

CCTGCAGTGCCCAAGCCCACACCCTGGAGGAGAGCAGGGTGCGTCTGGGAGGGGCTGGG

CTCCCCACCCCCACCCCCACCTGCACACCCCACCCACCCTTGCCCGGGCCCCCTGCAGGA

GGGTCAGAGCCCCCATGGGATATGGACTTAGGGTCTCACTCACGCACCTCCCCTCCTGGG

AGAAGGGGTCTCATGCCCAGATCCCCCCAGCAGCGCTGGTCACAGGTAGAGGCAGTGGC

CCCAGGGCCACCCTGACCTGGCCCCTCAGGCTCCTCTAGCCCTGGCTGCCCTGCTGTCCC

TGGGAGGCCTGGGCTCCACCAGACCACAGGTCTAGGGCACCGCCCACACTGGGGCCGCC

CACACACAGCTCACAGGAAGAAGATAAGCTCCAGACCCCCAGGCCCGGGACCTGCCTTG

CTGCTACGACTTCCTGCCCCAGACCTCGTTGCCCTCCCCCGTCCACTTACACACAGGCCA

GGAAGCTGTTCCCACACAGACCAACCCCAGACGGGGACCACCTGGCACTCAGGTCACTG

CCATTTCCTTCTCCATTCACTTCCAATGCCTCTGTGCTTCCTCCCTCCTCCTTCCTTCGGGG

GAGCACCCTGTGCAGCTCCTCCCTGCAGTCCACACCCTGGGGAGACCCGACCCTGCAGCC

CACACCCTGGGGAGACCTGACCCTCCTCCAGCCCTTTCTCCCCCGCTGCTCTTGCCACCCA

CCAAGACAGCCCTGGGGTCCTGTCCCTACAGCCCCCACCCAGTTCTCTACCTAGACCCGT

CTTCCTCCCTCTAAACACCTCTCCCAGGCCAACCCTACACCTGCAGGCCCTCCCCTCCACT

GCCAAAGACCCTCAGTTTCTCCTGCCTGTGCCCACCCCCGTGCTCCTCCTGCCCACAGCTC

GAGCTCTTCCTCTCCTAGGGCCCCTGAGGGATGGCATTGACCGTGCCCTCGCACCCACAC

ACTGCCCATGCCCTCACATTCCTCCTGGCCACTCCAGCCCCACTCCCCTCTCAGGCCTGGC

TCTGGTATTTCTGGGACAAAGCCTTACCCAAGTCTTTCCCATGCAGGCCTGGGCCCTTAC

CCTCACTGCCCGGTTACAGGGCAGCCTCCTGTGCACAGAAGCAGGGAGCTCAGCCCTTCC

ACAGGCAGAAGGCACTGAAAGAAATCGGCCTCCAGCGCCTTGACACACGTCTGCCTGTG

TCTCTCACTGCCCGCACCTGCAGGGAGGCTCGGCACTCCCTCTAAAGACGAGGGATCCAG

GCAGCAGCATCACAGGAGAATGCAGGGCTACCAGACATCCCAGTCCTCTCACAGGCCTC

TCCTGGGAAGAGACCTGAAGACGCCCAGTCAACGGAGTCTAACACCAAACCTCCCTGGA

GGCCGATGGGTAGTAACGGAGTCATTGCCAGACCTGGAGGCAGGGGAGCAGTGAGCCCG

AGCCCACACCATAGGGCCAGAGGACAGCCACTGACATCCCAAGCCACTCACTGGTGGTC

CCACAACACCCCATGGAAAGAGGACAGACCCACAGTCCCACCTGGACCAGGGCAGAGA

CTGCTGAGACCCAGCACCAGAACCAACCAAGAAACACCAGGCAACAGCATCAGAGGGG

GCTCTGGCAGAACAGAGGAGGGGAGGTCTCCTTCACCAGCAGGCGCTTCCCTTGACCGA

AGACAGGATCCATGCAACTCCCCCAGGACAAAGGAGGAGCCCCTTGTTCAGCACTGGGC

TCAGAGTCCTCTCCAAGACACCCAGAGTTTCAGACAAAAACCCCCTGGAATGCACAGTCT

CAGCAGGAGAGCCAGCCAGAGCCAGCAAGATGGGGCTCAGTGACACCCGCAGGGACAG

GAGGATTTTGTGGGGGCTCGTGTCACTGTGAGGATATTGTACTAATCGGAGCAGGCAGG

GTGTATGCTATACCCACAGTGACACAGCCCCATTCAAAAACCCCTACTGCAAACGCATTC

CACTTCTGGGGCTGAGGGGCTGGGGGAGCGTCTGGGAAATAGGGCTCAGGGGTGTCCAT

CAATGCCCAAAACGCACCAGACTCCCCTCCATACATCACACCCACCAGCCAGCGAGCAG

AGTAAACAGAAAATGAGAAGCAAGCTGGGGAAGCTTGCACAGGCCCCAAGGAAAGAGC

TTTGGCGGGTGTGTAAGAGGGGATGCGGGCAGAGCCTGAGCAGGGCCTTTTGCTGTTTCT

GCTTTCCTGTGCAGAGAGTTCCATAAACTGGTGTTCGAGATCAATGGCTGGGAGTGAGCC

CAGGAGGACAGCGTGGGAAGAGCACAGGGAAGGAGGAGCAGCCGCTATCCTACACTGT

CATCTTTCGAAAGTTTGCCTTGTGCCCACACTGCTGCATCATGGGATGCTTAACAGCTGA

TGTAGACACAGCTAAAGAGAGAATCAGTGAGATGGATTTGCAGCACAGATCTGAATAAA

TTCTCCAGAATGTGGAGCAGCACAGAAGCAAGCACACAGAAAGTGCCTGATGCAAGGAC

AAAGTTCAGTGGGCACCTTCAGGCATTGCTGCTGGGCACAGACACTCTGAAAAGCCCTG

GCAGGAACTCCCTGTGACAAAGCAGAACCCTCAGGCAATGCCAGCCCCAGAGCCCTCCC

TGAGAGCCTCATGGGCAAAGATGTGCACAACAGGTGTTTCTCATAGCCCCAAACTGAGA

GCAAAGCAAACGTCCATCTGAAGGAGAACAGGCAAATAAACGATGGCAGGTTCATGAA

ATGCAAACCCAGACAGCCACAAGCACAAAAGTACAGGGTTATAAGCGACTCTGGTTGAG

TTCATGACAATGCTGAGTAATTGGAGTAACAAAGTAAACTCCAAAAAATACTTTCAATGT

GATTTCTTCTAAATAAAATTTACACCCTGCAAAATGAACTGTCTTCTTAAGGGATACATTT

CCCAGTTAGAAAACCATAAAGAAAACCAAGAAAAGGATGATCACATAAACACAGTGGT

GGTTACTTCTGCTGGGGAAGGAAGAGGGTATGAACTGAGATACACAGGGTGGGCAAGTC

TCCTAACAAGAACAGAACGAATACATTACAGTACCTTGAAAACAGCAGTTAAACTTCTA

AATTGCAAGAAGAGGAAAATGCACACAGTTGTGTTTAGAAAATTCTCAGTCCAGCACTG

TTCATAATAGCAAAGACATTAACCCAGGTCGGATAAATAAGCGATGACACAGGCAATTG

CACAATGATACAGACATATATTTAGTATATGAGACATCGATGATGTATCCCCAAATAAAC

GACTTTAAAGAGATAAAGGGCTGATGTGTGGTGGCATTCACCTCCCTGGGATCCCCGGAC

AGGTTGCAGGCTCACTGTGCAGCAGGGCAGGCGGGTACCTGCTGGCAGTTCCTGGGGCC

TGATGTGGAGCAAGCGCAGGGCCATATATCCCGGAGGACGGCACAGTCAGTGAATTCCA

GAGAGAAGCAACTCAGCCACACTCCCCAGGCAGAGCCCGAGAGGGACGCCCACGCACA

GGGAGGCAGAGCCCAGCACCTCCGCAGCCAGCACCACCTGCGCACGGGCCACCACCTTG

CAGGCACAGAGTGGGTGCTGAGAGGAGGGGCAGGGACACCAGGCAGGGTGAGCACCCA

GAGAAAACTGCAGACGCCTCACACATCCACCTCAGCCTCCCCTGACCTGGACCTCACTGG

CCTGGGCCTCACTTAACCTGGGCTTCACCTGACCTTGGCCTCACCTGACTTGGACCTCGCC

TGTCCCAAGCTTTACCTGACCTGGGCCTCAACTCACCTGAACGTCTCCTGACCTGGGTTTA

ACCTGTCCTGGAACTCACCTGGCCTTGGCTTCCCCTGACCTGGACCTCATCTGGCCTGGG

CTTCACCTGGCCTGGGCCTCACCTGACCTGGACCTCATCTGGCCTGGACCTCACCTGGCC

TGGACTTCACCTGGCCTGGGCTTCACCTGACCTGGACCTCACCTGGCCTCGGGCCTCACC

TGCACCTGCTCCAGGTCTTGCTGGAGCCTGAGTAGCACTGAGGGTGCAGAAGCTCATCCA

GGGTTGGGGAATGACTCTAGAAGTCTCCCACATCTGACCTTTCTGGGTGGAGGCAGCTGG

TGGCCCTGGGAATATAAAAATCTCCAGAATGATGACTCTGTGATTTGTGGGCAACTTATG

AACCCGAAAGGACATGGCCATGGGGTGGGTAGGGACATAGGGACAGATGCCAGCCTGA

GGTGGAGCCTCAGGACACAGGTGGGCACGGACACTATCCACATAAGCGAGGGATAGACC

CGAGTGTCCCCACAGCAGACCTGAGAGCGCTGGGCCCACAGCCTCCCCTCAGAGCCCTG

CTGCCTCCTCCGGTCAGCCCTGGACATCCCAGGTTTCCCCAGGCCTGGCGGTAGGATTTT

GTTGAGGTCTGTGTCACTGTGGTATTACGATTGCAACTGCAGCAGATGGGCTCGCGACCA

TAGCAGGTGCTGCTATTATAACCACAGTGTCACAGAGTCCATCAAAAACCCATGCCTGGA

AGCTTCCCGCCACAGCCCTCCCCATGGGGCCCTGCTGCCTCCTCAGGTCAGCCCCGGACA

TCCCGGGTTTCCCCAGGCTGGGCGGTAGGATTTTGTTGAGGTCTGTGTCACTGTGGTATT

ACTATGAGAGATGCAATGGCAGACGCGGCTGCAGCAGATGGCGCGATCATAGCAGGTGC

TGCTATTATAACCACAGTGTCACAGAGTCCATCAAAAACCCATCCCTGGGAGCCTCCCGC

CACAGCCCTCCCTGCAGGGGACCGGTACGTGCCATGTTAGGATTTTGATCGAGGAGACA

GCACCATGGGTATGGTGGCTACCACAGCAGTGCAGCCTGTGACCCAAACCCGCAGGGCA

GCAGGCACGATGGACAGGCCCGTGACTGACCACGCTGGGCTCCAGCCTGCCAGCCCTGG

AGATCATGAAACAGATGGCCAAGGTCACCCTACAGGTCATCCAGATCTGGCTCCGAGGG

GTCTGCATCGCTGCTGCCCTCCCAACGCCAGTCCAAATGGGACAGGGACGGCCTCACAG

CACCATCTGCTGCCATCAGGCCAGCGATCCCAGAAGCCCCTCCCTCAAGGCTGGGCACAT

GTGTGGACACTGAGAGCCCTCATATCTGAGTAGGGGCACCAGGAGGGAGGGGCTGGCCC

TGTGCACTGTCCCTGCCCCTGTGGTCCCTGGCCTGCCTGGCCCTGACACCTGAGCCTCTCC

TGGGTCATTTCCAAGACAGAAGACATTCCTGGGGACAGCCGGAGCTGGGCGTCGCTCAT

CCTGCCCGGCCGTCCTGAGTCCTGCTCATTTCCAGACCTCACCGGGGAAGCCAACAGAGG

ACTCGCCTCCCACATTCAGAGACAAAGAACCTTCCAGAAATCCCTGCCTCTCTCCCCAGT

GGACACCCTCTTCCAGGACAGTCCTCAGTGGCATCACAGCGGCCTGAGATCCCCAGGAC

GCAGCACCGCTGTCAATAGGGGCCCCAAATGCCTGGACCAGGGCCTGCGTGGGAAAGGC

CTCTGGCCACACTCGGGGATTTTGTGAAGGGCCCTCCCACTGTGGAGAGGCTTTGTGGCT

TCCCTAAGAGCTGCAGCAGGCAAAAGCCTCACAGATGCTGCCACAGTGATGAACCCAGT

GTCAAAAACCGGCTGGAAACCCAGGGGCTGTGTGCACGCCTCAGCTTGGAGCTCTCCAG

GAGCACAAGAGCCGGGCCCAAGGATTTGTGCCCAGACCCTCAGCCTCTAGGGACACCTG

GGTCATCTCAGCCTGGGCTGGTGCCCTGCACACCATCTTCCTCCAAATAGGGGCTTCAGA

GGGCTCTGAGGTGACCTCACTCATGACCACAGGTGACCTGGCCCTTCCCTGCCAGCTATA

CCAGACCCTGTCTTGACAGATGCCCCGATTCCAACAGCCAATTCCTGGGACCCTGAATAG

CTGTAGACACCAGCCTCATTCCAGTACCTCCTGCCAATTGCCTGGATTCCCATCCTGGCTG

GAATCAAGAAGGCAGCATCCGCCAGGCTCCCAACAGGCAGGACTCCCGCACACCCTCCT

CTGAGAGGCCGCTGTGTTCCGCAGGGCCAGGCCCTGGACAGTTCCCCTCACCTGCCACTA

GAGAAACACCTGCCATTGTCGTCCCCACCTGGAAAAGACCACTCGTGGAGCCCCCAGCC

CCAGGTACAGCTGTAGAGACAGTCCTCGAGGCCCCTAAGAAGGAGCCATGCCCAGTTCT

GCCGGGACCCTCGGCCAGGCCGACAGGAGTGGACGCTGGAGCTGGGCCCACACTGGGCC

ACATAGGAGCTCACCAGTGAGGGCAGGAGAGCACATGCCGGGGAGCACCCAGCCTCCTG

CTGACCAGAGGCCCGTCCCAGAGCCCAGGAGGCTGCAGAGGCCTCTCCAGGGAGACACT

GTGCATGTCTGGTACCTAAGCAGCCCCCCACGTCCCCAGTCCTGGGGGCCCCTGGCTCAG

CTGTCTGGGCCCTCCCTGCTCCCTGGGAAGCTCCTCCTGACAGCCCCGCCTCCAGTTCCA

GGTGTGGATTTTGTCAGGCGATGTCACACTGTGTACTGCCAGAAGTGGATGTGGACTAGC

GATAGTGAGAGGAAGTGCTGTGAGGGTATGGTAAGCCGGCTGTGGTGTAAGAAGAAGCT

CTGGCACAGTGGTGCCGCCCATATCAAAAACCAGGCCAAGTAGACAGGCCCCTGCTGCG

CAGCCCCAGGCATCCACTTCACCTGCTTCTCCTGGGGCTCTCAAGGCTGCTGTCTGTCCTC

TGGCCCTCTGTGGGGAGGGTTCCCTCAGTGGGAGGTCTGTGCTCCAGGGCAGGGATGATT

GAGATAGAAATCAAAGGCTGGCAGGGAAAGGCAGCTTCCCGCCCTGAGAGGTGCAGGC

AGCACCACGGAGCCACGGAGTCACAGAGCCACGGAGCCCCCATTGTGGGCATTTGAGAG

TGCTGTGCCCCCGGCAGGCCCAGCCCTGATGGGGAAGCCTGTCCCATCCCACAGCCCGG

GTCCCACGGGCAGCGGGCACAGAAGCTGCCAGGTTGTCCTCTATGATCCTCATCCCTCCA

GCAGCATCCCCTCCACAGTGGGGAAACTGAGGCTTGGAGCACCACCCGGCCCCCTGGAA

ATGAGGCTGTGAGCCCAGACAGTGGGCCCAGAGCACTGTGAGTACCCCGGCAGTACCTG

GCTGCAGGGATCAGCCAGAGATGCCAAACCCTGAGTGACCAGCCTACAGGAGGATCCGG

CCCCACCCAGGCCACTCGATTAATGCTCAACCCCCTGCCCTGGAGACCTCTTCCAGTACC

ACCAGCAGCTCAGCTTCTCAGGGCCTCATCCCTGCAAGGAAGGTCAAGGGCTGGGCCTG

CCAGAAACACAGCACCCTCCCTAGCCCTGGCTAAGACAGGGTGGGCAGACGGCTGTGGA

CGGGACATATTGCTGGGGCATTTCTCACTGTCACTTCTGGGTGGTAGCTCTGACAAAAAC

GCAGACCCTGCCAAAATCCCCACTGCCTCCCGCTAGGGGCTGGCCTGGAATCCTGCTGTC

CTAGGAGGCTGCTGACCTCCAGGATGGCTCCGTCCCCAGTTCCAGGGCGAGAGCAGATC

CCAGGCAGGCTGTAGGCTGGGAGGCCACCCCTGCCCTTGCCGGGGTTGAATGCAGGTGC

CCAAGGCAGGAAATGGCATGAGCACAGGGATGACCGGGACATGCCCCACCAGAGTGCG

CCCCTTCCTGCTCTGCACCCTGCACCCCCCAGGCCAGCCCACGACGTCCAACAACTGGGC

CTGGGTGGCAGCCCCACCCAGACAGGACAGACCCAGCACCCTGAGGAGGTCCTGCCAGG

GGGAGCTAAGAGCCATGAAGGAGCAAGATATGGGGCCCCCGATACAGGCACAGATGTC

AGCTCCATCCAGGACCACCCAGCCCACACCCTGAGAGGAACGTCTGTCTCCAGCCTCTGC

AGGTCGGGAGGCAGCTGACCCCTGACTTGGACCCCTATTCCAGACACCAGACAGAGGCG

CAGGCCCCCCAGAACCAGGGTTGAGGGACGCCCCGTCAAAGCCAGACAAAACCAAGGG

GTGTTGAGCCCAGCAAGGGAAGGCCCCCAAACAGACCAGGAGGATTTTGTAGGTGTCTG

TGTCACTGTGTGCAACTGCAGCAGATGGCGCGACCATAGCAGGTGCTGCCACAGTGACA

CTCACCCAGTCAAAAACCCCATTCCAAGTCAGCGGAAGCAGAGAGAGCAGGGAGGACA

CGTTTAGGATCTGAGACTGCACCTGACACCCAGGCCAGCAGACGTCTCCCCTCCAGGGCA

CCCCACCCTGTCCTGCATTTCTGCAAGATCAGGGGCGGCCTGAGGGGGGGTCTAGGGTG

AGGAGATGGGTCCCCTGTACACCAAGGAGGAGTTAGGCAGGTCCCGAGCACTCTTAATT

AAACGACGCCTCGAATGGAACTACTACAACGAATGGTTGCTCTACGTAATGCATTCGCTA

CCTTAGGACCGTTATAGTTAGGCGCGCC

TX-DH114619 (SEQ ID NO: 234) includes toxin coding sequence inserted in

positions corresponding to D_H1-14 to D_H6-19:

TACGTATTAATTAAACGACGCCTCGAATGGAACTACTACAACGAATGGTTGCTCTCCCCA

TTGAGGCTGACCTGCCCAGAGAGTCCTGGGCCCACCCCACACACCGGGGCGGAATGTGT

GCAGGCCTCGGTCTCTGTGGGTGTTCCGCTAGCTGGGGCTCACAGTGCTCACCCCACACC

TAAAATGAGCCACAGCCTCCGGAGCCCCCGCAGGAGACCCCGCCCACAAGCCCAGCCCC

CACCCAGGAGGCCCCAGAGCTCAGGGCGCCCCGTCGGATTTTGTACAGCCCCGAGTCAC

TGTGGAGAGATGCTGCAATGGCAGACGCGGCTGCAGCAGCAGATGGTGCCGCGATCATA

GCAGGTGCTGCCACAGTGAGAATAGCTACGTCAAAAACCGTCCAGTGGCCACTGCCGGA

GGCCCCGCCAGAGAGGGCAGCAGCCACTCTGATCCCATGTCCTGCCGGCTCCCATGACCC

CCAGCACGCGGAGCCCCACAGTGTCCCCACTGGATGGGAGGACAAGAGCTGGGGATTCC

GGCGGGTCGGGGCAGGGGCTTGATCGCATCCTTCTGCCGTGGCTCCAGTGCCCCTGGCTG

GAGTTGACCCTTCTGACAAGTGTCCTCAGAGAGACAGGCATCACCGGCGCCTCCCAACAT

CAACCCCAGGCAGCACAGGCACAAACCCCACATCCAGAGCCAACTCCAGGAGCAGAGA

CACCCCAATACCCTGGGGGACCCCGACCCTGATGACTTCCCACTGGAATTCGCCGTAGAG

TCCACCAGGACCAAAGACCCTGCCTCTGCCTCTGTCCCTCACTCAGGACCTGCTGCCGGG

CGAGGCCTTGGGAGCAGACTTGGGCTTAGGGGACACCAGTGTGACCCCGACCTTGACCA

GGACGCAGACCTTTCCTTCCTTTCCTGGGGCAGCACAGACTTTGGGGTCTGGGCCAGGAG

GAACTTCTGGCAGGTCGCCAAGCACAGAGGCCACAGGCTGAGGTGGCCCTGGAAAGACC

TCCAGGAGGTGGCCACTCCCCTTCCTCCCAGCTGGACCCCATGTCCTCCCCAAGATAAGG

GTGCCATCCAAGGCAGGTGCTCCTTGGAGCCCCATTCAGACTCCTCCCTGGACCCCACTG

GGCCTCAGTCCCAGCTCTGGGGATGAAGCCACCACAAGCACACCAGGCAGCCCAGGCCC

AGCCACCCTGCAGTGCCCAAGCACACACTCTGGAGCAGAGCAGGGTGCCTCTGGGAGGG

GCTGAGCTCCCCACCCCACCCCCACCTGCACACCCCACCCACCCCTGCCCAGCGGCTCTG

CAGGAGGGTCAGAGCCCCACATGGGGTATGGACTTAGGGTCTCACTCACGTGGCTCCCA

TCATGAGTGAAGGGGCCTCAAGCCCAGGTTCCCACAGCAGCGCCTGTCGCAAGTGGAGG

CAGAGGCCCGAGGGCCACCCTGACCTGGTCCCTGAGGTTCCTGCAGCCCAGGCTGCCCTG

CTGTCCCTGGGAGGCCTGGGCTCCACCAGACCACAGGTCCAGGGCACCGGGTGCAGGAG

CCACCCACACACAGCTCACAGGAAGAAGATAAGCTCCAGACCCCCAGGGCCAGAACCTG

CCTTCCTGCTACTGCTTCCTGCCCCAGACCTGGGCGCCCTCCCCCGTCCACTTACACACAG

GCCAGGAAGCTGTTCCCACACAGAACAACCCCAAACCAGGACCGCCTGGCACTCAGGTG

GCTGCCATTTCCTTCTCCATTTGCTCCCAGCGCCTCTGTCCTCCCTGGTTCCTCCTTCGGGG

GAACAGCCTGTGCAGCCAGTCCCTGCAGCCCACACCCTGGGGAGACCCAACCCTGCCTG

GGGCCCTTCCAACCCTGCTGCTCTTACTGCCCACCCAGAAAACTCTGGGGTCCTGTCCCT

GCAGTCCCTACCCTGGTCTCCACCCAGACCCCTGTGTATCACTCCAGACACCCCTCCCAG

GCAAACCCTGCACCTGCAGGCCCTGTCCTCTTCTGTCGCTAGAGCCTCAGTTTCTCCCCCC

TGTGCCCACACCCTACCTCCTCCTGCCCACAACTCTAACTCTTCTTCTCCTGGAGCCCCTG

AGCCATGGCATTGACCCTGCCCTCCCACCACCCACAGCCCATGCCCTCACCTTCCTCCTG

GCCACTCCGACCCCGCCCCCTCTCAGGCCAAGCCCTGGTATTTCCAGGACAAAGGCTCAC

CCAAGTCTTTCCCAGGCAGGCCTGGGCTCTTGCCCTCACTTCCCGGTTACACGGGAGCCT

CCTGTGCACAGAAGCAGGGAGCTCAGCCCTTCCACAGGCAGAAGGCACTGAAAGAAATC

GGCCTCCAGCACCTTGACACACGTCCGCCCGTGTCTCTCACTGCCCGCACCTGCAGGGAG

GCTCCGCACTCCCTCTAAAGACAAGGGATCCAGGCAGCAGCATCACGGGAGAATGCAGG

GCTCCCAGACATCCCAGTCCTCTCACAGGCCTCTCCTGGGAAGAGACCTGCAGCCACCAC

CAAACAGCCACAGAGGCTGCTGGATAGTAACTGAGTCAATGACCGACCTGGAGGGCAGG

GGAGCAGTGAGCCGGAGCCCATACCATAGGGACAGAGACCAGCCGCTGACATCCCGAGC

TCCTCAATGGTGGCCCCATAACACACCTAGGAAACATAACACACCCACAGCCCCACCTG

GAACAGGGCAGAGACTGCTGAGCCCCCAGCACCAGCCCCAAGAAACACCAGGCAACAG

TATCAGAGGGGGCTCCCGAGAAAGAGAGGAGGGGAGATCTCCTTCACCATCAAATGCTT

CCCTTGACCAAAAACAGGGTCCACGCAACTCCCCCAGGACAAAGGAGGAGCCCCCTATA

CAGCACTGGGCTCAGAGTCCTCTCTGAGACACCCTGAGTTTCAGACAACAACCCGCTGGA

ATGCACAGTCTCAGCAGGAGAACAGACCAAAGCCAGCAAAAGGGACCTCGGTGACACC

AGTAGGGACAGGAGGATTTTGTGGGGGCTCGTGTCACTGTGAGGATATTGTAGTGGTAG

CAGCAGATGGGGTAGCTGCTACTCCCACAGTGACACAGACCCATTCAAAAACCCCTACT

GCAAACACACCCACTCCTGGGGCTGAGGGGCTGGGGGAGCGTCTGGGAAGTAGGGTCCA

GGGGTGTCTATCAATGTCCAAAATGCACCAGACTCCCCGCCAAACACCACCCCACCAGC

CAGCGAGCAGGGTAAACAGAAAATGAGAGGCTCTGGGAAGCTTGCACAGGCCCCAAGG

AAAGAGCTTTGGCGGGTGTGCAAGAGGGGATGCAGGCAGAGCCTGAGCAGGGCCTTTTG

CTGTTTCTGCTTTCCTGTGCAGAGAGTTCCATAAACTGGTGTTCAAGATCAGTGGCTGGG

AATGAGCCCAGGAGGGCAGTCTGTGGGAAGAGCACAGGGAAGGAGGAGCAGCCGCTAT

CCTACACTGTCATCTTTCAAAAGTTTGCCTTGTGACCACACTATTGCATCATGGGATGCTT

AAGAGCTGATGTAGACACAGCTAAAGAGAGAATCAGTGAGATGAATTTGCAGCATAGAT

CTGAATAAACTCTCCAGAATGTGGAGCAGTACAGAAGCAAACACACAGAAAGTGCCTGA

TGCAAGGACAAAGTTCAGTGGGCACCTTCAGGCATTGCTGCTGGGCACAGACACTCTGA

AAAGCCTTGGCAGGATCTCCCTGCGACAAAGCAGAACCCTCAGGCAATGCCAGCCCCAG

AGCCCTCCCTGAGAGCGTCATGGGGAAAGATGTGCAGAACAGCTGATTATCATAGACTC

AAACTGAGAACAGAGCAAACGTCCATCTGAAGAACAGTCAAATAAGCAATGGTAGGTTC

ATGCAATGCAAACCCAGACAGCCAGGGGACAACAGTAGAGGGCTACAGGCGGCTTTGCG

GTTGAGTTCATGACAATGCTGAGTAATTGGAGTAACAGAGGAAAGCCCAAAAAATACTT

TTAATGTGATTTCTTCTAAATAAAATTTACACCAGGCAAAATGAACTGTCTTCTTAAGGG

ATAAACTTTCCCCTGGAAAAACTACAAGGAAAATTAAGAAAACGATGATCACATAAACA

CAGTTGTGGTTACTTCTACTGGGGAAGGAAGAGGGTATGAGCTGAGACACACAGAGTCG

GCAAGTCTCCAAGCAAGCACAGAACGAATACATTACAGTACCTTGAATACAGCAGTTAA

ACTTCTAAATCGCAAGAACAGGAAAATGCACACAGCTGTGTTTAGAAAATTCTCAGTCC

AGCACTATTCATAATAGCAAAGACATTAACCCAGGTTGGATAAATAAATGATGACACAG

GCAATTGCACAATGATACAGACATACATTTAGTACATGAGACATCGATGATGTATCCCCA

AAGAAATGACTTTAAAGAGAAAAGGCCTGATGTGTGGTGGCACTCACCTCCCTGGGATC

CCCGGACAGGTTGCAGGCACACTGTGTGGCAGGGCAGGCTGGTACATGCTGGCAGCTCC

TGGGGCCTGATGTGGAGCAAGCGCAGGGCTGTATACCCCCAAGGATGGCACAGTCAGTG

AATTCCAGAGAGAAGCAGCTCAGCCACACTGCCCAGGCAGAGCCCGAGAGGGACGCCC

ACGTACAGGGAGGCAGAGCCCAGCTCCTCCACAGCCACCACCACCTGTGCACGGGCCAC

CACCTTGCAGGCACAGAGTGGGTGCTGAGAGGAGGGGCAGGGACACCAGGCAGGGTGA

GCACCCAGAGAAAACTGCAGAAGCCTCACACATCCACCTCAGCCTCCCCTGACCTGGAC

CTCACCTGGTCTGGACCTCACCTGGCCTGGGCCTCACCTGACCTGGACCTCACCTGGCCT

GGGCTTCACCTGACCTGGACCTCACCTGGCCTCCGGCCTCACCTGCACCTGCTCCAGGTC

TTGCTGGAACCTGAGTAGCACTGAGGCTGCAGAAGCTCATCCAGGGTTGGGGAATGACT

CTGGAACTCTCCCACATCTGACCTTTCTGGGTGGAGGCATCTGGTGGCCCTGGGAATATA

AAAAGCCCCAGAATGGTGCCTGCGTGATTTGGGGGCAATTTATGAACCCGAAAGGACAT

GGCCATGGGGTGGGTAGGGACATAGGGACAGATGCCAGCCTGAGGTGGAGCCTCAGGA

CACAGTTGGACGCGGACACTATCCACATAAGCGAGGGACAGACCCGAGTGTTCCTGCAG

TAGACCTGAGAGCGCTGGGCCCACAGCCTCCCCTCGGTGCCCTGCTGCCTCCTCAGGTCA

GCCCTGGACATCCCGGGTTTCCCCAGGCCAGATGGTAGGATTTTGTTGAGGTCTGTGTCA

CTGTGGTATTATGATTACGAGAGAGCTTGCAATGGCAGACGCGGCTGCAGCAGATGGGC

TCGCGATCATAGCAGGTGCTGCTATCGTTATACCCACAGTGTCACACGGTCCATCAAAAA

CCCATGCCACAGCCCTCCCCGCAGGGGACCGCCGCGTGCCATGTTACGATTTTGATCGAG

GACACAGCGCCATGGGTATGGTGGCTACCACAGCAGTGCAGCCCATGACCCAAACACAC

AGGGCAGCAGGCACAATGGACAGGCCTGTGAGTGACCATGCTGGGCTCCAGCCCGCCAG

CCCCGGAGACCATGAAACAGATGGCCAAGGTCACCCCACAGTTCAGCCAGACATGGCTC

CGTGGGGTCTGCATCGCTGCTGCCCTCTAACACCAGCCCAGATGGGGACAAGGCCAACC

CCACATTACCATCTCCTGCTGTCCACCCAGTGGTCCCAGAAGCCCCTCCCTCATGGCTGA

GCCACATGTGTGAACCCTGAGAGCACCCCATGTCAGAGTAGGGGCAGCAGAAGGGCGGG

GCTGGCCCTGTGCACTGTCCCTGCACCCATGGTCCCTCGCCTGCCTGGCCCTGACACCTG

AGCCTCTTCTGAGTCATTTCTAAGATAGAAGACATTCCCGGGGACAGCCGGAGCTGGGC

GTCGCTCATCCCGCCCGGCCGTCCTGAGTCCTGCTTGTTTCCAGACCTCACCAGGGAAGC

CAACAGAGGACTCACCTCACACAGTCAGAGACAAAGAACCTTCCAGAAATCCCTGTCTC

ACTCCCCAGTGGGCACCTTCTTCCAGGACATTCCTCGGTCGCATCACAGCAGGCACCCAC

ATCTGGATCAGGACGGCCCCCAGAACACAAGATGGCCCATGGGGACAGCCCCACAACCC

AGGCCTTCCCAGACCCCTAAAAGGCGTCCCACCCCCTGCACCTGCCCCAGGGCTAAAAAT

CCAGGAGGCTTGACTCCCGCATACCCTCCAGCCAGACATCACCTCAGCCCCCTCCTGGAG

GGGACAGGAGCCCGGGAGGGTGAGTCAGACCCACCTGCCCTCGATGGCAGGCGGGGAA

GATTCAGAAAGGCCTGAGATCCCCAGGACGCAGCACCACTGTCAATGGGGGCCCCAGAC

GCCTGGACCAGGGCCTGCGTGGGAAAGGCCGCTGGGCACACTCAGGGGGATTTTGTGAA

GGCCCCTCCCACTGTGGAGAGGCTTGCTTGTGGCTTCCCTAAGAGCTGCAGCAGGCAAGC

TAAGCCTCACAGATGCTGCCACAGTGATGAAACTAGCATCAAAAACCGGCCGGACACCC

AGGGACCATGCACACTTCTCAGCTTGGAGCTCTCCAGGACCAGAAGAGTCAGGTCTGAG

GGTTTGTAGCCAGACCCTCGGCCTCTAGGGACACCCTGGCCATCACAGCGGATGGGCTG

GTGCCCCACATGCCATCTGCTCCAAACAGGGGCTTCAGAGGGCTCTGAGGTGACTTCACT

CATGACCACAGGTGCCCTGGCCCCTTCCCCGCCAGCTACACCGAACCCTGTCCCAACAGC

TGCCCCAGTTCCAACAGCCAATTCCTGGGGCCCAGAATTGCTGTAGACACCAGCCTCGTT

CCAGCACCTCCTGCCAATTGCCTGGATTCACATCCTGGCTGGAATCAAGAGGGCAGCATC

CGCCAGGCTCCCAACAGGCAGGACTCCCGCACACCCTCCTCTGAGAGGCCGCTGTGTTCC

GCAGGGCCAGGCCCTGGACAGTTCCCCTCACCTGCCACTAGAGAAACACCTGCCATTGTC

GTCCCCACCTGGAAAAGACCACTCGTGGAGCCCCCAGCCCCAGGTACAGCTGTAGAGAG

ACTCCCCGAGGGATCTAAGAAGGAGCCATGCGCAGTTCTGCCGGGACCCTCGGCCAGGC

CGACAGGAGTGGACACTGGAGCTGGGCCCACACTGGGCCACATAGGAGCTCACCAGTGA

GGGCAGGAGAGCACATGCCGGGGAGCACCCAGCCTCCTGCTGACCAGAGGCCCGTCCCA

GAGCCCAGGAGGCTGCAGAGGCCTCTCCAGGGGGACACTGTGCATGTCTGGTCCCTGAG

CAGCCCCCCACGTCCCCAGTCCTGGGGGCCCCTGGCACAGCTGTCTGGACCCTCCCTGTT

CCCTGGGAAGCTCCTCCTGACAGCCCCGCCTCCAGTTCCAGGTGTGGATTTTGTCAGGGG

GTGTCACACTGTGTACTGCCAGAAGTGGATGTGGACTTGCGATAGTGAGAGGAAGAGCT

GTGAGGGTATGGTATGCCGGCTGTGGAGTAAGAAGAAGCTCTGGCACAGTGGTGCTGCC

CATATCAAAAACCAGGCCAAGTAGACAGGCCCCTGCTGTGCAGCCCCAGGCCTCCACTT

CACCTGCTTCTCCTGGGGCTCTCAAGGTCACTGTTGTCTGTACTCTGCCCTCTGTGGGGAG

GGTTCCCTCAGTGGGAGGTCTGTTCTCAACATCCCAGGGCCTCATGTCTGCACGGAAGGC

CAATGGATGGGCAACCTCACATGCCGCGGCTAAGATAGGGTGGGCAGCCTGGCGGGGGA

CAGTACATACTGCTGGGGTGTCTGTCACTGTGCCTAGTGGGGCACTGGCTCCCAAACAAC

GCAGTCCTCGCCAAAATCCCCACAGCCTCCCCTGCTAGGGGCTGGCCTGATCTCCTGCAG

TCCTAGGAGGCTGCTGACCTCCAGAATGTCTCCGTCCCCAGTTCCAGGGCGAGAGCAGAT

CCCAGGCCGGCTGCAGACTGGGAGGCCACCCCCTCCTTCCCAGGGTTCACTGGAGGTGA

CCAAGGTAGGAAATGGCCTTAACACAGGGATGACTGCGCCATCCCCCAACAGAGTCAGC

CCCCTCCTGCTCTGTACCCCGCACCCCCCAGGCCAGTCCACGAAAACCAGGGCCCCACAT

CAGAGTCACTGCCTGGCCCGGCCCTGGGGCGGACCCCTCAGCCCCCACCCTGTCTAGAGG

ACTTGGGGGGACAGGACACAGGCCCTCTCCTTATGGTTCCCCCACCTGCCTCCGGCCGGG

ACCCTTGGGGTGTGGACAGAAAGGACACCTGCCTAATTGGCCCCCAGGAACCCAGAACT

TCTCTCCAGGGACCCCAGCCCGAGCACCCCCTTACCCAGGACCCAGCCCTGCCCCTCCTC

CCCTCTGCTCTCCTCTCATCACCCCATGGGAATCCGGTATCCCCAGGAAGCCATCAGGAA

GGGCTGAAGGAGGAAGCGGGGCCGTGCACCACCGGGCAGGAGGCTCCGTCTTCGTGAAC

CCAGGGAAGTGCCAGCCTCCTAGAGGGTATGGTCCACCCTGCCTGGGGCTCCCACCGTG

GCAGGCTGCGGGGAAGGACCAGGGACGGTGTGGGGGAGGGCTCAGGGCCCTGCGGGTG

CTCCTCCATCTTCGGTGAGCCTCCCCCTTCACCCACCGTCCCGCCCACCTCCTCTCCACCC

TGGCTGCACGTCTTCCACACCATCCTGAGTCCTACCTACACCAGAGCCAGCAAAGCCAGT

GCAGACAAAGGCTGGGGTGCAGGGGGGCTGCCAGGGCAGCTTCGGGGAGGGAAGGATG

GAGGGAGGGGAGGTCAGTGAAGAGGCCCCCTTCCCCTGGGTCCAGGATCCTCCTCTGGG

ACCCCCGGATCCCATCCCCTCCTGGCTCTGGGAGGAGAAGCAGGATGGGAGAATCTGTG

CGGGACCCTCTCACAGTGGAATATCCCCACAGCGGCTCAGGCCAGACCCAAAAGCCCCT

CAGTGAGCCCTCCACTGCAGTCCTGGGCCTGGGTAGCAGCCCCTCCCACAGAGGACAGA

CCCAGCACCCCGAAGAAGTCCTGCCAGGGGGAGCTCAGAGCCATGAAAGAGCAGGATAT

GGGGTCCCCGATACAGGCACAGACCTCAGCTCCATCCAGGCCCACCGGGACCCACCATG

GGAGGAACACCTGTCTCCGGGTTGTGAGGTAGCTGGCCTCTGTCTCGGACCCCACTCCAG

ACACCAGACAGAGGGGCAGGCCCCCCAAAACCAGGGTTGAGGGATGATCCGTCAAGGC

AGACAAGACCAAGGGGCACTGACCCCAGCAAGGGAAGGCTCCCAAACAGACGAGGAGG

ATTTTGTAGCTGTCTGTATCACTGTGTGCAACTGCAGCAGATGGGCTCGCGACCATAGCA

GGTGCTGCCACAGTGACACTCGCCAGGTCAAAAACCCCGTCCCAAGTCAGCGGAAGCAG

AGAGAGCAGGGAGGACACGTTTAGGATCTGAGGCCGCACCTGACACCCAGGGCAGCAG

ACGTCTCCCCTCCAGGGCACCCTCCACCGTCCTGCGTTTCTTCAAGAATAGGGGCGGCCT

GAGGGGGTCCAGGGCCAGGCGATAGGTCCCCTCTACCCCAAGGAGGAGCCAGGCAGGA

CCCGAGCACCGATGCATCTAACGCAGTCATGTAATGCTGGGTGACAGTCAGTTCGCCTAC

GTA

TX-DH120126 (SEQ ID NO: 235) includes toxin coding sequence inserted in

positions corresponding to D_H1-20 to D_H1-26:

TACGTAATGCATCTAACGCAGTCATGTAATGCTGGGTGACAGTCAGTTCGCCTCCCCATT

GAGGCTGACCTGCCCAGACGGGCCTGGGCCCACCCCACACACCGGGGCGGAATGTGTGC

AGGCCCCAGTCTCTGTGGGTGTTCCGCTAGCTGGGGCCCCCAGTGCTCACCCCACACCTA

AAGCGAGCCCCAGCCTCCAGAGCCCCCTAAGCATTCCCCGCCCAGCAGCCCAGCCCCTG

CCCCCACCCAGGAGGCCCCAGAGCTCAGGGCGCCTGGTCGGATTTTGTACAGCCCCGAG

TCACTGTGGAGAGAGCTTGCAATGGCAGACGCGGCTGCAGCAGATGGGCTCGCGATCAT

AGCAGGTGCTGCCACAGTGAGAAAAACTGTGTCAAAAACCGACTCCTGGCAGCAGTCGG

AGGCCCCGCCAGAGAGGGGAGCAGCCGGCCTGAACCCATGTCCTGCCGGTTCCCATGAC

CCCCAGCACCCAGAGCCCCACGGTGTCCCCGTTGGATAATGAGGACAAGGGCTGGGGGC

TCCGGTGGTTTGCGGCAGGGACTTGATCACATCCTTCTGCTGTGGCCCCATTGCCTCTGGC

TGGAGTTGACCCTTCTGACAAGTGTCCTCAGAAAGACAGGGATCACCGGCACCTCCCAAT

ATCAACCCCAGGCAGCACAGACACAAACCCCACATCCAGAGCCAACTCCAGGAGCAGAG

ACACCCCAACACTCTGGGGGACCCCAACCGTGATAACTCCCCACTGGAATCCGCCCCAG

AGTCTACCAGGACCAAAGGCCCTGCCCTGTCTCTGTCCCTCACTCAGGGCCTCCTGCAGG

GCGAGCGCTTGGGAGCAGACTCGGTCTTAGGGGACACCACTGTGGGCCCCAACTTTGAT

GAGGCCACTGACCCTTCCTTCCTTTCCTGGGGCAGCACAGACTTTGGGGTCTGGGCAGGG

AAGAACTACTGGCTGGTGGCCAATCACAGAGCCCCCAGGCCGAGGTGGCCCCAAGAAGG

CCCTCAGGAGGTGGCCACTCCACTTCCTCCCAGCTGGACCCCAGGTCCTCCCCAAGATAG

GGGTGCCATCCAAGGCAGGTCCTCCATGGAGCCCCCTTCAGACTCCTCCCGGGACCCCAC

TGGACCTCAGTCCCTGCTCTGGGAATGCAGCCACCACAAGCACACCAGGAAGCCCAGGC

CCAGCCACCCTGCAGTGGGCAAGCCCACACTCTGGAGCAGAGCAGGGTGCGTCTGGGAG

GGGCTAACCTCCCCACCCCCCACCCCCCATCTGCACACAGCCACCTACCACTGCCCAGAC

CCTCTGCAGGAGGGCCAAGCCACCATGGGGTATGGACTTAGGGTCTCACTCACGTGCCTC

CCCTCCTGGGAGAAGGGGCCTCATGCCCAGATCCCTGCAGCACTAGACACAGCTGGAGG

CAGTGGCCCCAGGGCCACCCTGACCTGGCATCTAAGGCTGCTCCAGCCCAGACAGCACT

GCCGTTCCTGGGAAGCCTGGGCTCCACCAGACCACAGGTCCAGGGCACAGCCCACAGGA

GCCACCCACACACAGCTCACAGGAAGAAGATAAGCTCCAGACCCCAGGGCGGGACCTGC

CTTCCTGCCACCACTTACACACAGGCCAGGGAGCTGTTCCCACACAGATCAACCCCAAAC

CGGGACTGCCTGGCACTAGGGTCACTGCCATTTCCCTCTCCATTCCCTCCCAGTGCCTCTG

TGCTCCCTCCTTCTGGGGAACACCCTGTGCAGCCCCTCCCTGCAGCCCACACGCTGGGGA

GACCCCACCCTGCCTCGGGCCTTTTCTACCTGCTGCACTTGCCGCCCACCCAAACAACCC

TGGGTACGTGACCCTGCAGTCCTCACCCTGATCTGCAACCAGACCCCTGTCCCTCCCTCT

AAACACCCCTCCCAGGCCAACTCTGCACCTGCAGGCCCTCCGCTCTTCTGCCACAAGAGC

CTCAGGTTTTCCTACCTGTGCCCACCCCCTAACCCCTCCTGCCCACAACTTGAGTTCTTCC

TCTCCTGGAGCCCTTGAGCCATGGCACTGACCCTACACTCCCACCCACACACTGCCCATG

CCATCACCTTCCTCCTGGACACTCTGACCCCGCTCCCCTCCCTCTCAGACCCGGCCCTGGT

ATTTCCAGGACAAAGGCTCACCCAAGTCTTCCCCATGCAGGCCCTTGCCCTCACTGCCTG

GTTACACGGGAGCCTCCTGTGCGCAGAAGCAGGGAGCTCAGCTCTTCCACAGGCAGAAG

GCACTGAAAGAAATCAGCCTCCAGTGCCTTGACACACGTCCGCCTGTGTCTCTCACTGCC

TGCACCTGCAGGGAGGCTCCGCACTCCCTCTAAAGATGAGGGATCCAGGCAGCAACATC

ACGGGAGAATGCAGGGCTCCCAGACAGCCCAGCCCTCTCGCAGGCCTCTCCTGGGAAGA

GACCTGCAGCCACCACTGAACAGCCACGGAGGTCGCTGGATAGTAACCGAGTCAGTGAC

CGACCTGGAGGGCAGGGGAGCAGTGAACCGGAGCCCATACCATAGGGACAGAGACCAG

CCGCTAACATCCCGAGCCCCTCACTGGCGGCCCCAGAACACCCCGTGGAAAGAGAACAG

ACCCACAGTCCCACCTGGAACAGGGCAGACACTGCTGAGCCCCCAGCACCAGCCCCAAG

AAACACTAGGCAACAGCATCAGAGGGGGCTCCTGAGAAAGAGAGGAGGGGAGGTCTCC

TTCACCATCAAATGCTTCCCTTGACCAAAAACAGGGTCCACGCAACTCCCCCAGGACAAA

GGAGGAGCCCCCTGTACAGCACTGGGCTCAGAGTCCTCTCTGAGACAGGCTCAGTTTCAG

ACAACAACCCGCTGGAATGCACAGTCTCAGCAGGAGAGCCAGGCCAGAGCCAGCAAGA

GGAGACTCGGTGACACCAGTCTCCTGTAGGGACAGGAGGATTTTGTGGGGGTTCGTGTC

ACTGTGAGCATATTGTCGGAGCAGGCAGTGCTATTCCCACAGTGACACAACCCCATTCAA

AAACCCCTACTGCAAACGCACCCACTCCTGGGACTGAGGGGCTGGGGGAGCGTCTGGGA

AGTATGGCCTAGGGGTGTCCATCAATGCCCAAAATGCACCAGACTCTCCCCAAGACATC

ACCCCACCAGCCAGTGAGCAGAGTAAACAGAAAATGAGAAGCAGCTGGGAAGCTTGCA

CAGGCCCCAAGGAAAGAGCTTTGGCAGGTGTGCAAGAGGGGATGTGGGCAGAGCCTCA

GCAGGGCCTTTTGCTGTTTCTGCTTTCCTGTGCAGAGAGTTCCATAAACTGGTATTCAAGA

TCAATGGCTGGGAGTGAGCCCAGGAGGACAGTGTGGGAAGAGCACAGGGAAGGAGGAG

CAGCCGCTATCCTACACTGTCATCTTTTGAAAGTTTGCCCTGTGCCCACAATGCTGCATCA

TGGGATGCTTAACAGCTGATGTAGACACAGCTAAAGAGAGAATCAGTGAAATGGATTTG

CAGCACAGATCTGAATAAATCCTCCAGAATGTGGAGCAGCACAGAAGCAAGCACACAGA

AAGTGCCTGATGCCAAGGCAAAGTTCAGTGGGCACCTTCAGGCATTGCTGCTGGGCACA

GACACTCTGAAAAGCACTGGCAGGAACTGCCTGTGACAAAGCAGAACCCTCAGGCAATG

CCAGCCCTAGAGCCCTTCCTGAGAACCTCATGGGCAAAGATGTGCAGAACAGCTGTTTGT

CATAGCCCCAAACTATGGGGCTGGACAAAGCAAACGTCCATCTGAAGGAGAACAGACAA

ATAAACGATGGCAGGTTCATGAAATGCAAACTAGGACAGCCAGAGGACAACAGTAGAG

AGCTACAGGCGGCTTTGCGGTTGAGTTCATGACAATGCTGAGTAATTGGAGTAACAGAG

GAAAGCCCAAAAAATACTTTTAATGTGATTTCTTCTAAATAAAATTTACACCCGGCAAAA

TGAACTATCTTCTTAAGGGATAAACTTTCCCCTGGAAAAACTATAAGGAAAATCAAGAA

AACGATGATCACATAAACACAGTGGTGGTTACTTCTACTGGGGAAGGAAGAGGGTATGA

GCTGAGACACACAGAGTCGGCAAGTCTCCTAACAAGAACAGAACAAATACATTACAGTA

CCTTGAAAACAGCAGTTAAACTTCTAAATCGCAAGAAGAGGAAAATGCACACACCTGTG

TTTAGAAAATTCTCAGTCCAGCACTGTTCATAATAGCAAAGACATTAACCCAGGTTGGAT

AAATAAGCGATGACACAGGCAATTGCACAATGATACAGACATACATTCAGTATATGAGA

CATCGATGATGTATCCCCAAAGAAATGACTTTAAAGAGAAAAGGCCTGATGTGTGGTGG

CAATCACCTCCCTGGGCATCCCCGGACAGGCTGCAGGCTCACTGTGTGGCAGGGCAGGC

AGGCACCTGCTGGCAGCTCCTGGGGCCTGATGTGGAGCAGGCACAGAGCTGTATATCCC

CAAGGAAGGTACAGTCAGTGCATTCCAGAGAGAAGCAACTCAGCCACACTCCCTGGCCA

GAACCCAAGATGCACACCCATGCACAGGGAGGCAGAGCCCAGCACCTCCGCAGCCACCA

CCACCTGCGCACGGGCCACCACCTTGCAGGCACAGAGTGGGTGCTGAGAGGAGGGGCAG

GGACACCAGGCAGGGTGAGCACCCAGAGAAAACTGCAGAAGCCTCACACATCCCTCACC

TGGCCTGGGCTTCACCTGACCTGGACCTCACCTGGCCTCGGGCCTCACCTGCACCTGCTC

CAGGTCTTGCTGGAGCCTGAGTAGCACTGAGGCTGTAGGGACTCATCCAGGGTTGGGGA

ATGACTCTGCAACTCTCCCACATCTGACCTTTCTGGGTGGAGGCACCTGGTGGCCCAGGG

AATATAAAAAGCCCCAGAATGATGCCTGTGTGATTTGGGGGCAATTTATGAACCCGAAA

GGACATGGCCATGGGGTGGGTAGGGACAGTAGGGACAGATGTCAGCCTGAGGTGAAGC

CTCAGGACACAGGTGGGCATGGACAGTGTCCACCTAAGCGAGGGACAGACCCGAGTGTC

CCTGCAGTAGACCTGAGAGCGCTGGGCCCACAGCCTCCCCTCGGGGCCCTGCTGCCTCCT

CAGGTCAGCCCTGGACATCCCGGGTTTCCCCAGGCCTGGCGGTAGGATTTTGTTGAGGTC

TGTGTCACTGTGGTATTACTATGAGAGGCTTGCTTGTGGCTTCCCTAAGAGCTGCAGCAG

GCAAGCTAAGCCTCACAGATGCTGCTATTACTACCACAGTGTCACAGAGTCCATCAAAA

ACCCATGCCTGGGAGCCTCCCACCACAGCCCTCCCTGCGGGGGACCGCTGCATGCCGTGT

TAGGATTTTGATCGAGGACACGGCGCCATGGGTATGGTGGCTACCACAGCAGTGCAGCC

CATGACCCAAACACACGGGGCAGCAGAAACAATGGACAGGCCCACAAGTGACCATGAT

GGGCTCCAGCCCACCAGCCCCAGAGACCATGAAACAGATGGCCAAGGTCACCCTACAGG

TCATCCAGATCTGGCTCCAAGGGGTCTGCATCGCTGCTGCCCTCCCAACGCCAAACCAGA

TGGAGACAGGGCCGGCCCCATAGCACCATCTGCTGCCGTCCACCCAGCAGTCCCGGAAG

CCCCTCCCTGAACGCTGGGCCACGTGTGTGAACCCTGCGAGCCCCCCATGTCAGAGTAGG

GGCAGCAGGAGGGCGGGGCTGGCCCTGTGCACTGTCACTGCCCCTGTGGTCCCTGGCCTG

CCTGGCCCTGACACCTGAGCCTCTCCTGGGTCATTTCCAAGACATTCCCAGGGACAGCCG

GAGCTGGGAGTCGCTCATCCTGCCTGGCTGTCCTGAGTCCTGCTCATTTCCAGACCTCAC

CAGGGAAGCCAACAGAGGACTCACCTCACACAGTCAGAGACAACGAACCTTCCAGAAAT

CCCTGTTTCTCTCCCCAGTGAGAGAAACCCTCTTCCAGGGTTTCTCTTCTCTCCCACCCTC

TTCCAGGACAGTCCTCAGCAGCATCACAGCGGGAACGCACATCTGGATCAGGACGGCCC

CCAGAACACGCGATGGCCCATGGGGACAGCCCAGCCCTTCCCAGACCCCTAAAAGGTAT

CCCCACCTTGCACCTGCCCCAGGGCTCAAACTCCAGGAGGCCTGACTCCTGCACACCCTC

CTGCCAGATATCACCTCAGCCCCCTCCTGGAGGGGACAGGAGCCCGGGAGGGTGAGTCA

GACCCACCTGCCCTCAATGGCAGGCGGGGAAGATTCAGAAAGGCCTGAGATCCCCAGGA

CGCAGCACCACTGTCAATGGGGGCCCCAGACGCCTGGACCAGGGCCTGTGTGGGAAAGG

CCTCTGGCCACACTCAGGGGGATTTTGTGAAGGGCCCTCCCACTGTGGAGAGGCTTTGCT

GTGGCTTCCCTAAGAGCTGCCGCAGCAGGCAATGCAAGCCTCACAGATGCTGCCACAGT

GATGAAACCAGCATCAAAAACCGACCGGACTCGCAGGGTTTATGCACACTTCTCGGCTC

GGAGCTCTCCAGGAGCACAAGAGCCAGGCCCGAGGGTTTGTGCCCAGACCCTCGGCCTC

TAGGGACACCCGGGCCATCTTAGCCGATGGGCTGATGCCCTGCACACCGTGTGCTGCCAA

ACAGGGGCTTCAGAGGGCTCTGAGGTGACTTCACTCATGACCACAGGTGCCCTGGTCCCT

TCACTGCCAGCTGCACCAGACCCTGTTCCGAGAGATGCCCCAGTTCCAAAAGCCAATTCC

TGGGGCCGGGAATTACTGTAGACACCAGCCTCATTCCAGTACCTCCTGCCAATTGCCTGG

ATTCCCATCCTGGCTGGAATCAAGAGGGCAGCATCCGCCAGGCTCCCAACAGGCAGGAC

TCCCACACACCCTCCTCTGAGAGGCCGCTGTGTTCCGCAGGGCCAGGCCGCAGACAGTTC

CCCTCACCTGCCCATGTAGAAACACCTGCCATTGTCGTCCCCACCTGGCAAAGACCACTT

GTGGAGCCCCCAGCCCCAGGTACAGCTGTAGAGAGAGTCCTCGAGGCCCCTAAGAAGGA

GCCATGCCCAGTTCTGCCGGGACCCTCGGCCAGGCCGACAGGAGTGGACGCTGGAGCTG

GGCCCACACTGGGCCACATAGGAGCTCACCAGTGAGGGCAGGAGAGCACATGCCGGGG

AGCACCCAGCCTCCTGCTGACCAGAGACCCGTCCCAGAGCCCAGGAGGCTGCAGAGGCC

TCTCCAGGGGGACACAGTGCATGTCTGGTCCCTGAGCAGCCCCCAGGCTCTCTAGCACTG

GGGGCCCCTGGCACAGCTGTCTGGACCCTCCCTGTTCCCTGGGAAGCTCCTCCTGACAGC

CCCGCCTCCAGTTCCAGGTGTGGATTTTGTCAGGGGGTGCCACACTGTGTACTGCCAGAA

GTGGATGTGGACTTGCGATAGTGAGAGGAAGTGCTGTGAGGGTATGGTATGCCGGCTGT

GGTGTAAGAAGAAGCTCTGGCACAGTGGTGCCGCCCATATCAAAAACCAGGCCAAGTAG

ACAGACCCCTGCCACGCAGCCCCAGGCCTCCAGCTCACCTGCTTCTCCTGGGGCTCTCAA

GGCTGCTGTCTGCCCTCTGGCCCTCTGTGGGGAGGGTTCCCTCAGTGGGAGGTCTGTGCT

CCAGGGCAGGGATGACTGAGATAGAAATCAAAGGCTGGCAGGGAAAGGCAGCTTCCCG

CCCTGAGAGGTGCAGGCAGCACCACAGAGCCATGGAGTCACAGAGCCACGGAGCCCCCA

GTGTGGGCGTGTGAGGGTGCTGGGCTCCCGGCAGGCCCAGCCCTGATGGGGAAGCCTGC

CCCGTCCCACAGCCCAGGTCCCCAGGGGCAGCAGGCACAGAAGCTGCCAAGCTGTGCTC

TACGATCCTCATCCCTCCAGCAGCATCCACTCCACAGTGGGGAAACTGAGCCTTGGAGAA

CCACCCAGCCCCCTGGAAACAAGGCGGGGAGCCCAGACAGTGGGCCCAGAGCACTGTGT

GTATCCTGGCACTAGGTGCAGGGACCACCCGGAGATCCCCATCACTGAGTGGCCAGCCT

GCAGAAGGACCCAACCCCAACCAGGCCGCTTGATTAAGCTCCATCCCCCTGTCCTGGGA

ACCTCTTCCCAGCGCCACCAACAGCTCGGCTTCCCAGGCCCTCATCCCTCCAAGGAAGGC

CAAAGGCTGGGCCTGCCAGGGGCACAGTACCCTCCCTTGCCCTGGCTAAGACAGGGTGG

GCAGACGGCTGCAGATAGGACATATTGCTGGGGCATCTTGCTCTGTGACTACTGGGTACT

GGCTCTCAACGCAGACCCTACCAAAATCCCCACTGCCTCCCCTGCTAGGGGCTGGCCTGG

TCTCCTCCTGCTGTCCTAGGAGGCTGCTGACCTCCAGGATGGCTTCTGTCCCCAGTTCTAG

GGCCAGAGCAGATCCCAGGCAGGCTGTAGGCTGGGAGGCCACCCCTGTCCTTGCCGAGG

TTCAGTGCAGGCACCCAGGACAGGAAATGGCCTGAACACAGGGATGACTGTGCCATGCC

CTACCTAAGTCCGCCCCTTTCTACTCTGCAACCCCCACTCCCCAGGTCAGCCCATGACGA

CCAACAACCCAACACCAGAGTCACTGCCTGGCCCTGCCCTGGGGAGGACCCCTCAGCCC

CCACCCTGTCTAGAGGAGTTGGGGGGACAGGACACAGGCTCTCTCCTTATGGTTCCCCCA

CCTGGCTCCTGCCGGGACCCTTGGGGTGTGGACAGAAAGGACGCCTGCCTAATTGGCCCC

CAGGAACCCAGAACTTCTCTCCAGGGACCCCAGCCCGAGCACCCCCTTACCCAGGACCC

AGCCCTGCCCCTCCTCCCCTCTGCTCTCCTCTCATCACTCCATGGGAATCCAGAATCCCCA

GGAAGCCATCAGGAAGGGCTGAAGGAGGAAGCGGGGCCGCTGCACCACCGGGCAGGAG

GCTCCGTCTTCGTGAACCCAGGGAAGTGCCAGCCTCCTAGAGGGTATGGTCCACCCTGCC

TGGGGCTCCCACCGTGGCAGGCTGCGGGGAAGGACCAGGGACGGTGTGGGGGAGGGCT

CAGGGCCCTGCAGGTGCTCCATCTTGGATGAGCCCATCCCTCTCACCCACCGACCCGCCC

ACCTCCTCTCCACCCTGGCCACACGTCGTCCACACCATCCTGAGTCCCACCTACACCAGA

GCCAGCAGAGCCAGTGCAGACAGAGGCTGGGGTGCAGGGGGGCCGCCAGGGCAGCTTT

GGGGAGGGAGGAATGGAGGAAGGGGAGGTCAGTGAAGAGGCCCCCCTCCCCTGGGTCT

AGGATCCACCTTTGGGACCCCCGGATCCCATCCCCTCCAGGCTCTGGGAGGAGAAGCAG

GATGGGAGATTCTGTGCAGGACCCTCTCACAGTGGAATACCTCCACAGCGGCTCAGGCC

AGATACAAAAGCCCCTCAGTGAGCCCTCCACTGCAGTGCAGGGCCTGGGGGCAGCCCCT

CCCACAGAGGACAGACCCAGCACCCCGAAGAAGTCCTGCCAGGGGGAGCTCAGAGCCAT

GAAGGAGCAAGATATGGGGACCCCAATACTGGCACAGACCTCAGCTCCATCCAGGCCCA

CCAGGACCCACCATGGGTGGAACACCTGTCTCCGGCCCCTGCTGGCTGTGAGGCAGCTG

GCCTCTGTCTCGGACCCCCATTCCAGACACCAGACAGAGGGACAGGCCCCCCAGAACCA

GTGTTGAGGGACACCCCTGTCCAGGGCAGCCAAGTCCAAGAGGCGCGCTGAGCCCAGCA

AGGGAAGGCCCCCAAACAAACCAGGAGGTTTCTGAAGCTGTCTGTGTCACAGTCTGCTG

CAACTGCAGCAGCAAATGGTGCCGCGACCATAGCAGGTGCTGCCACAATGACACTGGGC

AGGACAGAAACCCCATCCCAAGTCAGCCGAAGGCAGAGAGAGCAGGCAGGACACATTT

AGGATCTGAGGCCACACCTGACACTCAAGCCAACAGATGTCTCCCCTCCAGGGCGCCCT

GCCCTGTTCAGTGTTCCTGAGAAAACAGGGGCAGCCTGAGGGGATCCAGGGCCAGGAGA

TGGGTCCCCTCTACCCCGAGGAGGAGCCAGGCGGGAATCCCAGCCCCCTCCCCATTGAG

GCCATCCTGCCCAGAGGGGCCCGGACCCACCCCACACACCCAGGCAGAATGTGTGCAGG

CCTCAGGCTCTGTGGGTGCCGCTAGCTGGGGCTGCCAGTCCTCACCCCACACCTAAGGTG

AGCCACAGCCGCCAGAGCCTCCACAGGAGACCCCACCCAGCAGCCCAGCCCCTACCCAG

GAGGCCCCAGAGCTCAGGGCGCCTGGGTGGATTTTGTACAGCCCCGAGTCACTGTGGGT

ATAGTGGGGAGAGGCTTTGTGGCTTCCCTAAGAGCTGCAGCAGGCAAAAGCCTCACAGA

TGCTGCAGCTACTACCACAGTGAGAAAAGCTATGTCAAAAACCGTCTCCCGGCCACTGCT

GGAGGCCCAGCCAGAGAAGGGACCAGCCGCCCGAACATACGACCTTCCCAGACCTCATG

ACCCCCAGCACTTGGAGCTCCACAGTGTCCCCATTGGATGGTGAGGATGGGGGCCGGGG

CCATCTGCACCTCCCAACATCACCCCCAGGCAGCACAGGCACAAACCCCAAATCCAGAG

CCGACACCAGGAACACAGACACCCCAATACCCTGGGGGACCCTGGCCCTGGTGACTTCC

CACTGGGATCCACCCCCGTGTCCACCTGGATCAAAGACCCCACCGCTGTCTCTGTCCCTC

ACTCAGGGCCTGCTGAGGGGCGGGTGCTTTGGAGCAGACTCAGGTTTAGGGGCCACCAT

TGTGGGGCCCAACCTCGACCAGGACACAGATTTTTCTTTCCTGCCCTGGGGCAACACAGA

CTTTGGGGTCTGTGCAGGGAGGACCTTCTGGAAAGTCACCAAGCACAGAGCCCTGACTG

AGGTGGTCTCAGGAAGACCCCCAGGAGGGGGCTTGTGCCCCTTCCTCTCATGTGGACCCC

ATGCCCCCCAAGATAGGGGCATCATGCAGGGCAGGTCCTCCATGCAGCCACCACTAGGC

AACTCCCTGGCGCCGGTCCCCACTGCGCCTCCATCCCGGCTCTGGGGATGCAGCCACCAT

GGCCACACCAGGCAGCCCGGGTCCAGCAACCCTGCAGTGCCCAAGCCCTTGGCAGGATT

CCCAGAGGCTGGAGCCCACCCCTCCTCATCCCCCCACACCTGCACACACACACCTACCCC

CTGCCCAGTCCCCCTCCAGGAGGGTTGGAGCCGCCCATAGGGTGGGGGCTCCAGGTCTC

ACTCACTCGCTTCCCTTCCTGGGCAAAGGAGCCTCGTGCCCCGGTCCCCCCTGACGGCGC

TGGGCACAGGTGTGGGTACTGGGCCCCAGGGCTCCTCCAGCCCCAGCTGCCCTGCTCTCC

CTGGGAGGCCTGGGCACCACCAGACCACCAGTCCAGGGCACAGCCCCAGGGAGCCGCCC

ACTGCCAGCTCACAGGAAGAAGATAAGCTTCAGACCCTCAGGGCCGGGAGCTGCCTTCC

TGCCACCCCTTCCTGCCCCAGACCTCCATGCCCTCCCCCAACCACTTACACACAAGCCAG

GGAGCTGTTTCCACACAGTTCAACCCCAAACCAGGACGGCCTGGCACTCGGGTCACTGCC

ATTTCTGTCTGCATTCGCTCCCAGCGCCCCTGTGTTCCCTCCCTCCTCCCTCCTTCCTTTCT

TCCTGCATTGGGTTCATGCCGCAGAGTGCCAGGTGCAGGTCAGCCCTGAGCTTGGGGTCA

CCTCCTCACTGAAGGCAGCCTCAGGGTGCCCAGGGGCAGGCAGGGTGGGGGTGAGGCTT

CCAGCTCCAACCGCTTCGCTACCTTAGGACCGTTATAGTTAGGCGCGCCGTCGACCAATT

CTCATGTTTGACAGCTTATCATCGAATTTCTACGTA

DNA Constructs

Typically, a polynucleotide molecule containing one or more nucleotide coding sequences that each encodes a non-immunoglobulin polypeptide of interest, or portion thereof (e.g., an extracellular portion of an ACKR2 polypeptide or a portion of a toxin peptide) is inserted into a vector, preferably a DNA vector, in order to replicate the polynucleotide molecule in a suitable host cell.

Due to their size, one or more nucleotide coding sequences can be cloned directly from cDNA sources available from commercial suppliers or designed in silico based on published sequences available from GenBank. Alternatively, bacterial artificial chromosome (BAC) libraries can provide heterologous nucleotide coding sequences from genes of interest (e.g., a heterologous ACKR2 gene or a toxin encoding sequence). BAC libraries contain an average insert size of 100-150 kb and are capable of harboring inserts as large as 300 kb (Shizuya, et al., 1992, Proc. Natl. Acad. Sci., USA 89:8794-8797; Swiatek, et al., 1993, Genes and Development 7:2071-2084; Kim, et al., 1996, Genomics 34 213-218; herein incorporated by reference). For example, human and mouse genomic BAC libraries have been constructed and are commercially available (e.g., Invitrogen, Carlsbad Calif.). Genomic BAC libraries can also serve as a source of heterologous coding sequences as well as transcriptional control regions.

Alternatively, heterologous nucleotide coding sequences may be isolated, cloned and/or transferred from yeast artificial chromosomes (YACs). An entire heterologous gene or locus can be cloned and contained within one or a few YACs. If multiple YACs are employed and contain regions of overlapping homology, they can be recombined within yeast host strains to produce a single construct representing the entire locus. YAC arms can be additionally modified with mammalian selection cassettes by retrofitting to assist in introducing the constructs into embryonic stems cells or embryos by methods known in the art and/or described herein.

As described above, exemplary DNA and amino acid sequences for use in constructing an engineered D_Hregion of an immunoglobulin heavy chain locus are provided in Tables 3 and 4, respectively. Other heterologous nucleotide coding sequences can also be found in the GenBank database or other sequence databases known in the art. For example, the mRNA and amino acid sequences of human ACKR2 can be found at GenBank accession numbers NM_001296.4 and NP_001287.2, respectively, and are hereby incorporated by reference. Also, for example, DNA and amino acid sequences of an α-conotoxin can be found at GenBank accession numbers JX177132.1 and AFR68318.1, respectively; of a δ-conotoxin at GenBank accession numbers KR013220.1 and AKD43185.1, respectively; of a κ-conotoxin at GenBank accession numbers DQ311073.1 and ABD33865.1, respectively; of a μ-conotoxin at GenBank accession numbers AY207469.1 and AA048588.1, respectively; and/or of an co-conotoxin at GenBank accession numbers M84612.1 and AAA81590.1, respectively; all of which are hereby incorporated by reference. Further, for example, sequences of a toxin from the tarantula Grammostola spatulata can be found at GenBank accession numbers 1TYK_A and 1LUP_A, of SGTX-I can be found at GenBank accession number 1LA4_A, of Huwentoxin-IV (HWTX-IV) can be found at GenBank accession number P83303.2, of Protoxin-I (ProTxI) at GenBank accession number 2M9L_A, of Protoxin-2 (ProTxII) at GenBank accession number P83476.1; all of which are hereby incorporated by reference.

DNA constructs containing one or more nucleotide coding sequences as described herein, in some embodiments, comprise human ACKR2 DNA sequences encoding an extracellular portion of a human ACKR2 polypeptide operably linked to recombination signal sequences (RSSs, i.e., flanked by a 5′ RSS and 3′ RSS) for recombination with immunoglobulin gene segments (e.g., V_Hand J_H) in a transgenic non-human animal. In some embodiments, DNA constructs containing one or more nucleotide coding sequences as described herein comprise toxin DNA sequences encoding a portion of a μ-conotoxin and/or tarantula toxin peptide operably linked to recombination signal sequences (RSSs, i.e., flanked by a 5′ RSS and 3′ RSS) for recombination with immunoglobulin gene segments (e.g., V_Hand J_H) in a transgenic non-human animal. Recombination signal sequences may be identical or substantially identical with recombination signal sequences found in nature (e.g., genomic) or may be engineered by the hand of man (e.g., optimized). In some embodiments, RSSs are genomic in origin, and include a sequence or sequences that are found in an immunoglobulin heavy chain locus found in nature (e.g., a human or rodent immunoglobulin heavy chain locus). For example, a DNA construct can include recombination signal sequences located in the 5′-flanking and/or 3′-flanking regions of a nucleotide coding sequence encoding a non-immunoglobulin polypeptide of interest, or portion thereof (e.g., an extracellular portion of heterologous ACKR2 polypeptide or a portion of a toxin peptide), operably linked to the nucleotide coding sequence in a manner capable of recombining the nucleotide coding sequence with a V_Hand/or J_Hgene segment. In some embodiments, recombination signal sequences comprise a sequence naturally associated with a traditional D_Hgene segment (i.e., an RSS found in nature). In some embodiments, recombination signal sequences comprise a sequence that is not naturally associated with a traditional D_Hgene segment. In some embodiments, recombination signal sequences comprise a sequence that is optimized for recombination with V_Hand J_Hgene segments. In some embodiments, recombination signal sequences operably linked to one or more nucleotide coding sequences each encoding a non-immunoglobulin polypeptide of interest, or portion thereof (e.g., an extracellular portion of an ACKR2 polypeptide or a portion of a toxin peptide) provide for recombination at a level similar to, more or less than that level of recombination in the animal from which the sequence is obtained. If additional flanking sequences are useful in optimizing recombination of the one or more nucleotide coding sequences, such sequences can be cloned using existing sequences as probes. Additional sequences necessary for maximizing recombination and/or expression of a heavy chain variable region containing a nucleotide coding sequence of non-immunoglobulin polypeptide (e.g., an ACKR2 or toxin) can be obtained from genomic sequences or other sources depending on the desired outcome.

In various embodiments, one or more nucleotide coding sequences as described herein are each flanked 5′ and/or 3′ by optimized RSSs. Exemplary optimized RSSs that may be used are provided in FIG. 2 and described in Example 1.

In various embodiments, one or more nucleotide coding sequences as described herein are each flanked 5′ by an optimized RSS having a sequence at least 50% (e.g., 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more) identical to a 5′ RSS that appears in FIG. 2.

In various embodiments, one or more nucleotide coding sequences as described herein are each flanked 5′ by an optimized RSS having a sequence that is substantially identical or identical to a 5′ RSS that appears in FIG. 2.

In various embodiments, one or more nucleotide coding sequences are each flanked 3′ by an optimized RSS having a sequence at least 50% (e.g., 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more) identical to a 3′ RSS that appears in FIG. 2.

In various embodiments, one or more nucleotide coding sequences as described herein are each flanked 3′ by an optimized RSS having a sequence that is substantially identical or identical to a 5′RSS that appears in FIG. 2.

In various embodiments, one or more nucleotide coding sequences as described herein are each flanked 5′ and 3′ by optimized RSSs each 5′ and 3′ RSS having a sequence at least 50% (e.g., 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more) identical to 5′ and 3′ RSSs that appear in FIG. 2.

In various embodiments, one or more nucleotide coding sequences as described herein are each flanked 5′ and 3′ by optimized RSSs each 5′ and 3′ RSS having a sequence that is substantially identical or identical to 5′ and 3′ RSSs that appear in FIG. 2.

In various embodiments, one or more nucleotide coding sequences as described herein are each flanked by 5′ and 3′ RSSs that are selected from FIG. 2.

DNA constructs can be prepared using methods known in the art. For example, a DNA construct can be prepared as part of a larger plasmid. Such preparation allows the cloning and selection of the correct constructions in an efficient manner as is known in the art. DNA fragments containing one or more nucleotide coding sequences as described herein can be located between convenient restriction sites on the plasmid so that they can be easily isolated from the remaining plasmid sequences for incorporation into the desired animal.

Various methods employed in preparation of plasmids and transformation of host organisms are known in the art. For other suitable expression systems for both prokaryotic and eukaryotic cells, as well as general recombinant procedures, see Molecular Cloning: A Laboratory Manual, 2nd Ed., ed. by Sambrook, J. et al., Cold Spring Harbor Laboratory Press: 1989.

Production of Non-Human Animals Having an Engineered D_ARegion

Non-human animals are provided that express antibodies characterized by heavy chain CDR3 diversity to direct binding to particular antigens resulting from integration of one or more nucleotide coding sequences, which one or more nucleotide coding sequences that each encode a non-immunoglobulin polypeptide of interest, or portion thereof (e.g., an extracellular portion of an atypical chemokine receptor such as, e.g., ACKR2, or a portion of a toxin peptide), into an immunoglobulin heavy chain variable region in the genome of the non-human animal. Suitable examples described herein include rodents, in particular, mice.

One or more heterologous nucleotide coding sequences, in some embodiments, comprise genetic material from a heterologous species (e.g., humans, spiders, scorpions, snails, tarantulas, sea anemones, etc.), wherein the heterologous nucleotide coding sequences each encode a non-immunoglobulin polypeptide of interest, or portion thereof (e.g., an extracellular portion of an ACKR polypeptide or a portion of a toxin peptide) that comprises the encoded portion of the genetic material from the heterologous species. In some embodiments, heterologous nucleotide coding sequences described herein comprise nucleotide coding sequences of a heterologous species that encodes a non-immunoglobulin polypeptide of interest, or portion thereof (e.g., an extracellular portion of a heterologous ACKR polypeptide or a portion of a toxin peptide), which portion of a non-immunoglobulin polypeptide of interest appears in an immunoglobulin heavy chain, in particular, a heavy chain CDR3, that is expressed by a B cell of a non-human animal as described herein. Non-human animals, embryos, cells and targeting constructs for making non-human animals, non-human embryos, and cells containing said heterologous nucleotide coding sequences are also provided.

In various embodiments, one or more heterologous nucleotide coding sequences are inserted into a D_Hregion of an immunoglobulin heavy chain variable region within the genome of a non-human animal. In some embodiments, a D_Hregion (or portion thereof) of an immunoglobulin heavy chain variable region is not deleted (i.e., intact). In some embodiments, a D_Hregion (or portion thereof) of an immunoglobulin heavy chain variable region is altered, disrupted, deleted or replaced with one or more heterologous nucleotide coding sequences (e.g., one or more heterologous ACKR2 or one or more heterologous toxin nucleotide coding sequences). In some embodiments, all or substantially all of a D_Hregion is replaced with one or more heterologous nucleotide coding sequences; in some certain embodiments, one or more traditional D_Hgene segments are not deleted or replaced in a D_Hregion of an immunoglobulin heavy chain variable region. In some embodiments, a D_Hregion as described herein is a synthetic D_Hregion, which synthetic D_Hregion comprises one or more heterologous nucleotide coding sequences as described herein. In some embodiments, a D_Hregion is a human D_Hregion. In some embodiments, a D_Hregion is a murine D_Hregion. In some embodiments, an engineered D_Hregion (or portion thereof) as described herein is inserted into an immunoglobulin heavy chain variable region so that said engineered D_Hregion (or portion thereof) is operably linked with one or more V_Hgene segments and/or one or more J_Hgene segments. In some embodiments, one or more heterologous nucleotide coding sequences is inserted into one of the two copies of an immunoglobulin heavy chain variable region, giving rise to a non-human animal that is heterozygous with respect to the one or more heterologous nucleotide coding sequences (i.e., an engineered D_Hregion). In some embodiments, a non-human animal is provided that is homozygous for one or more heterologous nucleotide coding sequences (i.e., an engineered D_Hregion). In some embodiments, a non-human animal is provided that is heterozygous for one or more heterologous nucleotide coding sequences (i.e., an engineered D_Hregion).

In some embodiments, a non-human animal described herein contains a human immunoglobulin heavy chain variable region that includes a D_Hregion that contains one or more heterologous nucleotide coding sequences within its genome (e.g., randomly integrated). Thus, such non-human animals can be described as having a human immunoglobulin heavy chain transgene containing an engineered D_Hregion. An engineered D_Hregion can be detected using a variety of methods including, for example, PCR, Western blot, Southern blot, restriction fragment length polymorphism (RFLP), or a gain or loss of allele assay. In some embodiments, a non-human animal described herein is heterozygous with respect to an engineered D_Hregion as described herein. In some embodiments, a non-human animal described herein is homozygous with respect to an engineered D_Hregion as described herein. In some embodiments, a non-human animal described herein is hemizygous with respect to an engineered D_Hregion as described herein. In some embodiments, a non-human animal described herein contains one or more copies of an engineered D_Hregion as described herein.

In some embodiments, one or more heterologous ACKR nucleotide coding sequences disclosed herein are heterologous ACKR2 nucleotide coding sequences. In some embodiments, one or more heterologous ACKR2 nucleotide coding sequences are human.

In some embodiments, one or more heterologous toxin nucleotide coding sequences described herein are heterologous μ-conotoxin nucleotide coding sequences, heterologous tarantula toxin nucleotide coding sequences and/or combinations thereof.

In various embodiments, one or more heterologous nucleotide coding sequences described herein includes one or more nucleotide coding sequences that each have a sequence at least 50% (e.g., 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more) identical to one or more nucleotide coding sequences that appear in Table 3 or Table 4.

In various embodiments, one or more heterologous nucleotide coding sequences described herein includes one or more nucleotide coding sequences that each have a sequence that is substantially identical or identical to one or more nucleotide coding sequences that appear in Table 3 or Table 4.

In various embodiments, one or more heterologous nucleotide coding sequences described herein are selected from Table 3 and/or Table 4.

In various embodiments, an engineered D_Hregion described herein comprises one or more heterologous nucleotide coding sequences that each have a sequence that is at least 50% (e.g., 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more) identical to one or more nucleotide coding sequences that appear in Table 3 or Table 4.

In various embodiments, an engineered D_Hregion described herein comprises one or more heterologous nucleotide coding sequences that each have a sequence that is substantially identical or identical to one or more nucleotide coding sequences that appear in Table 3 or Table 4.

In various embodiments, an engineered D_Hregion described herein comprises 5, 10, 15, 20 or 25 heterologous ACKR2 nucleotide coding sequences that each have a sequence that is identical to 5, 10, 15, 20 or 25 ACKR2 nucleotide coding sequences that appear in Table 3.

In various embodiments, an engineered D_Hregion described herein comprises 5, 10, 15, 20, 25 or 26 heterologous toxin nucleotide coding sequences that each have a sequence that is identical to 5, 10, 15, 20, 25 or 26 toxin nucleotide coding sequences that appear in Table 4.

In various embodiments, an engineered D_Hregion described herein comprises one or more heterologous nucleotide coding sequences that are each flanked by a 5′ recombination signal sequence (5′ RSS) and a 3′ recombination signal sequence (3′ RSS), which 5′ RSS and 3′ RSS each have a sequence that is at least 50% (e.g., 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more) identical to a 5′ RSS and a 3′ RSS that appear in FIG. 2.

In various embodiments, an engineered D_Hregion described herein comprises one or more heterologous nucleotide coding sequences that are each flanked by a 5′ recombination signal sequence (5′ RSS) and a 3′ recombination signal sequence (3′ RSS), which 5′ RSS and 3′ RSS each have a sequence that is substantially identical or identical to a 5′ RSS and a 3′ RSS that appear in FIG. 2.