Methods of transcription activator like effector assembly

Abstract
The disclosure describes methods that include providing a first nucleic acid having a sequence encoding a first set comprising one or more transcription activator-like effector (TALE) repeat domains and/or one or more portions of one or more TALE repeat domains; contacting the first nucleic acid with a first enzyme, wherein the first enzyme creates a first ligatable end; providing a second nucleic acid having a sequence encoding a second set comprising one or more TALE repeat domains and/or one or more portions of one or more TALE repeat domains; contacting the second nucleic acid with a second enzyme, wherein the second enzyme creates a second ligatable end, and wherein the first and second ligatable ends are compatible; and ligating the first and second nucleic acids through the first and second ligatable ends to produce a first ligated nucleic acid, wherein the first ligated nucleic acid is linked to a solid support, and wherein the first ligated nucleic acid encodes a polypeptide comprising said first and second sets.
Description
SEQUENCE LISTING

The instant application contains a Sequence Listing which has been submitted in ASCII format via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jul. 11, 2012, is named 2953936W.txt and is 459,673 bytes in size.


TECHNICAL FIELD

This invention relates to methods of producing nucleic acids encoding peptides and polypeptides encoding multiple transcription-like activator effector (TALE) repeat domains and the proteins themselves.


BACKGROUND

TALE proteins of plant pathogenic bacteria in the genus Xanthomonas play important roles in disease, or trigger defense, by binding host DNA and activating effector-specific host genes (see, e.g., Gu et al., 2005, Nature 435:1122; Yang et al., 2006 Proc. Natl. Acad. Sci. USA 103:10503; Kay et al., 2007, Science 318:648; Sugio et al., 2007, Proc. Natl. Acad. Sci. USA 104:10720; and Romer et al., 2007, Science 318:645). Specificity for nucleic acid sequences depends on an effector-variable number of imperfect, typically ˜33-35 amino acid repeats (Schornack et al., 2006, J. Plant Physiol. 163:256). Each repeat binds to one nucleotide in the target sequence, and the specificity of each repeat for its nucleotide is largely context-independent, allowing for the development of custom sequence-specific TALE proteins (Moscou et al., 2009, Science 326:1501; Boch et al., 2009, Science 326:1509-1512).


SUMMARY

This application is based, at least in part, on the development of rapid, simple, and easily automatable methods for assembling nucleic acids encoding custom TALE repeat array proteins.


Accordingly, this disclosure features a process that includes: (a) providing a first nucleic acid having a sequence encoding a first set comprising one or more (e.g., two or more, three or more, four or more, five or more, six or more, one to six, two to six, three to six, four to six, five or six, one two to five, three to five, four or five, one to four, two to four, three or four, one to three, two or three, one or two, one, two, three, four, five, or six) transcription activator-like effector (TALE) repeat domains and/or one or more portions of one or more TALE repeat domains; (b) contacting the first nucleic acid with a first enzyme, wherein the first enzyme creates a first ligatable end; (c) providing a second nucleic acid having a sequence encoding a second set comprising one or more (e.g., two or more, three or more, four or more, five or more, six or more, one to six, two to six, three to six, four to six, five or six, one two to five, three to five, four or five, one to four, two to four, three or four, one to three, two or three, one or two, one, two, three, four, five, or six) TALE repeat domains and/or one or more portions of one or more TALE repeat domains; (d) contacting the second nucleic acid with a second enzyme, wherein the second enzyme creates a second ligatable end, and wherein the first and second ligatable ends are compatible; and (e) ligating the first and second nucleic acids through the first and second ligatable ends to produce a first ligated nucleic acid, wherein the first ligated nucleic acid is linked to a solid support, and wherein the first ligated nucleic acid encodes a polypeptide comprising said first and second sets.


In some embodiments, the methods include linking the first nucleic acid to a solid support prior to (b) contacting the first nucleic acid with the first enzyme or prior to (e) ligating the first and second nucleic acids. In some embodiments, the methods include linking the first ligated nucleic acid to a solid support.


In some embodiments, the first set is N-terminal to the second set in the polypeptide. In some embodiments, the second set is N-terminal to the first set in the polypeptide.


In some embodiments, the first and second enzymes are a first and second restriction endonuclease, wherein the first restriction endonuclease cleaves at a site within the first nucleic acid and creates a first cut end, and the second restriction endonuclease cleaves at a site within the second nucleic acid and creates a second cut end, and wherein the first and second ligatable ends are the first and second cut ends. When restriction endonucleases are used, the first ligated nucleic acid cannot include a restriction site recognized by the first restriction endonuclease.


The process can further include: (f) contacting the first ligated nucleic acid with a third enzyme, wherein the third enzyme creates a third ligatable end; (g) providing a third nucleic acid comprising a sequence encoding a third set comprising one or more (e.g., two or more, three or more, four or more, five or more, six or more, one to six, two to six, three to six, four to six, five or six, one two to five, three to five, four or five, one to four, two to four, three or four, one to three, two or three, one or two, one, two, three, four, five, or six) TALE repeat domains and/or one or more portions of one or more TALE repeat domains; (h) contacting the third nucleic acid with a fourth enzyme, wherein the fourth enzyme creates a fourth ligatable end, and wherein the third and fourth ligatable ends are compatible; and (i) ligating the first ligated and third nucleic acids through the third and fourth ligatable ends to produce a second ligated nucleic acid linked to the solid support, wherein the second ligated nucleic acid encodes a polypeptide comprising said first, second, and third sets.


In some embodiments, the third and fourth enzymes are a third and fourth restriction endonuclease, wherein the third restriction endonuclease cleaves at a site within the first ligated nucleic acid and creates a third cut end, and the fourth restriction endonuclease cleaves at a site within the third nucleic acid and creates a fourth cut end, and wherein the third and fourth ligatable ends are the third and fourth cut ends.


In some embodiments, the ligated nucleic acid does not include a restriction site recognized by the first endonuclease, and the first and third restriction endonucleases are the same. In some embodiments, the second and fourth restriction endonucleases are the same.


The process can further include: (j) contacting the second ligated nucleic acid with a fifth enzyme, wherein the fifth enzyme creates a fifth ligatable end; (k) providing a fourth nucleic acid having a sequence encoding a fourth set comprising one or more (e.g., two or more, three or more, four or more, five or more, six or more, one to six, two to six, three to six, four to six, five or six, one two to five, three to five, four or five, one to four, two to four, three or four, one to three, two or three, one or two, one, two, three, four, five, or six) TALE repeat domains and/or one or more portions of one or more TALE repeat domains; (l) contacting the fourth nucleic acid with a sixth enzyme, wherein the sixth enzyme creates a sixth ligatable end, and wherein the fifth and sixth ligatable ends are compatible; and (m) ligating the second ligated and fourth nucleic acids through the fifth and sixth ligatable ends to produce a third ligated nucleic acid linked to the solid support, wherein the third ligated nucleic acid encodes a polypeptide comprising said first, second, third, and fourth sets. One of ordinary skill would recognize that the process can be repeated with similar additional steps. Such methods are included within this disclosure.


In some embodiments, the fifth and sixth enzymes are a fifth and sixth restriction endonuclease, wherein the fifth restriction endonuclease cleaves at a site within the second ligated nucleic acid and creates a fifth cut end, and the sixth restriction endonuclease cleaves at a site within the fourth nucleic acid and creates a sixth cut end, and wherein the fifth and sixth ligatable ends are the fifth and sixth cut ends.


In some embodiments, the second ligated nucleic acid does not include a restriction site recognized by the first endonuclease, and the first, third, and fifth restriction endonucleases are the same.


In some embodiments, the second, fourth, and sixth restriction endonucleases are the same.


In some embodiments, the solid support and linked nucleic acid are isolated, e.g., following any of the above steps (a)-(m).


In some embodiments, the second, third, or fourth set comprises one to four TALE repeat domains.


In some embodiments, the ligatable ends include an overhang of 1-10 nucleotides. In some embodiments, the ligatable ends are blunt ends. In some embodiments, an overhang can be generated using an exonuclease and polymerase in the presence of one or more nucleotides.


In some embodiments, an enzyme or restriction endonuclease used in the above processes is a type IIS restriction endonuclease.


The processes can further comprise unlinking a ligated nucleic acid from the solid support and inserting the ligated nucleic acid (or a processed derivative thereof comprising the TALE repeat array coding sequences) into a vector, e.g., an expression vector. The expression vector can include a sequence encoding an effector domain (e.g., a nuclease domain) configured to create a sequence encoding a fusion protein of the polypeptide and the effector domain. The expression vector can be inserted into a cell to affect the cell directly or for expression of the polypeptide or fusion protein. When the polypeptide or fusion protein is to be expressed, the processes can further include expressing and purifying the polypeptide or fusion protein.


In another aspect, this disclosure features TALE proteins that bind to a target nucleotide sequence (e.g., a “half site”) disclosed herein (e.g., in Table 6 or 7), TALE nucleases that include the TALE proteins, pairs of TALE proteins (e.g., TALENs) that bind to the target sites disclosed herein (e.g., in Table 6 or 7), and nucleic acids that encode any of the above. In some embodiments, the TALE proteins, TALE nucleases, and pairs of TALE proteins (e.g., TALENs) are those disclosed in Example 7. The nucleic acids encoding the TALE proteins, TALE nucleases, and pairs of TALE proteins (e.g., TALENs) can be those disclosed in Example 7 or other sequences that encode the proteins disclosed in Example 7. The disclosure also includes vectors and cells that include the nucleic acids encoding the TALE proteins, TALE nucleases, or pairs of TALE proteins (e.g., TALENs) disclosed herein and methods of expressing the TALE proteins, TALE nucleases, or pairs of TALE proteins (e.g., TALENs) that include culturing the cells. The methods of expressing the TALE proteins, TALE nucleases, or pairs of TALE proteins (e.g., TALENs) can also include isolating the TALE proteins, TALE nucleases, or pairs of TALE proteins (e.g., TALENs) from the cell culture.


In another aspect, the invention features a set, archive, or library of nucleic acids (e.g., plasmids) that include sequences encoding one or more TALE domains. In some embodiments, the set, archive, or library includes sequences encoding one, two, three, and/or four (or more than four (e.g., five, six, or more)) TALE repeat domains. In some embodiments, the set, library, or archive of nucleic acids includes sequences encoding TALE repeat domains that bind to nucleotide sequences having one, two, three, four (or more than four (e.g., five, six, or more)) nucleotides. In some embodiments, the set, library, or archive includes restriction sites (e.g., sites for type IIS restriction endonucleases) surrounding the sequences encoding the TALE repeat domains.


The methods described herein provide several advantages, including avoiding extensive PCR amplification of the TALE repeats, thereby avoiding the introduction of mutations from PCR errors. Further, TALE repeat arrays of any desired length can be constructed, and the methods can be easily multiplexed and/or automated.


Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.


The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.





DESCRIPTION OF DRAWINGS


FIG. 1 is a schematic depiction of an exemplary method of assembling a nucleic acid encoding a TALE protein.



FIG. 2 is a schematic depiction of exemplary archives of nucleic acids encoding single (one-mer), two-mer, three-mer, and four-mer TALE repeat domains.



FIG. 3 depicts the sequence of the pUC57-ΔBsaI plasmid. This plasmid is identical to plasmid pUC57 except for mutation of a single base (in bold, underlined and lowercase) that destroys a BsaI restriction site.



FIG. 4A depicts the polypeptide sequences of exemplary TALE repeats of type α/ε, β, γ, and δ. Polymorphic residues characteristic of each type are indicated in bold and italic. The hypervariable triplet SNI for binding to A is indicated in underscore.



FIG. 4B depicts the polynucleotide sequences of the exemplary TALE repeats of FIG. 4A.



FIGS. 5A-5B depict the common sequence of expression plasmids pJDS70, pJDS71, pJDS74, pJDS76, and pJDS78. The region of the variable sequences is depicted as XXXXXXXXX (underlined and bold).



FIG. 6 is a schematic diagram of the enhanced green fluorescent protein (eGFP) gene and the location of the binding sites for synthetic TALE proteins described herein.



FIG. 7 is a bar graph depicting the % of TALE nuclease-modified, eGFP-negative cells at 2 and 5 days following transfection with plasmids encoding TALE nucleases designed to bind and cleave the eGFP reporter gene.



FIG. 8 is a depiction of the sequences of insertion-deletion mutants of eGFP induced by TALE nucleases. Deleted bases are indicated by dashes and inserted bases indicated by double underlining; the TALEN target half-sites are single underlined. The net number of bases inserted or deleted is shown to the right.



FIG. 9 is a depiction of an electrophoresis gel of assembled DNA fragments encoding 17-mer TALE array preparations.



FIG. 10 is a depiction of an electrophoresis gel of 16-mer TALE array preparations.



FIGS. 11A-11B depict the nucleotide (11A) and polypeptide (11B) sequence of engineered DR-TALE-0003.



FIGS. 12A-12B depict the nucleotide (12A) and polypeptide (12B) sequence of engineered DR-TALE-0006.



FIGS. 13A-13B depict the nucleotide (13A) and polypeptide (13B) sequence of engineered DR-TALE-0005.



FIGS. 14A-14B depict the nucleotide (14A) and polypeptide (14B) sequence of engineered DR-TALE-0010.



FIGS. 15A-15B depict the nucleotide (15A) and polypeptide (15B) sequence of engineered DR-TALE-0023.



FIGS. 16A-16B depict the nucleotide (16A) and polypeptide (16B) sequence of engineered DR-TALE-0025.



FIGS. 17A-17B depict the nucleotide (17A) and polypeptide (17B) sequence of engineered DR-TALE-0020.



FIGS. 18A-18B depict the nucleotide (18A) and polypeptide (18B) sequence of engineered DR-TALE-0022.



FIG. 19A is a bar graph depicting activities of 48 TALEN pairs and four ZFN pairs in the EGFP gene-disruption assay. Percentages of EGFP-negative cells as measured 2 and 5 days following transfection of U2OS cells bearing a chromosomally integrated EGFP reporter gene with nuclease-encoding plasmids are shown. Mean percent disruption of EGFP and standard error of the mean from three independent transfections are shown.



FIG. 19B is a bar graph depicting mean EGFP-disruption activities from FIG. 19A, grouped by length of the TALENs.



FIG. 20A is a graph depicting the ratio of mean percent EGFP disruption values from day 2 to day 5. Ratios were calculated for groups of each length TALEN using the data from FIG. 19B. Values greater than 1 indicate a decrease in the average of EGFP-disrupted cells at day 5 relative to day 2.



FIG. 20B is a graph depicting the ratio of mean tdTomato-positive cells from day 2 to day 5 grouped by various lengths of TALENs. tdTomato-encoding control plasmids were transfected together with nuclease-encoding plasmids on day 0.



FIGS. 21A-E depict DNA sequences and frequencies of assembled TALEN-induced mutations at endogenous human genes. For each endogenous gene target, the wild-type (WT) sequence is shown at the top with the TALEN target half-sites underlined and the translation start codon of the gene (ATG) indicated by a box. Deletions are indicated by dashes and insertions by lowercase letters and double underlining. The sizes of the insertions (+) or deletions (Δ) are indicated to the right of each mutated site. The number of times that each mutant was isolated is shown in parentheses. Mutation frequencies are calculated as the number of mutants identified divided by the total number of sequences analyzed. Note that for several of the genes, we also identified larger deletions that extend beyond the sequences of the TALEN target sites.



FIG. 22 is a schematic depiction of an exemplary method of assembling a nucleic acid encoding a TALE protein containing TALE repeat domains or portions of TALE repeat domains.





DETAILED DESCRIPTION

The methods described herein can be used to assemble engineered proteins containing TALE repeat domains for binding to specific sequences of interest. Assembling long arrays (e.g., 12 or more) of TALE repeat domain repeats can be challenging because the repeats differ only at a small number of amino acids within their highly conserved ˜33-35 amino acid consensus sequence. PCR assembly can lead to the introduction of unwanted mutations. Hierarchical assembly methods that involve one or more passages of intermediate plasmid constructs in E. coli can also be problematic because the highly repetitive nature of these constructs can make them unstable and prone to recombination and because the need to passage these intermediate constructs makes these approaches difficult to automate.


TAL Effectors


TAL effectors of plant pathogenic bacteria in the genus Xanthomonas play important roles in disease, or trigger defense, by binding host DNA and activating effector-specific host genes. Specificity depends on an effector-variable number of imperfect, typically ˜33-35 amino acid repeats. Polymorphisms are present primarily at repeat positions 12 and 13, which are referred to herein as the “repeat variable-diresidue” (RVD). The RVDs of TAL effectors correspond to the nucleotides in their target sites in a direct, linear fashion, one RVD to one nucleotide, with some degeneracy and no apparent context dependence. In some embodiments, the polymorphic region that grants nucleotide specificity may be expressed as a triresidue or triplet e.g., encompassing residues 11, 12, and 13.


Each DNA binding repeat can include an RVD that determines recognition of a base pair in the target DNA sequence, wherein each DNA binding repeat is responsible for recognizing one base pair in the target DNA sequence, and wherein the RVD comprises, but is not limited to, one or more of the following: HA for recognizing C; ND for recognizing C; HI for recognizing C; HN for recognizing G; NA for recognizing G; SN for recognizing G or A; YG for recognizing T; and NK for recognizing and one or more of: HD for recognizing C; NG for recognizing T; NI for recognizing A; NN for recognizing G or A; NS for recognizing A or C or G or T; N* for recognizing C or T, wherein * represents a gap in the second position of the RVD; HG for recognizing T; H* for recognizing T, wherein * represents a gap in the second position of the RVD; and IG for recognizing T.


TALE proteins are useful in research and biotechnology as targeted chimeric nucleases that can facilitate homologous recombination in genome engineering (e.g., to add or enhance traits useful for biofuels or biorenewables in plants). These proteins also are useful as, for example, transcription factors, and especially for therapeutic applications requiring a very high level of specificity such as therapeutics against pathogens (e.g., viruses) as non-limiting examples.


Assembly Methods


An example of the methods described herein of assembling a TALE repeat domain array is shown in FIG. 1 and includes the following steps: (1) provision a single biotinylated PCR product encoding one single N-terminal TALE repeat domain (a one-mer) with a linker suitable for attachment to a solid support (in the example shown here, a magnetic streptavidin coated bead is used but other solid supports can also be utilized as well as other ways of tethering the initial DNA fragment to the solid support); (2) creation of an overhang at the 3′ end of the one-mer DNA (e.g., using a Type IIS restriction enzyme); (3) ligation of a second fragment containing four TALE repeat domain (i.e., a pre-assembled four-mer), creating a five-mer; (4) attachment of the five-mer to the solid support; (5) ligation of additional pre-assembled TALE repeat domains to create a long array, e.g., a piece or pieces of DNA encoding one, two, three, or four TALE repeat domains depending upon the length of the desired final array, and (6) release of the extended DNA encoding the TALE repeats from the solid support (e.g., by using a Type IIS restriction enzyme whose site is built in at the 5′ end of the initial biotinylated DNA product). The final fragment can then be prepared for ligation to an appropriate expression plasmid.


Alternatively, the method can proceed as follows: (1) attachment of a single biotinylated PCR product encoding one single N-terminal TALE repeat domains to a solid support (in the example shown here, a magnetic streptavidin coated bead is used but other solid supports such as the streptavidin-coated wells of a multi-well plate can also be utilized as well as other ways of tethering the initial DNA fragment to the solid support), (2) creation of an overhang at the 3′ end of the anchored DNA (e.g., using a Type ITS restriction enzyme), (3) ligation of a second fragment containing four TALE repeat domain, (4) additional cycles of steps (2) and (3) to create a long array, (5) in the final cycle performing ligation of a piece of DNA encoding one, two, three, or four TALE repeat domains depending upon the length of the desired final array, and (6) release of the extended DNA encoding the TALE repeats from the solid support (e.g., by using a Type IIS restriction enzyme whose site is built in at the 5′ end of the initial biotinylated DNA product).


Another example of a method of assembling a TALE repeat domain array based on the methods described herein is shown in FIG. 22 and includes the following steps: (1) provision a single biotinylated PCR product encoding a portion of one single N-terminal TALE repeat domain (a partial one-mer) with a linker suitable for attachment to a solid support (in the example shown here, a magnetic streptavidin coated bead is used but other solid supports can also be utilized as well as other ways of tethering the initial DNA fragment to the solid support); (2) creation of an overhang at the 3′ end of the partial one-mer DNA (e.g., using a Type IIS restriction enzyme); (3) ligation of a second fragment containing consisting of two partial and three full TALE repeats; (4) attachment of the second fragment to the solid support; (5) ligation of additional pre-assembled TALE repeat domains or portions of TALE repeat domains to create a long array, e.g., a piece or pieces of DNA encoding one, two, three, or four TALE repeat domains (or portions of TALE repeat domains) depending upon the length of the desired final array, and (6) release of the extended DNA encoding the TALE repeats from the solid support (e.g., by using a Type IIS restriction enzyme whose site is built in at the 5′ end of the initial biotinylated DNA product). The final fragment can then be prepared for ligation to an appropriate expression plasmid.


The initial nucleic acid encoding one or more TALE repeat domains (or portions) is linked to a solid support. The initial nucleic acid can be prepared by any means (e.g., chemical synthesis, PCR, or cleavage from a plasmid). Additionally, the nucleic acid can be linked to the solid support by any means, e.g., covalently or noncovalently.


In some embodiments, the nucleic acid is linked noncovalently by using a nucleic acid modified with one member of a binding pair and incorporating the other member of the binding pair on the solid support. A member of a binding pair is meant to be one of a first and a second moiety, wherein said first and said second moiety have a specific binding affinity for each other. Suitable binding pairs for use in the invention include, but are not limited to, antigens/antibodies (for example, digoxigenin/anti-digoxigenin, dinitrophenyl (DNP)/anti-DNP, dansyl-X/anti-dansyl, Fluorescein/anti-fluorescein, lucifer yellow/anti-lucifer yellow, peptide/anti-peptide, ligand/receptor and rhodamine/anti-rhodamine), biotin/avidin (or biotin/streptavidin) and calmodulin binding protein (CBP)/calmodulin. Other suitable binding pairs include polypeptides such as the FLAG-peptide (Hopp et al., 1988, BioTechnology, 6:1204 10); the KT3 epitope peptide (Martin et al., Science 255:192 194 (1992)); tubulin epitope peptide (Skinner et al., J. Biol. Chem. 266:15163-66 (1991)); and the T7 gene 10 protein peptide tag (Lutz-Freyerinuth et al., Proc. Natl. Acad. Sci. USA, 87:6393 97 (1990)) and the antibodies each thereto.


In some embodiments, the individual nucleic acids encoding one or more TALE repeat domains are present in an archive or library of plasmids (see FIG. 2). Although nucleic acids encoding one to four TALE repeat domains are shown, the library of plasmids can contain nucleic acids encoding more than four (e.g., five, six, or more) TALE repeat domains. Alternatively, as shown FIG. 22, the nucleic acids encoding parts or portions of one or more TALE repeat domains can also be joined together to create final DNA fragments encoding the desired full-length arrays of TALE repeat domains. Numerous TALE repeat domain sequences with binding specificity for specific nucleotides or sets of nucleotides are known in the art, and one of ordinary skill can design and prepare a library of plasmids based on these known sequences and the disclosures herein.


As used herein, a solid support refers to any solid or semisolid or insoluble support to which the nucleic acid can be linked. Such materials include any materials that are used as supports for chemical and biological molecule syntheses and analyses, such as, but not limited to: polystyrene, polycarbonate, polypropylene, nylon, glass, dextran, chitin, sand, pumice, agarose, polysaccharides, dendrimers, buckyballs, polyacryl-amide, silicon, rubber, and other materials used as supports for solid phase syntheses, affinity separations and purifications, hybridization reactions, immunoassays and other such applications. The solid support can be particulate or can be in the form of a continuous surface, such as a microtiter dish or well, a glass slide, a silicon chip, a nitrocellulose sheet, nylon mesh, or other such materials. When particulate, typically the particles have at least one dimension in the 5-10 mm range or smaller. Such particles, referred collectively herein as “beads,” are often, but not necessarily, spherical. Such reference, however, does not constrain the geometry of the matrix, which can be any shape, including random shapes, needles, fibers, and elongated. Roughly spherical “beads,” particularly microspheres that can be used in the liquid phase, also are contemplated. The “beads” can include additional components, such as magnetic or paramagnetic particles (see, e.g., Dynabeads (Dynal, Oslo, Norway)) for separation using magnets, as long as the additional components do not interfere with the methods described herein.


The ligatable ends can be produced by cutting with a restriction endonuclease (e.g., a type II or type IIS restriction endonuclease) or by “chewing back” the end using an enzyme (or enzymes) with exonuclease and polymerase activities in the presence of one or more nucleotides (see, Aslanidis et al., 1990, Nucl. Acids Res., 18:6069-74). Suitable enzymes are known to those of ordinary skill in the art. When restriction endonucleases are used, the nucleic acids can be designed to include restriction sites for the enzymes at suitable locations.


Following a ligation reaction, any unligated ends with 5′ or 3′ overhangs can be “blunted” by use of a polymerase, e.g., a DNA polymerase with both 3′→5′ exonuclease activity and 5′→3′ polymerase activity. This blunting step can reduce the appearance of undesired or partial assembly products. Alternatively, these ends can be capped using either a “hairpin” oligo bearing a compatible overhang (Briggs et al., 2012, Nucleic Acids Res, PMID: 22740649) or by short double-stranded DNAs bearing a compatible overhang on one end and a blunt end on the other.


To prepare the ligated nucleic acid for further downstream processing, it can be useful to select nucleic acids of the expected size, to reduce the presence of minor products created by incomplete ligations. Methods of selecting nucleic acids by size are known in the art, and include gel electrophoresis (e.g., slab gel electrophoresis or capillary gel electrophoresis (see, e.g., Caruso et al., 2003, Electrophoresis, 24:1-2:78-85)), liquid chromatography (e.g., size exclusion chromatography or reverse phase chromatography (see, e.g., Huber et al., 1995, Anal. Chem., 67:578-585)), and lab-on-a-chip systems (e.g., LabChip® XT system, Caliper Life Sciences, Hopkinton, Mass.). In some embodiments, a size exclusion step can be performed using an automated system, e.g., an automated gel electrophoresis system (e.g., a Pippin Prep™ automated DNA size selection system, Sage Science, Beverly, Mass.).


Automation


The methods disclosed herein can be performed manually or implemented in laboratory automation hardware (e.g., SciClone G3 Liquid Handling Workstation, Caliper Life Sciences, Hopkinton, Mass.) controlled by a compatible software package (e.g., Maestro™ liquid handling software) programmed according to the new methods described herein or a new software package designed and implemented to carry out the specific method steps described herein. When performed by laboratory automation hardware, the methods can be implemented by computer programs using standard programming techniques following the method steps described herein.


Examples of automated laboratory system robots include the Sciclone™ G3 liquid handling workstation (Caliper Life Sciences, Hopkinton, Mass.), Biomek® FX liquid handling system (Beckman-Coulter, Fullerton, Calif.), TekBench™ automated liquid handling platform (TekCel, Hopkinton, Mass.), and Freedom EVO® automation platform (Tecan Trading AG, Switzerland).


The programs can be designed to execute on a programmable computer including at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements, e.g., RAM and ROM), at least one communications port that provides access for devices such as a computer keyboard, telephone, or a wireless, hand-held device, such as a PDA, and optionally at least one output device, such as a monitor, printer, or website. The central computer also includes a clock and a communications port that provides control of the lab automation hardware. These are all implemented using known techniques, software, and devices. The system also includes a database that includes data, e.g., data describing the procedure of one or more method steps described herein.


Program code is applied to data input by a user (e.g., location of samples to be processed, timing and frequency of manipulations, amounts of liquid dispensed or aspirated, transfer of samples from one location in the system to another) and data in the database, to perform the functions described herein. The system can also generate inquiries and provide messages to the user. The output information is applied to instruments, e.g., robots, that manipulate, heat, agitate, etc. the vessels that contain the reactants as described herein. In addition, the system can include one or more output devices such as a telephone, printer, or a monitor, or a web page on a computer monitor with access to a website to provide to the user information regarding the synthesis and/or its progress.


Each program embodying the new methods is preferably implemented in a high level procedural or object-oriented programming language to communicate with a computer system. However, the programs can also be implemented in assembly or machine language if desired. In any case, the language can be a compiled or interpreted language.


Each such computer program is preferably stored on a storage medium or device (e.g., RAM, ROM, optical, magnetic) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. The system can also be considered to be implemented as a computer- or machine-readable storage medium (electronic apparatus readable medium), configured with a program, whereby the storage medium so configured causes a computer or machine to operate in a specific and predefined manner to perform the functions described herein.


The new methods can be implemented using various means of data storage. The files can be transferred physically on recordable media or electronically, e.g., by email on a dedicated intranet, or on the Internet. The files can be encrypted using standard encryption software from such companies as RSA Security (Bedford, Mass.) and Baltimore®. The files can be stored in various formats, e.g., spreadsheets or databases.


As used herein, the term “electronic apparatus” is intended to include any suitable computing or processing apparatus or other device configured or adapted for storing data or information. Examples of electronic apparatus suitable for use with the present invention include stand-alone computing apparatus; communications networks, including local area networks (LAN), wide area networks (WAN), Internet, Intranet, and Extranet; electronic appliances such as a personal digital assistants (PDAs), cellular telephones, “smartphones,” pagers and the like; and local and distributed processing systems.


As used herein, “stored” refers to a process for encoding information on an electronic apparatus readable medium. Those skilled in the art can readily adopt any of the presently known methods for recording information on known media to generate manufactures comprising the sequence information.


A variety of software programs and formats can be used to store method data on an electronic apparatus readable medium. For example, the data and machine instructions can be incorporated in the system of the software provided with the automated system, represented in a word processing text file, formatted in commercially-available software such as WordPerfect® and Microsoft® Word®, or represented in the form of an ASCII file, stored in a database application, such as Microsoft Access®, Microsoft SQL Server®, Sybase®, Oracle®, or the like, as well as in other forms. Any number of data processor structuring formats (e.g., text file or database) can be employed to obtain or create a medium having recorded thereon the relevant data and machine instructions to implement the methods described herein.


By providing information in electronic apparatus readable form, the programmable computer can communicate with and control the lab automation hardware to perform the methods described herein. One skilled in the art can input data in electronic apparatus readable form (or a form that is converted to electronic apparatus readable form) to describe the completion of various method steps by the lab automation hardware.


Polypeptide Expression Systems


In order to use the engineered proteins of the present invention, it is typically necessary to express the engineered proteins from a nucleic acid that encodes them. This can be performed in a variety of ways. For example, the nucleic acid encoding the engineered TALE repeat protein is typically cloned into an intermediate vector for transformation into prokaryotic or eukaryotic cells for replication and/or expression. Intermediate vectors are typically prokaryote vectors, e.g., plasmids, or shuttle vectors, or insect vectors, for storage or manipulation of the nucleic acid encoding the engineered TALE protein or production of protein. The nucleic acid encoding the engineered TALE repeat protein is also typically cloned into an expression vector, for administration to a plant cell, animal cell, preferably a mammalian cell or a human cell, fungal cell, bacterial cell, or protozoan cell.


To obtain expression of a cloned gene or nucleic acid, the engineered TALE repeat protein is typically subcloned into an expression vector that contains a promoter to direct transcription. Suitable bacterial and eukaryotic promoters are well known in the art and described, e.g., in Sambrook et al., Molecular Cloning, A Laboratory Manual (3d ed. 2001); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); and Current Protocols in Molecular Biology (Ausubel et al., eds., 2010). Bacterial expression systems for expressing the engineered TALE repeat protein are available in, e.g., E. coli, Bacillus sp., and Salmonella (Palva et al., 1983, Gene 22:229-235). Kits for such expression systems are commercially available. Eukaryotic expression systems for mammalian cells, yeast, and insect cells are well known in the art and are also commercially available.


The promoter used to direct expression of the engineered TALE repeat protein nucleic acid depends on the particular application. For example, a strong constitutive promoter is typically used for expression and purification of the engineered TALE repeat protein. In contrast, when the engineered TALE repeat protein is to be administered in vivo for gene regulation, either a constitutive or an inducible promoter can be used, depending on the particular use of the engineered TALE repeat protein. In addition, a preferred promoter for administration of the engineered TALE repeat protein can be a weak promoter, such as HSV TK or a promoter having similar activity. The promoter typically can also include elements that are responsive to transactivation, e.g., hypoxia response elements, Gal4 response elements, lac repressor response element, and small molecule control systems such as tet-regulated systems and the RU-486 system (see, e.g., Gossen & Bujard, 1992, Proc. Natl. Acad. Sci. USA, 89:5547; Oligino et al., 1998, Gene Ther., 5:491-496; Wang et al., 1997, Gene Ther., 4:432-441; Neering et al., 1996, Blood, 88:1147-55; and Rendahl et al., 1998, Nat. Biotechnol., 16:757-761).


In addition to the promoter, the expression vector typically contains a transcription unit or expression cassette that contains all the additional elements required for the expression of the nucleic acid in host cells, either prokaryotic or eukaryotic. A typical expression cassette thus contains a promoter operably linked, e.g., to the nucleic acid sequence encoding the TALE repeat protein signals required, e.g., for efficient polyadenylation of the transcript, transcriptional termination, ribosome binding sites, or translation termination. Additional elements of the cassette can include, e.g., enhancers, and heterologous spliced intronic signals.


The particular expression vector used to transport the genetic information into the cell is selected with regard to the intended use of the engineered TALE repeat protein, e.g., expression in plants, animals, bacteria, fungus, protozoa, etc. Standard bacterial expression vectors include plasmids such as pBR322 based plasmids, pSKF, pET23D, and commercially available fusion expression systems such as GST and LacZ. A preferred fusion protein is the maltose binding protein, “MBP.” Such fusion proteins can be used for purification of the engineered TALE repeat protein. Epitope tags can also be added to recombinant proteins to provide convenient methods of isolation, for monitoring expression, and for monitoring cellular and subcellular localization, e.g., c-myc or FLAG


Expression vectors containing regulatory elements from eukaryotic viruses are often used in eukaryotic expression vectors, e.g., SV40 vectors, papilloma virus vectors, and vectors derived from Epstein-Barr virus. Other exemplary eukaryotic vectors include pMSG; pAV009/A+, pMTO10/A+, pMAMneo-5, baculovirus pDSVE, and any other vector allowing expression of proteins under the direction of the SV40 early promoter, SV40 late promoter, metallothionein promoter, murine mammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other promoters shown effective for expression in eukaryotic cells.


Some expression systems have markers for selection of stably transfected cell lines such as thymidine kinase, hygromycin B phosphotransferase, and dihydrofolate reductase. High yield expression systems are also suitable, such as using a baculovirus vector in insect cells, with the engineered TALE repeat protein encoding sequence under the direction of the polyhedrin promoter or other strong baculovirus promoters.


The elements that are typically included in expression vectors also include a replicon that functions in E. coli, a gene encoding antibiotic resistance to permit selection of bacteria that harbor recombinant plasmids, and unique restriction sites in nonessential regions of the plasmid to allow insertion of recombinant sequences.


Standard transfection methods are used to produce bacterial, mammalian, yeast or insect cell lines that express large quantities of protein, which are then purified using standard techniques (see, e.g., Colley et al., 1989, J. Biol. Chem., 264:17619-22; Guide to Protein Purification, in Methods in Enzymology, vol. 182 (Deutscher, ed., 1990)). Transformation of eukaryotic and prokaryotic cells are performed according to standard techniques (see, e.g., Morrison, 1977, J. Bacteriol. 132:349-351; Clark-Curtiss & Curtiss, Methods in Enzymology 101:347-362 (Wu et al., eds, 1983).


Any of the well-known procedures for introducing foreign nucleotide sequences into host cells can be used. These include the use of calcium phosphate transfection, polybrene, protoplast fusion, electroporation, liposomes, microinjection, naked DNA, plasmid vectors, viral vectors, both episomal and integrative, and any of the other well-known methods for introducing cloned genomic DNA, cDNA, synthetic DNA or other foreign genetic material into a host cell (see, e.g., Sambrook et al., supra). It is only necessary that the particular genetic engineering procedure used be capable of successfully introducing at least one gene into the host cell capable of expressing the protein of choice.


Characterization of TALE Proteins


Engineered TALE repeat array proteins designed using methods of the present invention can be further characterized to ensure that they have the desired characteristics for their chosen use. For example, TALE repeat array protein can be assayed using a bacterial two-hybrid, bacterial promoter repression, phage-display, or ribosome display system or using an electrophoretic mobility shift assay or “EMSA” (Buratowski & Chodosh, in Current Protocols in Molecular Biology pp. 12.2.1-12.2.7). Equally, any other DNA binding assay known in the art could be used to verify the DNA binding properties of the selected protein.


In one embodiment, a bacterial “two-hybrid” system is used to express and test a TALE repeat protein of the present invention. The bacterial two-hybrid system has an additional advantage, in that the protein expression and the DNA binding “assay” occur within the same cells, thus there is no separate DNA binding assay to set up.


Methods for the use of the bacterial two-hybrid system to express and assay DNA binding proteins are described in Joung et al., 2000, Proc. Natl. Acad. Sci. USA, 97:7382, Wright et al., 2006, Nat. Protoc, 1:1637-52; Maeder et al., 2008, Mol. Cell, 31:294-301; Maeder et al., 2009, Nat. Protoc., 4:1471-1501; and US Patent Application No. 2002/0119498, the contents of which are incorporated herein by reference. Briefly, in a bacterial two-hybrid system, the DNA binding protein is expressed in a bacterial strain bearing the sequence of interest upstream of a weak promoter controlling expression of a reporter gene (e.g., histidine 3 (HIS3), the beta-lactamase antibiotic resistance gene, or the beta-galactosidase (lacZ) gene). Expression of the reporter gene occurs in cells in which the DNA binding protein expressed by the cell binds to the target site sequence. Thus, bacterial cells expressing DNA binding proteins that bind to their target site are identified by detection of an activity related to the reporter gene (e.g., growth on selective media, expression of beta-galactosidase).


In some embodiments, calculations of binding affinity and specificity are also made. This can be done by a variety of methods. The affinity with which the selected TALE repeat array protein binds to the sequence of interest can be measured and quantified in terms of its KD. Any assay system can be used, as long as it gives an accurate measurement of the actual KD of the TALE repeat array protein. In one embodiment, the KD for the binding of a TALE repeat array protein to its target is measured using an EMSA


In one embodiment, EMSA is used to determine the KD for binding of the selected TALE repeat array protein both to the sequence of interest (i.e., the specific KD) and to non-specific DNA (i.e., the non-specific KD). Any suitable non-specific or “competitor” double stranded DNA known in the art can be used. In some embodiments, calf thymus DNA or human placental DNA is used. The ratio of the non-specific KD to the specific KD is the specificity ratio. TALE repeat array proteins that bind with high specificity have a high specificity ratio. This measurement is very useful in deciding which of a group of selected TALE should be used for a given purpose. For example, use of TALE repeat array protein in vivo requires not only high affinity binding but also high-specificity binding.


Construction of Chimeric TALE Proteins


Often, the aim of producing a custom-designed TALE repeat array DNA binding domain is to obtain a TALE repeat array protein that can be used to perform a function. The TALE repeat array DNA binding domain can be used alone, for example to bind to a specific site on a gene and thus block binding of other DNA-binding domains. However, in some embodiments, the TALE repeat array protein will be used in the construction of a chimeric TALE protein containing a TALE repeat array DNA binding domain and an additional domain having some desired specific function (e.g., gene activation) or enzymatic activity i.e., a “functional domain.”


Chimeric TALE repeat array proteins designed and produced using the methods described herein can be used to perform any function where it is desired to target, for example, some specific enzymatic activity to a specific DNA sequence, as well as any of the functions already described for other types of synthetic or engineered DNA binding molecules. Engineered TALE repeat array DNA binding domains, can be used in the construction of chimeric proteins useful for the treatment of disease (see, for example, U.S. patent application 2002/0160940, and U.S. Pat. Nos. 6,511,808, 6,013,453 and 6,007,988, and International patent application WO 02/057308), or for otherwise altering the structure or function of a given gene in vivo. The engineered TALE repeat array proteins of the present invention are also useful as research tools, for example, in performing either in vivo or in vitro functional genomics studies (see, for example, U.S. Pat. No. 6,503,717 and U.S. patent application 2002/0164575).


To generate a functional recombinant protein, the engineered TALE repeat array DNA binding domain will typically be fused to at least one “functional” domain. Fusing functional domains to synthetic TALE repeat array proteins to form functional transcription factors involves only routine molecular biology techniques which are commonly practiced by those of skill in the art, see for example, U.S. Pat. Nos. 6,511,808, 6,013,453, 6,007,988, 6,503,717 and U.S. patent application 2002/0160940).


Functional domains can be associated with the engineered TALE repeat array domain at any suitable position, including the C- or N-terminus of the TALE protein. Suitable “functional” domains for addition to the engineered protein made using the methods of the invention are described in U.S. Pat. Nos. 6,511,808, 6,013,453, 6,007,988, and 6,503,717 and U.S. patent application 2002/0160940.


In one embodiment, the functional domain is a nuclear localization domain which provides for the protein to be translocated to the nucleus. Several nuclear localization sequences (NLS) are known, and any suitable NLS can be used. For example, many NLSs have a plurality of basic amino acids, referred to as a bipartite basic repeats (reviewed in Garcia-Bustos et al, 1991, Biochim. Biophys. Acta, 1071:83-101). An NLS containing bipartite basic repeats can be placed in any portion of chimeric protein and results in the chimeric protein being localized inside the nucleus. It is preferred that a nuclear localization domain is routinely incorporated into the final chimeric protein, as the ultimate functions of the chimeric proteins of the present invention will typically require the proteins to be localized in the nucleus. However, it may not be necessary to add a separate nuclear localization domain in cases where the engineered TALE repeat array domain itself, or another functional domain within the final chimeric protein, has intrinsic nuclear translocation function.


In another embodiment, the functional domain is a transcriptional activation domain such that the chimeric protein can be used to activate transcription of the gene of interest. Any transcriptional activation domain known in the art can be used, such as for example, the VP16 domain form herpes simplex virus (Sadowski et al., 1988, Nature, 335:563-564) or the p65 domain from the cellular transcription factor NF-kappaB (Ruben et al., 1991, Science, 251:1490-93).


In yet another embodiment, the functional domain is a transcriptional repression domain such that the chimeric protein can be used to repress transcription of the gene of interest. Any transcriptional repression domain known in the art can be used, such as for example, the KRAB (Kruppel-associated box) domain found in many naturally occurring KRAB proteins (Thiesen et al., 1991, Nucleic Acids Res., 19:3996).


In a further embodiment, the functional domain is a DNA modification domain such as a methyltransferase (or methylase) domain, a de-methylation domain, a deaminase domain, a hydroxylase domain, an acetylation domain, or a deacetylation domain. Many such domains are known in the art and any such domain can be used, depending on the desired function of the resultant chimeric protein. For example, it has been shown that a DNA methylation domain can be fused to a TALE repeat array DNA binding protein and used for targeted methylation of a specific DNA sequence (Xu et al., 1997, Nat. Genet., 17:376-378). The state of methylation of a gene affects its expression and regulation, and furthermore, there are several diseases associated with defects in DNA methylation.


In a still further embodiment the functional domain is a chromatin modification domain such as a histone acetylase or histone de-acetylase (or HDAC) domain. Many such domains are known in the art and any such domain can be used, depending on the desired function of the resultant chimeric protein. Histone deacetylases (such as HDAC1 and HDAC2) are involved in gene repression. Therefore, by targeting HDAC activity to a specific gene of interest using an engineered TALE protein, the expression of the gene of interest can be repressed.


In an alternative embodiment, the functional domain is a nuclease domain, such as a restriction endonuclease (or restriction enzyme) domain. The DNA cleavage activity of a nuclease enzyme can be targeted to a specific target sequence by fusing it to an appropriate engineered TALE repeat array DNA binding domain. In this way, sequence specific chimeric restriction enzyme can be produced. Several nuclease domains are known in the art and any suitable nuclease domain can be used. For example, an endonuclease domain of a type IIS restriction endonuclease (e.g., FokI) can be used, as taught by Kim et al., 1996, Proc. Natl. Acad. Sci. USA, 6:1156-60). In some embodiments, the endonuclease is an engineered FokI variant as described in US 2008/0131962. Such chimeric endonucleases can be used in any situation where cleavage of a specific DNA sequence is desired, such as in laboratory procedures for the construction of recombinant DNA molecules, or in producing double-stranded DNA breaks in genomic DNA in order to promote homologous recombination (Kim et al., 1996, Proc. Natl. Acad. Sci. USA, 6:1156-60; Bibikova et al., 2001, Mol. Cell. Biol., 21:289-297; Porteus & Baltimore, 2003, Science, 300:763; Miller et al., 2011, Nat. Biotechnol., 29:143-148; Cermak et al., 2011, Nucl. Acids Res., 39:e82). Repair of TALE nuclease-induced double-strand breaks (DSB) by error-prone non-homologous end-joining leads to efficient introduction of insertion or deletion mutations at the site of the DSB (Miller et al., 2011, Nat. Biotechnol., 29:143-148; Cermak et al., 2011, Nucl. Acids Res., 39:e82). Alternatively, repair of a DSB by homology-directed repair with an exogenously introduced “donor template” can lead to highly efficient introduction of precise base alterations or insertions at the break site (Bibikova et al., 2003, Science, 300:764; Urnov et al., 2005, Nature, 435:646-651; Porteus et al., 2003, Science, 300:763; Miller et al., 2011, Nat. Biotechnol., 29:143-148).


In some embodiments, the functional domain is an integrase domain, such that the chimeric protein can be used to insert exogenous DNA at a specific location in, for example, the human genome.


Other suitable functional domains include silencer domains, nuclear hormone receptors, resolvase domains oncogene transcription factors (e.g., myc, jun, fos, myb, max, mad, rel, ets, bcl, myb, mos family members etc.), kinases, phosphatases, and any other proteins that modify the structure of DNA and/or the expression of genes. Suitable kinase domains, from kinases involved in transcription regulation are reviewed in Davis, 1995, Mol. Reprod. Dev., 42:459-67. Suitable phosphatase domains are reviewed in, for example, Schonthal & Semin, 1995, Cancer Biol. 6:239-48.


Fusions of TALE repeat arrays to functional domains can be performed by standard recombinant DNA techniques well known to those skilled in the art, and as are described in, for example, basic laboratory texts such as Sambrook et al., Molecular Cloning; A Laboratory Manual 3d ed. (2001), and in U.S. Pat. Nos. 6,511,808, 6,013,453, 6,007,988, and 6,503,717 and U.S. patent application 2002/0160940.


In some embodiments, two or more engineered TALE repeat array proteins are linked together to produce the final DNA binding domain. The linkage of two or more engineered proteins can be performed by covalent or non-covalent means. In the case of covalent linkage, engineered proteins can be covalently linked together using an amino acid linker (see, for example, U.S. patent application 2002/0160940, and International applications WO 02/099084 and WO 01/53480). This linker can be any string of amino acids desired. In one embodiment the linker is a canonical TGEKP linker. Whatever linkers are used, standard recombinant DNA techniques (such as described in, for example, Sambrook et al., Molecular Cloning; A Laboratory Manual 3d ed. (2001)) can be used to produce such linked proteins.


In embodiments where the engineered proteins are used in the generation of chimeric endonuclease, the chimeric protein can possess a dimerization domain as such endonucleases are believed to function as dimers. Any suitable dimerization domain can be used. In one embodiment the endonuclease domain itself possesses dimerization activity. For example, the nuclease domain of FokI which has intrinsic dimerization activity can be used (Kim et al., 1996, Proc. Natl. Acad. Sci., 93:1156-60).


Assays for Determining Regulation of Gene Expression by Engineered Proteins


A variety of assays can be used to determine the level of gene expression regulation by the engineered TALE repeat proteins, see for example U.S. Pat. No. 6,453,242. The activity of a particular engineered TALE repeat protein can be assessed using a variety of in vitro and in vivo assays, by measuring, e.g., protein or mRNA levels, product levels, enzyme activity, tumor growth; transcriptional activation or repression of a reporter gene; second messenger levels (e.g., cGMP, cAMP, IP3, DAG; Ca2+); cytokine and hormone production levels; and neovascularization, using, e.g., immunoassays (e.g., ELISA and immunohistochemical assays with antibodies), hybridization assays (e.g., RNase protection, northerns, in situ hybridization, oligonucleotide array studies), colorimetric assays, amplification assays, enzyme activity assays, tumor growth assays, phenotypic assays, and the like.


TALE proteins can be first tested for activity in vitro using cultured cells, e.g., 293 cells, CHO cells, VERO cells, BHK cells, HeLa cells, COS cells, and the like. In some embodiments, human cells are used. The engineered TALE repeat array protein is often first tested using a transient expression system with a reporter gene, and then regulation of the target endogenous gene is tested in cells and in animals, both in vivo and ex vivo. The engineered TALE repeat array protein can be recombinantly expressed in a cell, recombinantly expressed in cells transplanted into an animal, or recombinantly expressed in a transgenic animal, as well as administered as a protein to an animal or cell using delivery vehicles described below. The cells can be immobilized, be in solution, be injected into an animal, or be naturally occurring in a transgenic or non-transgenic animal.


Modulation of gene expression is tested using one of the in vitro or in vivo assays described herein. Samples or assays are treated with the engineered TALE repeat array protein and compared to un-treated control samples, to examine the extent of modulation. For regulation of endogenous gene expression, the TALE repeat array protein ideally has a KD of 200 nM or less, more preferably 100 nM or less, more preferably 50 nM, most preferably 25 nM or less. The effects of the engineered TALE repeat array protein can be measured by examining any of the parameters described above. Any suitable gene expression, phenotypic, or physiological change can be used to assess the influence of the engineered TALE repeat array protein. When the functional consequences are determined using intact cells or animals, one can also measure a variety of effects such as tumor growth, neovascularization, hormone release, transcriptional changes to both known and uncharacterized genetic markers (e.g., northern blots or oligonucleotide array studies), changes in cell metabolism such as cell growth or pH changes, and changes in intracellular second messengers such as cGMP.


Preferred assays for regulation of endogenous gene expression can be performed in vitro. In one in vitro assay format, the engineered TALE repeat array protein regulation of endogenous gene expression in cultured cells is measured by examining protein production using an ELISA assay. The test sample is compared to control cells treated with an empty vector or an unrelated TALE repeat array protein that is targeted to another gene.


In another embodiment, regulation of endogenous gene expression is determined in vitro by measuring the level of target gene mRNA expression. The level of gene expression is measured using amplification, e.g., using RT-PCR, LCR, or hybridization assays, e.g., northern hybridization, RNase protection, dot blotting. RNase protection is used in one embodiment. The level of protein or mRNA is detected using directly or indirectly labeled detection agents, e.g., fluorescently or radioactively labeled nucleic acids, radioactively or enzymatically labeled antibodies, and the like, as described herein.


Alternatively, a reporter gene system can be devised using the target gene promoter operably linked to a reporter gene such as luciferase, green fluorescent protein, CAT, or beta-galactosidase. The reporter construct is typically co-transfected into a cultured cell. After treatment with the TALE repeat array protein, the amount of reporter gene transcription, translation, or activity is measured according to standard techniques known to those of skill in the art.


Another example of an assay format useful for monitoring regulation of endogenous gene expression is performed in vivo. This assay is particularly useful for examining TALE repeat array proteins that inhibit expression of tumor promoting genes, genes involved in tumor support, such as neovascularization (e.g., VEGF), or that activate tumor suppressor genes such as p53. In this assay, cultured tumor cells expressing the engineered TALE protein are injected subcutaneously into an immune compromised mouse such as an athymic mouse, an irradiated mouse, or a SCID mouse. After a suitable length of time, preferably 4-8 weeks, tumor growth is measured, e.g., by volume or by its two largest dimensions, and compared to the control. Tumors that have statistically significant reduction (using, e.g., Student's T test) are said to have inhibited growth. Alternatively, the extent of tumor neovascularization can also be measured. Immunoassays using endothelial cell specific antibodies are used to stain for vascularization of the tumor and the number of vessels in the tumor. Tumors that have a statistically significant reduction in the number of vessels (using, e.g., Student's T test) are said to have inhibited neovascularization.


Transgenic and non-transgenic animals can also be used for examining regulation of endogenous gene expression in vivo. Transgenic animals can express the engineered TALE repeat array protein. Alternatively, animals that transiently express the engineered TALE repeat array protein, or to which the engineered TALE repeat array protein has been administered in a delivery vehicle, can be used. Regulation of endogenous gene expression is tested using any one of the assays described herein.


Use of Engineered TALE Repeat-Containing Proteins in Gene Therapy


The engineered proteins of the present invention can be used to regulate gene expression or alter gene sequence in gene therapy applications in the same. Similar methods have been described for synthetic zinc finger proteins, see for example U.S. Pat. Nos. 6,511,808, 6,013,453, 6,007,988, 6,503,717, U.S. patent application 2002/0164575, and U.S. patent application 2002/0160940.


Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids encoding the engineered TALE repeat array protein into mammalian cells or target tissues. Such methods can be used to administer nucleic acids encoding engineered TALE repeat array proteins to cells in vitro. Preferably, the nucleic acids encoding the engineered TALE repeat array proteins are administered for in vivo or ex vivo gene therapy uses. Non-viral vector delivery systems include DNA plasmids, naked nucleic acid, and nucleic acid complexed with a delivery vehicle such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell. For a review of gene therapy procedures, see Anderson, 1992, Science, 256:808-813; Nabel & Felgner, 1993, TIBTECH, 11:211-217; Mitani & Caskey, 1993, TIBTECH, 11:162-166; Dillon, 1993, TIBTECH, 11:167-175; Miller, 1992, Nature, 357:455-460; Van Brunt, 1988, Biotechnology, 6:1149-54; Vigne, 1995, Restorat. Neurol. Neurosci., 8:35-36; Kremer & Perricaudet, 1995, Br. Med. Bull., 51:31-44; Haddada et al., in Current Topics in Microbiology and Immunology Doerfler and Bohm (eds) (1995); and Yu et al., 1994, Gene Ther., 1:13-26.


Methods of non-viral delivery of nucleic acids encoding the engineered TALE repeat array proteins include lipofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA or RNA, artificial virions, and agent-enhanced uptake of DNA or RNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™ and Lipofectin™). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Felgner, WO 91/17424, WO 91/16024. Delivery can be to cells (ex vivo administration) or target tissues (in vivo administration).


The preparation of lipid:nucleic acid complexes, including targeted liposomes such as immunolipid complexes, is well known to one of skill in the art (see, e.g., Crystal, 1995, Science, 270:404-410; Blaese et al., 1995, Cancer Gene Ther., 2:291-297; Behr et al., 1994, Bioconjugate Chem. 5:382-389; Remy et al., 1994, Bioconjugate Chem., 5:647-654; Gao et al., Gene Ther., 2:710-722; Ahmad et al., 1992, Cancer Res., 52:4817-20; U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787).


The use of RNA or DNA viral based systems for the delivery of nucleic acids encoding the engineered TALE repeat array proteins takes advantage of highly evolved processes for targeting a virus to specific cells in the body and trafficking the viral payload to the nucleus. Viral vectors can be administered directly to patients (in vivo) or they can be used to treat cells in vitro and the modified cells are administered to patients (ex vivo). Conventional viral based systems for the delivery of TALE repeat array proteins could include retroviral, lentivirus, adenoviral, adeno-associated, Sendai, and herpes simplex virus vectors for gene transfer. Viral vectors are currently the most efficient and versatile method of gene transfer in target cells and tissues. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.


The tropism of a retrovirus can be altered by incorporating foreign envelope proteins, expanding the potential target population of target cells. Lentiviral vectors are retroviral vectors that are able to transduce or infect non-dividing cells and typically produce high viral titers. Selection of a retroviral gene transfer system would therefore depend on the target tissue. Retroviral vectors are comprised of cis-acting long terminal repeats with packaging capacity for up to 6-10 kb of foreign sequence. The minimum cis-acting LTRs are sufficient for replication and packaging of the vectors, which are then used to integrate the therapeutic gene into the target cell to provide permanent transgene expression. Widely used retroviral vectors include those based upon murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), Simian Immuno deficiency virus (SIV), human immuno deficiency virus (HIV), and combinations thereof (see, e.g., Buchscher et al., 1992, J. Virol., 66:2731-39; Johann et al., 1992, J. Virol., 66:1635-40; Sommerfelt et al., 1990, Virololgy, 176:58-59; Wilson et al., 1989, J. Virol., 63:2374-78; Miller et al., 1991, J. Virol., 65:2220-24; WO 94/26877).


In applications where transient expression of the engineered TALE repeat array protein is preferred, adenoviral based systems can be used. Adenoviral based vectors are capable of very high transduction efficiency in many cell types and do not require cell division. With such vectors, high titer and levels of expression have been obtained. This vector can be produced in large quantities in a relatively simple system. Adeno-associated virus (“AAV”) vectors are also used to transduce cells with target nucleic acids, e.g., in the in vitro production of nucleic acids and peptides, and for in vivo and ex vivo gene therapy procedures (see, e.g., West et al., 1987, Virology 160:38-47; U.S. Pat. No. 4,797,368; WO 93/24641; Kotin, 1994, Hum. Gene Ther., 5:793-801; Muzyczka, 1994, J. Clin. Invest., 94:1351). Construction of recombinant AAV vectors are described in a number of publications, including U.S. Pat. No. 5,173,414; Tratschin et al., 1985, Mol. Cell. Biol. 5:3251-60; Tratschin et al., 1984, Mol. Cell. Biol., 4:2072-81; Hermonat & Muzyczka, 1984, Proc. Natl. Acad. Sci. USA, 81:6466-70; and Samulski et al., 1989, J. Virol., 63:3822-28.


In particular, at least six viral vector approaches are currently available for gene transfer in clinical trials, with retroviral vectors by far the most frequently used system. All of these viral vectors utilize approaches that involve complementation of defective vectors by genes inserted into helper cell lines to generate the transducing agent.


pLASN and MFG-S are examples are retroviral vectors that have been used in clinical trials (Dunbar et al., 1995, Blood, 85:3048; Kohn et al., 1995, Nat. Med., 1:1017; Malech et al., 1997, Proc. Natl. Acad. Sci. USA, 94:12133-38). PA317/pLASN was the first therapeutic vector used in a gene therapy trial. (Blaese et al., 1995, Science, 270:475-480). Transduction efficiencies of 50% or greater have been observed for MFG-S packaged vectors (Ellem et al., 1997, Immunol Immunother., 44:10-20; Dranoff et al., 1997, Hum. Gene Ther., 1:111-112).


Recombinant adeno-associated virus vectors (rAAV) are a promising alternative gene delivery systems based on the defective and nonpathogenic parvovirus adeno-associated type 2 virus. Typically, the vectors are derived from a plasmid that retains only the AAV 145 bp inverted terminal repeats flanking the transgene expression cassette. Efficient gene transfer and stable transgene delivery due to integration into the genomes of the transduced cell are key features for this vector system (Wagner et al., 1998, Lancet, 351:1702-1703; Kearns et al., 1996, Gene Ther., 9:748-55).


Replication-deficient recombinant adenoviral vectors (Ad) are predominantly used for colon cancer gene therapy, because they can be produced at high titer and they readily infect a number of different cell types. Most adenovirus vectors are engineered such that a transgene replaces the Ad E1a, E1b, and E3 genes; subsequently the replication defector vector is propagated in human 293 cells that supply deleted gene function in trans. Ad vectors can transduce multiple types of tissues in vivo, including nondividing, differentiated cells such as those found in the liver, kidney and muscle system tissues. Conventional Ad vectors have a large carrying capacity. An example of the use of an Ad vector in a clinical trial involved polynucleotide therapy for antitumor immunization with intramuscular injection (Sterman et al., 1998, Hum. Gene Ther. 7:1083-89). Additional examples of the use of adenovirus vectors for gene transfer in clinical trials include Rosenecker et al., 1996, Infection, 24:15-10; Sterman et al., 1998, Hum. Gene Ther., 9:7 1083-89; Welsh et al., 1995, Hum. Gene Ther., 2:205-218; Alvarez et al., 1997, Hum. Gene Ther. 5:597-613; Topf et al., 1998, Gene Ther., 5:507-513; Sterman et al., 1998, Hum. Gene Ther., 7:1083-89.


Packaging cells are used to form virus particles that are capable of infecting a host cell. Such cells include 293 cells, which package adenovirus, and W2 cells or PA317 cells, which package retrovirus. Viral vectors used in gene therapy are usually generated by producer cell line that packages a nucleic acid vector into a viral particle. The vectors typically contain the minimal viral sequences required for packaging and subsequent integration into a host, other viral sequences being replaced by an expression cassette for the protein to be expressed. The missing viral functions are supplied in trans by the packaging cell line. For example, AAV vectors used in gene therapy typically only possess ITR sequences from the AAV genome which are required for packaging and integration into the host genome. Viral DNA is packaged in a cell line, which contains a helper plasmid encoding the other AAV genes, namely rep and cap, but lacking ITR sequences. The cell line is also infected with adenovirus as a helper. The helper virus promotes replication of the AAV vector and expression of AAV genes from the helper plasmid. The helper plasmid is not packaged in significant amounts due to a lack of ITR sequences. Contamination with adenovirus can be reduced by, e.g., heat treatment to which adenovirus is more sensitive than AAV


In many gene therapy applications, it is desirable that the gene therapy vector be delivered with a high degree of specificity to a particular tissue type. A viral vector is typically modified to have specificity for a given cell type by expressing a ligand as a fusion protein with a viral coat protein on the viruses outer surface. The ligand is chosen to have affinity for a receptor known to be present on the cell type of interest. For example, Han et al., 1995, Proc. Natl. Acad. Sci. USA, 92:9747-51, reported that Moloney murine leukemia virus can be modified to express human heregulin fused to gp70, and the recombinant virus infects certain human breast cancer cells expressing human epidermal growth factor receptor. This principle can be extended to other pairs of virus expressing a ligand fusion protein and target cell expressing a receptor. For example, filamentous phage can be engineered to display antibody fragments (e.g., Fab or Fv) having specific binding affinity for virtually any chosen cellular receptor. Although the above description applies primarily to viral vectors, the same principles can be applied to nonviral vectors. Such vectors can be engineered to contain specific uptake sequences thought to favor uptake by specific target cells.


Gene therapy vectors can be delivered in vivo by administration to an individual patient, typically by systemic administration (e.g., intravenous, intraperitoneal, intramuscular, subdermal, or intracranial infusion) or topical application, as described below. Alternatively, vectors can be delivered to cells ex vivo, such as cells explanted from an individual patient (e.g., lymphocytes, bone marrow aspirates, tissue biopsy) or stem cells (e.g., universal donor hematopoietic stem cells, embryonic stem cells (ES), partially differentiated stem cells, non-pluripotent stem cells, pluripotent stem cells, induced pluripotent stem cells (iPS cells) (see e.g., Sipione et al., Diabetologia, 47:499-508, 2004)), followed by reimplantation of the cells into a patient, usually after selection for cells which have incorporated the vector.


Ex vivo cell transfection for diagnostics, research, or for gene therapy (e.g., via re-infusion of the transfected cells into the host organism) is well known to those of skill in the art. In a preferred embodiment, cells are isolated from the subject organism, transfected with nucleic acid (gene or cDNA), encoding the engineered TALE repeat array protein, and re-infused back into the subject organism (e.g., patient). Various cell types suitable for ex vivo transfection are well known to those of skill in the art (see, e.g., Freshney et al., Culture of Animal Cells, A Manual of Basic Technique (5th ed. 2005)) and the references cited therein for a discussion of how to isolate and culture cells from patients).


In one embodiment, stem cells (e.g., universal donor hematopoietic stem cells, embryonic stem cells (ES), partially differentiated stem cells, non-pluripotent stem cells, pluripotent stem cells, induced pluripotent stem cells (iPS cells) (see e.g., Sipione et al., Diabetologia, 47:499-508, 2004)) are used in ex vivo procedures for cell transfection and gene therapy. The advantage to using stem cells is that they can be differentiated into other cell types in vitro, or can be introduced into a mammal (such as the donor of the cells) where they will engraft in the bone marrow. Methods for differentiating CD34+ cells in vitro into clinically important immune cell types using cytokines such a GM-CSF, IFN-gamma and TNF-alpha are known (see Inaba et al., 1992, J. Exp. Med., 176:1693-1702).


Stem cells can be isolated for transduction and differentiation using known methods. For example, stem cells can be isolated from bone marrow cells by panning the bone marrow cells with antibodies which bind unwanted cells, such as CD4+ and CD8+(T cells), CD45+ (panB cells), GR-1 (granulocytes), and lad (differentiated antigen presenting cells) (see Inaba et al., 1992, J. Exp. Med., 176:1693-1702).


Vectors (e.g., retroviruses, adenoviruses, liposomes, etc.) containing nucleic acids encoding the engineered TALE repeat array protein can be also administered directly to the organism for transduction of cells in vivo. Alternatively, naked DNA can be administered. Administration is by any of the routes normally used for introducing a molecule into ultimate contact with blood or tissue cells. Suitable methods of administering such nucleic acids are available and well known to those of skill in the art, and, although more than one route can be used to administer a particular composition, a particular route can often provide a more immediate and more effective reaction than another route. Alternatively, stable formulations of the engineered TALE repeat array protein can also be administered.


Pharmaceutically acceptable carriers are determined in part by the particular composition being administered, as well as by the particular method used to administer the composition. Accordingly, there is a wide variety of suitable formulations of pharmaceutical compositions available, as described below (see, e.g., Remington: The Science and Practice of Pharmacy, 21st ed., 2005).


Delivery Vehicles


An important factor in the administration of polypeptide compounds, such as the engineered TALE repeat array proteins of the present invention, is ensuring that the polypeptide has the ability to traverse the plasma membrane of a cell, or the membrane of an intra-cellular compartment such as the nucleus. Cellular membranes are composed of lipid-protein bilayers that are freely permeable to small, nonionic lipophilic compounds and are inherently impermeable to polar compounds, macromolecules, and therapeutic or diagnostic agents. However, proteins and other compounds such as liposomes have been described, which have the ability to translocate polypeptides such as engineered TALE repeat array protein across a cell membrane.


For example, “membrane translocation polypeptides” have amphiphilic or hydrophobic amino acid subsequences that have the ability to act as membrane-translocating carriers. In one embodiment, homeodomain proteins have the ability to translocate across cell membranes. The shortest internalizable peptide of a homeodomain protein, Antennapedia, was found to be the third helix of the protein, from amino acid position 43 to 58 (see, e.g., Prochiantz, 1996, Curr. Opin. Neurobiol., 6:629-634). Another subsequence, the h (hydrophobic) domain of signal peptides, was found to have similar cell membrane translocation characteristics (see, e.g., Lin et al., 1995, J. Biol. Chem., 270:14255-58).


Examples of peptide sequences that can be linked to a protein, for facilitating uptake of the protein into cells, include, but are not limited to: peptide fragments of the tat protein of HIV (Endoh et al., 2010, Methods Mol. Biol., 623:271-281; Schmidt et al., 2010, FEBS Lett., 584:1806-13; Futaki, 2006, Biopolymers, 84:241-249); a 20 residue peptide sequence which corresponds to amino acids 84-103 of the p16 protein (see Fahraeus et al., 1996, Curr. Biol., 6:84); the third helix of the 60-amino acid long homeodomain of Antennapedia (Derossi et al., 1994, J. Biol. Chem., 269:10444); the h region of a signal peptide, such as the Kaposi fibroblast growth factor (K-FGF) h region (Lin et al., supra); or the VP22 translocation domain from HSV (Elliot & O'Hare, 1997, Cell, 88:223-233). See also, e.g., Caron et al., 2001, Mol Ther., 3:310-318; Langel, Cell-Penetrating Peptides: Processes and Applications (CRC Press, Boca Raton Fla. 2002); El-Andaloussi et al., 2005, Curr. Pharm. Des., 11:3597-3611; and Deshayes et al., 2005, Cell. Mol. Life Sci., 62:1839-49. Other suitable chemical moieties that provide enhanced cellular uptake can also be chemically linked to TALE repeat array proteins described herein.


Toxin molecules also have the ability to transport polypeptides across cell membranes. Often, such molecules are composed of at least two parts (called “binary toxins”): a translocation or binding domain or polypeptide and a separate toxin domain or polypeptide. Typically, the translocation domain or polypeptide binds to a cellular receptor, and then the toxin is transported into the cell. Several bacterial toxins, including Clostridium perfringens iota toxin, diphtheria toxin (DT), Pseudomonas exotoxin A (PE), pertussis toxin (PT), Bacillus anthracis toxin, and pertussis adenylate cyclase (CYA), have been used in attempts to deliver peptides to the cell cytosol as internal or amino-terminal fusions (Arora et al., 1993, J. Biol. Chem., 268:3334-41; Perelle et al., 1993, Infect. Immun., 61:5147-56; Stenmark et al., 1991, J. Cell Biol., 113:1025-32; Donnelly et al., 1993, Proc. Natl. Acad. Sci. USA, 90:3530-34; Carbonetti et al., 1995, Abstr. Annu. Meet. Am. Soc. Microbiol. 95:295; Sebo et al., 1995, Infect. Immun., 63:3851-57; Klimpel et al., 1992, Proc. Natl. Acad. Sci. USA, 89:10277-81; and Novak et al., 1992, J. Biol. Chem., 267:17186-93).


Such subsequences can be used to translocate engineered TALE repeat array proteins across a cell membrane. The engineered TALE repeat array proteins can be conveniently fused to or derivatized with such sequences. Typically, the translocation sequence is provided as part of a fusion protein. Optionally, a linker can be used to link the engineered TALE repeat array protein and the translocation sequence. Any suitable linker can be used, e.g., a peptide linker.


The engineered TALE repeat array protein can also be introduced into an animal cell, preferably a mammalian cell, via liposomes and liposome derivatives such as immunoliposomes. The term “liposome” refers to vesicles comprised of one or more concentrically ordered lipid bilayers, which encapsulate an aqueous phase. The aqueous phase typically contains the compound to be delivered to the cell, i.e., the engineered TALE repeat array protein.


The liposome fuses with the plasma membrane, thereby releasing the compound into the cytosol. Alternatively, the liposome is phagocytosed or taken up by the cell in a transport vesicle. Once in the endosome or phagosome, the liposome either degrades or fuses with the membrane of the transport vesicle and releases its contents.


In current methods of drug delivery via liposomes, the liposome ultimately becomes permeable and releases the encapsulated compound (e.g., the engineered TALE repeat array protein or a nucleic acid encoding the same) at the target tissue or cell. For systemic or tissue specific delivery, this can be accomplished, for example, in a passive manner wherein the liposome bilayer degrades over time through the action of various agents in the body. Alternatively, active compound release involves using an agent to induce a permeability change in the liposome vesicle. Liposome membranes can be constructed so that they become destabilized when the environment becomes acidic near the liposome membrane (see, e.g., Proc. Natl. Acad. Sci. USA, 84:7851 (1987); Biochemistry, 28:908 (1989)). When liposomes are endocytosed by a target cell, for example, they become destabilized and release their contents. This destabilization is termed fusogenesis. Dioleoylphosphatidylethanolamine (DOPE) is the basis of many “fusogenic” systems.


Such liposomes typically comprise the engineered TALE repeat array protein and a lipid component, e.g., a neutral and/or cationic lipid, optionally including a receptor-recognition molecule such as an antibody that binds to a predetermined cell surface receptor or ligand (e.g., an antigen). A variety of methods are available for preparing liposomes as described in, e.g., Szoka et al., 1980, Annu. Rev. Biophys. Bioeng., 9:467, U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, 4,946,787, PCT Publication. No. WO 91/17424, Deamer & Bangham, 1976, Biochim. Biophys. Acta, 443:629-634; Fraley, et al., 1979, Proc. Natl. Acad. Sci. USA, 76:3348-52; Hope et al., 1985, Biochim. Biophys. Acta, 812:55-65; Mayer et al., 1986, Biochim. Biophys. Acta, 858:161-168; Williams et al., 1988, Proc. Natl. Acad. Sci. USA, 85:242-246; Liposomes (Ostro (ed.), 1983, Chapter 1); Hope et al., 1986, Chem. Phys. Lip., 40:89; Gregoriadis, Liposome Technology (1984) and Lasic, Liposomes: from Physics to Applications (1993)). Suitable methods include, for example, sonication, extrusion, high pressure/homogenization, microfluidization, detergent dialysis, calcium-induced fusion of small liposome vesicles and ether-fusion methods, all of which are well known in the art.


In certain embodiments, it is desirable to target liposomes using targeting moieties that are specific to a particular cell type, tissue, and the like. Targeting of liposomes using a variety of targeting moieties (e.g., ligands, receptors, and monoclonal antibodies) has been previously described (see, e.g., U.S. Pat. Nos. 4,957,773 and 4,603,044).


Examples of targeting moieties include monoclonal antibodies specific to antigens associated with neoplasms, such as prostate cancer specific antigen and MAGE. Tumors can also be diagnosed by detecting gene products resulting from the activation or over-expression of oncogenes, such as ras or c-erbB2. In addition, many tumors express antigens normally expressed by fetal tissue, such as the alphafetoprotein (AFP) and carcinoembryonic antigen (CEA). Sites of viral infection can be diagnosed using various viral antigens such as hepatitis B core and surface antigens (HBVc, HBVs) hepatitis C antigens, Epstein-Barr virus antigens, human immunodeficiency type-1 virus (HIV1) and papilloma virus antigens. Inflammation can be detected using molecules specifically recognized by surface molecules which are expressed at sites of inflammation such as integrins (e.g., VCAM-1), selectin receptors (e.g., ELAM-1) and the like.


Standard methods for coupling targeting agents to liposomes can be used. These methods generally involve incorporation into liposomes lipid components, e.g., phosphatidylethanolamine, which can be activated for attachment of targeting agents, or derivatized lipophilic compounds, such as lipid derivatized bleomycin. Antibody targeted liposomes can be constructed using, for instance, liposomes which incorporate protein A (see Renneisen et al., 1990, J. Biol. Chem., 265:16337-42 and Leonetti et al., 1990, Proc. Natl. Acad. Sci. USA, 87:2448-51).


Dosages


For therapeutic applications, the dose of the engineered TALE repeat array protein to be administered to a patient is calculated in a similar way as has been described for zinc finger proteins, see for example U.S. Pat. Nos. 6,511,808, 6,492,117, 6,453,242, U.S. patent application 2002/0164575, and U.S. patent application 2002/0160940. In the context of the present disclosure, the dose should be sufficient to effect a beneficial therapeutic response in the patient over time. In addition, particular dosage regimens can be useful for determining phenotypic changes in an experimental setting, e.g., in functional genomics studies, and in cell or animal models. The dose will be determined by the efficacy, specificity, and KD of the particular engineered TALE repeat array protein employed, the nuclear volume of the target cell, and the condition of the patient, as well as the body weight or surface area of the patient to be treated. The size of the dose also will be determined by the existence, nature, and extent of any adverse side-effects that accompany the administration of a particular compound or vector in a particular patient.


Pharmaceutical Compositions and Administration


Appropriate pharmaceutical compositions for administration of the engineered TALE repeat array proteins of the present invention can be determined as described for zinc finger proteins, see for example U.S. Pat. Nos. 6,511,808, 6,492,117, 6,453,242, U.S. patent application 2002/0164575, and U.S. patent application 2002/0160940. Engineered TALE repeat array proteins, and expression vectors encoding engineered TALE repeat array proteins, can be administered directly to the patient for modulation of gene expression and for therapeutic or prophylactic applications, for example, cancer, ischemia, diabetic retinopathy, macular degeneration, rheumatoid arthritis, psoriasis, HIV infection, sickle cell anemia, Alzheimer's disease, muscular dystrophy, neurodegenerative diseases, vascular disease, cystic fibrosis, stroke, and the like. Examples of microorganisms that can be inhibited by TALE repeat array protein-mediated gene therapy include pathogenic bacteria, e.g., chlamydia, rickettsial bacteria, mycobacteria, staphylococci, streptococci, pneumococci, meningococci and conococci, klebsiella, proteus, serratia, pseudomonas, legionella, diphtheria, salmonella, bacilli, cholera, tetanus, botulism, anthrax, plague, leptospirosis, and Lyme disease bacteria; infectious fungus, e.g., Aspergillus, Candida species; protozoa such as sporozoa (e.g., Plasmodia), rhizopods (e.g., Entamoeba) and flagellates (Trypanosoma, Leishmania, Trichomonas, Giardia, etc.); viral diseases, e.g., hepatitis (A, B, or C), herpes virus (e.g., VZV, HSV-1, HSV-6, HSV-II, CMV, and EBV), HIV, Ebola, adenovirus, influenza virus, flaviviruses, echovirus, rhinovirus, coxsackie virus, comovirus, respiratory syncytial virus, mumps virus, rotavirus, measles virus, rubella virus, parvovirus, vaccinia virus, HTLV virus, dengue virus, papillomavirus, poliovirus, rabies virus, and arboviral encephalitis virus, etc.


Administration of therapeutically effective amounts is by any of the routes normally used for introducing TALE repeat array proteins into ultimate contact with the tissue to be treated. The TALE repeat array proteins are administered in any suitable manner, preferably with pharmaceutically acceptable carriers. Suitable methods of administering such modulators are available and well known to those of skill in the art, and, although more than one route can be used to administer a particular composition, a particular route can often provide a more immediate and more effective reaction than another route.


Pharmaceutically acceptable carriers are determined in part by the particular composition being administered, as well as by the particular method used to administer the composition. Accordingly, there is a wide variety of suitable formulations of pharmaceutical compositions that are available (see, e.g., Remington: The Science and Practice of Pharmacy, 21st ed., 2005).


The engineered TALE repeat array proteins, alone or in combination with other suitable components, can be made into aerosol formulations (i.e., they can be “nebulized”) to be administered via inhalation. Aerosol formulations can be placed into pressurized acceptable propellants, such as dichlorodifluoromethane, propane, nitrogen, and the like.


Formulations suitable for parenteral administration, such as, for example, by intravenous, intramuscular, intradermal, and subcutaneous routes, include aqueous and non-aqueous, isotonic sterile injection solutions, which can contain antioxidants, buffers, bacteriostats, and solutes that render the formulation isotonic with the blood of the intended recipient, and aqueous and non-aqueous sterile suspensions that can include suspending agents, solubilizers, thickening agents, stabilizers, and preservatives. The disclosed compositions can be administered, for example, by intravenous infusion, orally, topically, intraperitoneally, intravesically or intrathecally. The formulations of compounds can be presented in unit-dose or multi-dose sealed containers, such as ampules and vials. Injection solutions and suspensions can be prepared from sterile powders, granules, and tablets of the kind previously described.


Use of TALE Nucleases


TALE nucleases engineered using the methods described herein can be used to induce mutations in a genomic sequence, e.g., by cleaving at two sites and deleting sequences in between, by cleavage at a single site followed by non-homologous end joining, and/or by cleaving at a site so as to remove or replace one or two or a few nucleotides. In some embodiments, the TALE nuclease is used to induce mutation in an animal, plant, fungal, or bacterial genome. Targeted cleavage can also be used to create gene knock-outs (e.g., for functional genomics or target validation) and to facilitate targeted insertion of a sequence into a genome (i.e., gene knock-in); e.g., for purposes of cell engineering or protein overexpression. Insertion can be by means of replacements of chromosomal sequences through homologous recombination or by targeted integration, in which a new sequence (i.e., a sequence not present in the region of interest), flanked by sequences homologous to the region of interest in the chromosome, is used to insert the new sequence at a predetermined target site via homologous recombination. Exogenous DNA can also be inserted into TALE nuclease-induced double stranded breaks without the need for flanking homology sequences (see, Orlando et al., 2010, Nucl. Acids Res., 1-15, doi:10.1093/nar/gkq512).


As demonstrated in Example 3 below, the TALE nucleases produced by the methods described herein were capable of inducing site-specific mutagenesis in mammalian cells. A skilled practitioner will readily appreciate that TALE nucleases produced by the methods described herein would also function to induce efficient site-specific mutagenesis in other cell types and organisms (see, for example, Cade et al., 2012, Nucleic Acids Res., PMID: 22684503 and Moore et al., 2012, PLoS One, PMID: 22655075).


The same methods can also be used to replace a wild-type sequence with a mutant sequence, or to convert one allele to a different allele.


Targeted cleavage of infecting or integrated viral genomes can be used to treat viral infections in a host. Additionally, targeted cleavage of genes encoding receptors for viruses can be used to block expression of such receptors, thereby preventing viral infection and/or viral spread in a host organism. Targeted mutagenesis of genes encoding viral receptors (e.g., the CCR5 and CXCR4 receptors for HIV) can be used to render the receptors unable to bind to virus, thereby preventing new infection and blocking the spread of existing infections. Non-limiting examples of viruses or viral receptors that can be targeted include herpes simplex virus (HSV), such as HSV-1 and HSV-2, varicella zoster virus (VZV), Epstein-Barr virus (EBV) and cytomegalovirus (CMV), HHV6 and HHV7. The hepatitis family of viruses includes hepatitis A virus (HAV), hepatitis B virus (HBV), hepatitis C virus (HCV), the delta hepatitis virus (HDV), hepatitis E virus (HEV) and hepatitis G virus (HGV). Other viruses or their receptors can be targeted, including, but not limited to, Picornaviridae (e.g., polioviruses, etc.); Caliciviridae; Togaviridae (e.g., rubella virus, dengue virus, etc.); Flaviviridae; Coronaviridae; Reoviridae; Bimaviridae; Rhabodoviridae (e.g., rabies virus, etc.); Filoviridae; Paramyxoviridae (e.g., mumps virus, measles virus, respiratory syncytial virus, etc.); Orthomyxoviridae (e.g., influenza virus types A, B and C, etc.); Bunyaviridae; Arenaviridae; Retroviradae; lentiviruses (e.g., HTLV-I; HTLV-II; HIV-1 (also known as HTLV-III, LAV, ARV, hTLR, etc.) HIV-II); simian immunodeficiency virus (SIV), human papillomavirus (HPV), influenza virus and the tick-borne encephalitis viruses. See, e.g., Virology, 3rd Edition (W. K. Joklik, ed. 1988); Fundamental Virology, 4th Edition (Knipe and Howley, eds. 2001), for a description of these and other viruses. Receptors for HIV, for example, include CCR-5 and CXCR-4.


In similar fashion, the genome of an infecting bacterium can be mutagenized by targeted DNA cleavage followed by non-homologous end joining, to block or ameliorate bacterial infections.


The disclosed methods for targeted recombination can be used to replace any genomic sequence with a homologous, non-identical sequence. For example, a mutant genomic sequence can be replaced by its wild-type counterpart, thereby providing methods for treatment of e.g., genetic disease, inherited disorders, cancer, and autoimmune disease. In like fashion, one allele of a gene can be replaced by a different allele using the methods of targeted recombination disclosed herein.


Exemplary genetic diseases include, but are not limited to, achondroplasia, achromatopsia, acid maltase deficiency, adenosine deaminase deficiency (OMIM No. 102700), adrenoleukodystrophy, aicardi syndrome, alpha-1 antitrypsin deficiency, alpha-thalassemia, androgen insensitivity syndrome, apert syndrome, arrhythmogenic right ventricular, dysplasia, ataxia telangictasia, barth syndrome, beta-thalassemia, blue rubber bleb nevus syndrome, canavan disease, chronic granulomatous diseases (CGD), cri du chat syndrome, cystic fibrosis, dercum's disease, ectodermal dysplasia, Fanconi anemia, fibrodysplasia ossificans progressive, fragile X syndrome, galactosemis, Gaucher's disease, generalized gangliosidoses (e.g., GM1), hemochromatosis, the hemoglobin C mutation in the 6th codon of beta-globin (HbC), hemophilia, Huntington's disease, Hurler Syndrome, hypophosphatasia, Klinefelter's syndrome, Krabbes Disease, Langer-Giedion Syndrome, leukocyte adhesion deficiency (LAD, OMIM No. 116920), leukodystrophy, long QT syndrome, Marfan syndrome, Moebius syndrome, mucopolysaccharidosis (MPS), nail patella syndrome, nephrogenic diabetes insipdius, neurofibromatosis, Neimann-Pick disease, osteogenesis imperfecta, porphyria, Prader-Willi syndrome, progeria, Proteus syndrome, retinoblastoma, Rett syndrome, Rubinstein-Taybi syndrome, Sanfilippo syndrome, severe combined immunodeficiency (SCID), Shwachman syndrome, sickle cell disease (sickle cell anemia), Smith-Magenis syndrome, Stickler syndrome, Tay-Sachs disease, Thrombocytopenia Absent Radius (TAR) syndrome, Treacher Collins syndrome, trisomy, tuberous sclerosis, Turner's syndrome, urea cycle disorder, von Hippel-Landau disease, Waardenburg syndrome, Williams syndrome, Wilson's disease, Wiskott-Aldrich syndrome, X-linked lymphoproliferative syndrome (XLP, OMIM No. 308240).


Additional exemplary diseases that can be treated by targeted DNA cleavage and/or homologous recombination include acquired immunodeficiencies, lysosomal storage diseases (e.g., Gaucher's disease, GM1, Fabry disease and Tay-Sachs disease), mucopolysaccahidosis (e.g., Hunter's disease, Hurler's disease), hemoglobinopathies (e.g., sickle cell diseases, HbC, alpha-thalassemia, beta-thalassemia) and hemophilias.


In certain cases, alteration of a genomic sequence in a pluripotent cell (e.g., a hematopoietic stem cell) is desired. Methods for mobilization, enrichment and culture of hematopoietic stem cells are known in the art. See for example, U.S. Pat. Nos. 5,061,620; 5,681,559; 6,335,195; 6,645,489 and 6,667,064. Treated stem cells can be returned to a patient for treatment of various diseases including, but not limited to, SCID and sickle-cell anemia.


In many of these cases, a region of interest comprises a mutation, and the donor polynucleotide comprises the corresponding wild-type sequence. Similarly, a wild-type genomic sequence can be replaced by a mutant sequence, if such is desirable. For example, overexpression of an oncogene can be reversed either by mutating the gene or by replacing its control sequences with sequences that support a lower, non-pathologic level of expression. As another example, the wild-type allele of the ApoAI gene can be replaced by the ApoAI Milano allele, to treat atherosclerosis. Indeed, any pathology dependent upon a particular genomic sequence, in any fashion, can be corrected or alleviated using the methods and compositions disclosed herein.


Targeted cleavage and targeted recombination can also be used to alter non-coding sequences (e.g., sequences encoding microRNAs and long non-coding RNAs, and regulatory sequences such as promoters, enhancers, initiators, terminators, splice sites) to alter the levels of expression of a gene product. Such methods can be used, for example, for therapeutic purposes, functional genomics and/or target validation studies.


The compositions and methods described herein also allow for novel approaches and systems to address immune reactions of a host to allogeneic grafts. In particular, a major problem faced when allogeneic stem cells (or any type of allogeneic cell) are grafted into a host recipient is the high risk of rejection by the host's immune system, primarily mediated through recognition of the Major Histocompatibility Complex (MHC) on the surface of the engrafted cells. The MHC comprises the HLA class I protein(s) that function as heterodimers that are comprised of a common beta subunit and variable alpha subunits. It has been demonstrated that tissue grafts derived from stem cells that are devoid of HLA escape the host's immune response. See, e.g., Coffman et al., 1993, J. Immunol., 151:425-35; Markmann et al., 1992, Transplantation, 54:1085-89; Koller et al., 1990, Science, 248:1227-30. Using the compositions and methods described herein, genes encoding HLA proteins involved in graft rejection can be cleaved, mutagenized or altered by recombination, in either their coding or regulatory sequences, so that their expression is blocked or they express a non-functional product. For example, by inactivating the gene encoding the common beta subunit gene (beta2 microglobulin) using TALE nuclease fusion proteins as described herein, HLA class I can be removed from the cells to rapidly and reliably generate HLA class I null stem cells from any donor, thereby reducing the need for closely matched donor/recipient MHC haplotypes during stem cell grafting.


Inactivation of any gene (e.g., the beta2 microglobulin gene) can be achieved, for example, by a single cleavage event, by cleavage followed by non-homologous end joining, by cleavage at two sites followed by joining so as to delete the sequence between the two cleavage sites, by targeted recombination of a missense or nonsense codon into the coding region, or by targeted recombination of an irrelevant sequence (i.e., a “stuffer” sequence) into the gene or its regulatory region, so as to disrupt the gene or regulatory region.


Targeted modification of chromatin structure, as disclosed in WO 01/83793, can be used to facilitate the binding of fusion proteins to cellular chromatin.


In additional embodiments, one or more fusions between a TALE binding domain and a recombinase (or functional fragment thereof) can be used, in addition to or instead of the TALE-cleavage domain fusions disclosed herein, to facilitate targeted recombination. See, for example, co-owned U.S. Pat. No. 6,534,261 and Akopian et al. (2003) Proc. Natl. Acad. Sci. USA 100:8688-8691.


In additional embodiments, the disclosed methods and compositions are used to provide fusions of TALE repeat DNA-binding domains with transcriptional activation or repression domains that require dimerization (either homodimerization or heterodimerization) for their activity. In these cases, a fusion polypeptide comprises a TALE repeat DNA-binding domain and a functional domain monomer (e.g., a monomer from a dimeric transcriptional activation or repression domain). Binding of two such fusion polypeptides to properly situated target sites allows dimerization so as to reconstitute a functional transcription activation or repression domain.


Regulation of Gene Expression in Plants


Engineered TALE repeat array proteins can be used to engineer plants for traits such as increased disease resistance, modification of structural and storage polysaccharides, flavors, proteins, and fatty acids, fruit ripening, yield, color, nutritional characteristics, improved storage capability, and the like. In particular, the engineering of crop species for enhanced oil production, e.g., the modification of the fatty acids produced in oilseeds, is of interest.


Seed oils are composed primarily of triacylglycerols (TAGs), which are glycerol esters of fatty acids. Commercial production of these vegetable oils is accounted for primarily by six major oil crops (soybean, oil palm, rapeseed, sunflower, cotton seed, and peanut). Vegetable oils are used predominantly (90%) for human consumption as margarine, shortening, salad oils, and frying oil. The remaining 10% is used for non-food applications such as lubricants, oleochemicals, biofuels, detergents, and other industrial applications.


The desired characteristics of the oil used in each of these applications varies widely, particularly in terms of the chain length and number of double bonds present in the fatty acids making up the TAGs. These properties are manipulated by the plant in order to control membrane fluidity and temperature sensitivity. The same properties can be controlled using TALE repeat array proteins to produce oils with improved characteristics for food and industrial uses.


The primary fatty acids in the TAGs of oilseed crops are 16 to 18 carbons in length and contain 0 to 3 double bonds. Palmitic acid (16:0 [16 carbons: 0 double bonds]), oleic acid (18:1), linoleic acid (18:2), and linolenic acid (18:3) predominate. The number of double bonds, or degree of saturation, determines the melting temperature, reactivity, cooking performance, and health attributes of the resulting oil.


The enzyme responsible for the conversion of oleic acid (18:1) into linoleic acid (18:2) (which is then the precursor for 18:3 formation) is delta-12-oleate desaturase, also referred to as omega-6 desaturase. A block at this step in the fatty acid desaturation pathway should result in the accumulation of oleic acid at the expense of polyunsaturates.


In one embodiment engineered TALE repeat array proteins are used to regulate expression of the FAD2-1 gene in soybeans. Two genes encoding microsomal delta-6 desaturases have been cloned recently from soybean, and are referred to as FAD2-1 and FAD2-2 (Heppard et al., 1996, Plant Physiol. 110:311-319). FAD2-1 (delta-12 desaturase) appears to control the bulk of oleic acid desaturation in the soybean seed. Engineered TALE repeat array proteins can thus be used to modulate gene expression of FAD2-1 in plants. Specifically, engineered TALE repeat array proteins can be used to inhibit expression of the FAD2-1 gene in soybean in order to increase the accumulation of oleic acid (18:1) in the oil seed. Moreover, engineered TALE proteins can be used to modulate expression of any other plant gene, such as delta-9 desaturase, delta-12 desaturases from other plants, delta-15 desaturase, acetyl-CoA carboxylase, acyl-ACP-thioesterase, ADP-glucose pyrophosphorylase, starch synthase, cellulose synthase, sucrose synthase, senescence-associated genes, heavy metal chelators, fatty acid hydroperoxide lyase, polygalacturonase, EPSP synthase, plant viral genes, plant fungal pathogen genes, and plant bacterial pathogen genes.


Recombinant DNA vectors suitable for transformation of plant cells are also used to deliver protein (e.g., engineered TALE repeat array protein)-encoding nucleic acids to plant cells. Techniques for transforming a wide variety of higher plant species are well known and described in the technical and scientific literature (see, e.g., Weising et al., 1988, Ann. Rev. Genet., 22:421-477). A DNA sequence coding for the desired TALE repeat array protein is combined with transcriptional and translational initiation regulatory sequences which will direct the transcription of the TALE protein in the intended tissues of the transformed plant.


For example, a plant promoter fragment can be employed which will direct expression of the engineered TALE repeat array protein in all tissues of a regenerated plant. Such promoters are referred to herein as “constitutive” promoters and are active under most environmental conditions and states of development or cell differentiation. Examples of constitutive promoters include the cauliflower mosaic virus (CaMV) 35 S transcription initiation region, the 1′- or 2′-promoter derived from T-DNA of Agrobacterium tumefaciens, and other transcription initiation regions from various plant genes known to those of skill.


Alternatively, the plant promoter can direct expression of the engineered TALE repeat array protein in a specific tissue or can be otherwise under more precise environmental or developmental control. Such promoters are referred to here as “inducible” promoters. Examples of environmental conditions that can affect transcription by inducible promoters include anaerobic conditions or the presence of light.


Examples of promoters under developmental control include promoters that initiate transcription only in certain tissues, such as fruit, seeds, or flowers. For example, the use of a polygalacturonase promoter can direct expression of the TALE repeat array protein in the fruit, a CHS-A (chalcone synthase A from petunia) promoter can direct expression of the TALE repeat array protein in the flower of a plant.


The vector comprising the TALE repeat array protein sequences will typically comprise a marker gene which confers a selectable phenotype on plant cells. For example, the marker can encode biocide resistance, particularly antibiotic resistance, such as resistance to kanamycin, G418, bleomycin, hygromycin, or herbicide resistance, such as resistance to chlorosulfuron or Basta.


Such DNA constructs can be introduced into the genome of the desired plant host by a variety of conventional techniques. For example, the DNA construct can be introduced directly into the genomic DNA of the plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or the DNA constructs can be introduced directly to plant tissue using biolistic methods, such as DNA particle bombardment. Alternatively, the DNA constructs can be combined with suitable T-DNA flanking regions and introduced into a conventional Agrobacterium tumefaciens host vector. The virulence functions of the Agrobacterium tumefaciens host will direct the insertion of the construct and adjacent marker into the plant cell DNA when the cell is infected by the bacteria.


Microinjection techniques are known in the art and well described in the scientific and patent literature. The introduction of DNA constructs using polyethylene glycol precipitation is described in Paszkowski et al., 1984, EMBO J., 3:2717-22. Electroporation techniques are described in Fromm et al. 1985, Proc. Natl. Acad. Sci. USA, 82:5824. Biolistic transformation techniques are described in Klein et al., 1987, Nature, 327:70-73.



Agrobacterium tumefaciens-meditated transformation techniques are well described in the scientific literature (see, e.g., Horsch et al., 1984, Science, 233:496-498; and Fraley et al., 1983, Proc. Natl. Acad. Sci. USA, 80:4803).


Transformed plant cells which are derived by any of the above transformation techniques can be cultured to regenerate a whole plant which possesses the transformed genotype and thus the desired TALE repeat array protein-controlled phenotype. Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a biocide and/or herbicide marker which has been introduced together with the TALE repeat array protein nucleotide sequences. Plant regeneration from cultured protoplasts is described in Evans et al., Protoplasts Isolation and Culture, Handbook of Plant Cell Culture, pp. 124-176 (1983); and Binding, Regeneration of Plants, Plant Protoplasts, pp. 21-73 (1985). Regeneration can also be obtained from plant callus, explants, organs, or parts thereof. Such regeneration techniques are described generally in Klee et al., 1987, Ann. Rev. Plant Phys., 38:467-486.


Functional Genomics Assays


Engineered TALE repeat array proteins also have use for assays to determine the phenotypic consequences and function of gene expression. Recent advances in analytical techniques, coupled with focused mass sequencing efforts have created the opportunity to identify and characterize many more molecular targets than were previously available. This new information about genes and their functions will improve basic biological understanding and present many new targets for therapeutic intervention. In some cases analytical tools have not kept pace with the generation of new data. An example is provided by recent advances in the measurement of global differential gene expression. These methods, typified by gene expression microarrays, differential cDNA cloning frequencies, subtractive hybridization and differential display methods, can very rapidly identify genes that are up or down-regulated in different tissues or in response to specific stimuli. Increasingly, such methods are being used to explore biological processes such as, transformation, tumor progression, the inflammatory response, neurological disorders etc. Many differentially expressed genes correlate with a given physiological phenomenon, but demonstrating a causative relationship between an individual differentially expressed gene and the phenomenon is labor intensive. Until now, simple methods for assigning function to differentially expressed genes have not kept pace with the ability to monitor differential gene expression.


The engineered TALE repeat array proteins described herein can be used to rapidly analyze the function of a differentially expressed gene. Engineered TALE proteins can be readily used to up or down-regulate or knockout any endogenous target gene, or to knock in an endogenous or endogenous gene. Very little sequence information is required to create a gene-specific DNA binding domain. This makes the engineered TALE repeat array technology ideal for analysis of long lists of poorly characterized differentially expressed genes. One can simply build a TALE repeat array protein-based DNA binding domain for each candidate gene, create chimeric up and down-regulating artificial transcription factors and test the consequence of up or down-regulation on the phenotype under study (e.g., transformation or response to a cytokine) by switching the candidate genes on or off one at a time in a model system.


Additionally, greater experimental control can be imparted by engineered TALE repeat array proteins than can be achieved by more conventional methods. This is because the production and/or function of engineered TALE repeat array proteins can be placed under small molecule control. Examples of this approach are provided by the Tet-On system, the ecdysone-regulated system and a system incorporating a chimeric factor including a mutant progesterone receptor. These systems are all capable of indirectly imparting small molecule control on any endogenous gene of interest or any transgene by placing the function and/or expression of a engineered TALE repeat array protein under small molecule control.


Transgenic Animals


A further application of engineered TALE repeat array proteins is manipulating gene expression in animal models. As with cell lines, the introduction of a heterologous gene into or knockout of an endogenous in a transgenic animal, such as a transgenic mouse or zebrafish, is a fairly straightforward process. Thus, transgenic or transient expression of an engineered TALE repeat array protein in an animal can be readily performed.


By transgenically or transiently expressing a suitable engineered TALE repeat array protein fused to an activation domain, a target gene of interest can be over-expressed. Similarly, by transgenically or transiently expressing a suitable engineered TALE repeat array protein fused to a repressor or silencer domain, the expression of a target gene of interest can be down-regulated, or even switched off to create “functional knockout”. Knock-in or knockout mutations by insertion or deletion of a target gene of interest can be prepared using TALE nucleases.


Two common issues often prevent the successful application of the standard transgenic and knockout technology; embryonic lethality and developmental compensation. Embryonic lethality results when the gene plays an essential role in development. Developmental compensation is the substitution of a related gene product for the gene product being knocked out, and often results in a lack of a phenotype in a knockout mouse when the ablation of that gene's function would otherwise cause a physiological change.


Expression of transgenic engineered TALE repeat array proteins can be temporally controlled, for example using small molecule regulated systems as described in the previous section. Thus, by switching on expression of an engineered TALE repeat array protein at a desired stage in development, a gene can be over-expressed or “functionally knocked-out” in the adult (or at a late stage in development), thus avoiding the problems of embryonic lethality and developmental compensation.


EXAMPLES
Example 1. Assembly of TALE Repeat Arrays Using Streptavidin Coated Magnetic Beads

An archive of DNA plasmids (˜850 different plasmids) encoding one, two, three, or four TALE repeat domains was created for assembly of nucleic acids encoding multiple TALE arrays of any desired length. The plasmids were created by cloning synthetic arrays of one, two, three or four TALE repeat domains into the pUC57-ΔBsaI backbone (FIG. 3). The TALE repeats were of the arrangement α, βγδε, βγδ, βγ′, βγ, δε′, and β, and included hypervariable triplet residues at each position to bind to the nucleotides as shown in Table 1. Polypeptide and nucleotide sequences of the TALE repeat types are shown in FIGS. 4A and 4B, respectively. The polypeptide and polynucleotide sequences were varied slightly among the four types to reduce the possibility of recombination-mediated mutations due to long sequences of exact repeats.









TABLE 1







Nucleotide binding code of TALE triplets










Triplet
Bound Nucleotide







SNI
A



SHD
C



NNN
G



SNK
G



SNG
T










A 16-mer TALE repeat array targeted to the eGFP gene was created by in vitro assembly of 16 TALE repeats designed to bind the target sequence GCAGTGCTTCAGCCGC (SEQ ID NO: 41). In the first step, a plasmid carrying an α-type TALE repeat with an NNN triplet (G) was amplified by PCR using a biotinylated forward primer Biotin-TCTAGAGAAGACAAGAACCTGACC (SEQ ID NO: 42) and a reverse primer GGATCCGGTCTCTTAAGGCCGTGG (SEQ ID NO: 43). The amplified fragment (50 μl) was purified using a QIA Quick PCR purification kit (QIAGEN), eluted in 40 μl 0.1× elution buffer (as provided in the QIA Quick PCR purification kit), and digested with BsaI HF (New England Biolabs (NEB)) in NEB Buffer 4 for 15 minutes at 50° C. (40 μl elution, 5 μl NEBuffer 4, 5 μl BsaI HF). The digested fragment was purified using a QIA Quick PCR purification kit and eluted in 0.1× elution buffer (50 μl).


A plasmid containing a four TALE repeat domain sub-array unit (βγδε) coding for repeats that each harbor one of the following variable amino acids SHD, SNI, NNN, and SNG (designed to bind the sequence 5′-CAGT-3′) was digested with BbsI (NEB) in NEBuffer 2 for 2 hours at 37° C. in 100 μl (50 μl plasmid [˜200 ng/μ1], 10 μl NEBuffer 2, 10 μl BbsI, 30 μl water). To the 100 μl digest was added 25 μl NEBuffer 4, 2.5 μl 100×BSA (NEB), 107.5 μl water, and 5 μl XbaI (NEB), and the digest was incubated for 5 minutes at 37° C. To the mixture, 5 μl of BamHI HF was then added for a 5 minute digest at 37° C., and then 5 μl SalI HF (NEB) was added for an additional 5 minute digest at 37° C. The resulting fragment was purified using a QIA Quick PCR purification kit (QIAGEN) and eluted in 180 μl 0.1× elution buffer.


For the initial ligation, 2 μl of the alpha unit digest was mixed with 2.5 μl of T4 DNA ligase (400 U/μl; NEB) and 27 μl Quick Ligase Buffer (QLB) (NEB). To this 31.5 μl mixture was added 22.5 μl of the first digested subarray, and the mixture was ligated for 15 minutes at room temperature. Magnetic beads were prepared by washing 5 μl of Dynabeads MyOne Streptavidin C1 (Invitrogen) three times with 50 μl 1× B&W Buffer (5.0 mM Tris-HCl [pH 7.5], 0.5 mM EDTA, 1.0 M NaCl, 0.005% Tween 20) and resuspending in 54 μl B&W Buffer. The ligated mixture was added to the washed beads and incubated for 15 minutes at room temperature (with mixing every five minutes). The mixture was then placed on a SPRIplate 96-well Ring magnet for 3 minutes. The supernatant was then aspirated, and 100 μl 1× B&W Buffer was added to wash, with mixing by moving the beads 31 times from side to side within the tube using a DynaMag-96 Side magnet (Invitrogen). The B&W Buffer was then aspirated, and 100 μl 1×BSA was added, with mixing, then aspirated. The ligated, bead-bound nucleic acids (αβγδε) were resuspended in 50 μl BsaI HF mix (5 μl NEBuffer 4, 2 μl BsaI HF, 43 μl water).


The digest was incubated at 50° C. for 10 minutes, and 50 μl 1× B&W buffer was added. The digest was placed on a magnet for 3 minutes, and the supernatant was aspirated. The beads were washed with 100 μl 1× B&W Buffer and 100 μl 1×BSA as above. To the washed beads were added a digested plasmid containing a four TALE repeat domain sub-array unit (βγδε) coding for repeats that each harbor one of the following variable amino acids NNN, SHD, SNC; and SNG (designed to bind the DNA sequence 5′-GCTT-3′) (22.5 μl) and 27.5 μl ligase mix (25 μl Quick Ligase Buffer, 2 μl DNA ligase). The beads were resuspended by pipetting up and down, and the mixture was incubated for 15 minutes at room temperature with mixing every five minutes. To the ligation was added 50 μl 1× B&W Buffer, and the mixture was placed on the magnet for 3 minutes. The supernatant was aspirated, and the beads were washed with 100 μl 1× B&W Buffer and 100 μl 1×BSA as above. The ligated, bead-bound nucleic acids (αβγδεβγδε) were resuspended in 50 μl BsaI HF mix (5 μl NEBuffer 4, 2 μl BsaI HF, 43 μl water). Two more TALE repeat sub-array units were ligated sequentially as above, the first a four TALE repeat sub-array unit (βγδε) coding for repeats that each harbor one of the following variable amino acids SHD, SNI, NNN, and SHD (designed to bind the DNA sequence 5′-CAGC-3′) and the second a three TALE repeat sub-array unit ((3M coding for repeats that each harbor one of the following variable amino acids SHD, NNN, and SHD (designed to bind the DNA sequence 5′-CGC-3′). The final TALE repeat array contained subunits of the format αβγδεβγδεβγδεβγδ with individual TALE repeats designed to bind the target DNA sequence 5′-GCAGTGCTTCAGCCGC-3′ (SEQ ID NO: 44).


Following the final ligation step, the construct was digested with BsaI HF for eventual cloning into an expression vector and the beads were washed with 1× B&W Buffer and 1×BSA. The washed beads were resuspended in 50 μl BbsI mix (5 μl NEBuffer 2, 5 μl BbsI, 40 μl water) and incubated at 37° C. for 2 hours with agitation at 1500 rpm to cleave the biotinylated 5′ end and release the assembled TALE repeat array from the magnetic beads. The digested mixture was purified by MinElute column purified (QIAGEN) and ligated into a BsmBI-digested TALE expression vector. The ligated mixture was transformed into chemically competent XL1 Blue cells and plated on LB/Carb100 plates overnight.


The expression vectors each harbor the following elements: a T7 promoter, a nuclear localization signal, a FLAG tag, amino acids 153 to 288 from the TALE13 protein (numbering as defined by Miller et al., 2011, Nat. Biotechnol., 29:143-148), two adjacent BsmBI restriction sites into which a DNA fragment encoding a TALE repeat array can be cloned, a 0.5 TALE repeat, amino acids 715 to 777 from the C-terminal end of the TALE13 protein (numbering as defined by Miller et al., 2011, Nat. Biotechnol., 29:143-148), and the wild-type FokI cleavage domain.


The plasmids differ in the identity of the C-terminal 0.5 TALE repeat. Plasmid pJDS70 encodes a 0.5 TALE repeat with a SNI RVD (for recognition of an A nucleotide), plasmid pJDS71 encodes a 0.5 TALE repeat with a SHD RVD (for recognition of a C nucleotide), plasmid pJDS74 encodes a 0.5 TALE repeat with a NNN RVD (for recognition of a G nucleotide), plasmid pJDS76 encodes a 0.5 TALE repeat with a SNK RVD (for recognition of a G nucleotide), and plasmid pJDS78 encodes a 0.5 TALE repeat with a NG RVD (for recognition of a T nucleotide). All plasmids share the common sequence shown in FIGS. 5A-5B and differ at just nine nucleotide positions marked as XXXXXXXXX (underlined and bold). The sequence of these 9 bps and plasmid names are also shown below in Table 2.









TABLE 2







DNAsequences of expression vectors










Plasmid
Sequence of
SEQ
RVD of C-terminal


name
variable 9 bps
ID NO:
0.5 TALE repeat





pJDS70
TCTAACATC
45
SNI (for binding





to an A nucleotide)





pJDS71
TCCCACGAC
46
SHD (for binding





to a C nucleotide)





pJDS74
AATAATAAC
47
NNN(for binding





to a G nucleotide)





pJDS76
TCCAATAAA
48
SNK (for binding





to a G nucleotide)





pJDS78
TCTAATGGG
49
SNG(for binding





to a T nucleotide)









This example demonstrates the construction of TALE repeat arrays on an immobilized substrate using preassembled TALE repeat sub-array units. The above method, up to the cloning step, can be performed in one day.


Example 2. Assembly of TALE Repeat Arrays Using a Streptavidin Coated Plate

TALE repeats are assembled using the archive of DNA plasmids (˜850 different plasmids) as described in Example 1. A 16-mer TALE repeat array was created by in vitro assembly of 16 TALE repeats designed to bind a target sequence. In the first step, a plasmid carrying an α-type TALE repeat with an NNN triplet (G) was amplified by PCR using a biotinylated forward primer Biotin-TCTAGAGAAGACAAGAACCTGACC (SEQ ID NO: 42) and a reverse primer GGATCCGGTCTCTTAAGGCCGTGG (SEQ ID NO: 43). The amplified fragment (50 μl) was purified using a QIA Quick PCR purification kit (QIAGEN), eluted in 40 μl 0.1× elution buffer (as provided in the QIA Quick PCR purification kit), and digested with BsaI HF (New England Biolabs (NEB)) in NEB Buffer 4 for 15 minutes at 50° C. (40 μl elution, 5 μl NEBuffer 4, 5 μl BsaI HF). The digested fragment was purified using a QIA Quick PCR purification kit and eluted in 0.1× elution buffer (50 μl).


A plasmid containing a four TALE repeat domain sub-array unit (βγδε) coding for repeats that each harbor one of the following variable amino acids SHD, SNI, NNN, and SNG (designed to bind the sequence 5′-CAGT-3′) was digested with BbsI (NEB) in NEBuffer 2 for 2 hours at 37° C. in 100 μl (50 μl plasmid [˜200 ng/μ1], 10 μl NEBuffer 2, 10 μl BbsI, 30 μl water). To the 100 μl digest was added 25 μl NEBuffer 4, 2.5 μl 100× BSA (NEB), 107.5 μl water, and 5 μl XbaI (NEB), and the digest was incubated for 5 minutes at 37° C. To the mixture, 5 μl of BamHI HF was then added for a 5 minute digest at 37° C., and then 5 μl SalI HF (NEB) was added for an additional 5 minute digest at 37° C. The resulting fragment was purified using a QIA Quick PCR purification kit (QIAGEN) and eluted in 180 μl 0.1× elution buffer.


For the initial ligation, 2 μl of the alpha unit digest was mixed with 2.5 μl of T4 DNA ligase (400 U/μl; NEB) and 27 μl Quick Ligase Buffer (QLB) (NEB). To this 31.5 μl mixture was added 22.5 μl of the first digested subarray, and the mixture was ligated for 15 minutes at room temperature. The ligation mixture was then mixed with 2× B&@ buffer (Invitrogen) and added to a well in a 96-well plate coated with streptavidin (Thermo Scientific) and incubated at room temperature for 15 min. The supernatant was aspirated. Each well in the 96 well plate was washed with 200 ul of 1× Bovine Serum Albumin (BSA) by pipetting up and down 10 times before discarding the 1×BSA. This was repeated for a total of two washes with 1×BSA. Then 50 μl BsaI HF mix (5 μl NEBuffer 4, 2 μl BsaI HF, 43 μl water) was added to the ligated, nucleic acids (αβγδε) bound to the streptavidin-coated well.


The digest was incubated at 50° C. for 10 minutes and then the supernatant was aspirated. The wells were then washed with 200 μl 1× B&W Buffer and 200 μl 1×BSA twice by pipetting up and down ten times before removal of each supernatant. 22.5 μl of digested plasmid encoding a four TALE repeat domain sub-array unit (βγδε) coding for repeats that each harbor one of the following variable amino acids NNN, SHD, SNG, and SNI and 27.5 μl ligase mix (25 μl Quick Ligase Buffer, 2 μl DNA ligase) were added to the well. The supernatant was mixed by pipetting up and down, and the mixture was incubated for 15 minutes at room temperature. The supernatant was removed and the well was washed with 1× B&W and 1×BSA as above. Then 50 μl BsaI HF mix (5 μl NEBuffer 4, 2 μl BsaI HF, 43 μl water) was added to the ligated nucleic acids (αβγδεβγδε) bound to the well. Two more TALE repeat sub-array units were ligated sequentially as above, the first a four TALE repeat sub-array unit (βγδε) coding for repeats that each harbor one of the following variable amino acids SHD, SNI, NNN, and SNG and the second a three TALE repeat sub-array unit (βγδ) coding for repeats that each harbor one of the following variable amino acids SHD, SNI, NNN, and SHD. The final TALE repeat array contained subunits of the format αβγδεβγδεβγδεβγδ with individual TALE repeats designed to bind a target DNA sequence.


Following the final ligation step, the fragments in the well were digested with BsaI HF for eventual cloning into an expression vector. The well was then washed with 1× B&W Buffer and twice with 1×BSA. Then 50 μl BbsI mix (5 μl NEBuffer 2, 5 μl BbsI, 40 μl water) was added to the well and incubated at 37° C. for 2 hours to cleave the biotinylated 5′ end and release the assembled TALE repeat array from the well. The digested mixture was purified, ligated, and transformed as described in Example 1.


Example 3. Site-Specific Mutagenesis Using TALE Nucleases

To demonstrate the effectiveness of TALE repeat domains created by the methods described herein, TALE repeat arrays were constructed and cloned into TALE nuclease expression vectors (as described in Example 1) to produce plasmids encoding TALE nuclease monomers targeted to the eGFP coding sequences shown in FIG. 6 and Table 3. Nucleic acid and polypeptide sequences of the TALE nuclease monomers are shown in FIGS. 11A-18B.









TABLE 3







TALE nuclease monomer target sequences















Length






TALE

of target
SEQ

Position
Plasmid


Fragment
Target Sequence
sequence
ID NO:
Site
(half-site)
name





DR-
TGCAGTGCTTCAGCCGC
17
50
eGFP223
left
SQT70


TALE-0003











DR-
TGCAGTGCTTCAGCCGCT
18
51
eGFP223
left
SQT114


TALE-0006











DR-
TTGAAGAAGTCGTGCTGC
18
52
eGFP223
right
SQT72


TALE-0005











DR-
TGAAGAAGTCGTGCTGCT
18
53
eGFP223
right
SQT56


TALE-0010











DR-
TCGAGCTGAAGGGCATC
17
54
eGFP382
left
SQT84


TALE-0023











DR-
TCGAGCTGAAGGGCATCG
18
55
eGFP382
left
SQT120


TALE-0025











DR-
TTGTGCCCCAGGATGTTG
18
56
eGFP382
right
SQT135


TALE-0020











DR-
TGTGCCCCAGGATGTTGC
18
57
eGFP382
right
SQT118


TALE-0022















4E5 U2OS-eGFP cells were nucleofected with 400 ng plasmid DNA in solution SE with program DN-100 using Nucleofector™ non-viral transfection (Lonza, Walkersville, Md.). The cells were analyzed by flow cytometry at days 2 and 5 (FIG. 7). Non-homologous end joining (NHEJ)-mediated mutagenic repair of TALE nuclease-induced double-stranded breaks led to disruption of eGFP expression (eGFP-negative cells). All eight TALE nuclease pairs tested induced a high percentage of eGFP-negative (eGFP-) cells (y-axis). The percentage of eGFP-cells declined only modestly between day 2 and 5 suggesting that the alterations were stably induced.


A subset of mutated eGFP genes were amplified from cells and sequenced. The resulting mutations are shown in FIG. 8. Sequences targeted by the TALE nucleases encoded by expression plasmids SQT70/SQT56 in human USOS-eGFP cells are underlined in the wild-type (WT) sequence shown at the top of FIG. 8. Insertion and deletion mutations induced by the TALE nuclease pair are shown below with deleted bases indicated by dashes and inserted bases indicated by double underlining. The net number of bases inserted or deleted is shown to the right. All mutations were isolated once unless otherwise indicated in brackets. The overall frequency of mutagenesis (46%) is also indicated.


Example 4. Automated Assembly of TALE Repeat Arrays

The assembly method described in Example 1 has been automated so as to be performed using a Sciclone™ G3 liquid handling workstation (Caliper Life Sciences, Hopkinton, Mass.) in 96-well plates. All of the steps were automated except digestion of the nucleic acids prior to ligation and linking to the beads and the steps following release of the assembled TALE repeat array from the magnetic beads. The automated steps were performed essentially as when done manually with minor variations in the number of resuspension and mixing motions. The results of assembly of two 17-mers are shown in FIG. 9. A major product of the expected size can be seen, corresponding to the 17-mer. Additional minor 13-mer, 9-mer, and 5-mer products can also be seen, likely produced by carry forward of incompletely ligated products. A similar result can be seen in FIG. 10, which shows the results of assembly of 16-mers from an N-terminal 1-mer sub-array (1), three 4-mer subarrays (4A, 4B, 4C), and a C-terminal 3-mer subarray (3D).


This example demonstrates that the methods described herein can be automated for rapid and reproducible synthesis of nucleic acids encoding TALE repeat arrays.


Example 5. Assembly Methods

TALE repeat arrays were created using an architecture in which four distinct TALE repeat backbones that differ slightly in their amino acid and DNA sequences occur in a repeated pattern. The first, amino-terminal TALE repeat in an array was designated as the α unit. This was followed by β, γ, and δ units and then an ε unit that is essentially identical to the α unit except for the different positioning of a Type IIS restriction site on the 5′ end (required to enable creation of a unique overhang on the α unit needed for cloning). The ε unit was then followed again by repeats of β, γ, δ, and ε units. Due to constraints related to creation of a 3′ end required for cloning, slightly modified DNA sequences were required for TALE repeat arrays that end with a carboxy-terminal γ or ε unit. We designated these variant units as γ* and ε*.


For each type of TALE repeat unit (i.e.—α, β, γ, δ, ε, γ*, and ε*), we commercially synthesized (Genscript) a series of four plasmids, each harboring one of the five repeat variable di-residues (RVDs) that specifies one of the four DNA bases (NI=A; HD=C; NN=G; NG=T, NK=G). Full DNA sequences of these plasmids are provided in Table 4 and FIG. 3. For all 35 of these plasmids, the sequence encoding the TALE repeat domain is flanked on the 5′ end by unique XbaI and BbsI restriction sites and on the 3′ end by unique BsaI and BamHI restriction sites. Additionally, the overhangs generated by digestion of any plasmids encoding units designed to be adjacent to one another (e.g.—β and γ, or δ and ε) with BsaI and BbsI are complementary. Using these 35 different plasmids and serial ligation via the BsaI and BbsI restriction sites, we assembled an archive of all possible combinations of βγ, βγδε, βγδ, βγ*, and δε* repeats. In total, this archive consisted of 825 different plasmids encoding 5 α's, 5 β's, 25 βγ combinations, 625 βγδε combinations, 125 βγδ combinations, 25 βγ* combinations, and 25 δε* combinations (Table 5). These 825 plasmids plus ten of the original 35 plasmids encoding single TALE repeats (five α and five β plasmids) are required to practice the methods. With this archive of 835 plasmids listed in Table 5, the methods can be used to construct TALE repeat arrays of any desired length and composition.









TABLE 4







DNA sequences encoding individual TALE repeats
















DNA Sequence (Cloned



TAL
Unit

Target
between XbaI/



ID#
Architecture
RVD
Base
BamHI in pUC57-ΔBSal
SEQ ID NO:















6
α
NI
A
TCTAGAGAAGACAAGAACCTGACC
58






CCAGACCAGGTAGTCGCAATCGCG







TCGAACATTGGGGGAAAGCAAGCC







CTGGAAACCGTGCAAAGGTTGTTG







CCGGTCCTTTGTCAAGACCACGGC







CTTAAGAGACCGGATCC






7
α
HD
C
TCTAGAGAAGACAAGAACCTGACC
59






CCAGACCAGGTAGTCGCAATCGCG







TCACATGACGGGGGAAAGCAAGCC







CTGGAAACCGTGCAAAGGTTGTTG







CCGGTCCTTTGTCAAGACCACGGC







CTTAAGAGACCGGATCC






8
α
NK
G
TCTAGAGAAGACAAGAACCTGACC
60






CCAGACCAGGTAGTCGCAATCGCG







TCGAACAAAGGGGGAAAGCAAGCC







CTGGAAACCGTGCAAAGGTTGTTG







CCGGTCCTTTGTCAAGACCACGGC







CTTAAGAGACCGGATCC






9
α
NN
G
TCTAGAGAAGACAAGAACCTGACC
61






CCAGACCAGGTAGTCGCAATCGCG







AACAATAATGGGGGAAAGCAAGCC







CTGGAAACCGTGCAAAGGTTGTTG







CCGGTCCTTTGTCAAGACCACGGC







CTTAAGAGACCGGATCC






10
α
NG
T
TCTAGAGAAGACAAGAACCTGACC
62






CCAGACCAGGTAGTCGCAATCGCG







TCAAACGGAGGGGGAAAGCAAGCC







CTGGAAACCGTGCAAAGGTTGTTG







CCGGTCCTTTGTCAAGACCACGGC







CTTAAGAGACCGGATCC






11
β
NI
A
TCTAGAGAAGACAACTTACACCGG
63






AGCAAGTCGTGGCCATTGCAAGCA







ACATCGGTGGCAAACAGGCTCTTG







AGACGGTTCAGAGACTTCTCCCAG







TTCTCTGTCAAGCCCACGGGCTGA







AGAGACCGGATCC






12
β
HD
C
TCTAGAGAAGACAACTTACACCGG
64






AGCAAGTCGTGGCCATTGCATCCC







ACGACGGTGGCAAACAGGCTCTTG







AGACGGTTCAGAGACTTCTCCCAG







TTCTCTGTCAAGCCCACGGGCTGA







AGAGACCGGATCC






13
β
NK
G
TCTAGAGAAGACAACTTACACCGG
65






AGCAAGTCGTGGCCATTGCATCAA







ATAAAGGTGGCAAACAGGCTCTTG







AGACGGTTCAGAGACTTCTCCCAG







TTCTCTGTCAAGCCCACGGGCTGA







AGAGACCGGATCC






14
β
NN
G
TCTAGAGAAGACAACTTACACCGG
66






AGCAAGTCGTGGCCATTGCAAATA







ATAACGGTGGCAAACAGGCTCTTG







AGACGGTTCAGAGACTTCTCCCAG







TTCTCTGTCAAGCCCACGGGCTGA







AGAGACCGGATCC






15
β
NG
T
TCTAGAGAAGACAACTTACACCGG
67






AGCAAGTCGTGGCCATTGCAAGCA







ATGGGGGTGGCAAACAGGCTCTTG







AGACGGTTCAGAGACTTCTCCCAG







TTCTCTGTCAAGCCCACGGGCTGA







AGAGACCGGATCC






16
γ
NI
A
TCTAGAGAAGACAACTGACTCCCG
68






ATCAAGTTGTAGCGATTGCGTCGA







ACATTGGAGGGAAACAAGCATTGG







AGACTGTCCAACGGCTCCTTCCCG







TGTTGTGTCAAGCCCACGGTTTGA







AGAGACCGGATCC






17
γ
HD
C
TCTAGAGAAGACAACTGACTCCCG
69






ATCAAGTTGTAGCGATTGCGTCGC







ATGACGGAGGGAAACAAGCATTGG







AGACTGTCCAACGGCTCCTTCCCG







TGTTGTGTCAAGCCCACGGTTTGA







AGAGACCGGATCC






18
γ
NK
G
TCTAGAGAAGACAACTGACTCCCG
70






ATCAAGTTGTAGCGATTGCGTCCA







ACAAGGGAGGGAAACAAGCATTGG







AGACTGTCCAACGGCTCCTTCCCG







TGTTGTGTCAAGCCCACGGTTTGA







AGAGACCGGATCC






19
γ
NN
G
TCTAGAGAAGACAACTGACTCCCG
71






ATCAAGTTGTAGCGATTGCGAATA







ACAATGGAGGGAAACAAGCATTGG







AGACTGTCCAACGGCTCCTTCCCG







TGTTGTGTCAAGCCCACGGTTTGA







AGAGACCGGATCC






20
γ
NG
T
TCTAGAGAAGACAACTGACTCCCG
72






ATCAAGTTGTAGCGATTGCGTCCA







ACGGTGGAGGGAAACAAGCATTGG







AGACTGTCCAACGGCTCCTTCCCG







TGTTGTGTCAAGCCCACGGTTTGA







AGAGACCGGATCC






21
δ
NI
A
TCTAGAGAAGACAATTGACGCCTG
73






CACAAGTGGTCGCCATCGCCTCCA







ATATTGGCGGTAAGCAGGCGCTGG







AAACAGTACAGCGCCTGCTGCCTG







TACTGTGCCAGGATCATGGACTGA







AGAGACCGGATCC






22
δ
HD
C
TCTAGAGAAGACAATTGACGCCTG
74






CACAAGTGGTCGCCATCGCCAGCC







ATGATGGCGGTAAGCAGGCGCTGG







AAACAGTACAGCGCCTGCTGCCTG







TACTGTGCCAGGATCATGGACTGA







AGAGACCGGATCC






23
δ
NK
G
TCTAGAGAAGACAATTGACGCCTG
75






CACAAGTGGTCGCCATCGCCAGCA







ATAAGGGCGGTAAGCAGGCGCTGG







AAACAGTACAGCGCCTGCTGCCTG







TACTGTGCCAGGATCATGGACTGA







AGAGACCGGATCC






24
δ
NN
G
TCTAGAGAAGACAATTGACGCCTG
76






CACAAGTGGTCGCCATCGCCAACA







ACAACGGCGGTAAGCAGGCGCTGG







AAACAGTACAGCGCCTGCTGCCTG







TACTGTGCCAGGATCATGGACTGA







AGAGACCGGATCC






25
δ
NG
T
TCTAGAGAAGACAATTGACGCCTG
77






CACAAGTGGTCGCCATCGCCTCGA







ATGGCGGCGGTAAGCAGGCGCTGG







AAACAGTACAGCGCCTGCTGCCTG







TACTGTGCCAGGATCATGGACTGA







AGAGACCGGATCC






26
ε
NI
A
TCTAGAGAAGACAACTGACCCCAG
78






ACCAGGTAGTCGCAATCGCGTCGA







ACATTGGGGGAAAGCAAGCCCTGG







AAACCGTGCAAAGGTTGTTGCCGG







TCCTTTGTCAAGACCACGGCCTTA







AGAGACCGGATCC






27
ε
HD
C
TCTAGAGAAGACAACTGACCCCAG
79






ACCAGGTAGTCGCAATCGCGTCAC







ATGACGGGGGAAAGCAAGCCCTGG







AAACCGTGCAAAGGTTGTTGCCGG







TCCTTTGTCAAGACCACGGCCTTA







AGAGACCGGATCC






28
ε
NK
G
TCTAGAGAAGACAACTGACCCCAG
80






ACCAGGTAGTCGCAATCGCGTCGA







ACAAAGGGGGAAAGCAAGCCCTGG







AAACCGTGCAAAGGTTGTTGCCGG







TCCTTTGTCAAGACCACGGCCTTA







AGAGACCGGATCC






29
ε
NN
G
TCTAGAGAAGACAACTGACCCCAG
81






ACCAGGTAGTCGCAATCGCGAACA







ATAATGGGGGAAAGCAAGCCCTGG







AAACCGTGCAAAGGTTGTTGCCGG







TCCTTTGTCAAGACCACGGCCTTA







AGAGACCGGATCC






30
ε
NG
T
TCTAGAGAAGACAACTGACCCCAG
82






ACCAGGTAGTCGCAATCGCGTCAA







ACGGAGGGGGAAAGCAAGCCCTGG







AAACCGTGCAAAGGTTGTTGCCGG







TCCTTTGTCAAGACCACGGCCTTA







AGAGACCGGATCC






31
γ′
NI
A
TCTAGAGAAGACAACTGACTCCCG
83






ATCAAGTTGTAGCGATTGCGTCGA







ACATTGGAGGGAAACAAGCATTGG







AGACTGTCCAACGGCTCCTTCCCG







TGTTGTGTCAAGCCCACGGTCTGA







AGAGACCGGATCC






32
γ′
HD
C
TCTAGAGAAGACAACTGACTCCCG
84






ATCAAGTTGTAGCGATTGCGTCGC







ATGACGGAGGGAAACAAGCATTGG







AGACTGTCCAACGGCTCCTTCCCG







TGTTGTGTCAAGCCCACGGTCTGA







AGAGACCGGATCC






33
γ′
NK
G
TCTAGAGAAGACAACTGACTCCCG
85






ATCAAGTTGTAGCGATTGCGTCCA







ACAAGGGAGGGAAACAAGCATTGG







AGACTGTCCAACGGCTCCTTCCCG







TGTTGTGTCAAGCCCACGGTCTGA







AGAGACCGGATCC






34
γ′
NN
G
TCTAGAGAAGACAACTGACTCCCG
86






ATCAAGTTGTAGCGATTGCGAATA







ACAATGGAGGGAAACAAGCATTGG







AGACTGTCCAACGGCTCCTTCCCG







TGTTGTGTCAAGCCCACGGTCTGA







AGAGACCGGATCC






35
γ′
NG
T
TCTAGAGAAGACAACTGACTCCCG
87






ATCAAGTTGTAGCGATTGCGTCCA







ACGGTGGAGGGAAACAAGCATTGG







AGACTGTCCAACGGCTCCTTCCCG







TGTTGTGTCAAGCCCACGGTCTGA







AGAGACCGGATCC






36
ε′
NI
A
TCTAGAGAAGACAACTGACCCCAG
88






ACCAGGTAGTCGCAATCGCGTCGA







ACATTGGGGGAAAGCAAGCCCTGG







AAACCGTGCAAAGGTTGTTGCCGG







TCCTTTGTCAAGACCACGGCCTGA







AGAGACCGGATCC






37
ε′
HD
C
TCTAGAGAAGACAACTGACCCCAG
89






ACCAGGTAGTCGCAATCGCGTCAC







ATGACGGGGGAAAGCAAGCCCTGG







AAACCGTGCAAAGGTTGTTGCCGG







TCCTTTGTCAAGACCACGGCCTGA







AGAGACCGGATCC






38
ε′
NK
G
TCTAGAGAAGACAACTGACCCCAG
90






ACCAGGTAGTCGCAATCGCGTCGA







ACAAAGGGGGAAAGCAAGCCCTGG







AAACCGTGCAAAGGTTGTTGCCGG







TCCTTTGTCAAGACCACGGCCTGA







AGAGACCGGATCC






39
ε′
NN
G
TCTAGAGAAGACAACTGACCCCAG
91






ACCAGGTAGTCGCAATCGCGAACA







ATAATGGGGGAAAGCAAGCCCTGG







AAACCGTGCAAAGGTTGTTGCCGG







TCCTTTGTCAAGACCACGGCCTGA







AGAGACCGGATCC






40
ε′
NG
T
TCTAGAGAAGACAACTGACCCCAG
92






ACCAGGTAGTCGCAATCGCGTCAA







ACGGAGGGGGAAAGCAAGCCCTGG







AAACCGTGCAAAGGTTGTTGCCGG







TCCTTTGTCAAGACCACGGCCTGA







AGAGACCGGATCC
















TABLE 5







Archive of 835 plasmids encoding pre-assembled TALE repeat units










Plasmid ID
DNA Target
RVDs
Unit Architecture





TAL006
A
NI
α





TAL007
C
HD
α





TAL008
G
NK
α





TAL009
G
NN
α





TAL010
T
NG
α





TAL011/016/021/026
AAAA
NI/NI/NI/NI
βγβε





TAL011/016/021/027
AAAC
NI/NI/NI/HD
βγβε





TAL011/016/021/028
AAAG
NI/NI/NI/NK
βγβε





TAL011/016/021/029
AAAG
NI/NI/NI/NN
βγβε





TAL011/016/021/030
AAAT
NI/NI/NI/NG
βγβε





TAL011/016/022/026
AACA
NI/NI/HD/NI
βγβε





TAL011/016/022/027
AACC
NI/NI/HD/HD
βγβε





TAL011/016/022/028
AACG
NI/NI/HD/NK
βγβε





TAL011/016/022/029
AACG
NI/NI/HD/NN
βγβε





TAL011/016/022/030
AACT
NI/NI/HD/NG
βγβε





TAL011/016/023/026
AAGA
NI/NI/NK/NI
βγβε





TAL011/016/023/027
AAGC
NI/NI/NK/HD
βγβε





TAL011/016/023/028
AAGG
NI/NI/NK/NK
βγβε





TAL011/016/023/029
AAGG
NI/NI/NK/NN
βγβε





TAL011/016/023/030
AAGT
NI/NI/NK/NG
βγβε





TAL011/016/024/026
AAGA
NI/NI/NN/NI
βγβε





TAL011/016/024/027
AAGC
NI/NI/NN/HD
βγβε





TAL011/016/024/028
AAGG
NI/NI/NN/NK
βγβε





TAL011/016/024/029
AAGG
NI/NI/NN/NN
βγβε





TAL011/016/024/030
AAGT
NI/NI/NN/NG
βγβε





TAL011/016/025/026
AATA
NI/NI/NG/NI
βγβε





TAL011/016/025/027
AATC
NI/NI/NG/HD
βγβε





TAL011/016/025/028
AATG
NI/NI/NG/NK
βγβε





TAL011/016/025/029
AATG
NI/NI/NG/NN
βγβε





TAL011/016/025/030
AATT
NI/NI/NG/NG
βγβε





TAL011/017/021/026
ACAA
NI/HD/NI/NI
βγβε





TAL011/017/021/027
ACAC
NI/HD/NI/HD
βγβε





TAL011/017/021/028
ACAG
NI/HD/NI/NK
βγβε





TAL011/017/021/029
ACAG
NI/HD/NI/NN
βγβε





TAL011/017/021/030
ACAT
NI/HD/NI/NG
βγβε





TAL011/017/022/026
ACCA
NI/HD/HD/NI
βγβε





TAL011/017/022/027
ACCC
NI/HD/HD/HD
βγβε





TAL011/017/022/028
ACCG
NI/HD/HD/NK
βγβε





TAL011/017/022/029
ACCG
NI/HD/HD/NN
βγβε





TAL011/017/022/030
ACCT
NI/HD/HD/NG
βγβε





TAL011/017/023/026
ACGA
NI/HD/NK/NI
βγβε





TAL011/017/023/027
ACGC
NI/HD/NK/HD
βγβε





TAL011/017/023/028
ACGG
NI/HD/NK/NK
βγβε





TAL011/017/023/029
ACGG
NI/HD/NK/NN
βγβε





TAL011/017/023/030
ACGT
NI/HD/NK/NG
βγβε





TAL011/017/024/026
ACGA
NI/HD/NN/NI
βγβε





TAL011/017/024/027
ACGC
NI/HD/NN/HD
βγβε





TAL011/017/024/028
ACGG
NI/HD/NN/NK
βγβε





TAL011/017/024/029
ACGG
NI/HD/NN/NN
βγβε





TAL011/017/024/030
ACGT
NI/HD/NN/NG
βγβε





TAL011/017/025/026
ACTA
NI/HD/NG/NI
βγβε





TAL011/017/025/027
ACTC
NI/HD/NG/HD
βγβε





TAL011/017/025/028
ACTG
NI/HD/NG/NK
βγβε





TAL011/017/025/029
ACTG
NI/HD/NG/NN
βγβε





TAL011/017/025/030
ACTT
NI/HD/NG/NG
βγβε





TAL011/018/021/026
AGAA
NI/NK/NI/NI
βγβε





TAL011/018/021/027
AGAC
NI/NK/NI/HD
βγβε





TAL011/018/021/028
AGAG
NI/NK/NI/NK
βγβε





TAL011/018/021/029
AGAG
NI/NK/NI/NN
βγβε





TAL011/018/021/030
AGAT
NI/NK/NI/NG
βγβε





TAL011/018/022/026
AGCA
NI/NK/HD/NI
βγβε





TAL011/018/022/027
AGCC
NI/NK/HD/HD
βγβε





TAL011/018/022/028
AGCG
NI/NK/HD/NK
βγβε





TAL011/018/022/029
AGCG
NI/NK/HD/NN
βγβε





TAL011/018/022/030
AGCT
NI/NK/HD/NG
βγβε





TAL011/018/023/026
AGGA
NI/NK/NK/NI
βγβε





TAL011/018/023/027
AGGC
NI/NK/NK/HD
βγβε





TAL011/018/023/028
AGGG
NI/NK/NK/NK
βγβε





TAL011/018/023/029
AGGG
NI/NK/NK/NN
βγβε





TAL011/018/023/030
AGGT
NI/NK/NK/NG
βγβε





TAL011/018/024/026
AGGA
NI/NK/NN/NI
βγβε





TAL011/018/024/027
AGGC
NI/NK/NN/HD
βγβε





TAL011/018/024/028
AGGG
NI/NK/NN/NK
βγβε





TAL011/018/024/029
AGGG
NI/NK/NN/NN
βγβε





TAL011/018/024/030
AGGT
NI/NK/NN/NG
βγβε





TAL011/018/025/026
AGTA
NI/NK/NG/NI
βγβε





TAL011/018/025/027
AGTC
NI/NK/NG/HD
βγβε





TAL011/018/025/028
AGTG
NI/NK/NG/NK
βγβε





TAL011/018/025/029
AGTG
NI/NK/NG/NN
βγβε





TAL011/018/025/030
AGTT
NI/NK/NG/NG
βγβε





TAL011/019/021/026
AGAA
NI/NN/NI/NI
βγβε





TAL011/019/021/027
AGAC
NI/NN/NI/HD
βγβε





TAL011/019/021/028
AGAG
NI/NN/NI/NK
βγβε





TAL011/019/021/029
AGAG
NI/NN/NI/NN
βγβε





TAL011/019/021/030
AGAT
NI/NN/NI/NG
βγβε





TAL011/019/022/026
AGCA
NI/NN/HD/NI
βγβε





TAL011/019/022/027
AGCC
NI/NN/HD/HD
βγβε





TAL011/019/022/028
AGCG
NI/NN/HD/NK
βγβε





TAL011/019/022/029
AGCG
NI/NN/HD/NN
βγβε





TAL011/019/022/030
AGCT
NI/NN/HD/NG
βγβε





TAL011/019/023/026
AGGA
NI/NN/NK/NI
βγβε





TAL011/019/023/027
AGGC
NI/NN/NK/HD
βγβε





TAL011/019/023/028
AGGG
NI/NN/NK/NK
βγβε





TAL011/019/023/029
AGGG
NI/NN/NK/NN
βγβε





TAL011/019/023/030
AGGT
NI/NN/NK/NG
βγβε





TAL011/019/024/026
AGGA
NI/NN/NN/NI
βγβε





TAL011/019/024/027
AGGC
NI/NN/NN/HD
βγβε





TAL011/019/024/028
AGGG
NI/NN/NN/NK
βγβε





TAL011/019/024/029
AGGG
NI/NN/NN/NN
βγβε





TAL011/019/024/030
AGGT
NI/NN/NN/NG
βγβε





TAL011/019/025/026
AGTA
NI/NN/NG/NI
βγβε





TAL011/019/025/027
AGTC
NI/NN/NG/HD
βγβε





TAL011/019/025/028
AGTG
NI/NN/NG/NK
βγβε





TAL011/019/025/029
AGTG
NI/NN/NG/NN
βγβε





TAL011/019/025/030
AGTT
NI/NN/NG/NG
βγβε





TAL011/020/021/026
ATAA
NI/NG/NI/NI
βγβε





TAL011/020/021/027
ATAC
NI/NG/NI/HD
βγβε





TAL011/020/021/028
ATAG
NI/NG/NI/NK
βγβε





TAL011/020/021/029
ATAG
NI/NG/NI/NN
βγβε





TAL011/020/021/030
ATAT
NI/NG/NI/NG
βγβε





TAL011/020/022/026
ATCA
NI/NG/HD/NI
βγβε





TAL011/020/022/027
ATCC
NI/NG/HD/HD
βγβε





TAL011/020/022/028
ATCG
NI/NG/HD/NK
βγβε





TAL011/020/022/029
ATCG
NI/NG/HD/NN
βγβε





TAL011/020/022/030
ATCT
NI/NG/HD/NG
βγβε





TAL011/020/023/026
ATGA
NI/NG/NK/NI
βγβε





TAL011/020/023/027
ATGC
NI/NG/NK/HD
βγβε





TAL011/020/023/028
ATGG
NI/NG/NK/NK
βγβε





TAL011/020/023/029
ATGG
NI/NG/NK/NN
βγβε





TAL011/020/023/030
ATGT
NI/NG/NK/NG
βγβε





TAL011/020/024/026
ATGA
NI/NG/NN/NI
βγβε





TAL011/020/024/027
ATGC
NI/NG/NN/HD
βγβε





TAL011/020/024/028
ATGG
NI/NG/NN/NK
βγβε





TAL011/020/024/029
ATGG
NI/NG/NN/NN
βγβε





TAL011/020/024/030
ATGT
NI/NG/NN/NG
βγβε





TAL011/020/025/026
ATTA
NI/NG/NG/NI
βγβε





TAL011/020/025/027
ATTC
NI/NG/NG/HD
βγβε





TAL011/020/025/028
ATTG
NI/NG/NG/NK
βγβε





TAL011/020/025/029
ATTG
NI/NG/NG/NN
βγβε





TAL011/020/025/030
ATTT
NI/NG/NG/NG
βγβε





TAL012/016/021/026
CAAA
HD/NI/NI/NI
βγβε





TAL012/016/021/027
CAAC
HD/NI/NI/HD
βγβε





TAL012/016/021/028
CAAG
HD/NI/NI/NK
βγβε





TAL012/016/021/029
CAAG
HD/NI/NI/NN
βγβε





TAL012/016/021/030
CAAT
HD/NI/NI/NG
βγβε





TAL012/016/022/026
CACA
HD/NI/HD/NI
βγβε





TAL012/016/022/027
CACC
HD/NI/HD/HD
βγβε





TAL012/016/022/028
CACG
HD/NI/HD/NK
βγβε





TAL012/016/022/029
CACG
HD/NI/HD/NN
βγβε





TAL012/016/022/030
CACT
HD/NI/HD/NG
βγβε





TAL012/016/023/026
CAGA
HD/NI/NK/NI
βγβε





TAL012/016/023/027
CAGC
HD/NI/NK/HD
βγβε





TAL012/016/023/028
CAGG
HD/NI/NK/NK
βγβε





TAL012/016/023/029
CAGG
HD/NI/NK/NN
βγβε





TAL012/016/023/030
CAGT
HD/NI/NK/NG
βγβε





TAL012/016/024/026
CAGA
HD/NI/NN/NI
βγβε





TAL012/016/024/027
CAGC
HD/NI/NN/HD
βγβε





TAL012/016/024/028
CAGG
HD/NI/NN/NK
βγβε





TAL012/016/024/029
CAGG
HD/NI/NN/NN
βγβε





TAL012/016/024/030
CAGT
HD/NI/NN/NG
βγβε





TAL012/016/025/026
CATA
HD/NI/NG/NI
βγβε





TAL012/016/025/027
CATC
HD/NI/NG/HD
βγβε





TAL012/016/025/028
CATG
HD/NI/NG/NK
βγβε





TAL012/016/025/029
CATG
HD/NI/NG/NN
βγβε





TAL012/016/025/030
CATT
HD/NI/NG/NG
βγβε





TAL012/017/021/026
CCAA
HD/HD/NI/NI
βγβε





TAL012/017/021/027
CCAC
HD/HD/NI/HD
βγβε





TAL012/017/021/028
CCAG
HD/HD/NI/NK
βγβε





TAL012/017/021/029
CCAG
HD/HD/NI/NN
βγβε





TAL012/017/021/030
CCAT
HD/HD/NI/NG
βγβε





TAL012/017/022/026
CCCA
HD/HD/HD/NI
βγβε





TAL012/017/022/027
CCCC
HD/HD/HD/HD
βγβε





TAL012/017/022/028
CCCG
HD/HD/HD/NK
βγβε





TAL012/017/022/029
CCCG
HD/HD/HD/NN
βγβε





TAL012/017/022/030
CCCT
HD/HD/HD/NG
βγβε





TAL012/017/023/026
CCGA
HD/HD/NK/NI
βγβε





TAL012/017/023/027
CCGC
HD/HD/NK/HD
βγβε





TAL012/017/023/028
CCGG
HD/HD/NK/NK
βγβε





TAL012/017/023/029
CCGG
HD/HD/NK/NN
βγβε





TAL012/017/023/030
CCGT
HD/HD/NK/NG
βγβε





TAL012/017/024/026
CCGA
HD/HD/NN/NI
βγβε





TAL012/017/024/027
CCGC
HD/HD/NN/HD
βγβε





TAL012/017/024/028
CCGG
HD/HD/NN/NK
βγβε





TAL012/017/024/029
CCGG
HD/HD/NN/NN
βγβε





TAL012/017/024/030
CCGT
HD/HD/NN/NG
βγβε





TAL012/017/025/026
CCTA
HD/HD/NG/NI
βγβε





TAL012/017/025/027
CCTC
HD/HD/NG/HD
βγβε





TAL012/017/025/028
CCTG
HD/HD/NG/NK
βγβε





TAL012/017/025/029
CCTG
HD/HD/NG/NN
βγβε





TAL012/017/025/030
CCTT
HD/HD/NG/NG
βγβε





TAL012/018/021/026
CGAA
HD/NK/NI/NI
βγβε





TAL012/018/021/027
CGAC
HD/NK/NI/HD
βγβε





TAL012/018/021/028
CGAG
HD/NK/NI/NK
βγβε





TAL012/018/021/029
CGAG
HD/NK/NI/NN
βγβε





TAL012/018/021/030
CGAT
HD/NK/NI/NG
βγβε





TAL012/018/022/026
CGCA
HD/NK/HD/NI
βγβε





TAL012/018/022/027
CGCC
HD/NK/HD/HD
βγβε





TAL012/018/022/028
CGCG
HD/NK/HD/NK
βγβε





TAL012/018/022/029
CGCG
HD/NK/HD/NN
βγβε





TAL012/018/022/030
CGCT
HD/NK/HD/NG
βγβε





TAL012/018/023/026
CGGA
HD/NK/NK/NI
βγβε





TAL012/018/023/027
CGGC
HD/NK/NK/HD
βγβε





TAL012/018/023/028
CGGG
HD/NK/NK/NK
βγβε





TAL012/018/023/029
CGGG
HD/NK/NK/NN
βγβε





TAL012/018/023/030
CGGT
HD/NK/NK/NG
βγβε





TAL012/018/024/026
CGGA
HD/NK/NN/NI
βγβε





TAL012/018/024/027
CGGC
HD/NK/NN/HD
βγβε





TAL012/018/024/028
CGGG
HD/NK/NN/NK
βγβε





TAL012/018/024/029
CGGG
HD/NK/NN/NN
βγβε





TAL012/018/024/030
CGGT
HD/NK/NN/NG
βγβε





TAL012/018/025/026
CGTA
HD/NK/NG/NI
βγβε





TAL012/018/025/027
CGTC
HD/NK/NG/HD
βγβε





TAL012/018/025/028
CGTG
HD/NK/NG/NK
βγβε





TAL012/018/025/029
CGTG
HD/NK/NG/NN
βγβε





TAL012/018/025/030
CGTT
HD/NK/NG/NG
βγβε





TAL012/019/021/026
CGAA
HD/NN/NI/NI
βγβε





TAL012/019/021/027
CGAC
HD/NN/NI/HD
βγβε





TAL012/019/021/028
CGAG
HD/NN/NI/NK
βγβε





TAL012/019/021/029
CGAG
HD/NN/NI/NN
βγβε





TAL012/019/021/030
CGAT
HD/NN/NI/NG
βγβε





TAL012/019/022/026
CGCA
HD/NN/HD/NI
βγβε





TAL012/019/022/027
CGCC
HD/NN/HD/HD
βγβε





TAL012/019/022/028
CGCG
HD/NN/HD/NK
βγβε





TAL012/019/022/029
CGCG
HD/NN/HD/NN
βγβε





TAL012/019/022/030
CGCT
HD/NN/HD/NG
βγβε





TAL012/019/023/026
CGGA
HD/NN/NK/NI
βγβε





TAL012/019/023/027
CGGC
HD/NN/NK/HD
βγβε





TAL012/019/023/028
CGGG
HD/NN/NK/NK
βγβε





TAL012/019/023/029
CGGG
HD/NN/NK/NN
βγβε





TAL012/019/023/030
CGGT
HD/NN/NK/NG
βγβε





TAL012/019/024/026
CGGA
HD/NN/NN/NI
βγβε





TAL012/019/024/027
CGGC
HD/NN/NN/HD
βγβε





TAL012/019/024/028
CGGG
HD/NN/NN/NK
βγβε





TAL012/019/024/029
CGGG
HD/NN/NN/NN
βγβε





TAL012/019/024/030
CGGT
HD/NN/NN/NG
βγβε





TAL012/019/025/026
CGTA
HD/NN/NG/NI
βγβε





TAL012/019/025/027
CGTC
HD/NN/NG/HD
βγβε





TAL012/019/025/028
CGTG
HD/NN/NG/NK
βγβε





TAL012/019/025/029
CGTG
HD/NN/NG/NN
βγβε





TAL012/019/025/030
CGTT
HD/NN/NG/NG
βγβε





TAL012/020/021/026
CTAA
HD/NG/NI/NI
βγβε





TAL012/020/021/027
CTAC
HD/NG/NI/HD
βγβε





TAL012/020/021/028
CTAG
HD/NG/NI/NK
βγβε





TAL012/020/021/029
CTAG
HD/NG/NI/NN
βγβε





TAL012/020/021/030
CTAT
HD/NG/NI/NG
βγβε





TAL012/020/022/026
CTCA
HD/NG/HD/NI
βγβε





TAL012/020/022/027
CTCC
HD/NG/HD/HD
βγβε





TAL012/020/022/028
CTCG
HD/NG/HD/NK
βγβε





TAL012/020/022/029
CTCG
HD/NG/HD/NN
βγβε





TAL012/020/022/030
CTCT
HD/NG/HD/NG
βγβε





TAL012/020/023/026
CTGA
HD/NG/NK/N
βγβε





TAL012/020/023/027
CTGC
HD/NG/NK/HD
βγβε





TAL012/020/023/028
CTGG
HD/NG/NK/NK
βγβε





TAL012/020/023/029
CTGG
HD/NG/NK/NN
βγβε





TAL012/020/023/030
CTGT
HD/NG/NK/NG
βγβε





TAL012/020/024/026
CTGA
HD/NG/NN/NI
βγβε





TAL012/020/024/027
CTGC
HD/NG/NN/HD
βγβε





TAL012/020/024/028
CTGG
HD/NG/NN/NK
βγβε





TAL012/020/024/029
CTGG
HD/NG/NN/NN
βγβε





TAL012/020/024/030
CTGT
HD/NG/NN/NG
βγβε





TAL012/020/025/026
CTTA
HD/NG/NG/NI
βγβε





TAL012/020/025/027
CTTC
HD/NG/NG/HD
βγβε





TAL012/020/025/028
CTTG
HD/NG/NG/NK
βγβε





TAL012/020/025/029
CTTG
HD/NG/NG/NN
βγβε





TAL012/020/025/030
CTTT
HD/NG/NG/NG
βγβε





TAL013/016/021/026
GAAA
NK/NI/NI/NI
βγβε





TAL013/016/021/027
GAAC
NK/NI/NI/HD
βγβε





TAL013/016/021/028
GAAG
NK/NI/NI/NK
βγβε





TAL013/016/021/029
GAAG
NK/NI/NI/NN
βγβε





TAL013/016/021/030
GAAT
NK/NI/NI/NG
βγβε





TAL013/016/022/026
GACA
NK/NI/HD/NI
βγβε





TAL013/016/022/027
GACC
NK/NI/HD/HD
βγβε





TAL013/016/022/028
GACG
NK/NI/HD/NK
βγβε





TAL013/016/022/029
GACG
NK/NI/HD/NN
βγβε





TAL013/016/022/030
GACT
NK/NI/HD/NG
βγβε





TAL013/016/023/026
GAGA
NK/NI/NK/NI
βγβε





TAL013/016/023/027
GAGC
NK/NI/NK/HD
βγβε





TAL013/016/023/028
GAGG
NK/NI/NK/NK
βγβε





TAL013/016/023/029
GAGG
NK/NI/NK/NN
βγβε





TAL013/016/023/030
GAGT
NK/NI/NK/NG
βγβε





TAL013/016/024/026
GAGA
NK/NI/NN/NI
βγβε





TAL013/016/024/027
GAGC
NK/NI/NN/HD
βγβε





TAL013/016/024/028
GAGG
NK/NI/NN/NK
βγβε





TAL013/016/024/029
GAGG
NK/NI/NN/NN
βγβε





TAL013/016/024/030
GAGT
NK/NI/NN/NG
βγβε





TAL013/016/025/026
GATA
NK/NI/NG/NI
βγβε





TAL013/016/025/027
GATC
NK/NI/NG/HD
βγβε





TAL013/016/025/028
GATG
NK/NI/NG/NK
βγβε





TAL013/016/025/029
GATG
NK/NI/NG/NN
βγβε





TAL013/016/025/030
GATT
NK/NI/NG/NG
βγβε





TAL013/017/021/026
GCAA
NK/HD/NI/NI
βγβε





TAL013/017/021/027
GCAC
NK/HD/NI/HD
βγβε





TAL013/017/021/028
GCAG
NK/HD/NI/NK
βγβε





TAL013/017/021/029
GCAG
NK/HD/NI/NN
βγβε





TAL013/017/021/030
GCAT
NK/HD/NI/NG
βγβε





TAL013/017/022/026
GCCA
NK/HD/HD/NI
βγβε





TAL013/017/022/027
GCCC
NK/HD/HD/HD
βγβε





TAL013/017/022/028
GCCG
NK/HD/HD/NK
βγβε





TAL013/017/022/029
GCCG
NK/HD/HD/NN
βγβε





TAL013/017/022/030
GCCT
NK/HD/HD/NG
βγβε





TAL013/017/023/026
GCGA
NK/HD/NK/NI
βγβε





TAL013/017/023/027
GCGC
NK/HD/NK/HD
βγβε





TAL013/017/023/028
GCGG
NK/HD/NK/NK
βγβε





TAL013/017/023/029
GCGG
NK/HD/NK/NN
βγβε





TAL013/017/023/030
GCGT
NK/HD/NK/NG
βγβε





TAL013/017/024/026
GCGA
NK/HD/NN/NI
βγβε





TAL013/017/024/027
GCGC
NK/HD/NN/HD
βγβε





TAL013/017/024/028
GCGG
NK/HD/NN/NK
βγβε





TAL013/017/024/029
GCGG
NK/HD/NN/NN
βγβε





TAL013/017/024/030
GCGT
NK/HD/NN/NG
βγβε





TAL013/017/025/026
GCTA
NK/HD/NG/NI
βγβε





TAL013/017/025/027
GCTC
NK/HD/NG/HD
βγβε





TAL013/017/025/028
GCTG
NK/HD/NG/NK
βγβε





TAL013/017/025/029
GCTG
NK/HD/NG/NN
βγβε





TAL013/017/025/030
GCTT
NK/HD/NG/NG
βγβε





TAL013/018/021/026
GGAA
NK/NK/NI/NI
βγβε





TAL013/018/021/027
GGAC
NK/NK/NI/HD
βγβε





TAL013/018/021/028
GGAG
NK/NK/NI/NK
βγβε





TAL013/018/021/029
GGAG
NK/NK/NI/NN
βγβε





TAL013/018/021/030
GGAT
NK/NK/NI/NG
βγβε





TAL013/018/022/026
GGCA
NK/NK/HD/NI
βγβε





TAL013/018/022/027
GGCC
NK/NK/HD/HD
βγβε





TAL013/018/022/028
GGCG
NK/NK/HD/NK
βγβε





TAL013/018/022/029
GGCG
NK/NK/HD/NN
βγβε





TAL013/018/022/030
GGCT
NK/NK/HD/NG
βγβε





TAL013/018/023/026
GGGA
NK/NK/NK/NI
βγβε





TAL013/018/023/027
GGGC
NK/NK/NK/HD
βγβε





TAL013/018/023/028
GGGG
NK/NK/NK/NK
βγβε





TAL013/018/023/029
GGGG
NK/NK/NK/NN
βγβε





TAL013/018/023/030
GGGT
NK/NK/NK/NG
βγβε





TAL013/018/024/026
GGGA
NK/NK/NN/NI
βγβε





TAL013/018/024/027
GGGC
NK/NK/NN/HD
βγβε





TAL013/018/024/028
GGGG
NK/NK/NN/NK
βγβε





TAL013/018/024/029
GGGG
NK/NK/NN/NN
βγβε





TAL013/018/024/030
GGGT
NK/NK/NN/NG
βγβε





TAL013/018/025/026
GGTA
NK/NK/NG/NI
βγβε





TAL013/018/025/027
GGTC
NK/NK/NG/HD
βγβε





TAL013/018/025/028
GGTG
NK/NK/NG/NK
βγβε





TAL013/018/025/029
GGTG
NK/NK/NG/NN
βγβε





TAL013/018/025/030
GGTT
NK/NK/NG/NG
βγβε





TAL013/019/021/026
GGAA
NK/NN/NI/NI
βγβε





TAL013/019/021/027
GGAC
NK/NN/NI/HD
βγβε





TAL013/019/021/028
GGAG
NK/NN/NI/NK
βγβε





TAL013/019/021/029
GGAG
NK/NN/NI/NN
βγβε





TAL013/019/021/030
GGAT
NK/NN/NI/NG
βγβε





TAL013/019/022/026
GGCA
NK/NN/HD/NI
βγβε





TAL013/019/022/027
GGCC
NK/NN/HD/HD
βγβε





TAL013/019/022/028
GGCG
NK/NN/HD/NK
βγβε





TAL013/019/022/029
GGCG
NK/NN/HD/NN
βγβε





TAL013/019/022/030
GGCT
NK/NN/HD/NG
βγβε





TAL013/019/023/026
GGGA
NK/NN/NK/NI
βγβε





TAL013/019/023/027
GGGC
NK/NN/NK/HD
βγβε





TAL013/019/023/028
GGGG
NK/NN/NK/NK
βγβε





TAL013/019/023/029
GGGG
NK/NN/NK/NN
βγβε





TAL013/019/023/030
GGGT
NK/NN/NK/NG
βγβε





TAL013/019/024/026
GGGA
NK/NN/NN/NI
βγβε





TAL013/019/024/027
GGGC
NK/NN/NN/HD
βγβε





TAL013/019/024/028
GGGG
NK/NN/NN/NK
βγβε





TAL013/019/024/029
GGGG
NK/NN/NN/NN
βγβε





TAL013/019/024/030
GGGT
NK/NN/NN/NG
βγβε





TAL013/019/025/026
GGTA
NK/NN/NG/NI
βγβε





TAL013/019/025/027
GGTC
NK/NN/NG/HD
βγβε





TAL013/019/025/028
GGTG
NK/NN/NG/NK
βγβε





TAL013/019/025/029
GGTG
NK/NN/NG/NN
βγβε





TAL013/019/025/030
GGTT
NK/NN/NG/NG
βγβε





TAL013/020/021/026
GTAA
NK/NG/NI/NI
βγβε





TAL013/020/021/027
GTAC
NK/NG/NI/HD
βγβε





TAL013/020/021/028
GTAG
NK/NG/NI/NK
βγβε





TAL013/020/021/029
GTAG
NK/NG/NI/NN
βγβε





TAL013/020/021/030
GTAT
NK/NG/NI/NG
βγβε





TAL013/020/022/026
GTCA
NK/NG/HD/NI
βγβε





TAL013/020/022/027
GTCC
NK/NG/HD/HD
βγβε





TAL013/020/022/028
GTCG
NK/NG/HD/NK
βγβε





TAL013/020/022/029
GTCG
NK/NG/HD/NN
βγβε





TAL013/020/022/030
GTCT
NK/NG/HD/NG
βγβε





TAL013/020/023/026
GTGA
NK/NG/NK/NI
βγβε





TAL013/020/023/027
GTGC
NK/NG/NK/HD
βγβε





TAL013/020/023/028
GTGG
NK/NG/NK/NK
βγβε





TAL013/020/023/029
GTGG
NK/NG/NK/NN
βγβε





TAL013/020/023/030
GTGT
NK/NG/NK/NG
βγβε





TAL013/020/024/026
GTGA
NK/NG/NN/NI
βγβε





TAL013/020/024/027
GTGC
NK/NG/NN/HD
βγβε





TAL013/020/024/028
GTGG
NK/NG/NN/NK
βγβε





TAL013/020/024/029
GTGG
NK/NG/NN/NN
βγβε





TAL013/020/024/030
GTGT
NK/NG/NN/NG
βγβε





TAL013/020/025/026
GTTA
NK/NG/NG/NI
βγβε





TAL013/020/025/027
GTTC
NK/NG/NG/HD
βγβε





TAL013/020/025/028
GTTG
NK/NG/NG/NK
βγβε





TAL013/020/025/029
GTTG
NK/NG/NG/NN
βγβε





TAL013/020/025/030
GTTT
NK/NG/NG/NG
βγβε





TAL014/016/021/026
GAAA
NN/NI/NI/NI
βγβε





TAL014/016/021/027
GAAC
NN/NI/NI/HD
βγβε





TAL014/016/021/028
GAAG
NN/NI/NI/NK
βγβε





TAL014/016/021/029
GAAG
NN/NI/NI/NN
βγβε





TAL014/016/021/030
GAAT
NN/NI/NI/NG
βγβε





TAL014/016/022/026
GACA
NN/NI/HD/NI
βγβε





TAL014/016/022/027
GACC
NN/NI/HD/HD
βγβε





TAL014/016/022/028
GACG
NN/NI/HD/NK
βγβε





TAL014/016/022/029
GACG
NN/NI/HD/NN
βγβε





TAL014/016/022/030
GACT
NN/NI/HD/NG
βγβε





TAL014/016/023/026
GAGA
NN/NI/NK/NI
βγβε





TAL014/016/023/027
GAGC
NN/NI/NK/HD
βγβε





TAL014/016/023/028
GAGG
NN/NI/NK/NK
βγβε





TAL014/016/023/029
GAGG
NN/NI/NK/NN
βγβε





TAL014/016/023/030
GAGT
NN/NI/NK/NG
βγβε





TAL014/016/024/026
GAGA
NN/NI/NN/NI
βγβε





TAL014/016/024/027
GAGC
NN/NI/NN/HD
βγβε





TAL014/016/024/028
GAGG
NN/NI/NN/NK
βγβε





TAL014/016/024/029
GAGG
NN/NI/NN/NN
βγβε





TAL014/016/024/030
GAGT
NN/NI/NN/NG
βγβε





TAL014/016/025/026
GATA
NN/NI/NG/NI
βγβε





TAL014/016/025/027
GATC
NN/NI/NG/HD
βγβε





TAL014/016/025/028
GATG
NN/NI/NG/NK
βγβε





TAL014/016/025/029
GATG
NN/NI/NG/NN
βγβε





TAL014/016/025/030
GATT
NN/NI/NG/NG
βγβε





TAL014/017/021/026
GCAA
NN/HD/NI/NI
βγβε





TAL014/017/021/027
GCAC
NN/HD/NI/HD
βγβε





TAL014/017/021/028
GCAG
NN/HD/NI/NK
βγβε





TAL014/017/021/029
GCAG
NN/HD/NI/NN
βγβε





TAL014/017/021/030
GCAT
NN/HD/NI/NG
βγβε





TAL014/017/022/026
GCCA
NN/HD/HD/NI
βγβε





TAL014/017/022/027
GCCC
NN/HD/HD/HD
βγβε





TAL014/017/022/028
GCCG
NN/HD/HD/NK
βγβε





TAL014/017/022/029
GCCG
NN/HD/HD/NN
βγβε





TAL014/017/022/030
GCCT
NN/HD/HD/NG
βγβε





TAL014/017/023/026
GCGA
NN/HD/NK/NI
βγβε





TAL014/017/023/027
GCGC
NN/HD/NK/HD
βγβε





TAL014/017/023/028
GCGG
NN/HD/NK/NK
βγβε





TAL014/017/023/029
GCGG
NN/HD/NK/NN
βγβε





TAL014/017/023/030
GCGT
NN/HD/NK/NG
βγβε





TAL014/017/024/026
GCGA
NN/HD/NN/NI
βγβε





TAL014/017/024/027
GCGC
NN/HD/NN/HD
βγβε





TAL014/017/024/028
GCGG
NN/HD/NN/NK
βγβε





TAL014/017/024/029
GCGG
NN/HD/NN/NN
βγβε





TAL014/017/024/030
GCGT
NN/HD/NN/NG
βγβε





TAL014/017/025/026
GCTA
NN/HD/NG/NI
βγβε





TAL014/017/025/027
GCTC
NN/HD/NG/HD
βγβε





TAL014/017/025/028
GCTG
NN/HD/NG/NK
βγβε





TAL014/017/025/029
GCTG
NN/HD/NG/NN
βγβε





TAL014/017/025/030
GCTT
NN/HD/NG/NG
βγβε





TAL014/018/021/026
GGAA
NN/NK/NI/NI
βγβε





TAL014/018/021/027
GGAC
NN/NK/NI/HD
βγβε





TAL014/018/021/028
GGAG
NN/NK/NI/NK
βγβε





TAL014/018/021/029
GGAG
NN/NK/NI/NN
βγβε





TAL014/018/021/030
GGAT
NN/NK/NI/NG
βγβε





TAL014/018/022/026
GGCA
NN/NK/HD/NI
βγβε





TAL014/018/022/027
GGCC
NN/NK/HD/HD
βγβε





TAL014/018/022/028
GGCG
NN/NK/HD/NK
βγβε





TAL014/018/022/029
GGCG
NN/NK/HD/NN
βγβε





TAL014/018/022/030
GGCT
NN/NK/HD/NG
βγβε





TAL014/018/023/026
GGGA
NN/NK/NK/NI
βγβε





TAL014/018/023/027
GGGC
NN/NK/NK/HD
βγβε





TAL014/018/023/028
GGGG
NN/NK/NK/NK
βγβε





TAL014/018/023/029
GGGG
NN/NK/NK/NN
βγβε





TAL014/018/023/030
GGGT
NN/NK/NK/NG
βγβε





TAL014/018/024/026
GGGA
NN/NK/NN/NI
βγβε





TAL014/018/024/027
GGGC
NN/NK/NN/HD
βγβε





TAL014/018/024/028
GGGG
NN/NK/NN/NK
βγβε





TAL014/018/024/029
GGGG
NN/NK/NN/NN
βγβε





TAL014/018/024/030
GGGT
NN/NK/NN/NG
βγβε





TAL014/018/025/026
GGTA
NN/NK/NG/NI
βγβε





TAL014/018/025/027
GGTC
NN/NK/NG/HD
βγβε





TAL014/018/025/028
GGTG
NN/NK/NG/NK
βγβε





TAL014/018/025/029
GGTG
NN/NK/NG/NN
βγβε





TAL014/018/025/030
GGTT
NN/NK/NG/NG
βγβε





TAL014/019/021/026
GGAA
NN/NN/NI/NI
βγβε





TAL014/019/021/027
GGAC
NN/NN/NI/HD
βγβε





TAL014/019/021/028
GGAG
NN/NN/NI/NK
βγβε





TAL014/019/021/029
GGAG
NN/NN/NI/NN
βγβε





TAL014/019/021/030
GGAT
NN/NN/NI/NG
βγβε





TAL014/019/022/026
GGCA
NN/NN/HD/NI
βγβε





TAL014/019/022/027
GGCC
NN/NN/HD/HD
βγβε





TAL014/019/022/028
GGCG
NN/NN/HD/NK
βγβε





TAL014/019/022/029
GGCG
NN/NN/HD/NN
βγβε





TAL014/019/022/030
GGCT
NN/NN/HD/NG
βγβε





TAL014/019/023/026
GGGA
NN/NN/NK/NI
βγβε





TAL014/019/023/027
GGGC
NN/NN/NK/HD
βγβε





TAL014/019/023/028
GGGG
NN/NN/NK/NK
βγβε





TAL014/019/023/029
GGGG
NN/NN/NK/NN
βγβε





TAL014/019/023/030
GGGT
NN/NN/NK/NG
βγβε





TAL014/019/024/026
GGGA
NN/NN/NN/NI
βγβε





TAL014/019/024/027
GGGC
NN/NN/NN/HD
βγβε





TAL014/019/024/028
GGGG
NN/NN/NN/NK
βγβε





TAL014/019/024/029
GGGG
NN/NN/NN/NN
βγβε





TAL014/019/024/030
GGGT
NN/NN/NN/NG
βγβε





TAL014/019/025/026
GGTA
NN/NN/NG/NI
βγβε





TAL014/019/025/027
GGTC
NN/NN/NG/HD
βγβε





TAL014/019/025/028
GGTG
NN/NN/NG/NK
βγβε





TAL014/019/025/029
GGTG
NN/NN/NG/NN
βγβε





TAL014/019/025/030
GGTT
NN/NN/NG/NG
βγβε





TAL014/020/021/026
GTAA
NN/NG/NI/NI
βγβε





TAL014/020/021/027
GTAC
NN/NG/NI/HD
βγβε





TAL014/020/021/028
GTAG
NN/NG/NI/NK
βγβε





TAL014/020/021/029
GTAG
NN/NG/NI/NN
βγβε





TAL014/020/021/030
GTAT
NN/NG/NI/NG
βγβε





TAL014/020/022/026
GTCA
NN/NG/HD/NI
βγβε





TAL014/020/022/027
GTCC
NN/NG/HD/HD
βγβε





TAL014/020/022/028
GTCG
NN/NG/HD/NK
βγβε





TAL014/020/022/029
GTCG
NN/NG/HD/NN
βγβε





TAL014/020/022/030
GTCT
NN/NG/HD/NG
βγβε





TAL014/020/023/026
GTGA
NN/NG/NK/NI
βγβε





TAL014/020/023/027
GTGC
NN/NG/NK/HD
βγβε





TAL014/020/023/028
GTGG
NN/NG/NK/NK
βγβε





TAL014/020/023/029
GTGG
NN/NG/NK/NN
βγβε





TAL014/020/023/030
GTGT
NN/NG/NK/NG
βγβε





TAL014/020/024/026
GTGA
NN/NG/NN/NI
βγβε





TAL014/020/024/027
GTGC
NN/NG/NN/HD
βγβε





TAL014/020/024/028
GTGG
NN/NG/NN/NK
βγβε





TAL014/020/024/029
GTGG
NN/NG/NN/NN
βγβε





TAL014/020/024/030
GTGT
NN/NG/NN/NG
βγβε





TAL014/020/025/026
GTTA
NN/NG/NG/NI
βγβε





TAL014/020/025/027
GTTC
NN/NG/NG/HD
βγβε





TAL014/020/025/028
GTTG
NN/NG/NG/NK
βγβε





TAL014/020/025/029
GTTG
NN/NG/NG/NN
βγβε





TAL014/020/025/030
GTTT
NN/NG/NG/NG
βγβε





TAL015/016/021/026
TAAA
NG/NI/NI/NI
βγβε





TAL015/016/021/027
TAAC
NG/NI/NI/HD
βγβε





TAL015/016/021/028
TAAG
NG/NI/NI/NK
βγβε





TAL015/016/021/029
TAAG
NG/NI/NI/NN
βγβε





TAL015/016/021/030
TAAT
NG/NI/NI/NG
βγβε





TAL015/016/022/026
TACA
NG/NI/HD/NI
βγβε





TAL015/016/022/027
TACC
NG/NI/HD/HD
βγβε





TAL015/016/022/028
TACG
NG/NI/HD/NK
βγβε





TAL015/016/022/029
TACG
NG/NI/HD/NN
βγβε





TAL015/016/022/030
TACT
NG/NI/HD/NG
βγβε





TAL015/016/023/026
TAGA
NG/NI/NK/NI
βγβε





TAL015/016/023/027
TAGC
NG/NI/NK/HD
βγβε





TAL015/016/023/028
TAGG
NG/NI/NK/NK
βγβε





TAL015/016/023/029
TAGG
NG/NI/NK/NN
βγβε





TAL015/016/023/030
TAGT
NG/NI/NK/NG
βγβε





TAL015/016/024/026
TAGA
NG/NI/NN/NI
βγβε





TAL015/016/024/027
TAGC
NG/NI/NN/HD
βγβε





TAL015/016/024/028
TAGG
NG/NI/NN/NK
βγβε





TAL015/016/024/029
TAGG
NG/NI/NN/NN
βγβε





TAL015/016/024/030
TAGT
NG/NI/NN/NG
βγβε





TAL015/016/025/026
TATA
NG/NI/NG/NI
βγβε





TAL015/016/025/027
TATC
NG/NI/NG/HD
βγβε





TAL015/016/025/028
TATG
NG/NI/NG/NK
βγβε





TAL015/016/025/029
TATG
NG/NI/NG/NN
βγβε





TAL015/016/025/030
TATT
NG/NI/NG/NG
βγβε





TAL015/017/021/026
TCAA
NG/HD/NI/NI
βγβε





TAL015/017/021/027
TCAC
NG/HD/NI/HD
βγβε





TAL015/017/021/028
TCAG
NG/HD/NI/NK
βγβε





TAL015/017/021/029
TCAG
NG/HD/NI/NN
βγβε





TAL015/017/021/030
TCAT
NG/HD/NI/NG
βγβε





TAL015/017/022/026
TCCA
NG/HD/HD/NI
βγβε





TAL015/017/022/027
TCCC
NG/HD/HD/HD
βγβε





TAL015/017/022/028
TCCG
NG/HD/HD/NK
βγβε





TAL015/017/022/029
TCCG
NG/HD/HD/NN
βγβε





TAL015/017/022/030
TCCT
NG/HD/HD/NG
βγβε





TAL015/017/023/026
TCGA
NG/HD/NK/NI
βγβε





TAL015/017/023/027
TCGC
NG/HD/NK/HD
βγβε





TAL015/017/023/028
TCGG
NG/HD/NK/NK
βγβε





TAL015/017/023/029
TCGG
NG/HD/NK/NN
βγβε





TAL015/017/023/030
TCGT
NG/HD/NK/NG
βγβε





TAL015/017/024/026
TCGA
NG/HD/NN/NI
βγβε





TAL015/017/024/027
TCGC
NG/HD/NN/HD
βγβε





TAL015/017/024/028
TCGG
NG/HD/NN/NK
βγβε





TAL015/017/024/029
TCGG
NG/HD/NN/NN
βγβε





TAL015/017/024/030
TCGT
NG/HD/NN/NG
βγβε





TAL015/017/025/026
TCTA
NG/HD/NG/NI
βγβε





TAL015/017/025/027
TCTC
NG/HD/NG/HD
βγβε





TAL015/017/025/028
TCTG
NG/HD/NG/NK
βγβε





TAL015/017/025/029
TCTG
NG/HD/NG/NN
βγβε





TAL015/017/025/030
TCTT
NG/HD/NG/NG
βγβε





TAL015/018/021/026
TGAA
NG/NK/NI/NI
βγβε





TAL015/018/021/027
TGAC
NG/NK/NI/HD
βγβε





TAL015/018/021/028
TGAG
NG/NK/NI/NK
βγβε





TAL015/018/021/029
TGAG
NG/NK/NI/NN
βγβε





TAL015/018/021/030
TGAT
NG/NK/NI/NG
βγβε





TAL015/018/022/026
TGCA
NG/NK/HD/NI
βγβε





TAL015/018/022/027
TGCC
NG/NK/HD/HD
βγβε





TAL015/018/022/028
TGCG
NG/NK/HD/NK
βγβε





TAL015/018/022/029
TGCG
NG/NK/HD/NN
βγβε





TAL015/018/022/030
TGCT
NG/NK/HD/NG
βγβε





TAL015/018/023/026
TGGA
NG/NK/NK/NI
βγβε





TAL015/018/023/027
TGGC
NG/NK/NK/HD
βγβε





TAL015/018/023/028
TGGG
NG/NK/NK/NK
βγβε





TAL015/018/023/029
TGGG
NG/NK/NK/NN
βγβε





TAL015/018/023/030
TGGT
NG/NK/NK/NG
βγβε





TAL015/018/024/026
TGGA
NG/NK/NN/NI
βγβε





TAL015/018/024/027
TGGC
NG/NK/NN/HD
βγβε





TAL015/018/024/028
TGGG
NG/NK/NN/NK
βγβε





TAL015/018/024/029
TGGG
NG/NK/NN/NN
βγβε





TAL015/018/024/030
TGGT
NG/NK/NN/NG
βγβε





TAL015/018/025/026
TGTA
NG/NK/NG/NI
βγβε





TAL015/018/025/027
TGTC
NG/NK/NG/HD
βγβε





TAL015/018/025/028
TGTG
NG/NK/NG/NK
βγβε





TAL015/018/025/029
TGTG
NG/NK/NG/NN
βγβε





TAL015/018/025/030
TGTT
NG/NK/NG/NG
βγβε





TAL015/019/021/026
TGAA
NG/NN/NI/NI
βγβε





TAL015/019/021/027
TGAC
NG/NN/NI/HD
βγβε





TAL015/019/021/028
TGAG
NG/NN/NI/NK
βγβε





TAL015/019/021/029
TGAG
NG/NN/NI/NN
βγβε





TAL015/019/021/030
TGAT
NG/NN/NI/NG
βγβε





TAL015/019/022/026
TGCA
NG/NN/HD/NI
βγβε





TAL015/019/022/027
TGCC
NG/NN/HD/HD
βγβε





TAL015/019/022/028
TGCG
NG/NN/HD/NK
βγβε





TAL015/019/022/029
TGCG
NG/NN/HD/NN
βγβε





TAL015/019/022/030
TGCT
NG/NN/HD/NG
βγβε





TAL015/019/023/026
TGGA
NG/NN/NK/NI
βγβε





TAL015/019/023/027
TGGC
NG/NN/NK/HD
βγβε





TAL015/019/023/028
TGGG
NG/NN/NK/NK
βγβε





TAL015/019/023/029
TGGG
NG/NN/NK/NN
βγβε





TAL015/019/023/030
TGGT
NG/NN/NK/NG
βγβε





TAL015/019/024/026
TGGA
NG/NN/NN/NI
βγβε





TAL015/019/024/027
TGGC
NG/NN/NN/HD
βγβε





TAL015/019/024/028
TGGG
NG/NN/NN/NK
βγβε





TAL015/019/024/029
TGGG
NG/NN/NN/NN
βγβε





TAL015/019/024/030
TGGT
NG/NN/NN/NG
βγβε





TAL015/019/025/026
TGTA
NG/NN/NG/NI
βγβε





TAL015/019/025/027
TGTC
NG/NN/NG/HD
βγβε





TAL015/019/025/028
TGTG
NG/NN/NG/NK
βγβε





TAL015/019/025/029
TGTG
NG/NN/NG/NN
βγβε





TAL015/019/025/030
TGTT
NG/NN/NG/NG
βγβε





TAL015/020/021/026
TTAA
NG/NG/NI/NI
βγβε





TAL015/020/021/027
TTAC
NG/NG/NI/HD
βγβε





TAL015/020/021/028
TTAG
NG/NG/NI/NK
βγβε





TAL015/020/021/029
TTAG
NG/NG/NI/NN
βγβε





TAL015/020/021/030
TTAT
NG/NG/NI/NG
βγβε





TAL015/020/022/026
TTCA
NG/NG/HD/NI
βγβε





TAL015/020/022/027
TTCC
NG/NG/HD/HD
βγβε





TAL015/020/022/028
TTCG
NG/NG/HD/NK
βγβε





TAL015/020/022/029
TTCG
NG/NG/HD/NN
βγβε





TAL015/020/022/030
TTCT
NG/NG/HD/NG
βγβε





TAL015/020/023/026
TTGA
NG/NG/NK/NI
βγβε





TAL015/020/023/027
TTGC
NG/NG/NK/HD
βγβε





TAL015/020/023/028
TTGG
NG/NG/NK/NK
βγβε





TAL015/020/023/029
TTGG
NG/NG/NK/NN
βγβε





TAL015/020/023/030
TTGT
NG/NG/NK/NG
βγβε





TAL015/020/024/026
TTGA
NG/NG/NN/NI
βγβε





TAL015/020/024/027
TTGC
NG/NG/NN/HD
βγβε





TAL015/020/024/028
TTGG
NG/NG/NN/NK
βγβε





TAL015/020/024/029
TTGG
NG/NG/NN/NN
βγβε





TAL015/020/024/030
TTGT
NG/NG/NN/NG
βγβε





TAL015/020/025/026
TTTA
NG/NG/NG/NI
βγβε





TAL015/020/025/027
TTTC
NG/NG/NG/HD
βγβε





TAL015/020/025/028
TTTG
NG/NG/NG/NK
βγβε





TAL015/020/025/029
TTTG
NG/NG/NG/NN
βγβε





TAL015/020/025/030
TTTT
NG/NG/NG/NG
βγβε





TAL011/016
AA
NI/NI
βγ





TAL011/017
AC
NI/HD
βγ





TAL011/018
AG
NI/NK
βγ





TAL011/019
AG
NI/NN
βγ





TAL011/020
AT
NI/NG
βγ





TAL012/016
CA
HD/NI
βγ





TAL012/017
CC
HD/HD
βγ





TAL012/018
CG
HD/NK
βγ





TAL012/019
CG
HD/NN
βγ





TAL012/020
CT
HD/NG
βγ





TAL013/016
GA
NK/NI
βγ





TAL013/017
GC
NK/HD
βγ





TAL013/018
GG
NK/NK
βγ





TAL013/019
GG
NK/NN
βγ





TAL013/020
GT
NK/NG
βγ





TAL014/016
GA
NN/NI
βγ





TAL014/017
GC
NN/HD
βγ





TAL014/018
GG
NN/NK
βγ





TAL014/019
GG
NN/NN
βγ





TAL014/020
GT
NN/NG
βγ





TAL015/016
TA
NG/NI
βγ





TAL015/017
TC
NG/HD
βγ





TAL015/018
TG
NG/NK
βγ





TAL015/019
TG
NG/NN
βγ





TAL015/020
TT
NG/NG
βγ





TAL011/016/021
AAA
NI/NI/NI
βγδ





TAL011/016/022
AAC
NI/NI/HD
βγδ





TAL011/016/023
AAG
NI/NI/NK
βγδ





TAL011/016/024
AAG
NI/NI/NN
βγδ





TAL011/016/025
AAT
NI/NI/NG
βγδ





TAL011/017/021
ACA
NI/HD/NI
βγδ





TAL011/017/022
ACC
NI/HD/HD
βγδ





TAL011/017/023
ACG
NI/HD/NK
βγδ





TAL011/017/024
ACG
NI/HD/NN
βγδ





TAL011/017/025
ACT
NI/HD/NG
βγδ





TAL011/018/021
AGA
NI/NK/NI
βγδ





TAL011/018/022
AGC
NI/NK/HD
βγδ





TAL011/018/023
AGG
NI/NK/NK
βγδ





TAL011/018/024
AGG
NI/NK/NN
βγδ





TAL011/018/025
AGT
NI/NK/NG
βγδ





TAL011/019/021
AGA
NI/NN/NI
βγδ





TAL011/019/022
AGC
NI/NN/HD
βγδ





TAL011/019/023
AGG
NI/NN/NK
βγδ





TAL011/019/024
AGG
NI/NN/NN
βγδ





TAL011/019/025
AGT
NI/NN/NG
βγδ





TAL011/020/021
ATA
NI/NG/NI
βγδ





TAL011/020/022
ATC
NI/NG/HD
βγδ





TAL011/020/023
ATG
NI/NG/NK
βγδ





TAL011/020/024
ATG
NI/NG/NN
βγδ





TAL011/020/025
ATT
NI/NG/NG
βγδ





TAL012/016/021
CAA
HD/NI/NI
βγδ





TAL012/016/022
CAC
HD/NI/HD
βγδ





TAL012/016/023
CAG
HD/NI/NK
βγδ





TAL012/016/024
CAG
HD/NI/NN
βγδ





TAL012/016/025
CAT
HD/NI/NG
βγδ





TAL012/017/021
CCA
HD/HD/NI
βγδ





TAL012/017/022
CCC
HD/HD/HD
βγδ





TAL012/017/023
CCG
HD/HD/NK
βγδ





TAL012/017/024
CCG
HD/HD/NN
βγδ





TAL012/017/025
CCT
HD/HD/NG
βγδ





TAL012/018/021
CGA
HD/NK/NI
βγδ





TAL012/018/022
CGC
HD/NK/HD
βγδ





TAL012/018/023
CGG
HD/NK/NK
βγδ





TAL012/018/024
CGG
HD/NK/NN
βγδ





TAL012/018/025
CGT
HD/NK/NG
βγδ





TAL012/019/021
CGA
HD/NN/NI
βγδ





TAL012/019/022
CGC
HD/NN/HD
βγδ





TAL012/019/023
CGG
HD/NN/NK
βγδ





TAL012/019/024
CGG
HD/NN/NN
βγδ





TAL012/019/025
CGT
HD/NN/NG
βγδ





TAL012/020/021
CTA
HD/NG/NI
βγδ





TAL012/020/022
CTC
HD/NG/HD
βγδ





TAL012/020/023
CTG
HD/NG/NK
βγδ





TAL012/020/024
CTG
HD/NG/NN
βγδ





TAL012/020/025
CTT
HD/NG/NG
βγδ





TAL013/016/021
GAA
NK/NI/NI
βγδ





TAL013/016/022
GAC
NK/NI/HD
βγδ





TAL013/016/023
GAG
NK/NI/NK
βγδ





TAL013/016/024
GAG
NK/NI/NN
βγδ





TAL013/016/025
GAT
NK/NI/NG
βγδ





TAL013/017/021
GCA
NK/HD/NI
βγδ





TAL013/017/022
GCC
NK/HD/HD
βγδ





TAL013/017/023
GCG
NK/HD/NK
βγδ





TAL013/017/024
GCG
NK/HD/NN
βγδ





TAL013/017/025
GCT
NK/HD/NG
βγδ





TAL013/018/021
GGA
NK/NK/NI
βγδ





TAL013/018/022
GGC
NK/NK/HD
βγδ





TAL013/018/023
GGG
NK/NK/NK
βγδ





TAL013/018/024
GGG
NK/NK/NN
βγδ





TAL013/018/025
GGT
NK/NK/NG
βγδ





TAL013/019/021
GGA
NK/NN/NI
βγδ





TAL013/019/022
GGC
NK/NN/HD
βγδ





TAL013/019/023
GGG
NK/NN/NK
βγδ





TAL013/019/024
GGG
NK/NN/NN
βγδ





TAL013/019/025
GGT
NK/NN/NG
βγδ





TAL013/020/021
GTA
NK/NG/NI
βγδ





TAL013/020/022
GTC
NK/NG/HD
βγδ





TAL013/020/023
GTG
NK/NG/NK
βγδ





TAL013/020/024
GTG
NK/NG/NN
βγδ





TAL013/020/025
GTT
NK/NG/NG
βγδ





TAL014/016/021
GAA
NN/NI/NI
βγδ





TAL014/016/022
GAC
NN/NI/HD
βγδ





TAL014/016/023
GAG
NN/NI/NK
βγδ





TAL014/016/024
GAG
NN/NI/NN
βγδ





TAL014/016/025
GAT
NN/NI/NG
βγδ





TAL014/017/021
GCA
NN/HD/NI
βγδ





TAL014/017/022
GCC
NN/HD/HD
βγδ





TAL014/017/023
GCG
NN/HD/NK
βγδ





TAL014/017/024
GCG
NN/HD/NN
βγδ





TAL014/017/025
GCT
NN/HD/NG
βγδ





TAL014/018/021
GGA
NN/NK/NI
βγδ





TAL014/018/022
GGC
NN/NK/HD
βγδ





TAL014/018/023
GGG
NN/NK/NK
βγδ





TAL014/018/024
GGG
NN/NK/NN
βγδ





TAL014/018/025
GGT
NN/NK/NG
βγδ





TAL014/019/021
GGA
NN/NN/NI
βγδ





TAL014/019/022
GGC
NN/NN/HD
βγδ





TAL014/019/023
GGG
NN/NN/NK
βγδ





TAL014/019/024
GGG
NN/NN/NN
βγδ





TAL014/019/025
GGT
NN/NN/NG
βγδ





TAL014/020/021
GTA
NN/NG/NI
βγδ





TAL014/020/022
GTC
NN/NG/HD
βγδ





TAL014/020/023
GTG
NN/NG/NK
βγδ





TAL014/020/024
GTG
NN/NG/NN
βγδ





TAL014/020/025
GTT
NN/NG/NG
βγδ





TAL015/016/021
TAA
NG/NI/NI
βγδ





TAL015/016/022
TAC
NG/NI/HD
βγδ





TAL015/016/023
TAG
NG/NI/NK
βγδ





TAL015/016/024
TAG
NG/NI/NN
βγδ





TAL015/016/025
TAT
NG/NI/NG
βγδ





TAL015/017/021
TCA
NG/HD/NI
βγδ





TAL015/017/022
TCC
NG/HD/HD
βγδ





TAL015/017/023
TCG
NG/HD/NK
βγδ





TAL015/017/024
TCG
NG/HD/NN
βγδ





TAL015/017/025
TCT
NG/HD/NG
βγδ





TAL015/018/021
TGA
NG/NK/NI
βγδ





TAL015/018/022
TGC
NG/NK/HD
βγδ





TAL015/018/023
TGG
NG/NK/NK
βγδ





TAL015/018/024
TGG
NG/NK/NN
βγδ





TAL015/018/025
TGT
NG/NK/NG
βγδ





TAL015/019/021
TGA
NG/NN/NI
βγδ





TAL015/019/022
TGC
NG/NN/HD
βγδ





TAL015/019/023
TGG
NG/NN/NK
βγδ





TAL015/019/024
TGG
NG/NN/NN
βγδ





TAL015/019/025
TGT
NG/NN/NG
βγδ





TAL015/020/021
TTA
NG/NG/NI
βγδ





TAL015/020/022
TTC
NG/NG/HD
βγδ





TAL015/020/023
TTG
NG/NG/NK
βγδ





TAL015/020/024
TTG
NG/NG/NN
βγδ





TAL015/020/025
TTT
NG/NG/NG
βγδ





TAL011/031
AA
NI/NI
βγ′





TAL011/032
AC
NI/HD
βγ′





TAL011/033
AG
NI/NK
βγ′





TAL011/034
AG
NI/NN
βγ′





TAL011/035
AT
NI/NG
βγ′





TAL012/031
CA
HD/NI
βγ′





TAL012/032
CC
HD/HD
βγ′





TAL012/033
CG
HD/NK
βγ′





TAL012/034
CG
HD/NN
βγ′





TAL012/035
CT
HD/NG
βγ′





TAL013/031
GA
NK/NI
βγ′





TAL013/032
GC
NK/HD
βγ′





TAL013/033
GG
NK/NK
βγ′





TAL013/034
GG
NK/NN
βγ′





TAL013/035
GT
NK/NG
βγ′





TAL014/031
GA
NN/NI
βγ′





TAL014/032
GC
NN/HD
βγ′





TAL014/033
GG
NN/NK
βγ′





TAL014/034
GG
NN/NN
βγ′





TAL014/035
GT
NN/NG
βγ′





TAL015/031
TA
NG/NI
βγ′





TAL015/032
TC
NG/HD
βγ′





TAL015/033
TG
NG/NK
βγ′





TAL015/034
TG
NG/NN
βγ′





TAL015/035
TT
NG/NG
βγ′





TAL021/036
AA
NI/NI
δε′





TAL021/037
AC
NI/HD
δε′





TAL021/038
AG
NI/NK
δε′





TAL021/039
AG
NI/NN
δε′





TAL021/040
AT
NI/NG
δε′





TAL022/036
CA
HD/NI
δε′





TAL022/037
CC
HD/HD
δε′





TAL022/038
CG
HD/NK
δε′





TAL022/039
CG
HD/NN
δε′





TAL022/040
CT
HD/NG
δε′





TAL023/036
GA
NK/NI
δε′





TAL023/037
GC
NK/HD
δε′





TAL023/038
GG
NK/NK
δε′





TAL023/039
GG
NK/NN
δε′





TAL023/040
GT
NK/NG
δε′





TAL024/036
GA
NN/NI
δε′





TAL024/037
GC
NN/HD
δε′





TAL024/038
GG
NN/NK
δε′





TAL024/039
GG
NN/NN
δε′





TAL024/040
GT
NN/NG
δε′





TAL025/036
TA
NG/NI
δε′





TAL025/037
TC
NG/HD
δε′





TAL025/038
TG
NG/NK
δε′





TAL025/039
TG
NG/NN
δε′





TAL025/040
TT
NG/NG
δε′





TAL011
A
NI
β





TAL012
C
HD
β





TAL013
G
NK
β





TAL014
G
NN
β





TAL015
T
NG
β









To prepare DNA fragments encoding a units for use in assembly, 20 rounds of PCR were performed with each α unit plasmid as a template using primers oJS2581 (5′-Biotin-TCTAGAGAAGACAAGAACCTGACC-3′ (SEQ ID NO:237)) and oJS2582 (5′-GGATCCGGTCTCTTAAGGCCGTGG-3′ (SEQ ID NO:238)). The resulting PCR products were biotinylated on the 5′ end. Each α PCR product was then digested with 40 units of BsaI-HF restriction enzyme to generate 4 bp overhangs, purified using the QIAquick PCR purification kit (QIAGEN) according to manufacturer's instructions except that the final product was eluted in 50 μl of 0.1×EB.


To prepare DNA fragments encoding β, βγδε, βγδ, βγ, βγ*, and δε* repeats, 10 μg of each of these plasmids was digested with 50 units of BbsI restriction enzyme in NEBuffer 2 for 2 hours at 37° C. followed by serial restriction digests performed in NEBuffer 4 at 37° C. using 100 units each of XbaI, BamHI-HF, and SalI-HF enzymes that were added at 5 minute intervals. The latter set of restriction digestions were designed to cleave the plasmid backbone to ensure that this larger DNA fragment does not interfere with subsequent ligations performed during the assembly process. These restriction digest reactions were then purified using the QIAquick PCR purification kit (QIAGEN) according to manufacturer's instructions except that the final product was eluted in 180 μl of 0.1×EB.


All assembly steps were performed using a Sciclone G3 liquid handling workstation (Caliper) in 96-well plates and using a SPRIplate 96-ring magnet (Beckman Coulter Genomics) and a DynaMag-96 Side magnet (Life Technologies). In the first assembly step, a biotinylated α unit fragment was ligated to the first βγδε fragment and then the resulting αβγδε fragments are bound to Dynabeads MyOne C1 streptavidin-coated magnetic beads (Life Technologies) in 2× B&W Buffer (Life Technologies). Beads were then drawn to the side of the well by placing the plate on the magnet and then washed with 100 μl B&W buffer with 0.005% Tween 20 (Sigma) and again with 100 μl 0.1 mg/ml bovine serum albumin (BSA) (New England Biolabs). Additional βγδε fragments were ligated by removing the plate from the magnet, resuspending the beads in solution in each well, digesting the bead bound fragment with BsaI-HF restriction enzyme, placing the plate on the magnet, washing with 100 μl B&W/Tween20 followed by 100 μl of 0.1 mg/ml BSA, and then ligating the next fragment. This process was repeated multiple times with additional βγδε units to extend the bead-bound fragment. The last fragment to be ligated was always a β, βγ*, βγδ, or δε* unit to enable cloning of the full-length fragment into expression vectors (note that fragments that end with a δε* unit are always preceded by ligation of a βγ unit).


The final full-length bead-bound fragment was digested with 40 units of BsaI-HF restriction enzyme followed by 25 units of BbsI restriction enzyme (New England Biolabs). Digestion with BbsI released the fragment from the beads and generated a unique 5′ overhang for cloning of the fragment. Digestion with BsaI-HF resulted in creation of a unique 3′ overhang for cloning.


DNA fragments encoding the assembled TALE repeat arrays were subcloned into one of four TALEN expression vectors. Each of these vectors included a CMV promoter, a translational start codon optimized for mammalian cell expression, a triple FLAG epitope tag, a nuclear localization signal, amino acids 153 to 288 from the TALE 13 protein (Miller et al., 2011, Nat. Biotechnol., 29:143-148), two unique and closely positioned Type IIS BsmBI restriction sites, a 0.5 TALE repeat domain encoding one of four possible RVDs (NI, HD, NN, or NG for recognition of an A, C, or T nucleotide, respectively), amino acids 715 to 777 from the TALE 13 protein, and the wild-type FokI cleavage domain. All DNA fragments possessed overhangs that enable directional cloning into any of the four TALEN expression vectors that has been digested with BsmBI.


To prepare a TALEN expression vector for subcloning, 5 μg of plasmid DNA were digested with 50 units of BsmBI restriction enzyme (New England Biolabs) in NEBuffer 3 for 8 hours at 55 degrees C. Digested DNA was purified using 90 μl of Ampure XP beads (Agencourt) according to manufacturer's instructions and diluted to a final concentration of 5 ng/μl in 1 mM TrisHCl. The assembled TALE repeat arrays were ligated into TALEN expression vectors using 400 U of T4 DNA Ligase (New England Biolabs). Ligation products were transformed into chemically competent XL-1 Blue cells. Six colonies were picked for each ligation and plasmid DNA isolated by an alkaline lysis miniprep procedure. Simultaneously, the same six colonies were screened by PCR using primers oSQT34 (5′-GACGGTGGCTGTCAAATACCAAGATATG-3′ (SEQ ID NO:239)) and oSQT35 (5′-TCTCCTCCAGTTCACTTTTGACTAGTTGGG-3′ (SEQ ID NO:240)). PCR products were analyzed on a QIAxcel capillary electrophoresis system (Qiagen). Miniprep DNA from clones that contained correctly sized PCR products were sent for DNA sequence confirmation with primers oSQT1 (5′-AGTAACAGCGGTAGAGGCAG-3′ (SEQ ID NO:241)), oSQT3 (5′-ATTGGGCTACGATGGACTCC-3′ (SEQ ID NO:242)), and oJS2980 (5-TTAATTCAATATATTCATGAGGCAC-3′ (SEQ ID NO:243)).


Because the final fragment ligated can encode one, two, or three TALE repeats, the methods disclosed herein can be used to assemble arrays consisting of any desired number of TALE repeats. Assembled DNA fragments encoding the final full-length TALE repeat array are released from the beads by restriction enzyme digestion and can be directly cloned into a desired expression vector of choice.


The methods can be efficiently practiced in 96-well format using a robotic liquid handling workstation. With automation, DNA fragments encoding 96 different TALE repeat arrays of variable lengths can be assembled in less than one day. Medium-throughput assembly of fragments can be performed in one to two days using multi-channel pipets and 96-well plates. Fragments assembled using either approach can then be cloned into expression vectors (e.g., for expression as a TALEN) to generate sequence-verified plasmids in less than one week. Using the automated assembly approach, sequence-verified TALE repeat array expression plasmids can be made quickly and inexpensively.


Example 6. Large-Scale Testing of Assembled TALENs Using a Human Cell-Based Reporter Assay

To perform a large-scale test of the robustness of TALENs for genome editing in human cells, the method described in Example 5 was used to construct a series of plasmids encoding 48 TALEN pairs targeted to different sites scattered throughout the EGFP reporter gene. Monomers in each of the TALEN pairs contained the same number of repeats (ranging from 8.5 to 19.5 in number), and these pairs were targeted to sites possessing a fixed length “spacer” sequence (16 bps) between the “half-sites” bound by each TALEN monomer (Table 6).









TABLE 6







EGFP reporter gene sequences targeted by 48 


pairs of TALENs













Position







within







EGFP of


# of
# of



the


re-
re-



first


peat
peat



nucle-


do-
do-



otide


mains
mains


TALEN
in the
Target site
SEQ
in
in


pair
binding
(half-sites in CAPS,
ID
Left
Right


#
site
spacer in lowercase)
NO:
TALEN
TALEN















 1
 −8
TCGCCACCATggtgagcaaggg
 93
8.5
8.5




cgagGAGCTGTTCA








 2
 35
TGGTGCCCATcctggtcgagct
 94
8.5
8.5




ggacGGCGACGTAA








 3
143
TCTGCACCACcggcaagctgcc
 95
8.5
8.5




cgtgCCCTGGCCCA








 4
425
TGGAGTACAActacaacagcca
 96
8.5
8.5




caacGTCTATATCA








 5
 82
TTCAGCGTGTCcggcgagggcg
 97
9.5
9.5




agggcGATGCCACCTA








 6
111
TGCCACCTACGgcaagctgacc
 98
9.5
9.5




ctgaaGTTCATCTGCA








 7
172
TGGCCCACCCTcgtgaccaccc
 99
9.5
9.5




tgaccTACGGCGTGCA








 8
496
TTCAAGATCCGccacaacatcg
100
9.5
9.5




aggacGGCAGCGTGCA








 9
−23
TAGAGGATCCACcggtcgccac
101
10.5
10.5




catggtGAGCAAGGGCGA








10
 91
TCCGGCGAGGGCgagggcgatg
102
10.5
10.5




ccacctACGGCAAGCTGA








11
194
TGACCTACGGCGtgcagtgctt
103
10.5
10.5




cagccgCTACCCCGACCA








12
503
TCCGCCACAACAtcgaggacgg
104
10.5
10.5




cagcgtGCAGCTCGCCGA








13
 44
TCCTGGTCGAGCTggacggcga
105
11.5
11.5




cgtaaacGGCCACAAGTTCA








14
215
TCAGCCGCTACCCcgaccacat
106
11.5
11.5




gaagcagCACGACTTCTTCA








15
251
TCTTCAAGTCCGCcatgcccga
107
11.5
11.5




aggctacGTCCAGGAGCGCA








16
392
TCAAGGAGGACGGcaacatcct
108
11.5
11.5




ggggcacAAGCTGGAGTACA








17
485
TCAAGGTGAACTTcaagatccg
109
11.5
11.5




ccacaacATCGAGGACGGCA








18
−16
TCCACCGGTCGCCAccatggtg
110
12.5
12.5




agcaagggCGAGGAGCTGTTCA








19
 82
TTCAGCGTGTCCGGcgagggcg
111
12.5
12.5




agggcgatGCCACCTACGGCAA








20
214
TTCAGCCGCTACCCcgaccaca
112
12.5
12.5




tgaagcagCACGACTTCTTCAA








21
436
TACAACAGCCACAAcgtctata
113
12.5
12.5




tcatggccGACAAGCAGAAGAA








22
 35
TGGTGCCCATCCTGGtcgagct
114
13.5
13.5




ggacggcgaCGTAAACGGCCAC







AA








23
266
TGCCCGAAGGCTACGtccagga
115
13.5
13.5




gcgcaccatCTTCTTCAAGGAC







GA








24
362
TGAACCGCATCGAGCtgaaggg
116
13.5
13.5




catcgacttCAAGGAGGACGGC







AA








25
497
TCAAGATCCGCCACAacatcga
117
13.5
13.5




ggacggcagCGTGCAGCTCGCC







GA








26
 23
TGTTCACCGGGGTGGTgcccat
118
14.5
14.5




cctggtcgagCTGGACGGCGAC







GTAA








27
 38
TGCCCATCCTGGTCGAgctgga
119
14.5
14.5




cggcgacgtaAACGGCCACAAG







TTCA








28
 89
TGTCCGGCGAGGGCGAgggcga
120
14.5
14.5




tgccacctacGGCAAGCTGACC







CTGA








29
140
TCATCTGCACCACCGGcaagct
121
14.5
14.5




gcccgtgcccTGGCCCACCCTC







GTGA








30
452
TCTATATCATGGCCGAcaagca
122
14.5
14.5




gaagaacggcATCAAGGTGAAC







TTCA








31
199
TACGGCGTGCAGTGCTTcagcc
123
15.5
15.5




gctaccccgacCACATGAAGCA







GCACGA








32
223
TACCCCGACCACATGAAgcagc
124
15.5
15.5




acgacttcttcAAGTCCGCCAT







GCCCGA








33
259
TCCGCCATGCCCGAAGGctacg
125
15.5
15.5




tccaggagcgcACCATCTTCTT







CAAGGA








34
391
TTCAAGGAGGACGGCAAcatcc
126
15.5
15.5




tggggcacaagCTGGAGTACAA







CTACAA








35
430
TACAACTACAACAGCCAcaacg
127
15.5
15.5




tctatatcatgGCCGACAAGCA







GAAGAA








36
 26
TCACCGGGGTGGTGCCCAtcct
128
16.5
16.5




ggtcgagctggaCGGCGACGTA







AACGGCCA








37
 68
TAAACGGCCACAAGTTCAgcgt
129
16.5
16.5




gtccggcgagggCGAGGGCGAT







GCCACCTA








38
206
TGCAGTGCTTCAGCCGCTaccc
130
16.5
16.5




cgaccacatgaaGCAGCACGAC







TTCTTCAA








39
 83
TCAGCGTGTCCGGCGAGGGcga
131
17.5
17.5




gggcgatgccaccTACGGCAAG







CTGACCCTGA








40
134
TGAAGTTCATCTGCACCACcgg
132
17.5
17.5




caagctgcccgtgCCCTGGCCC







ACCCTCGTGA








41
182
TCGTGACCACCCTGACCTAcgg
133
17.5
17.5




cgtgcagtgcttcAGCCGCTAC







CCCGACCACA








42
458
TCATGGCCGACAAGCAGAAgaa
134
17.5
17.5




cggcatcaaggtgAACTTCAAG







ATCCGCCACA








43
 25
TTCACCGGGGTGGTGCCCATcc
135
18.5
18.5




tggtcgagctggacGGCGACGT







AAACGGCCACAA








44
145
TGCACCACCGGCAAGCTGCCcg
136
18.5
18.5




tgccctggcccaccCTCGTGAC







CACCCTGACCTA








45
253
TTCAAGTCCGCCATGCCCGAag
137
18.5
18.5




gctacgtccaggagCGCACCAT







CTTCTTCAAGGA








46
454
TATATCATGGCCGACAAGCAga
138
18.5
18.5




agaacggcatcaagGTGAACTT







CAAGATCCGCCA








47
139
TTCATCTGCACCACCGGCAAGc
139
19.5
19.5




tgcccgtgccctggcCCACCCT







CGTGACCACCCTGA








48
338
TGAAGTTCGAGGGCGACACCCt
140
19.5
19.5




ggtgaaccgcatcgaGCTGAAG







GGCATCGACTTCAA









Each of the 48 TALEN pairs was tested in human cells for its ability to disrupt the coding sequence of a chromosomally integrated EGFP reporter gene. In this assay, NHEJ-mediated repair of TALEN-induced breaks within the EGFP coding sequence led to loss of EGFP expression, which was quantitatively assessed using flow cytometry 2 and 5 days following transfection. (To ensure that activities of each active TALEN pair could be detected, we only targeted sites located at or upstream of nucleotide position 503 in the gene, a position we had previously shown would disrupt EGFP function when mutated with a zinc finger nuclease (ZFN) (Maeder et al., 2008, Mol. Cell 31:294-301).) Strikingly, all 48 TALEN pairs showed significant EGFP gene-disruption activities in this assay (FIG. 19A). The net percentage of EGFP-disrupted cells induced by TALENs on day 2 post-transfection ranged from 9.4% to 68.0%, levels comparable to the percentage disruption observed with four EGFP-targeted ZFN pairs originally made by the Oligomerized Pool Engineering (OPEN) method (FIG. 19A). These results demonstrate that TALENs containing as few as 8.5 TALE repeats possess significant nuclease activities and provide a large-scale demonstration of the robustness of TALENs in human cells.


Interestingly, re-quantification of the percentage of EGFP-negative cells at day 5 post-transfection revealed that cells expressing shorter-length TALENs (such as those composed of 8.5 to 10.5 repeats) showed significant reductions in the percentage of EGFP-disrupted cells whereas those expressing longer TALENs did not (FIGS. 19A-B and 20A). One potential explanation for this effect is cellular toxicity associated with expression of shorter-length TALENs. Consistent with this hypothesis, in cells transfected with plasmids encoding shorter-length TALENs, greater reductions in the percentage of tdTomato-positive cells were observed from day 2 to day 5 post-transfection (FIG. 20D) (a tdTomato-encoding plasmid was co-transfected together with the TALEN expression plasmids on day 0). Taken together, our results suggest that although shorter-length TALENs are as active as longer-length TALENs, the former can cause greater cytotoxicity in human cells.


Our EGFP experiments also provided an opportunity to assess four of five computationally-derived design guidelines (Cermak et al., 2011, Nucleic Acids Res., 39:e82). The guidelines proposed by Cermak are as follows:


1. The nucleotide just 5′ to the first nucleotide of the half-site should be a thymine.


2. The first nucleotide of the half-site should not be a thymine.


3. The second nucleotide of the half-site should not be an adenosine.


4. The 3′ most nucleotide in the target half-site should be a thymine.


5. The composition of each nucleotide within the target half-site should not vary from the observed percentage composition of naturally occurring binding sites by more than 2 standard deviations. The percentage composition of all naturally occurring TALE binding sites is: A=31±16%, C=37±13%, G=9±8%, T=22±10%. Hence, the nucleotide composition of potential TALE binding sites should be: A=0% to 63%, C=11% to 63%, G=0% to 25% and T=2% to 42%.


These guidelines have been implemented in the TALE-NT webserver (boglabx.plp.iastate.edu/TALENT/TALENT/) to assist users in identifying potential TALEN target sites. All 48 of the sequences we targeted in EGFP did not meet one or more of these guidelines (however, note that all of our sites did meet the requirement for a 5′ T). The ˜100% success rate observed for these 48 sites demonstrates that TALENs can be readily obtained for target sequences that do not follow these guidelines. In addition, for each of the four design guidelines, we did not find any statistically significant correlation between guideline violation and the level of TALEN-induced mutagenesis on either day 2 or day 5 post-transfection. We also failed to find a significant correlation between the total number of guideline violations and the level of mutagenic TALEN activity. Thus, our results show that failure to meet four of the five previously described design guidelines when identifying potential TALEN target sites does not appear to adversely affect success rates or nuclease efficiencies.


Example 7. High-Throughput Alteration of Endogenous Human Genes Using Assembled TALENs

Having established the robustness of the TALEN platform with a chromosomally integrated reporter gene, it was next determined whether this high success rate would also be observed with endogenous genes in human cells. To test this, the assembly method described in Example 5 was used to engineer TALEN pairs targeted to 96 different human genes: 78 genes implicated in human cancer (Vogelstein and Kinzler, 2004, Nat. Med., 10:789-799) and 18 genes involved in epigenetic regulation of gene expression (Table 7). For each gene, a TALEN pair was designed to cleave near the amino-terminal end of the protein coding sequence, although in a small number of cases the presence of repetitive sequences led us to target alternate sites in neighboring downstream exons or introns (Table 7). Guided by the results with the EGFP TALENs, TALENs composed of 14.5, 15.5, or 16.5 repeats were constructed that cleaved sites with 16, 17, 18, 19 or 21 bp spacer sequences. All of the target sites had a T at the 5′ end of each half-site.









TABLE 7







Endogenous human gene sequences targeted by 96 pairs of TALENs
















Target site



Length





(half-sites

Length

of





in CAPS,

of LEFT

RIGHT



Target

spacer in
SEQ
half site
Length
half site



gene
%
lowercase,
ID
(include
of
(include
Gene


name
NHEJ
ATG underlined)
NO:
5′ T)
spacer
5′ T)
Type





ABL1
22.5 ±
TACCTATTATTACT
141.
16.5
17
15.5
Cancer



 7.1
TTATggggcagcagcctgg









aaAAGTACTTGGGG









ACCAA










AKT2
14.1 ±
TGTGTCTTGGGATG
142.
16.5
16
16.5
Cancer



 7.3
AGTGggtcagtgttctggtg









CTCACAGGATGGCT









GGCA










ALK
12.7 ±
TCCTGTGGCTCCTG
143.
16.5
16
15.5
Cancer



 2.9
CCGCtgctgctttccacggc









AGCTGTGGGCTCCG









GGA










APC
48.8 ±
TATGTACGCCTCCC
144.
16.5
16
16.5
Cancer



 9.8
TGGGctcgggtccggtcgcc









CCTTTGCCCGCTTC









TGTA










ATM
35.5 ±
TGAATTGGGATGCT
145.
16.5
18
16.5
Cancer



15.6
GTTTttaggtattctattcaaa









TTTATTTTACTGTCT









TTA










AXIN2
 2.5 ±
TCCCTCACCATGAG
146.
16.5
16
16.5
Cancer



 0.6
TAGCgctatgttggtgacttG









CCTCCCGGACCCCA









GCA










BAX
14.7 ±
TGTGCGATCTCCAA
147.
16.5
16
16.5
Cancer



11.6
GCACtgaggggcagaaact









cCCGGATCGGGCGC









TGCCA










BCL6
14.9 ±
TTTTCAAGTGAAGA
148.
16.5
16
16.5
Cancer



 5.9
CAAAatggcctcgccggct









gACAGCTGTATCCA









GTTCA










BMPR1
50.4 ±
TACAATTGAACAAT
149.
16.5
17
16.5
Cancer


A
16.4
GCCTcagctatacatttacat









CAGATTATTGGGAG









CCTA










BRCA1
44.5 ±
TCCGAAGCTGACAG
150.
16.5
16
16.5
Cancer



15.5
ATGGgtattctttgacgggg









GGTAGGGGCGGAA









CCTGA










BRCA2
41.6 ±
TTAGACTTAGGTAA
151.
16.5
16
16.5
Cancer



10.5
GTAAtgcaatatggtagact









GGGGAGAACTACA









AACTA










CBX3
35.2 ±
TCTGCAATAAAAAA
152.
16.5
16
16.5
Epigen



22.6
TGGCctccaacaaaactaca




etic




TTGGTAAGTTAATG









AAAA










CBX8
13.5 ±
TGGAGCTTTCAGCG
153.
16.5
17
15.5
Epigen



 3.4
GTGGgggagcgggtgttcg




etic




cgGCCGAAGCCCTC









CTGAA










CCND1
40.5 ±
TGGAACACCAGCTC
154.
16.5
19
16.5
Cancer



 2.2
CTGTgctgcgaagtggaaac









catCCGCCGCGCGTA









CCCCGA










CDC73
36.3 ±
TGCTTAGCGTCCTG
155.
16.5
16
16.5
Cancer



 7.7
CGACagtacaacatccagaa









GAAGGAGATTGTG









GTGAA










CDH1
none
TGCTGCAGGTACCC
156.
16.5
16
16.5
Cancer




CGGAtcccctgacttgcgag









GGACGCATTCGGGC









CGCA










CDK4
21.5 ±
TCCCTTGATCTGAG
157.
14.5
16
15.5
Cancer



17.4
AAtggctacctctcgataTG









AGCCAGTGGCTGA









AA










CHD4
 9.6 ±
TGGCGTCGGGCCTG
158.
15.5
17
16.5
Epigen



 0.1
GGCtccccgtccccctgctc




etic




GGCGGGCAGTGAG









GAGGA










CHD7
11.4 ±
TGTGTTGGAAGAAG
159.
16.5
16
16.5
Epigen



 2.7
ATGGcagatccaggaatgat




etic




GAGTCTTTTTGGCG









AGGA










CTNNB
26.0 ±
TCCAGCGTGGACAA
160.
15.5
16
16.5
Cancer


1
 8.1
TGGctactcaaggtttgtgTC









ATTAAATCTTTAGT









TA










CYLD
24.7 ±
TAATATCACAATGA
161.
16.5
18
16.5
Cancer



 2.3
GTTCaggcttatggagccaa









gaAAAAGTCACTTC









ACCCTA










DDB2
15.8 ±
TCACACGGAGGAC
162.
14.5
16
16.5
Cancer



 7.2
GCGatggctcccaagaaac









GCCCAGAAACCCA









GAAGA










ERCC2
55.8 ±
TCCGGCCGGCGCCA
163.
15.5
16
14.5
Cancer



12.7
TGAagtgagaagggggctg









GGGGTCGCGCTCGC









TA










ERCC5
none
TCCGGGATCGCCAT
164.
16.5
19
16.5
Cancer




GGGAactcaatagaaaatcc









tcaTCTTCTCACTTTG









TTTCA










EWSR1
14.3 ±
TGGCGTCCACGGGT
165.
16.5
17
16.5
Cancer



 8.2
GAGTatggtggaactgcggt









cGCGCCGGCGGTAG









CCGGA










EXT1
 9.5 ±
TGACCCAGGCAGG
166.
16.5
17
16.5
Cancer



 3.0
ACACAtgcaggccaaaaaa









cgcTATTTCATCCTG









CTCTCA










EXT2
none
TTCCTCCCAGGGGG
167.
16.5
16
16.5
Cancer




ATGTcctgcgcctcagggtc









CGGTGGTGGCCTGC









GGCA










EZH2
41.3 ±
TGCTTTTAGAATAA
168.
16.5
16
16.5
Epigen



 2.6
TCATgggccagactgggaa




etic




gAAATCTGAGAAGG









GACCA










FANCA
 9.7 ±
TAGGCGCCAAGGC
169.
16.5
16
16.5
Cancer



 5.0
CATGTccgactcgtgggtc









ccGAACTCCGCCTC









GGGCCA










FANCC
23.7 ±
TGAAGGGACATCA
170.
16.5
17
15.5
Cancer



17.8
CCTTTtcgctttttccaagatg









GCTCAAGATTCAGT









AGA










FANCE
none
TGCCCCGGCATGGC
171.
16.5
17
16.5
Cancer




GACAccggacgcggggctc









ccTGGGGCTGAGGG









CGTGGA










FANCF
46.0 ±
TTCGCGCACCTCAT
172.
14.5
16
16.5
Cancer



 7.7
GGaatcccttctgcagcaCC









TGGATCGCTTTTCC









GA










FANCG
26.9 ±
TCGGCCACCATGTC
173.
14.5
16
16.5
Cancer



16.2
CCgccagaccacctctgtGG









GCTCCAGCTGCCTG









GA










FES
12.6 ±
TCCCCAGAACAGCA
174.
16.5
18
16.5
Cancer



10.6
CTATgggcttctcttccgagc









tGTGCAGCCCCCAG









GGCCA










FGFR1
17.4 ±
TCTGCTCCCCACCG
175.
16.5
16
15.5
Cancer



 6.2
AGGAcctctgcatgcaggca









TGAATCCCAGGAGC









CTA










FH
20.9 ±
TGTACCGAGCACTT
176.
16.5
17
16.5
Cancer



11.8
CGGCtcctcgcgcgctcgcg









tCCCCTCGTGCGGG









CTCCA










FLCN
11.1 ±
TCTCCAAGGCACCA
177.
16.5
18
16.5
Cancer



 4.4
TGAAtgccatcgtggctctct









gCCACTTCTGCGAG









CTCCA










FLT3
none
TCCGGAGGCCATGC
178.
16.5
21
15.5
Cancer




CGGCgttggcgcgcgacgg









cggccaGCTGCCGCTG









CTCGGTA










FLT4
 9.9 ±
TGCAGCGGGGCGC
179.
16.5
19
16.5
Cancer



 5.0
CGCGCtgtgcctgcgactgt









ggctCTGCCTGGGAC









TCCTGGA










FOXO1
 8.5 ±
TCACCATGGCCGAG
180.
15.5
16
14.5
Cancer



 1.1
GCGcctcaggtggtggagaT









CGACCCGGACTTCG









A










FOXO3
 7.3 ±
TCTCCGCTCGAAGT
181.
16.5
18
16.5
Cancer



 2.3
GGAGctggacccggagttc









gagCCCCAGAGCCGT









CCGCGA










GLI1
21.5 ±
TCCTCTGAGACGCC
182.
16.5
16
16.5
Cancer



12.4
ATGTtcaactcgatgacccc









ACCACCAATCAGTA









GCTA










HDAC1
10.8 ±
TGGCGCAGACGCA
183.
15.5
17
16.5
Epigen



 3.0
GGGCacccggaggaaagtc




etic




tgTTACTACTACGAC









GGTGA










HDAC2
 4.2 ±
TGCGCTCACCTCCC
184.
16.5
18
16.5
Epigen



 0.9
TGCGgcctcctgaggtggttt




etic




gGTGGCCCCCTCCT









CGCGA










HDAC6
21.4 ±
TCCTCAACTATGAC
185.
16.5
16
16.5
Epigen



 2.1
CTCAaccggccaggattcca




etic




CCACAACCAGGCA









GCGAA










HMGA
 3.0 ±
TGAGCGCACGCGGT
186.
16.5
16
16.5
Cancer


2
 1.5
GAGGgcgcggggcagccg









tcCACTTCAGCCCAG









GGACA










HOXA1
 7.6 ±
TCCGTGCTCCTCCA
187.
16.5
17
16.5
Cancer


3
 3.1
CCCCcgctggatcgagccca









cCGTCATGTTTCTCT









ACGA










HOXA9
 6.4 ±
TGGGCACGGTGATG
188.
14.5
16
15.5
Cancer



 2.7
GCcaccactggggccctgG









GCAACTACTACGTG









GA










HOXC1
10.5 ±
TCCAGCAGATCATG
189.
16.5
18
16.5
Cancer


3
 0.3
TCATgacgacttcgctgctcc









tGCATCCACGCTGG









CCGGA










HOXD1
none
TTGACGAGTGCGGC
190.
15.5
17
16.5
Cancer


1

CAGagcgcagccagcatgta









CCTGCCGGGCTGCG









CCTA










HOXD1
none
TGCGGGCAGACGG
191.
16.5
17
16.5
Cancer


3

CGGGGgcgccggtggcgc









cccgGCCTCTTCCTCC









TCCTCA










JAK2
44.9 ±
TCTGAAAAAGACTC
192.
16.5
16
16.5
Cancer



16.9
TGCAtgggaatggcctgcct









TACGATGACAGAA









ATGGA










KIT
none
TACCGCGATGAGA
193.
16.5
19
16.5
Cancer




GGCGCtcgcggcgcctgg









gattttCTCTGCGTTCT









GCTCCTA










KRAS
 9.4 ±
TGAAAATGACTGA
194.
16.5
17
15.5
Cancer



 0.9
ATATAaacttgtggtagttg









gaGCTGGTGGCGTA









GGCAA










MAP2K
11.9 ±
TAGGGTCCCCGGCG
195.
16.5
16
16.5
Cancer


4
 7.1
CCAGgccacccggccgtca









gCAGCATGCAGGGT









AAGGA










MDM2
33.0 ±
TCCAAGCGCGAAA
196.
16.5
17
15.5
Cancer



20.2
ACCCCggatggtgaggag









caggTACTGGCCCGG









CAGCGA










MET
40.4 ±
TTATTATTACATGG
197.
16.5
16
16.5
Cancer



10.7
CTTTgccttactgaggcttcA









TCTTGTCCTCTGGT









CCA










MLH1
44.9 ±
TCTGGCGCCAAAAT
198.
16.5
16
16.5
Cancer



 6.3
GTCGttcgtggcaggggtta









TTCGGCGGCTGGAC









GAGA










MSH2
27.5 ±
TGAGGAGGTTTCGA
199.
16.5
16
16.5
Cancer



10.4
CATGgcggtgcagccgaag









gAGACGCTGCAGTT









GGAGA










MUTY
24.9 ±
TCACTGTCGGCGGC
200.
16.5
18
16.5
Cancer


H
 8.4
CATGacaccgctcgtctccc









gcCTGAGTCGTCTGT









GGGTA










MYC
13.4 ±
TGCTTAGACGCTGG
201.
16.5
16
16.5
Cancer



 4.0
ATTTttttcgggtagtggaaA









ACCAGGTAAGCAC









CGAA










MYCL1
17.3 ±
TCCCGCAGGGAGC
202.
16.5
16
16.5
Cancer



 0.6
GGACAtggactacgactcg









taCCAGCACTATTTC









TACGA










MYCN
16.3 ±
TGCCGAGCTGCTCC
203.
14.5
16
16.5
Cancer



11.6
ACgtccaccatgccgggcA









TGATCTGCAAGAAC









CCA










NBN
46.3 ±
TGAGGAGCCGGAC
204.
14.5
16
14.5
Cancer



15.5
CGAtgtggaaactgctgccC









GCCGCGGGCCCGG









CA










NCOR1
29.6 ±
TCTTTACTGATAAT
205.
16.5
16
16.5
Epigen



13.1
GTCAagttcatgttaccctcC




etic




CAACCAAGGAGCA









TTCA










NCOR2
 3.3 ±
TGGAGGGCCACTG
206.
14.5
16
14.5
Epigen



 0.6
AGCcccgctacccgcccca




etic




CAGCCTTTCCTACC









CA










NTRK1
none
TCGGCGCATGAAG
207.
16.5
16
16.5
Cancer




GAGGTactcctcattttcgtt









CTCTCTCTCTGTGC









CCCA










PDGFR
16.0 ±
TTGCGCTCGGGGCG
208.
16.5
16
16.5
Cancer


A
 4.3
GCCAtgtcggccggcgagg









tCGAGCGCCTAGTG









TCGGA










PDGFR
16.0 ±
TCTGCAGGACACCA
209.
16.5
16
16.5
Cancer


B
 3.2
TGCGgcttccgggtgcgatg









CCAGCTCTGGCCCT









CAAA










PHF8
22.2 ±
TGAGTACTCCGCCT
210.
16.5
16
16.5
Epigen



 6.1
CTACcccggctgaagcccg




etic




cCCCCGCCGCCACC









TATTA










PMS2
26.9 ±
TCGGGTGTTGCATC
211.
16.5
18
16.5
Cancer



 9.5
CATGgagcgagctgagagc









tcgAGGTGAGCGGG









GCTCGCA










PTCH1
27.5 ±
TGGAACTGCTTAAT
212.
14.5
16
14.5
Cancer



15.9
AGaaacaggcttgtaattGT









GAGTCCGCGCTGCA










PTEN
31.5 ±
TCCCAGACATGACA
213.
15.5
16
16.5
Cancer



11.7
GCCatcatcaaagagatcgT









TAGCAGAAACAAA









AGGA










RARA
13.4 ±
TGGCATGGCCAGCA
214.
16.5
17
16.5
Cancer



 6.1
ACAGcagctcctgcccgac









acCTGGGGGCGGGC









ACCTCA










RBBP5
15.7 ±
TGCTGGGTGAGAA
215.
15.5
17
16.5
Epigen



 9.5
GGGCtgtggctgcgttttaga




etic




GAAGCGTTGGGTAC









TGGA










RECQL
22.1 ±
TGCGGGACGTGCG
216.
16.5
16
16.5
Cancer


4
16.2
GGAGCggctgcaggcgtg









ggaGCGCGCGTTCCG









ACGGCA










REST
none
TCAGAATACAGTTA
217.
16.5
16
16.5
Epigen




TGGCcacccaggtaatggg




etic




gCAGTCTTCTGGAG









GAGGA










RET
 5.4 ±
TGAGTTCTGCCGGC
218.
16.5
17
16.5
Cancer



 1.8
CGCCggctcccgcaggggc









caGGGCGAAGTTGG









CGCCGA










RNF2
none
TTCTTTATTTCCAG
219.
16.5
16
16.5
Epigen




CAATgtctcaggctgtgcag




etic




ACAAACGGAACTC









AACCA










RUNX1
25.1 ±
TTCAGGAGGAAGC
220.
16.5
16
16.5
Epigen



 6.9
GATGGcttcagacagcatat




etic




tTGAGTCATTTCCTT









CGTA










SDHB
36.4 ±
TCTCCTTGAGGCGC
221.
16.5
16
16.5
Cancer



19.2
CGGTtgccggccacaaccct









TGGCGGAGCCTGCC









TGCA










SDHC
13.7 ±
TGTTGCTGAGGTGA
222.
16.5
19
15.5
Cancer



 3.4
CTTCagtgggactgggagtt









ggtGCCTGCGGCCCT









CCGGA










SDHD
42.0 ±
TCAGGAACGAGAT
223.
16.5
17
16.5
Cancer



 7.8
GGCGGttctctggaggctga









gtGCCGTTTGCGGTG









CCCTA










SETDB
33.5 ±
TGCAGAGGACAAA
224.
16.5
16
16.5
Epigen


1
 6.1
AGCATgtcttcccttcctgg




etic




gTGCATTGGTTTGG









ATGCA










SIRT6
43.3 ±
TTACGCGGCGGGGC
225.
16.5
18
16.5
Epigen



 3.1
TGTCgccgtacgcggacaa




etic




gggCAAGTGCGGCC









TCCCGGA










SMAD2
 3.9 ±
TTTGGTAAGAACAT
226.
16.5
17
15.5
Cancer



 1.6
GTCGtccatcttgccattcac









GCCGCCAGTTGTGA









AGA










SS18
31.4 ±
TGGTGACGGCGGC
227.
16.5
17
16.5
Cancer



 7.9
AACATgtctgtggctttcgc









ggCCCCGAGGCAGC









GAGGCA










SUZ12
13.1 ±
TGGCGCCTCAGAAG
228.
14.5
16
14.5
Epigen



 0.4
CAcggcggtgggggaggg




etic




GGCGGCTCGGGGC









CCA










TFE3
17.3 ±
TCATGTCTCATGCG
229.
16.5
16
16.5
Cancer



 2.4
GCCGaaccagctcgggatg









gCGTAGAGGCCAGC









GCGGA










TGFBR
none
TCGGGGGCTGCTCA
230.
16.5
17
16.5
Cancer


2

GGGGcctgtggccgctgca









caTCGTCCTGTGGAC









GCGTA










TLX3
none
TTCCGCCCGCCCAG
231.
16.5
17
16.5
Cancer




GATGgaggcgcccgccag









cgcGCAGACCCCGC









ACCCGCA










TP53
19.9 ±
TTGCCGTCCCAAGC
232.
16.5
17
16.5
Cancer



 3.6
AATGgatgatttgatgctgtc









CCCGGACGATATTG









AACA










TSC2
30.7 ±
TCCTGGTCCACCAT
233.
15.5
17
16.5
Cancer



22.7
GGCcaaaccaacaagcaaa









gATTCAGGCTTGAA









GGAGA










VHL
19.4 ±
TCTGGATCGCGGAG
234.
16.5
16
16.5
Cancer



 1.1
GGAAtgccccggagggcg









gaGAACTGGGACGA









GGCCGA










XPA
12.9 ±
TGGGCCAGAGATG
235.
16.5
16
16.5
Cancer



 2.2
GCGGCggccgacggggct









ttgCCGGAGGCGGCG









GCTTTA










XPC
31.4 ±
TGCCCAGACAAGC
236.
16.5
19
16.5
Cancer



 4.2
AACATggctcggaaacgc









gcggccGGCGGGGAG









CCGCGGGGA









The abilities of the 96 TALEN pairs to introduce NHEJ-mediated insertion or deletion (indel) mutations at their intended endogenous gene targets were tested in cultured human cells using a slightly modified version of a previously described T7 Endonuclease I (T7EI) assay (Mussolino et al., 2011, Nucleic Acids Res., 39:9283-93; Kim et al., 2009, Genome Res., 19:1279-88). With this T7EI assay, 83 of the 96 TALEN pairs showed evidence of NHEJ-mediated mutagenesis at their intended endogenous gene target sites, an overall success rate of ˜86% (Table 7). The efficiencies of TALEN-induced mutagenesis we observed ranged from 2.5% to 55.8% with a mean of 22.5%. To provide molecular confirmation of the mutations we identified by T7EI assay, we sequenced target loci for 11 different TALEN pairs that induced varying efficiencies of mutagenesis (FIGS. 21A-D). As expected, this sequencing revealed indels at the expected target gene sites with frequencies similar to those determined by the T7EI assays.


The nucleotide and amino acid sequences for 14 of the 96 pairs of TALENs targeted to the endogenous human genes in Table 7 are presented below. Each TALEN monomer is presented as follows:


(1) A header with information presented in the format: Gene target_Left or Right monomer_Target DNA site shown 5′ to 3′_TALE repeat monomers and 0.5 repeat plasmid used with code as shown in Table 4.


(2) DNA sequence encoding the N-terminal part of the TALE required for activity, the TALE repeat array, the C-terminal 0.5 TALE repeat domain, and the C-terminal 63 amino acids required for activity from a NheI site to a BamHI site. This sequence is present in the “Vector Sequence” plasmid shown below, taking the place of the underlined X's flanked by NheI and BamHI sites


(3) Amino acid sequences the N-terminal part of the TALE required for activity, the TALE repeat array, the C-terminal 0.5 TALE repeat domain, and the C-terminal 63 amino acids required for activity shown from the start of translation (located just 3′ to the NheI site and including an N-terminal FLAG epitope tag) to a Gly-Ser sequence (encoded by the BamHI site) that serves as a linker from the TALE repeat array to the FokI cleavage domain.









VECTOR SEQUENCE


SEQ ID NO: 244


GACGGATCGGGAGATCTCCCGATCCCCTATGGTCGACTCTCAGTACAATC





TGCTCTGATGCCGCATAGTTAAGCCAGTATCTGCTCCCTGCTTGTGTGTT





GGAGGTCGCTGAGTAGTGCGCGAGCAAAATTTAAGCTACAACAAGGCAAG





GCTTGACCGACAATTGCATGAAGAATCTGCTTAGGGTTAGGCGTTTTGCG





CTGCTTCGCGATGTACGGGCCAGATATACGCGTTGACATTGATTATTGAC





TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATA





TGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCG





CCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGT





AACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGACTATTTACGGT





AAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCC





CCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTA





CATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCA





TCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGA





TAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAA





TGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTA





ACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAG





GTCTATATAAGCAGAGCTCTCTGGCTAACTAGAGAACCCACTGCTTACTG





GCTTATCGAAATTAATACGACTCACTATAGGGAGACCCAAGCTGGCTAGC







XXXXXXXXXXGGATCC
CAACTAGTCAAAAGTGAACTGGAGGAGAAGAAAT






CTGAACTTCGTCATAAATTGAAATATGTGCCTCATGAATATATTGAATTA





ATTGAAATTGCCAGAAATTCCACTCAGGATAGAATTCTTGAAATGAAGGT





AATGGAATTTTTTATGAAAGTTTATGGATATAGAGGTAAACATTTGGGTG





GATCAAGGAAACCGGACGGAGCAATTTATACTGTCGGATCTCCTATTGAT





TACGGTGTGATCGTGGATACTAAAGCTTATAGCGGAGGTTATAATCTGCC





AATTGGCCAAGCAGATGAAATGCAACGATATGTCGAAGAAAATCAAACAC





GAAACAAACATATCAACCCTAATGAATGGTGGAAAGTCTATCCATCTTCT





GTAACGGAATTTAAGTTTTTATTTGTGAGTGGTCACTTTAAAGGAAACTA





CAAAGCTCAGCTTACACGATTAAATCATATCACTAATTGTAATGGAGCTG





TTCTTAGTGTAGAAGAGCTTTTAATTGGTGGAGAAATGATTAAAGCCGGC





ACATTAACCTTAGAGGAAGTCAGACGGAAATTTAATAACGGCGAGATAAA





CTTTTAAGGGCCCTTCGAAGGTAAGCCTATCCCTAACCCTCTCCTCGGTC





TCGATTCTACGCGTACCGGTCATCATCACCATCACCATTGAGTTTAAACC





CGCTGATCAGCCTCGACTGTGCCTTCTAGTTGCCAGCCATCTGTTGTTTG





CCCCTCCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCACTCCCACTGTCC





TTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTGAGTAGGTGTCAT





TCTATTCTGGGGGGTGGGGTGGGGCAGGACAGCAAGGGGGAGGATTGGGA





AGACAATAGCAGGCATGCTGGGGATGCGGTGGGCTCTATGGCTTCTGAGG





CGGAAAGAACCAGCTGGGGCTCTAGGGGGTATCCCCACGCGCCCTGTAGC





GGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGCAGCGTGACCGCTAC





ACTTGCCAGCGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCTTTC





TCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGCATCCCT





TTAGGGTTCCGATTTAGTGCTTTACGGCACCTCGACCCCAAAAAACTTGA





TTAGGGTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTC





GCCCTTTGACGTTGGAGTCCACGTTCTTTAATAGTGGACTCTTGTTCCAA





ACTGGAACAACACTCAACCCTATCTCGGTCTATTCTTTTGATTTATAAGG





GATTTTGGGGATTTCGGCCTATTGGTTAAAAAATGAGCTGATTTAACAAA





AATTTAACGCGAATTAATTCTGTGGAATGTGTGTCAGTTAGGGTGTGGAA





AGTCCCCAGGCTCCCCAGGCAGGCAGAAGTATGCAAAGCATGCATCTCAA





TTAGTCAGCAACCAGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAA





GTATGCAAAGCATGCATCTCAATTAGTCAGCAACCATAGTCCCGCCCCTA





ACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCTCCGCC





CCATGGCTGACTAATTTTTTTTATTTATGCAGAGGCCGAGGCCGCCTCTG





CCTCTGAGCTATTCCAGAAGTAGTGAGGAGGCTTTTTTGGAGGCCTAGGC





TTTTGCAAAAAGCTCCCGGGAGCTTGTATATCCATTTTCGGATCTGATCA





GCACGTGTTGACAATTAATCATCGGCATAGTATATCGGCATAGTATAATA





CGACAAGGTGAGGAACTAAACCATGGCCAAGCCTTTGTCTCAAGAAGAAT





CCACCCTCATTGAAAGAGCAACGGCTACAATCAACAGCATCCCCATCTCT





GAAGACTACAGCGTCGCCAGCGCAGCTCTCTCTAGCGACGGCCGCATCTT





CACTGGTGTCAATGTATATCATTTTACTGGGGGACCTTGTGCAGAACTCG





TGGTGCTGGGCACTGCTGCTGCTGCGGCAGCTGGCAACCTGACTTGTATC





GTCGCGATCGGAAATGAGAACAGGGGCATCTTGAGCCCCTGCGGACGGTG





TCGACAGGTGCTTCTCGATCTGCATCCTGGGATCAAAGCGATAGTGAAGG





ACAGTGATGGACAGCCGACGGCAGTTGGGATTCGTGAATTGCTGCCCTCT





GGTTATGTGTGGGAGGGCTAAGCACTTCGTGGCCGAGGAGCAGGACTGAC





ACGTGCTACGAGATTTCGATTCCACCGCCGCCTTCTATGAAAGGTTGGGC





TTCGGAATCGTTTTCCGGGACGCCGGCTGGATGATCCTCCAGCGCGGGGA





TCTCATGCTGGAGTTCTTCGCCCACCCCAACTTGTTTATTGCAGCTTATA





ATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTT





TTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTA





TCATGTCTGTATACCGTCGACCTCTAGCTAGAGCTTGGCGTAATCATGGT





CATAGCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAATTCCACACAAC





ATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGTGAG





CTAACTCACATTAATTGCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAA





ACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGC





GGTTTGCGTATTGGGCGCTCTTCCGCTTCCTCGCTCACTGACTCGCTGCG





CTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAA





TACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCA





AAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTT





TTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAA





GTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCC





CCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGG





ATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCAATGCT





CACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGC





TGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAA





CTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAG





CAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACA





GAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGACAGTATT





TGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTA





GCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTT





TGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTT





GATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAG





GGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTA





AATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTG





GTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCT





GTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACT





ACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCG





AGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCG





GAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAG





TCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAG





TTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGT





CGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTT





ACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCC





GATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGG





CAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCT





GTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCG





ACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATA





GCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAA





CTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCG





TGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGT





GAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACA





CGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCAT





TTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGA





AAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCT





GACGTC
















TALE REPEAT SEQUENCES













SEQ

SEQ




ID

ID


Target
SEQUENCE
NO:
SEQUENCE
NO:





>APC_Left_
GCTAGCaccATGGACTACAAAGACCATGACGG
245.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
246.


TATGTACGCC
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



TCCCTGGG_T
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



AL/006/015
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



/019/025/0
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



26/012/019
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASNIGGKQALETVQRLLPVLCQDHGLTPEQVVAI



/022/027/0
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASNGGGKQALETVQRLLPVLCQAHGLTPDQVVAIANN



15/017/022
GGGGCATGGCTTCACTCATGCGCATATTGTCG

NGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNGGG



/027/015/0
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNIGGKQA



19/024/JDS
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASHDGGKQALET



74/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIANNNGGKQALETVQR



(‘TATGTACG
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASHDGGKQALETVQRLLP



CCTCCCTGGG’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASHDGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNGGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASHDGGKQALETVQRLLPVLCQAHGLT



NO: 412)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASHDGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCGAACATT

IASNGGGKQALETVQRLLPVLCQAHGLTPDQVVAIAN




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NNGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIANNNGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAGC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




AATGGGGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGAATAACAATGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCTCGAATGGCGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCGAACATTGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCATCCCACGACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGAATAA






CAATGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAGCCATGATGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCACATGACGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAATGGGGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGCATGACGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAGCCATG






ATGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CACATGACGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAGCAATGGGGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGAATAACAATGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCAACAACAACGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTAATAATAAC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>APC_Right
GCTAGCaccATGGACTACAAAGACCATGACGG
247.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
248.


_TACAGAAGC
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



GGGCAAAGG_
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



TAL/006/01
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



2/016/024/
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



026/011/01
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASNIGGKQALETVQRLLPVLCQDHGLTPEQVVAI



9/022/029/
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIASN



014/019/02
GGGGCATGGCTTCACTCATGCGCATATTGTCG

IGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNGG



2/026/011/
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNIGGKQA



016/024/JD
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASNIGGKQALET



S74/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIANNNGGKQALETVQR



(‘TACAGAAG
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASHDGGKQALETVQRLLP



CGGGCAAAGG’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIANNNGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIANNNGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIANNNGGKQALETVQRLLPVLCQAHGLT



NO: 413)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASHDGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASNIGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCGAACATT

IASNIGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NIGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIANNNGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCATCC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




CACGACGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCGAACATTGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAACAACAACGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCGAACATTGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAGCAACATCGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGAATAA






CAATGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAGCCATGATGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGAACAATAATGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAATAATAACGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGAATAACAATGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAGCCATG






ATGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CGAACATTGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAGCAACATCGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCGAACATTGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCAACAACAACGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTAATAATAAC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>BRCA1_Lef
GCTAGCaccATGGACTACAAAGACCATGACGG
249.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
250.


t_TCCGAAGC
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



TGACAGATGG
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



_TAL/007/0
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



12/019/021
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



/026/014/0
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVAI



17/025/029
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIANN



/011/017/0
GGGGCATGGCTTCACTCATGCGCATATTGTCG

NGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNIGG



21/029/011
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNIGGKQA



/020/024/J
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIANNNGGKQALET



D574/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASHDGGKQALETVQR



(‘TCCGAAGC
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASNGGGKQALETVQRLLP



TGACAGATGG’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIANNNGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNIGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASHDGGKQALETVQRLLPVLCQAHGLT



NO: 414)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASNIGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIANNNGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCACATGAC

IASNIGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NGGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIANNNGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCATCC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




CACGACGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGAATAACAATGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCTCCAATATTGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCGAACATTGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAATAATAACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGCA






TGACGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CTCGAATGGCGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGAACAATAATGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAACATCGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGCATGACGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCTCCAATA






TTGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGA






ACAATAATGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAGCAACATCGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCCAACGGTGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCAACAACAACGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTAATAATAAC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>BRCA1_Rig
GCTAGCaccATGGACTACAAAGACCATGACGG
251.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
252.


ht_TCAGGTT
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



CCGCCCCTAC
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



C_TAL/007/
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



011/019/02
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



4/030/015/
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVAI



017/022/02
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASNIGGKQALETVQRLLPVLCQAHGLTPDQVVAIANN



9/012/017/
GGGGCATGGCTTCACTCATGCGCATATTGTCG

NGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNGG



022/027/01
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNGGGKQA



5/016/022/
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASNGGGKQALET



JDS71/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASHDGGKQALETVQR



(‘TCAGGTTC
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASHDGGKQALETVQRLLP



CGCCCCTACC’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIANNNGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASHDGGKQALETVQRLLPVLCQAHGLT



N: 415)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASHDGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCACATGAC

IASNGGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NIGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASHDGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAGC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




AACATCGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGAATAACAATGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAACAACAACGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCAAACGGAGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAGCAATGGGGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGCA






TGACGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAGCCATGATGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGAACAATAATGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCATCCCACGACGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGCATGACGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAGCCATG






ATGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CACATGACGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAGCAATGGGGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCGAACATTGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCAGCCATGATGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCCCACGAC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>BRCA2_Lef
GCTAGCaccATGGACTACAAAGACCATGACGG
253.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
254.


t_TTAGACTT
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



AGGTAAGTAA
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



_TAL/010/0
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



11/019/021
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



/027/015/0
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASNGGGKQALETVQRLLPVLCQDHGLTPEQVVAI



20/021/029
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASNIGGKQALETVQRLLPVLCQAHGLTPDQVVAIANN



/014/020/0
GGGGCATGGCTTCACTCATGCGCATATTGTCG

NGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNIGG



21/026/014
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASHDGGKQA



/020/021/J
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASNGGGKQALET



DS70/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASNGGGKQALETVQR



(‘TTAGACTT
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASNIGGKQALETVQRLLP



AGGTAAGTAA’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIANNNGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIANNNGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASNGGGKQALETVQRLLPVLCQAHGLT



NO: 416)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASNIGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASNIGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCAAACGGA

IANNNGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NGGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNIG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASNIGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAGC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




AACATCGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGAATAACAATGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCTCCAATATTGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCACATGACGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAGCAATGGGGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCCAA






CGGTGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CTCCAATATTGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGAACAATAATGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAATAATAACGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCCAACGGTGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCTCCAATA






TTGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CGAACATTGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAATAATAACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCCAACGGTGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCTCCAATATTGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCTAACATC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>BRCA2_Rig
GCTAGCaccATGGACTACAAAGACCATGACGG
255.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
256.


ht_TAGTTTG
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



TAGTTCTCCC
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



C_TAL/006/
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



014/020/02
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



5/030/014/
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASNIGGKQALETVQRLLPVLCQDHGLTPEQVVAI



020/021/02
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ANNNGGKQALETVQRLLPVLCQAHGLTPDQVVAIASN



9/015/020/
GGGGCATGGCTTCACTCATGCGCATATTGTCG

GGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNGGG



022/030/01
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNGGGKQA



2/017/022/
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIANNNGGKQALET



JDS71/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASNGGGKQALETVQR



(‘TAGTTTGT
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASNIGGKQALETVQRLLP



AGTTCTCCCC’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIANNNGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNGGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASNGGGKQALETVQRLLPVLCQAHGLT



NO: 417)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASHDGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASNGGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCGAACATT

IASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

HDGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASHDGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAAT

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




AATAACGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCCAACGGTGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCTCGAATGGCGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCAAACGGAGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAATAATAACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCCAA






CGGTGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CTCCAATATTGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGAACAATAATGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAATGGGGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCCAACGGTGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAGCCATG






ATGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CAAACGGAGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCATCCCACGACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCGCATGACGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCAGCCATGATGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCCCACGAC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>ERCC2_Lef
GCTAGCaccATGGACTACAAAGACCATGACGG
257.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
258.


t_TCCGGCCG
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



GCGCCATGA_
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



TAL/007/01
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



2/019/024/
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



027/012/01
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVAI



9/024/027/
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIANN



014/017/02
GGGGCATGGCTTCACTCATGCGCATATTGTCG

NGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNGG



2/026/015/
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASHDGGKQA



034/JDS70/
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASHDGGKQALET



(‘TCCGGCCG
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIANNNGGKQALETVQR



GCGCCATGA’
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIANNNGGKQALETVQRLLP



disclosed
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASHDGGKQALETVQRLLPVLC



as SEQ ID
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIANNNGGKQALETVQRLLPVLCQAH



NO: 418)
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASHDGGKQALETVQRLLPVLCQAHGLT




ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASHDGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASNIGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCACATGAC

IASNGGGKQALETVQRLLPVLCQAHGLTPDQVVAIAN




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NNGGKQALETVQRLLPVLCQAHGLTPEQVVAIASNIG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GRPALESIVAQLSRPDPALAALTNDHLVALACLGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCATCC

ALDAVKKGLPHAPALIKRTNRRIPERTSHRVAGS




CACGACGGTGGCAAACAGGCTCTTGAGACGGT






TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGAATAACAATGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAACAACAACGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCACATGACGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCATCCCACGACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGAATAA






CAATGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAACAACAACGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCACATGACGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAATAATAACGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGCATGACGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAGCCATG






ATGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CGAACATTGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAGCAATGGGGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGAATAACAATGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTCTGACACCCGAAC






AGGTGGTCGCCATTGCTTCTAACATCGGAGGA






CGGCCAGCCTTGGAGTCCATCGTAGCCCAATT






GTCCAGGCCCGATCCCGCGTTGGCTGCGTTAA






CGAATGACCATCTGGTGGCGTTGGCATGTCTT






GGTGGACGACCCGCGCTCGATGCAGTCAAAAA






GGGTCTGCCTCATGCTCCCGCATTGATCAAAA






GAACCAACCGGCGGATTCCCGAGAGAACTTCC






CATCGAGTCGCGGGATCC








>ERCC2_Rig
GCTAGCaccATGGACTACAAAGACCATGACGG
259.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
260.


ht_TAGCGAG
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



CGCGACCCC_
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



TAL/006/01
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



4/017/024/
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



026/014/01
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASNIGGKQALETVQRLLPVLCQDHGLTPEQVVAI



7/024/027/
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ANNNGGKQALETVQRLLPVLCQAHGLTPDQVVAIASH



014/016/02
GGGGCATGGCTTCACTCATGCGCATATTGTCG

DGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNGG



2/027/012/
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNIGGKQA



JDS71/
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIANNNGGKQALET



(‘TAGCGAGC
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASHDGGKQALETVQR



GCGACCCC’
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIANNNGGKQALETVQRLLP



disclosed
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASHDGGKQALETVQRLLPVLC



as SEQ ID
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIANNNGGKQALETVQRLLPVLCQAH



NO: 419)
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASNIGGKQALETVQRLLPVLCQAHGLT




ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASHDGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCGAACATT

IASHDGGKQALETVQRLLPVLCQAHGLTPEQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

HDGGRPALESIVAQLSRPDPALAALTNDHLVALACLG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GRPALDAVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TTACACCGGAGCAAGTCGTGGCCATTGCAAAT






AATAACGGTGGCAAACAGGCTCTTGAGACGGT






TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCGCATGACGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAACAACAACGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCGAACATTGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAATAATAACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGCA






TGACGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAACAACAACGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCACATGACGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAATAATAACGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGAACATTGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAGCCATG






ATGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CACATGACGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCATCCCACGACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACACCCGAACAGGTGG






TCGCCATTGCTTCCCACGACGGAGGACGGCCA






GCCTTGGAGTCCATCGTAGCCCAATTGTCCAG






GCCCGATCCCGCGTTGGCTGCGTTAACGAATG






ACCATCTGGTGGCGTTGGCATGTCTTGGTGGA






CGACCCGCGCTCGATGCAGTCAAAAAGGGTCT






GCCTCATGCTCCCGCATTGATCAAAAGAACCA






ACCGGCGGATTCCCGAGAGAACTTCCCATCGA






GTCGCGGGATCC








>FANCA_Lef
GCTAGCaccATGGACTACAAAGACCATGACGG
261.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
262.


t_TAGGCGCC
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



AAGGCCATGT
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



_TAL/006/0
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



14/019/022
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



/029/012/0
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASNIGGKQALETVQRLLPVLCQDHGLTPEQVVAI



17/021/026
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ANNNGGKQALETVQRLLPVLCQAHGLTPDQVVAIANN



/014/019/0
GGGGCATGGCTTCACTCATGCGCATATTGTCG

NGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDGG



22/027/011
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIANNNGGKQA



/020/024/J
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASHDGGKQALET



DS78/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASHDGGKQALETVQR



(‘TAGGCGCC
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASNIGGKQALETVQRLLP



AAGGCCATGT’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASNIGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIANNNGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIANNNGGKQALETVQRLLPVLCQAHGLT



NO: 420)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASHDGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCGAACATT

IASNIGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NGGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASNGGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAAT

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




AATAACGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGAATAACAATGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAGCCATGATGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGAACAATAATGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCATCCCACGACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGCA






TGACGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CTCCAATATTGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCGAACATTGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAATAATAACGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGAATAACAATGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAGCCATG






ATGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CACATGACGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAGCAACATCGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCCAACGGTGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCAACAACAACGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCTAATGGG






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>FANCA_Rig
GCTAGCaccATGGACTACAAAGACCATGACGG
263.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
264.


ht_TGGCCCG
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



AGGCGGAGTT
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



C_TAL/009/
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



014/017/02
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



2/027/014/
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIANNNGGKQALETVQRLLPVLCQDHGLTPEQVVAI



016/024/02
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ANNNGGKQALETVQRLLPVLCQAHGLTPDQVVAIASH



9/012/019/
GGGGCATGGCTTCACTCATGCGCATATTGTCG

DGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDGG



024/026/01
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASHDGGKQA



4/020/025/
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIANNNGGKQALET



JDS71/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASNIGGKQALETVQR



(‘TGGCCCGA
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIANNNGGKQALETVQRLLP



GGCGGAGTTC’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIANNNGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIANNNGGKQALETVQRLLPVLCQAHGLT



NO: 421)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIANNNGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASNIGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGAACAATAAT

IANNNGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NGGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNGG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASHDGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAAT

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




AATAACGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCGCATGACGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAGCCATGATGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCACATGACGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAATAATAACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGAA






CATTGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAACAACAACGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGAACAATAATGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCATCCCACGACGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGAATAACAATGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAACAACA






ACGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CGAACATTGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAATAATAACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCCAACGGTGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCTCGAATGGCGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCCCACGAC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>FANCC_Lef
GCTAGCaccATGGACTACAAAGACCATGACGG
265.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
266.


t_TGAAGGGA
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



CATCACCTTT
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



_TAL/009/0
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



11/016/024
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



/029/014/0
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIANNNGGKQALETVQRLLPVLCQDHGLTPEQVVAI



16/022/026
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASNIGGKQALETVQRLLPVLCQAHGLTPDQVVAIASN



/015/017/0
GGGGCATGGCTTCACTCATGCGCATATTGTCG

IGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNGG



21/027/012
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIANNNGGKQA



/020/025/J
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIANNNGGKQALET



DS78/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASNIGGKQALETVQR



(‘TGAAGGGA
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASHDGGKQALETVQRLLP



CATCACCTTT’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASNIGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNGGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASHDGGKQALETVQRLLPVLCQAHGLT



NO: 422)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASNIGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGAACAATAAT

IASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NGGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNGG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASNGGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAGC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




AACATCGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCGAACATTGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAACAACAACGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGAACAATAATGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAATAATAACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGAA






CATTGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAGCCATGATGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCGAACATTGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAATGGGGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGCATGACGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCTCCAATA






TTGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CACATGACGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCATCCCACGACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCCAACGGTGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCTCGAATGGCGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCTAATGGG






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>FANCC_Rig
GCTAGCaccATGGACTACAAAGACCATGACGG
267.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
268.


ht_TCTACTG
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



AATCTTGAGC
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



_TAL/007/0
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



15/016/022
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



/030/014/0
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVAI



16/021/030
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASNGGGKQALETVQRLLPVLCQAHGLTPDQVVAIASN



/012/020/0
GGGGCATGGCTTCACTCATGCGCATATTGTCG

IGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDGG



25/029/011
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNGGGKQA



/034/JDS71
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIANNNGGKQALET



/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASNIGGKQALETVQR



(‘TCTACTGA
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASNIGGKQALETVQRLLP



ATCTTGAGC’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASNGGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASNGGGKQALETVQRLLPVLCQAHGLT



NO: 423)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASNGGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIANNNGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCACATGAC

IASNIGGKQALETVQRLLPVLCQAHGLTPDQVVAIAN




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NNGGKQALETVQRLLPVLCQAHGLTPEQVVAIASHDG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GRPALESIVAQLSRPDPALAALTNDHLVALACLGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAGC

ALDAVKKGLPHAPALIKRTNRRIPERTSHRVAGS




AATGGGGGTGGCAAACAGGCTCTTGAGACGGT






TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCGAACATTGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAGCCATGATGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCAAACGGAGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAATAATAACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGAA






CATTGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CTCCAATATTGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCAAACGGAGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCATCCCACGACGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCCAACGGTGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCTCGAATG






GCGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGA






ACAATAATGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAGCAACATCGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGAATAACAATGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTCTGACACCCGAAC






AGGTGGTCGCCATTGCTTCCCACGACGGAGGA






CGGCCAGCCTTGGAGTCCATCGTAGCCCAATT






GTCCAGGCCCGATCCCGCGTTGGCTGCGTTAA






CGAATGACCATCTGGTGGCGTTGGCATGTCTT






GGTGGACGACCCGCGCTCGATGCAGTCAAAAA






GGGTCTGCCTCATGCTCCCGCATTGATCAAAA






GAACCAACCGGCGGATTCCCGAGAGAACTTCC






CATCGAGTCGCGGGATCC








>FANCG_Lef
GCTAGCaccATGGACTACAAAGACCATGACGG
269.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
270.


t_TCGGCCAC
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



CATGTCCC_T
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



AL/007/014
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



/019/022/0
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



27/011/017
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVAI



/022/026/0
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ANNNGGKQALETVQRLLPVLCQAHGLTPDQVVAIANN



15/019/025
GGGGCATGGCTTCACTCATGCGCATATTGTCG

NGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDGG



/027/012/J
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASHDGGKQA



DS71/
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASNIGGKQALET



(‘TCGGCCAC
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASHDGGKQALETVQR



CATGTCCC’
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASHDGGKQALETVQRLLP



disclosed
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASNIGGKQALETVQRLLPVLC



as SEQ ID
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNGGGKQALETVQRLLPVLCQAH



NO: 424)
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIANNNGGKQALETVQRLLPVLCQAHGLT




ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASNGGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCACATGAC

IASHDGGKQALETVQRLLPVLCQAHGLTPEQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

HDGGRPALESIVAQLSRPDPALAALTNDHLVALACLG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GRPALDAVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TTACACCGGAGCAAGTCGTGGCCATTGCAAAT






AATAACGGTGGCAAACAGGCTCTTGAGACGGT






TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGAATAACAATGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAGCCATGATGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCACATGACGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAGCAACATCGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGCA






TGACGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAGCCATGATGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCGAACATTGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAATGGGGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGAATAACAATGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCTCGAATG






GCGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CACATGACGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCATCCCACGACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACACCCGAACAGGTGG






TCGCCATTGCTTCCCACGACGGAGGACGGCCA






GCCTTGGAGTCCATCGTAGCCCAATTGTCCAG






GCCCGATCCCGCGTTGGCTGCGTTAACGAATG






ACCATCTGGTGGCGTTGGCATGTCTTGGTGGA






CGACCCGCGCTCGATGCAGTCAAAAAGGGTCT






GCCTCATGCTCCCGCATTGATCAAAAGAACCA






ACCGGCGGATTCCCGAGAGAACTTCCCATCGA






GTCGCGGGATCC








>FANCG_Rig
GCTAGCaccATGGACTACAAAGACCATGACGG
271.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
272.


ht_TCCAGGC
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



AGCTGGAGCC
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



C_TAL/007/
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



012/016/02
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



4/029/012/
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVAI



016/024/02
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIASN



7/015/019/
GGGGCATGGCTTCACTCATGCGCATATTGTCG

IGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNGG



024/026/01
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIANNNGGKQA



4/017/022/
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASHDGGKQALET



JDS71/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASNIGGKQALETVQR



(‘TCCAGGCA
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIANNNGGKQALETVQRLLP



GCTGGAGCCC’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASHDGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNGGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIANNNGGKQALETVQRLLPVLCQAHGLT



NO: 425)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIANNNGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASNIGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCACATGAC

IANNNGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

HDGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASHDGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCATCC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




CACGACGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCGAACATTGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAACAACAACGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGAACAATAATGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCATCCCACGACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGAA






CATTGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAACAACAACGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCACATGACGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAATGGGGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGAATAACAATGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAACAACA






ACGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CGAACATTGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAATAATAACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCGCATGACGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCAGCCATGATGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCCCACGAC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>JAK2_Left
GCTAGCaccATGGACTACAAAGACCATGACGG
273.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
274.


_TCTGAAAAA
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



GACTCTGCA_
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



TAL/007/01
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



5/019/021/
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



026/011/01
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVAI



6/021/029/
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASNGGGKQALETVQRLLPVLCQAHGLTPDQVVAIANN



011/017/02
GGGGCATGGCTTCACTCATGCGCATATTGTCG

NGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNIGG



5/027/015/
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNIGGKQA



019/022/JD
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASNIGGKQALET



S70/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASNIGGKQALETVQR



(‘TCTGAAAA
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASNIGGKQALETVQRLLP



AGACTCTGCA’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIANNNGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNIGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASHDGGKQALETVQRLLPVLCQAHGLT



NO: 426)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASNGGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCACATGAC

IASNGGGKQALETVQRLLPVLCQAHGLTPDQVVAIAN




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NNGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASNIGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAGC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




AATGGGGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGAATAACAATGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCTCCAATATTGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCGAACATTGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAGCAACATCGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGAA






CATTGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CTCCAATATTGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGAACAATAATGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAACATCGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGCATGACGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCTCGAATG






GCGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CACATGACGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAGCAATGGGGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGAATAACAATGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCAGCCATGATGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCTAACATC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>JAK2_Righ
GCTAGCaccATGGACTACAAAGACCATGACGG
275.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
276.


t_TCCATTTC
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



TGTCATCGTA
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



_TAL/007/0
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



12/016/025
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



/030/015/0
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVAI



17/025/029
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIASN



/015/017/0
GGGGCATGGCTTCACTCATGCGCATATTGTCG

IGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNGGG



21/030/012
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNGGGKQA



/019/025/J
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASNGGGKQALET



DS70/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASHDGGKQALETVQR



(‘TCCATTTC
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASNGGGKQALETVQRLLP



TGTCATCGTA’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIANNNGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNGGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASHDGGKQALETVQRLLPVLCQAHGLT



NO: 427)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASNIGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASNGGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCACATGAC

IASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIAN




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NNGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNGG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASNIGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCATCC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




CACGACGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCGAACATTGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCTCGAATGGCGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCAAACGGAGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAGCAATGGGGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGCA






TGACGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CTCGAATGGCGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGAACAATAATGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAATGGGGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGCATGACGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCTCCAATA






TTGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CAAACGGAGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCATCCCACGACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGAATAACAATGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCTCGAATGGCGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCTAACATC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>KRAS_Left
GCTAGCaccATGGACTACAAAGACCATGACGG
277.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
278.


_TGAAAATGA
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



CTGAATATA_
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



TAL/009/01
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



1/016/021/
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



026/015/01
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIANNNGGKQALETVQRLLPVLCQDHGLTPEQVVAI



9/021/027/
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASNIGGKQALETVQRLLPVLCQAHGLTPDQVVAIASN



015/019/02
GGGGCATGGCTTCACTCATGCGCATATTGTCG

IGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNIGG



1/026/015/
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNIGGKQA



016/025/JD
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASNGGGKQALET



S70/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIANNNGGKQALETVQR



(‘TGAAAATG
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASNIGGKQALETVQRLLP



ACTGAATATA’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASHDGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNGGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIANNNGGKQALETVQRLLPVLCQAHGLT



NO: 428)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASNIGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASNIGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGAACAATAAT

IASNGGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NIGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNGG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASNIGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAGC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




AACATCGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCGAACATTGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCTCCAATATTGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCGAACATTGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAGCAATGGGGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGAATAA






CAATGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CTCCAATATTGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCACATGACGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAATGGGGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGAATAACAATGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCTCCAATA






TTGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CGAACATTGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAGCAATGGGGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCGAACATTGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCTCGAATGGCGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCTAACATC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>KRAS_Righ
GCTAGCaccATGGACTACAAAGACCATGACGG
279.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
280.


t_TTGCCTAC
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



GCCACCAGC_
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



TAL/010/01
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



4/017/022/
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



030/011/01
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASNGGGKQALETVQRLLPVLCQDHGLTPEQVVAI



7/024/027/
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ANNNGGKQALETVQRLLPVLCQAHGLTPDQVVAIASH



012/016/02
GGGGCATGGCTTCACTCATGCGCATATTGTCG

DGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDGG



2/027/011/
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNGGGKQA



034/JDS71/
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASNIGGKQALET



(‘TTGCCTAC
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASHDGGKQALETVQR



GCCACCAGC’
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIANNNGGKQALETVQRLLP



disclosed
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASHDGGKQALETVQRLLPVLC



as SEQ ID
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAH



NO: 429)
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASNIGGKQALETVQRLLPVLCQAHGLT




ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASHDGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCAAACGGA

IASNIGGKQALETVQRLLPVLCQAHGLTPDQVVAIAN




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NNGGKQALETVQRLLPVLCQAHGLTPEQVVAIASHDG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GRPALESIVAQLSRPDPALAALTNDHLVALACLGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAAT

ALDAVKKGLPHAPALIKRTNRRIPERTSHRVAGS




AATAACGGTGGCAAACAGGCTCTTGAGACGGT






TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCGCATGACGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAGCCATGATGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCAAACGGAGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAGCAACATCGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGCA






TGACGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAACAACAACGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCACATGACGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCATCCCACGACGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGAACATTGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAGCCATG






ATGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CACATGACGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAGCAACATCGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGAATAACAATGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTCTGACACCCGAAC






AGGTGGTCGCCATTGCTTCCCACGACGGAGGA






CGGCCAGCCTTGGAGTCCATCGTAGCCCAATT






GTCCAGGCCCGATCCCGCGTTGGCTGCGTTAA






CGAATGACCATCTGGTGGCGTTGGCATGTCTT






GGTGGACGACCCGCGCTCGATGCAGTCAAAAA






GGGTCTGCCTCATGCTCCCGCATTGATCAAAA






GAACCAACCGGCGGATTCCCGAGAGAACTTCC






CATCGAGTCGCGGGATCC








>MYC_Left_
GCTAGCaccATGGACTACAAAGACCATGACGG
281.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
282.


TGCTTAGACG
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



CTGGATTT_T
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



AL/009/012
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



/020/025/0
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



26/014/016
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIANNNGGKQALETVQRLLPVLCQDHGLTPEQVVAI



/022/029/0
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIASN



12/020/024
GGGGCATGGCTTCACTCATGCGCATATTGTCG

GGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNGGG



/029/011/0
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNIGGKQA



20/025/JDS
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIANNNGGKQALET



78/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASNIGGKQALETVQR



(‘TGCTTAGA
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASHDGGKQALETVQRLLP



CGCTGGATTT’ 
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIANNNGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASNGGGKQALETVQRLLPVLCQAHGLT



NO: 430)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIANNNGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIANNNGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGAACAATAAT

IASNIGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NGGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNGG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASNGGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCATCC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




CACGACGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCCAACGGTGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCTCGAATGGCGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCGAACATTGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAATAATAACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGAA






CATTGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAGCCATGATGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGAACAATAATGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCATCCCACGACGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCCAACGGTGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAACAACA






ACGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGA






ACAATAATGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAGCAACATCGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCCAACGGTGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCTCGAATGGCGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCTAATGGG






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>MYC_Right
GCTAGCaccATGGACTACAAAGACCATGACGG
283.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
284.


_TTCGGTGCT
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



TACCTGGTT_
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



TAL/010/01
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



2/019/024/
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



030/014/01
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASNGGGKQALETVQRLLPVLCQDHGLTPEQVVAI



7/025/030/
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIANN



011/017/02
GGGGCATGGCTTCACTCATGCGCATATTGTCG

NGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNGG



2/030/014/
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNGGGKQA



019/025/JD
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIANNNGGKQALET



S78/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASHDGGKQALETVQR



(‘TTCGGTGC
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASNGGGKQALETVQRLLP



TTACCTGGTT’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASNGGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNIGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASHDGGKQALETVQRLLPVLCQAHGLT



NO: 431)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASHDGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASNGGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCAAACGGA

IANNNGGKQALETVQRLLPVLCQAHGLTPDQVVAIAN




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NNGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNGG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASNGGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCATCC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




CACGACGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGAATAACAATGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAACAACAACGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCAAACGGAGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAATAATAACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGCA






TGACGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CTCGAATGGCGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCAAACGGAGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAACATCGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGCATGACGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAGCCATG






ATGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CAAACGGAGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAATAATAACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGAATAACAATGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCTCGAATGGCGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCTAATGGG






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>PTEN_Left
GCTAGCaccATGGACTACAAAGACCATGACGG
285.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
286.


_TCCCAGACA
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



TGACAGCC_T
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



AL/007/012
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



/017/021/0
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



29/011/017
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVAI



/021/030/0
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIASH



14/016/022
GGGGCATGGCTTCACTCATGCGCATATTGTCG

DGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNIGG



/026/014/0
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIANNNGGKQA



32/JDS71/
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASNIGGKQALET



(‘TCCCAGAC
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASHDGGKQALETVQR



ATGACAGCC’
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASNIGGKQALETVQRLLP



disclosed
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASNGGGKQALETVQRLLPVLC



as SEQ ID
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIANNNGGKQALETVQRLLPVLCQAH



NO: 432)
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASNIGGKQALETVQRLLPVLCQAHGLT




ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASHDGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASNIGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCACATGAC

IANNNGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

HDGGKQALETVQRLLPVLCQAHGLTPEQVVAIASHDG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GRPALESIVAQLSRPDPALAALTNDHLVALACLGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCATCC

ALDAVKKGLPHAPALIKRTNRRIPERTSHRVAGS




CACGACGGTGGCAAACAGGCTCTTGAGACGGT






TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCGCATGACGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCTCCAATATTGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGAACAATAATGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAGCAACATCGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGCA






TGACGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CTCCAATATTGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCAAACGGAGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAATAATAACGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGAACATTGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAGCCATG






ATGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CGAACATTGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAATAATAACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCGCATGACGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTCTGACACCCGAAC






AGGTGGTCGCCATTGCTTCCCACGACGGAGGA






CGGCCAGCCTTGGAGTCCATCGTAGCCCAATT






GTCCAGGCCCGATCCCGCGTTGGCTGCGTTAA






CGAATGACCATCTGGTGGCGTTGGCATGTCTT






GGTGGACGACCCGCGCTCGATGCAGTCAAAAA






GGGTCTGCCTCATGCTCCCGCATTGATCAAAA






GAACCAACCGGCGGATTCCCGAGAGAACTTCC






CATCGAGTCGCGGGATCC








>PTEN_Righ
GCTAGCaccATGGACTACAAAGACCATGACGG
287.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
288.


t_TCCTTTTG
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



TTTCTGCTAA
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



_TAL/007/0
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



12/020/025
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



/030/015/0
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVAI



19/025/030
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIASN



/015/017/0
GGGGCATGGCTTCACTCATGCGCATATTGTCG

GGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNGGG



25/029/012
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNGGGKQA



/020/021/J
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASNGGGKQALET



DS70/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIANNNGGKQALETVQR



(‘TCCTTTTG
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASNGGGKQALETVQRLLP



TTTCTGCTAA’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASNGGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNGGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASHDGGKQALETVQRLLPVLCQAHGLT



NO: 433)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASNGGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIANNNGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCACATGAC

IASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NGGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNIG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASNIGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCATCC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




CACGACGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCCAACGGTGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCTCGAATGGCGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCAAACGGAGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAGCAATGGGGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGAATAA






CAATGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CTCGAATGGCGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCAAACGGAGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAATGGGGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGCATGACGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCTCGAATG






GCGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGA






ACAATAATGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCATCCCACGACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCCAACGGTGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCTCCAATATTGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCTAACATC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>TP53_Left
GCTAGCaccATGGACTACAAAGACCATGACGG
289.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
290.


_TTGCCGTCC
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



CAAGCAATG_
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



TAL/010/01
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



4/017/022/
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



029/015/01
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASNGGGKQALETVQRLLPVLCQDHGLTPEQVVAI



7/022/027/
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ANNNGGKQALETVQRLLPVLCQAHGLTPDQVVAIASH



011/016/02
GGGGCATGGCTTCACTCATGCGCATATTGTCG

DGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDGG



4/027/011/
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIANNNGGKQA



016/025/JD
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASNGGGKQALET



S74/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASHDGGKQALETVQR



(‘TTGCCGTC
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASHDGGKQALETVQRLLP



CCAAGCAATG’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASHDGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNIGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASNIGGKQALETVQRLLPVLCQAHGLT



NO: 434)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIANNNGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCAAACGGA

IASNIGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NIGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNGG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIANNNGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAAT

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




AATAACGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCGCATGACGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAGCCATGATGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGAACAATAATGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAGCAATGGGGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGCA






TGACGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAGCCATGATGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCACATGACGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAACATCGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGAACATTGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAACAACA






ACGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CACATGACGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAGCAACATCGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCGAACATTGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCTCGAATGGCGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTAATAATAAC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>TP53_Righ
GCTAGCaccATGGACTACAAAGACCATGACGG
291.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
292.


t_TGTTCAAT
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



ATCGTCCGGG
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



_TAL/009/0
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



15/020/022
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



/026/011/0
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIANNNGGKQALETVQRLLPVLCQDHGLTPEQVVAI



20/021/030
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASNGGGKQALETVQRLLPVLCQAHGLTPDQVVAIASN



/012/019/0
GGGGCATGGCTTCACTCATGCGCATATTGTCG

GGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDGG



25/027/012
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNIGGKQA



/019/024/J
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASNIGGKQALET



DS74/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASNGGGKQALETVQR



(‘TGTTCAAT
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASNIGGKQALETVQRLLP



ATCGTCCGGG’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASNGGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIANNNGGKQALETVQRLLPVLCQAHGLT



NO: 435)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASNGGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGAACAATAAT

IASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIAN




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NNGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIANNNGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAGC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




AATGGGGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCCAACGGTGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAGCCATGATGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCGAACATTGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAGCAACATCGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCCAA






CGGTGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CTCCAATATTGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCAAACGGAGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCATCCCACGACGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGAATAACAATGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCTCGAATG






GCGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CACATGACGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCATCCCACGACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGAATAACAATGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCAACAACAACGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTAATAATAAC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>XPA_Left_
GCTAGCaccATGGACTACAAAGACCATGACGG
293.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
294.


TGGGCCAGAG
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



ATGGCGGC_T
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



AL/009/014
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



/019/022/0
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



27/011/019
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIANNNGGKQALETVQRLLPVLCQDHGLTPEQVVAI



/021/029/0
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ANNNGGKQALETVQRLLPVLCQAHGLTPDQVVAIANN



11/020/024
GGGGCATGGCTTCACTCATGCGCATATTGTCG

NGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDGG



/029/012/0
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASHDGGKQA



19/024/JDS
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASNIGGKQALET



71/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIANNNGGKQALETVQR



(‘TGGGCCAG
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASNIGGKQALETVQRLLP



AGATGGCGGC’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIANNNGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNIGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASNGGGKQALETVQRLLPVLCQAHGLT



NO: 436)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIANNNGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIANNNGGKQALETVQRLLPVLCQDHGLTPEQVVA




AGACCAGGTAGTCGCAATCGCGAACAATAAT

IASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIAN




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NNGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASHDGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAAT

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




AATAACGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGAATAACAATGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAGCCATGATGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCACATGACGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAGCAACATCGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGAATAA






CAATGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CTCCAATATTGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGAACAATAATGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAACATCGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCCAACGGTGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAACAACA






ACGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGA






ACAATAATGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCATCCCACGACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGAATAACAATGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCAACAACAACGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCCCACGAC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>XPA_Right
GCTAGCaccATGGACTACAAAGACCATGACGG
295.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
296.


_TAAAGCCGC
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



CGCCTCCGG_
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



TAL/006/01
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



1/016/024/
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



027/012/01
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASNIGGKQALETVQRLLPVLCQDHGLTPEQVVAI



9/022/027/
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASNIGGKQALETVQRLLPVLCQAHGLTPDQVVAIASN



014/017/02
GGGGCATGGCTTCACTCATGCGCATATTGTCG

IGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNGG



2/030/012/
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASHDGGKQA



017/024/JD
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASHDGGKQALET



S74/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIANNNGGKQALETVQR



(‘TTAAAGCCG
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASHDGGKQALETVQRLLP



CCGCCTCCGG’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASHDGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIANNNGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASHDGGKQALETVQRLLPVLCQAHGLT



NO: 437)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASHDGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASNGGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCGAACATT

IASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

HDGGKQALETVQRLLPVLCQAHGLTPAQVVAIANNNG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIANNNGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCAAGC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




AACATCGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCGAACATTGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAACAACAACGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCACATGACGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCATCCCACGACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGAATAA






CAATGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAGCCATGATGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCACATGACGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAATAATAACGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGCATGACGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAGCCATG






ATGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CAAACGGAGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCATCCCACGACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCGCATGACGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCAACAACAACGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTAATAATAAC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>XPC_Left_
GCTAGCaccATGGACTACAAAGACCATGACGG
297.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
298.


TGCCCAGACA
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



AGCAACAT_T
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



AL/009/012
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



/017/022/0
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



26/014/016
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIANNNGGKQALETVQRLLPVLCQDHGLTPEQVVAI



/022/026/0
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIASH



11/019/022
GGGGCATGGCTTCACTCATGCGCATATTGTCG

DGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDGG



/026/011/0
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIASNIGGKQA



17/021/JDS
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIANNNGGKQALET



78/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIASNIGGKQALETVQR



(‘TGCCCAGA
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIASHDGGKQALETVQRLLP



CAAGCAACAT’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASNIGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNIGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIANNNGGKQALETVQRLLPVLCQAHGLT



NO: 438)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASHDGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASNIGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGAACAATAAT

IASHIGGKQALETVQRLLPVLCQAHGLTPDQVVAIAS




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

HDGGKQALETVQRLLPVLCQAHGLTPAQVVAIASNIG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASNGGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCATCC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




CACGACGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCGCATGACGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAGCCATGATGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGTCGAACATTGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCAAATAATAACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGTCGAA






CATTGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAGCCATGATGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCGAACATTGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAACATCGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGAATAACAATGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAGCCATG






ATGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CGAACATTGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCAAGCAACATCGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGTCGCATGACGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCTCCAATATTGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCTAATGGG






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC








>XPC_Right
GCTAGCaccATGGACTACAAAGACCATGACGG
299.
ASTMDYKDHDGDYKDHDIDYKDDDDKMAPKKKRKVGI
300.


_TCCCCGCGG
TGATTATAAAGATCATGACATCGATTACAAGG

HRGVPMVDLRTLGYSQQQQEKIKPKVRSTVAQHHEAL



CTCCCCGCC_
ATGACGATGACAAGATGGCCCCCAAGAAGAAG

VGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEA



TAL/007/01
AGGAAGGTGGGCATTCACCGCGGGGTACCTAT

THEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDT



2/017/022/
GGTGGACTTGAGGACACTCGGTTATTCGCAAC

GQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPDQV



029/012/01
AGCAACAGGAGAAAATCAAGCCTAAGGTCAGG

VAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVAI



9/024/027/
AGCACCGTCGCGCAACACCACGAGGCGCTTGT

ASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIASH



015/017/02
GGGGCATGGCTTCACTCATGCGCATATTGTCG

DGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDGG



2/027/012/
CGCTTTCACAGCACCCTGCGGCGCTTGGGACG

KQALETVQRLLPVLCQDHGLTPDQVVAIANNNGGKQA



019/022/JD
GTGGCTGTCAAATACCAAGATATGATTGCGGC

LETVQRLLPVLCQDHGLTPEQVVAIASHDGGKQALET



S71/
CCTGCCCGAAGCCACGCACGAGGCAATTGTAG

VQRLLPVLCQAHGLTPDQVVAIANNNGGKQALETVQR



(‘TCCCCGCG
GGGTCGGTAAACAGTGGTCGGGAGCGCGAGCA

LLPVLCQAHGLTPAQVVAIANNNGGKQALETVQRLLP



GCTCCCCGCC’
CTTGAGGCGCTGCTGACTGTGGCGGGTGAGCT

VLCQDHGLTPDQVVAIASHDGGKQALETVQRLLPVLC



disclosed
TAGGGGGCCTCCGCTCCAGCTCGACACCGGGC

QDHGLTPEQVVAIASNGGGKQALETVQRLLPVLCQAH



as SEQ ID
AGCTGCTGAAGATCGCGAAGAGAGGGGGAGTA

GLTPDQVVAIASHDGGKQALETVQRLLPVLCQAHGLT



NO: 439)
ACAGCGGTAGAGGCAGTGCACGCCTGGCGCAA

PAQVVAIASHDGGKQALETVQRLLPVLCQDHGLTPDQ




TGCGCTCACCGGGGCCCCCTTGAACCTGACCC

VVAIASHDGGKQALETVQRLLPVLCQDHGLTPEQVVA




CAGACCAGGTAGTCGCAATCGCGTCACATGAC

IASHDGGKQALETVQRLLPVLCQAHGLTPDQVVAIAN




GGGGGAAAGCAAGCCCTGGAAACCGTGCAAAG

NNGGKQALETVQRLLPVLCQAHGLTPAQVVAIASHDG




GTTGTTGCCGGTCCTTTGTCAAGACCACGGCC

GKQALETVQRLLPVLCQDHGLTPEQVVAIASHDGGRP




TTACACCGGAGCAAGTCGTGGCCATTGCATCC

ALESIVAQLSRPDPALAALTNDHLVALACLGGRPALD




CACGACGGTGGCAAACAGGCTCTTGAGACGGT

AVKKGLPHAPALIKRTNRRIPERTSHRVAGS




TCAGAGACTTCTCCCAGTTCTCTGTCAAGCCC






ACGGGCTGACTCCCGATCAAGTTGTAGCGATT






GCGTCGCATGACGGAGGGAAACAAGCATTGGA






GACTGTCCAACGGCTCCTTCCCGTGTTGTGTC






AAGCCCACGGTTTGACGCCTGCACAAGTGGTC






GCCATCGCCAGCCATGATGGCGGTAAGCAGGC






GCTGGAAACAGTACAGCGCCTGCTGCCTGTAC






TGTGCCAGGATCATGGACTGACCCCAGACCAG






GTAGTCGCAATCGCGAACAATAATGGGGGAAA






GCAAGCCCTGGAAACCGTGCAAAGGTTGTTGC






CGGTCCTTTGTCAAGACCACGGCCTTACACCG






GAGCAAGTCGTGGCCATTGCATCCCACGACGG






TGGCAAACAGGCTCTTGAGACGGTTCAGAGAC






TTCTCCCAGTTCTCTGTCAAGCCCACGGGCTG






ACTCCCGATCAAGTTGTAGCGATTGCGAATAA






CAATGGAGGGAAACAAGCATTGGAGACTGTCC






AACGGCTCCTTCCCGTGTTGTGTCAAGCCCAC






GGTTTGACGCCTGCACAAGTGGTCGCCATCGC






CAACAACAACGGCGGTAAGCAGGCGCTGGAAA






CAGTACAGCGCCTGCTGCCTGTACTGTGCCAG






GATCATGGACTGACCCCAGACCAGGTAGTCGC






AATCGCGTCACATGACGGGGGAAAGCAAGCCC






TGGAAACCGTGCAAAGGTTGTTGCCGGTCCTT






TGTCAAGACCACGGCCTTACACCGGAGCAAGT






CGTGGCCATTGCAAGCAATGGGGGTGGCAAAC






AGGCTCTTGAGACGGTTCAGAGACTTCTCCCA






GTTCTCTGTCAAGCCCACGGGCTGACTCCCGA






TCAAGTTGTAGCGATTGCGTCGCATGACGGAG






GGAAACAAGCATTGGAGACTGTCCAACGGCTC






CTTCCCGTGTTGTGTCAAGCCCACGGTTTGAC






GCCTGCACAAGTGGTCGCCATCGCCAGCCATG






ATGGCGGTAAGCAGGCGCTGGAAACAGTACAG






CGCCTGCTGCCTGTACTGTGCCAGGATCATGG






ACTGACCCCAGACCAGGTAGTCGCAATCGCGT






CACATGACGGGGGAAAGCAAGCCCTGGAAACC






GTGCAAAGGTTGTTGCCGGTCCTTTGTCAAGA






CCACGGCCTTACACCGGAGCAAGTCGTGGCCA






TTGCATCCCACGACGGTGGCAAACAGGCTCTT






GAGACGGTTCAGAGACTTCTCCCAGTTCTCTG






TCAAGCCCACGGGCTGACTCCCGATCAAGTTG






TAGCGATTGCGAATAACAATGGAGGGAAACAA






GCATTGGAGACTGTCCAACGGCTCCTTCCCGT






GTTGTGTCAAGCCCACGGTTTGACGCCTGCAC






AAGTGGTCGCCATCGCCAGCCATGATGGCGGT






AAGCAGGCGCTGGAAACAGTACAGCGCCTGCT






GCCTGTACTGTGCCAGGATCATGGACTGACAC






CCGAACAGGTGGTCGCCATTGCTTCCCACGAC






GGAGGACGGCCAGCCTTGGAGTCCATCGTAGC






CCAATTGTCCAGGCCCGATCCCGCGTTGGCTG






CGTTAACGAATGACCATCTGGTGGCGTTGGCA






TGTCTTGGTGGACGACCCGCGCTCGATGCAGT






CAAAAAGGGTCTGCCTCATGCTCCCGCATTGA






TCAAAAGAACCAACCGGCGGATTCCCGAGAGA






ACTTCCCATCGAGTCGCGGGATCC









OTHER EMBODIMENTS

A number of embodiments of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. Accordingly, other embodiments are within the scope of the following claims.

Claims
  • 1. A composition comprising a plurality of nucleic acids encoding one or more transcription activator-like effector (TALE) repeat domains, wherein at least one nucleic acid is selected from the group consisting of SEQ ID NOs: 58-62.
  • 2. The composition of claim 1, further comprising one or more of: at least one nucleic acid selected from the group consisting of SEQ ID NOs: 63-67at least one nucleic acid selected from the group consisting of SEQ ID NOs: 68-72;at least one nucleic acid selected from the group consisting of SEQ ID NOs: 73-77;at least one nucleic acid selected from the group consisting of SEQ ID NOs: 78-82;at least one nucleic acid selected from the group consisting of SEQ ID NOs: 83-87; orat least one nucleic acid selected from the group consisting of SEQ ID NOs: 88-92.
  • 3. The composition of claim 2, wherein the plurality of nucleic acids comprises each of SEQ ID NOs: 58-92.
  • 4. The composition of claim 1, wherein the nucleic acid encodes a functional TALE protein that is 33-35 amino acids.
  • 5. The composition of claim 1, wherein each nucleic acid of the composition is in a plasmid.
  • 6. The composition of claim 5, wherein each plasmid of the composition is in a cell.
  • 7. The composition of claim 1, wherein the at least one nucleic acid is linked to a solid support.
  • 8. A plasmid comprising at least one nucleic acid comprising a sequence selected from the group consisting of SEQ ID NOs: 58-62.
  • 9. The plasmid of claim 8, wherein the plasmid is in a cell.
CLAIM OF PRIORITY

This application is a continuation of U.S. patent application Ser. No. 15/156,574, filed May 17, 2016, which is a continuation of U.S. patent application Ser. No. 14/232,067, filed on Jun. 5, 2014, which claims priority to International Patent Application No. PCT/US2012/046451, filed on Jul. 12, 2012, which claims the benefit of U.S. Provisional Patent Application Ser. No. 61/610,212, filed on Mar. 13, 2012, 61/601,409, filed on Feb. 21, 2012, and 61/508,366, filed on Jul. 15, 2011. The entire contents of the foregoing applications are hereby incorporated by reference.

STATEMENT AS TO FEDERALLY SPONSORED RESEARCH

This invention was made with government support under grant number DPI OD006862 awarded by the National Institutes of Health. The government has certain rights in the invention.

US Referenced Citations (41)
Number Name Date Kind
4603044 Geho et al. Jul 1986 A
4797368 Carter et al. Jan 1989 A
4957773 Spencer et al. Sep 1990 A
6007988 Choo et al. Dec 1999 A
6013453 Choo et al. Jan 2000 A
6453242 Eisenberg et al. Sep 2002 B1
6492117 Choo et al. Dec 2002 B1
6503717 Case et al. Jan 2003 B2
6511808 Wolffe et al. Jan 2003 B2
6534261 Cox, III et al. Mar 2003 B1
7001768 Wolfe et al. Feb 2006 B2
7220719 Case May 2007 B2
7741086 Shi Jun 2010 B2
7914796 Miller Mar 2011 B2
8034598 Miller Oct 2011 B2
8071370 Wolffe Dec 2011 B2
8771986 Miller Jul 2014 B2
8962281 Doyon Feb 2015 B2
10273271 Joung et al. Apr 2019 B2
10676749 Joung et al. Jun 2020 B2
10894950 Joung et al. Jan 2021 B2
20020160940 Case et al. Jan 2002 A1
20020106680 Shinmyo Aug 2002 A1
20020119498 Joung et al. Aug 2002 A1
20020164575 Case et al. Nov 2002 A1
20030083283 Bennett et al. May 2003 A1
20060115850 Schatz Jun 2006 A1
20080131962 Miller Jun 2008 A1
20090133158 Lahaye et al. May 2009 A1
20100132069 Lahaye et al. May 2010 A1
20110059502 Chalasani Mar 2011 A1
20110112040 Liu et al. May 2011 A1
20110301073 Gregory et al. Dec 2011 A1
20120064620 Bonas Mar 2012 A1
20120100569 Liu et al. Apr 2012 A1
20130323220 Joung et al. Dec 2013 A1
20140274812 Joung et al. Sep 2014 A1
20150267176 Joung et al. Sep 2015 A1
20150376626 Joung et al. Dec 2015 A1
20160010076 Joung et al. Jan 2016 A1
20160024523 Joung et al. Jan 2016 A1
Foreign Referenced Citations (32)
Number Date Country
1941060 Jul 2008 EP
2206723 Jul 2010 EP
2003-501069 Jan 2003 JP
2003-531616 Oct 2003 JP
2013-529083 Jul 2013 JP
2015-527889 Sep 2015 JP
WO 1991016024 Oct 1991 WO
WO 1991017424 Nov 1991 WO
WO 9319202 Sep 1993 WO
WO 1993024641 Dec 1993 WO
WO 9517413 Jun 1995 WO
WO 9810095 Mar 1998 WO
WO 9947536 Sep 1999 WO
WO 0075368 Dec 2000 WO
WO 2001019981 Mar 2001 WO
WO 2001053480 Jul 2001 WO
WO 0183732 Nov 2001 WO
WO 2002057308 Jul 2002 WO
WO 2002099084 Dec 2002 WO
WO 2004099366 Nov 2004 WO
WO 2006071608 Jul 2006 WO
WO 2007128982 Nov 2007 WO
WO 2009134409 Nov 2009 WO
WO 2010037001 Apr 2010 WO
WO 2010079430 Jul 2010 WO
WO 2011017293 Feb 2011 WO
WO 2011019385 Feb 2011 WO
WO 2011072246 Jun 2011 WO
WO 2011146121 Nov 2011 WO
WO 2012138939 Oct 2012 WO
WO 2013012674 Jan 2013 WO
WO 2013017950 Feb 2013 WO
Non-Patent Literature Citations (387)
Entry
Briggs et al., Nucleic Acids Research, 2012, 40(15), pp. 1-10.
EP Office Action in European Appln. No. 10191041, dated Apr. 17, 2020, 5 pages.
Akopian et al., “Chimeric recombinases with designed DNA sequence recognition,” Proc Natl Acad Sci USA, Jul. 22, 2003;100(15):8688-91.
Alvarez and Curiel, “A phase I study of recombinant adenovims vector-mediated intraperitoneal delivery of herpes simplex vims thymidine kinase (HSV-TK) gene and intravenous ganciclovir for previously treated ovarian and extraovarian cancer patients,” Hum. Gene Ther., Mar. 1997, 5:597-613.
Anders and Huber, “Differential expression analysis for sequence count data,” Genome Biol., 11(10):R106, Epub Oct. 27, 2010.
Arimondo et al., “Exploring the Cellular Activity of Camptothecin—Triple-Helix-Forming Oligonucleotide Conjugates,” Mol. Cell. Biol., 26(1):324-33 (2006).
Arnould et al., “Engineering of large numbers of highly specific homing endonucleases that induce recombination on novel DNA targets,” J Mol Biol., 355(3):443-458, Epub Nov. 15, 2005.
Arnould et al., “The I-CreI meganuclease and its engineered derivatives: applications from cell modification to gene therapy,” Protein Eng Des Sel., 24(1-2):27-31, Epub Nov, 3, 2010.
Arora et al., “Residues 1-254 of anthrax toxin lethal factor are sufficient to cause cellular uptake of fused polypeptides,” J. Biol. Chem., Feb. 1993, 268:3334-41.
Aslanidis et al., “Ligation-independent cloning of PCR products (LIC-PCR),” Nucleic Acids Res., Oct. 25, 1990;18(20):6069-74.
Australian Office Action in Australian Application No. 2012284365, dated Jul. 29, 2016, 5 pages.
Bae et al., “Human zinc fingers as building blocks in the construction of artificial transcription factors,” Nat Biotechnol., 21(3):275-280, Epub Feb. 18, 2003.
Bannister et al., “Histone methylation: Dynamic or static?,” Cell, Jun. 28, 2002, 109(7): 801-806.
Batt, C.A., Chapter 14. Genetic Engineering of Food Proteins in Food Proteins and Their Applications, Damodaran, S., Ed. CRC Press, Mar. 12, 1997, p. 425.
Beerli and Barbas, “Engineering poly dactyl zinc-finger transcription factors,” Nat Biotechnol., 20(2):135-141, Feb. 2002.
Beerli et al., “Toward controlling gene expression at will: Specific regulation of the erbB-2/HER-2 promoter using polydactyl zinc finger proteins constructed from modular building blocks,” PNAS, Dec. 1998, 95: 14628-14633.
Bello et al., “Hypermethylation of the DNA repair gene MGMT: association with TP53 G:C to A:T transitions in a series of 469 nervous system tumors,” Mutat. Res., Oct. 2004, 554:23-32.
Berg, “Proposed structure for the zinc-binding domains from transcription factor IIIA and related proteins,” Proc Natl Acad Sci U S A., 85(1):99-102, Jan. 1988.
Bergmann et al. Epigenetic engineering shows H3K4me2 is required for HJURP targeting and CENP-A assembly on a synthetic human kinetochore. The EMBO Journal, vol. 30, pp. 328-340, Jan. 2011, published online Dec. 14, 2010, including pp. 1/14-14/14 of Supplementary Data.
Biancotto et al., “Histone modification therapy of cancer,” Adv Genet., 70:341-386, 2010.
Bibikova et al., “Enhancing gene targeting with designed zinc finger nucleases,” Science, May 2, 2003;300(5620):764.
Bibikova et al., “Stimulation of homologous recombination through targeted cleavage by chimeric nucleases,” Mol Cell Biol., Jan. 2001;21(1):289-97.
Blaese et al., “T lymphocyte-directed gene therapy for ADA-SCID: initial trial results after 4 years,” Science, Oct. 1995, 270(5235):475-480.
Blancafort et al., “Designing transcription factor architectures for drug discovery,” Mol Pharmacol., 66(6):1361-1371, Epub Aug. 31, 2004.
Boch et al., “Breaking the code of DNA binding specificity of TAL-type III effectors,” Science, 326(5959):1509-1512, Dec. 11, 2009.
Boch et al., “Xanthomonas AvrBs3 family-type III effectors: discovery and function,” Annu Rev Phytopathol., 48:419-436, 2010.
Boch, “TALEs of genome targeting,” Nat Biotechnol., 29(2):435-136, Feb. 2011.
Bogdanove & Voytas, “TAL Effectors: Customizable Proteins for DNA Targeting,” Science, 333:1843-1846 (2011).
Bogdanove et al., “TAL effectors: finding plant genes for disease and defense,” Curr. Opin. Plant Biol., 13:394-401 (2010).
Bonas et al., “Genetic and Structural Characterization of the Avirulence Gene AVR-BS3 From Xanthomonas-Campestris Pathovar Vesicatoria,” Molecular and General Genetics, Jul. 1989, 218(1): 127-136.
Boyle et al., “High-resolution mapping and characterization of open chromatin across the genome,” Cell., 132(2):311-322, Jan. 25, 2008.
Briggs et al., “Iterative capped assembly: rapid and scalable synthesis of repeat-module DNA such as TAL effectors from individual monomers,” Nucleic Acids Res., Aug. 2012;40(15):e117.
Bulger and Groudine, “Functional and mechanistic diversity of distal transcription enhancers,” Cell., 144(3):327-339, Feb. 4, 2011.
Bultmann et al., “Targeted transcriptional activation of silent oct4 pluripotency gene by combining designer TALEs and inhibition of epigenetic modifiers,” Nucleic Acids Res., 40(12):5368-77. Epub Mar. 2, 2012.
Burnett et al., “Conditional macrophage ablation in transgenic mice expressing a Fas-based suicide gene,” J. Leukoc. Biol., Apr. 2004, 75(4):612-623.
Cade et al., “Highly efficient generation of heritable zebrafish gene mutations using homo- and heterodimeric TALENs,” Nucleic Acids Res., Sep. 2012, 40(16):8001-10.
Calo and Wysocka, “Modification of enhancer chromatin: what, how, and why?” Mol Cell., Mar. 2013, 49(5):825-837.
Carbonetti et al., “Use of pertussis toxin vaccine molecule PT19K/129G to deliver peptide epitopes for stimulation of a cytotoxic T lymphocyte response,” Abstr. Annu. Meet. Am. Soc. Microbiol., 1995, 95:295.
Carey et al., “A mechanism for synergistic activation of a mammalian gene by GAL4 derivatives,” Nature, 345(6273):361-364, May 24, 1990.
Caron et al., “Intracellular Delivery of a Tat-eGFP Fusion Protein into Muscle Cells,” Mol Ther., Mar. 2001, 3:310-318.
Carroll et al., “Design, construction and in vitro testing of zinc finger nucleases,” Nat Protoc., 1(3):1329-1341, 2006.
Carroll, “Progress and prospects: zinc-finger nucleases as gene therapy agents,” Gene Ther., 15(22):1463-1468, Epub Sep. 11, 2008.
Castellano et al., “Inducible recruitment of Cdc42 or WASP to a cell-surface receptor triggers actin polymerization and filopodium formation,” Curr. Biol., 1999, 9(7): 351-360.
Cathomen and Joung, “Zinc-finger nucleases: the next generation emerges,” Mol Ther., 16(7):1200-1207, Epub Jun. 10, 2008.
Cermak et al., “Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting,” Nucleic Acids Res., 39:e82, p. 1-11 (2011).
Chaikind et al., “Targeted DNA Methylation Using an Artificially Bisected M.Hhal Fused to Zinc Fingers,” PLoS One, 7(9):E44852 pp. 1-11 (2012).
Chase et al., “Histone methylation at H3K9: evidence for a restrictive epigenome in schizophrenia,” Schizophr Res., 149(1-3):15-20, Epub Jun. 28, 2013.
Chen et al., “Crystal structure of human histone lysine-specific demethylase 1 (LSD1),” Proc Natl Acad Sci U S A., 103(38):13956-13961, Epub Sep. 6, 2006.
Chen et al., “Fusion protein linkers: property, design and functionality,” Adv Drug Deliv Rev., 65(10):1357-1369, [author manuscript] Epub Sep. 29, 2012.
Chen et al., “Induced DNA demethylation by targeting Ten-Eleven Translocation 2 to the human ICAM-1 promoter,” Nucleic Acids Res., 42(3):4563-1574, Epub Nov. 4, 2013.
Chim et al., “Methylation profiling in multiple myeloma,” Leuk. Res., Apr. 2004, 28:379-85.
Choo and Klug, “Toward a code for the interactions of zinc fingers with DNA: selection of randomized fingers displayed on phage,” Proc Natl Acad Sci U S A., 91(23):11163-11167, Nov. 8, 1994.
Christian et al., “Targeting DNA Double-Strand Breaks with TAL Effector Nucleases,” Genetics, 2010, 186:757-761 (2010).
Chylinski et al., “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems,” RNA Biol., 10(5):726-737, Epub Apr. 5, 2013.
Coffman et al., “Improved renal function in mouse kidney allografts lacking MHC class I antigens,” J. Immunol., Jul. 1993, 151:425-35.
Cong et al., “Comprehensive interrogation of natural TALE DNA-binding modules and transcriptional repressor domains,” Nat Commun., 3:968, [author manuscript] Jul. 24, 2012.
Cong et al., “Multiplex genome engineering using CRISPR/Cas systems,” Science, 339(6121):819-823, Epub Jan. 3, 2013.
Consortium, The ENCODE Project, “An integrated encyclopedia of DNA elements in the human genome,” Nature, Sep. 2012, 488:57-74.
Copeland et al., “Targeting genetic alterations in protein methyltransferases for personalized cancer therapeutics,” Oncogene., 32(8):939-946, Epub Nov. 19, 2012.
Costa et al., “REELIN and schizophrenia: a disease at the interface of the genome and the epigenome,” Mol. Interv., Feb. 2002, 2:47-57.
Crabtree and Schreiber, “Three-part inventions: intracellular signaling and induced proximity,” Trends Biochem. Sci., Nov. 1996, 21(11):418-422.
Creyghton et al., “Histone H3K27ac separates active from poised enhancers and predicts developmental state,” Proc Natl Acad Sci U S A., 107(50):21931-21936, Epub Nov. 24, 2010.
Cronican et al., “A Class of Human Proteins that Deliver Functional Proteins into Mammalian Cells In Vitro and In Vivo,” Chem Biol., Jul. 2011, 18:833-838.
Cronican et al., “Potent Delivery of Functional Proteins into Mammalian Cells in Vitro and in Vivo Using a Supercharged Protein,” ACS Chem. Biol., 2010, 5:747.
D'Avignon et al., “Site-specific experiments on folding/unfolding of Jun coiled coils: thermodynamic and kinetic parameters from spin inversion transfer nuclear magnetic resonance at leucine-18,” Biopolymers, 83(3):255-267, Oct. 15, 2006.
Davis, “Transcriptional regulation by MAP kinases,” Mol Reprod Dev., Dec. 1995;42(4):459-67.
De Zhu, “The altered DNA methylation pattern and its implications in liver cancer,” Cell. Res., 2005, 15:272-80.
Derossi et al., “The Third Helix of the Antennapedia Homeodomain Translocates through Biological Membranes,” J. Biol. Chem., Apr. 1994, 269:10444.
Deshayes et al., “Cell-penetrating peptides: tools for intracellular delivery of therapeutics,” Cell. Mol. Life Sci., Aug. 2005, 62:1839-49.
Dhami et al., “Genomic approaches uncover increasing complexities in the regulatory landscape at the human SCL (TAL1) locus,” PLoS One, 5(2):e9059, Feb. 5, 2010.
Donnelly et al., “Targeted delivery of peptide epitopes to class I major histocompatibility molecules by a modified Pseudomonas exotoxin,” PNAS, Apr. 1993, 90:3530-34.
Doyle et al., “TAL Effector-Nucleotide Targeter (TALE-NT) 2.0: tools for TAL effector design and target prediction,” Nucleic Acids Res., 40(Web Server issue):W117-W122, Epub Jun. 12, 2012.
Doyle, Computational and experimental analysis of TAL effector-DNA binding [dissertation], Jan. 2013, Iowa State University, Ames, Iowa, 162 pages.
Doyon et al., “Heritable targeted gene disruption in zebrafish using designed zinc-finger nucleases,” Nat Biotechnol., Jun. 2008, 26:702-708.
Dranoff et al., “A phase I study of vaccination with autologous, irradiated melanoma cells engineered to secrete human granulocyte-macrophage colony stimulating factor,” Hum. Gene Ther., Jan. 1997, 8(1):111-23.
Dreidax et al., “Low p14ARF expression in neuroblastoma cells is associated with repressed histone mark status, and enforced expression induces growth arrest and apoptosis,” Hum Mol Genet., 22(9):1735-1745, May 1, 2013.
Dunbar et al., “Retrovirally Marked CD34-Enriched Peripheral Blood and Bone Marrow Cells Contribute to Long-Term Engraftment After Autologous Transplantation,” Blood, Jun. 1995, 85:3048-3057.
Eisenschmidt et al., “Developing a programmed restriction endonuclease for highly specific DNA cleavage,” Nucleic Acids Res., 33(22):7039-47 (2005).
El-Andaloussi et al., “Cell-penetrating peptides: mechanisms and applications,” Curr. Pharm. Des., 2005, 11:3597-3611.
Ellem et al., “A case report: immune responses and clinical course of the first human use of granulocyte/macrophage-colony-stimulating-factor-transduced autologous melanoma cells for immunotherapy,” Immunol Immunother., Mar. 1997, 44:10-20.
Elliot and O'Hare, “Intercellular trafficking and protein delivery by a herpesvirus structural protein,” Cell, 88(2):223-233, Jan. 24, 1997.
Elrod-Erickson et al., “High-resolution structures of variant Zif268-DNA complexes: implications for understanding zinc finger-DNA recognition,” Structure, 6(4):451-464, Apr. 15, 1998.
Endoh et al., “Cellular siRNA delivery using TatU1A and photo-induced RNA interference,” Methods Mol. Biol., 2010, 623:271-281.
Entry for CDKN2A, cyclin-dependent kinase inhibitor 2A [Homo sapiens (human)], Gene ID: 1029, updated on Oct. 31, 2016, and printed from http:www.ncbi.nlm.nih.gov/gene/1029 as pp. 1/9 on Nov. 1, 2016.
Ernst, J. et al., “Mapping and analysis of chromatin state dynamics in nine human cell types,” Nature, 2011, 473:43-49.
Esteller et al., “A Gene Hypermethylation Profile of Human Cancer,” Cancer Res., Apr. 2001, 61:3225-9.
Esteller et al., “Promoter Hypermethylation and BRCA1 Inactivation in Sporadic Breast and Ovarian Tumors,” J. Natl. Cancer Inst., Apr. 2000, 92:564-9.
European Office Action in Application No. 13797024.0, dated Mar. 16, 2018, 8 pages.
European Office Action in European Application No. 13797024.0, dated Jul. 18, 2017, 9 pages.
European Office Action in European Application No. 13845212, dated May 18, 2016, 1 page.
Evans et al., Binding, Regeneration of Plants, Plant Protoplasts, pp. 21-73 (1985).
Evans et al., Protoplasts Isolation and Culture, Handbook of Plant Cell Culture, pp. 124-176 (1983).
Extended European Search Report in Application No. 17205413.2, dated Mar. 23, 2018, 7 pages.
Extended European Search Report in Application No. 18191841.8, dated May 5, 2019, 9 pages.
Extended European Search Report in Application No. 18214166.3, dated Feb. 4, 2019, 9 pages.
Extended European Search Report in European Application No. 12814750.1, dated Jun. 30, 2015, 13 pages.
Extended European Search Report in European Application No. 13797024, dated Mar. 15, 2016, 9 pages.
Extended European Search Report in European Application No. 13845212, dated Apr. 29, 2016, 6 pages.
Externded European Search Report in European Application No. 14749683, dated Sep. 9, 2016, 7 pages.
Fahraeus et al., “Inhibition of pRb phosphorylation and cell-cycle progression by a 20-residue peptide derived from p16CDKN2/INK4A,” Curr Biol., 6(1):84-91, Jan. 1, 1996.
Foley et al., “Targeted mutagenesis in zebrafish using customized zinc-finger nucleases”, Nature Protocols, Nature Publishing Group, Jan. 2009, 4(12):1855-1868.
Fonfara et al., “Creating highly specific nucleases by fusion of active restriction endonucleases and catalytically inactive homing endonucleases,” Nucleic Acids Res., 40(2):847-860, Epub Sep. 29, 2011.
Frauer et al., “Different Binding Properties and Function of CXXC Zinc Finger Domains in Dnmt1 and Tet1,” PLoS One, Feb. 2011, 6: e16627.
Freeman et al., “Inducible Prostate Intraepithelial Neoplasia with Reversible Hyperplasia in Conditional FGFR1-Expressing Mice,” Cancer Res., Dec. 2003, 63(23):8256-8563.
Futaki, “Oligoarginine vectors for intracellular delivery: design and cellular-uptake mechanisms,” Biopolymers, 2006, 84:241-249.
Gao et al., “Hypermethylation of the RASSF1A gene in gliomas,” Clin. Chim. Acta., Nov. 2004, 349:173-9.
Garcia-Bustos et al., “Nuclear protein localization,” Biochim Biophys Acta., 1071(1):83-101, Mar. 7, 1991.
Garg et al., “Engineering synthetic TAL effectors with orthogonal target sites,” Nucleic Acids Res., 40(15):7584-7595, Epub May 11, 2012.
Gavin et al., “Dimethylated lysine 9 of histone 3 is elevated in schizophrenia and exhibits a divergent response to histone deacetylase inhibitors in lymphocyte cultures,” J. Psychiatry Neurosci., May 2009, 34(3):232-7.
Geibler et al., “Transcriptional Activators of Human Genes with Programmable DNA-Specificity ,” PLoS ONE, 6:e19509 (2011).
GenBank Accesion No. FJ176909.1, “Xanthomonas oryzae pv. oryzae clone 041 avirulence/virulence factor repeat domain proteinlike gene, complete sequence,” dated Sep. 30, 2008 [retrieved on Aug. 30, 2018]. Retrieved from the Internet: URL <https ://www.ncbi.nlm.nih.gov/nuccore/FJ176909.1/> 2 pages.
GenBank Accession No. NM_001009999.2, “Homo sapiens lysine (K)-specific demethylase 1A (KDM1A), transcript variant 1, mRNA,” Apr, 6, 2014, 6 pages.
GenBank Accession No. NP_055828.2, “lysine-specific histone demethylase 1A isoform b [Homo sapiens],” Apr. 6, 2014, 4 pages.
GenBank Accession No. NM_015013.3, “Homo sapiens lysine (K)-specific demethylase 1A (KDM1A), transcript variant 2, mRNA,” Apr. 6, 2014, 6 pages.
GEO Sample G5M1008573, Duke DnaseSeq HEK293T, Sep. 25, 2012, printed as pp. 1/2-282 from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1008573. 2 pages.
Gillies et al., “A tissue-specific transcription enhancer element is located in the major intron of a rearranged immunoglobulin heavy chain gene,” Cell, 33(3):717-728, Jul. 1983.
Gong and Zhu, “Active DNA demethylation by oxidation and repair,” Cell Research, 2011, 21:1649-1651.
Gossen and Bujard, “Tight control of gene expression in mammalian cells by tetracycline-responsive promoters,” Proc Natl Acad Sci U S A., 89(12):5547-5551, Jun. 15, 1992.
Graef et al., “Proximity and orientation underlie signaling by the non-receptor tyrosine kinase ZAP70,” Embo. J., 1997, 16(18):5618-5628.
Greer et al. Histone methylation: a dynamic mark in health, disease and inheritance. Nature Reviews Genetics, vol. 13, pp. 343-357, published online Apr. 3, 2012. (Year: 2012).
Gregory et al., “Selective DNA demethylation by fusion of TOG with a sequence-specific DNA-binding domain”, Epigenetics, Apr. 2012, 7(4):344-349.
Grizot et al., “Generation of redesigned homing endonucleases comprising DNA-binding domains derived from two different scaffolds,” Nucleic Acids Res., 38(6):2006-2018, Epub Dec. 21, 2009.
Gross and Garrard, “Nuclease Hypersensitive Sites in Chromatin,” Annu. Rev. Biochem., Jul. 1988, 57:159-97.
Gruen et al., “An in vivo selection system for homing endonuclease activity,” Nucleic Acids Res., 30(7):e29, Apr. 1, 2002.
Gu et al., “R gene expression induced by a type-III effector triggers disease resistance in rice,” Nature, Jun. 23, 2005;435(7045):1122-5.
Guo et el., “Hydroxylation of 5-Methylcytosine by TET1 Promotes Active DNA Demethylation in the Adult Brain ,” Cell, 145:423-434 (2011).
Hakimi et al., “A core-BRAF35 complex containing histone deacetylase mediates repression of neuronal-specific genes,” Proceedings of the National Academy of Sciences of the United States of America, May 28, 2002, 99(11): 7420-7425.
Han et al., “CTCF is the Master Organizer of Domain-Wide Allele-Specific Chromatin at the H19/Igf2 Imprinted Region,” Mol Cell Biol., Feb. 2008, 28(3):1124-35.
Han et al., “Ligand-directed retroviral targeting of human breast cancer cells,” PNAS, Oct. 1995, 92:9747-51.
Harikrishna et al., “Construction and function of fusion enzymes of the human cytochrome P450scc system,” DNA Cell Biol., 12(5):371-379, Jun. 1993.
Harrison, “A structural taxonomy of DNA-binding domains,” Nature, 353(6346): 715-719, Oct. 24, 1991.
He et al., “Tet-Mediated Formation of 5-Carboxylcytosine and its Excision by TDG in Mammalian DNA,” Science, 333:1303-1307 (2011).
Heintzman et al., “Histone modifications at human enhancers reflect global cell-type-specific gene expression,” Nature, 459(7243):108-112, Epub Mar. 18, 2009.
Heppard et al., “Developmental and Growth Temperature Regulation of Two Different Microsomal [omega]-6 Desaturase Genes in Soybeans,” Plant Physiol., 1996, 110:311-319.
Hermonat & Muzyczka, “Use of adeno-associated virus as a mammalian DNA cloning vector: Transduction of neomycin resistance into mammalian tissue culture cells,” PNAS, Oct. 1984, 81:6466-70.
Hockemeyer et al., “Genetic engineering of human pluripotent cells using TALE nucleases,” Nat. Biotechnol., 29:731-734 (2011).
Hoivik et al., “DNA methylation of intronic enhancers directs tissue-specific expression of steroidogenic factor 1/adrenal 4 binding protein (SF-1/Ad4BP),” Endocrinology, 152(5):2100-2112, Epub Feb. 22, 2011.
Hopp et al., “A Short Polypeptide Marker Sequence Useful for Recombinant Protein Identification and Purification,” BioTechnology, Oct. 1988, 6:1204-10.
Hsu and Zhang, “Dissecting neural function using targeted genome engineering technologies,” ACS Chem Neurosci., 3(8):603-610, Epub Jul. 19, 2012.
Huang et al., “Heritable gene targeting in zebrafish using customized TALENs,” Nat. Biotechnol., 29:699-700 (2011).
Huang Shi, “Histone methyltransferases, diet nutrients and tumour suppressors,” Nature Reviews. Cancer, Jun. 2002, 2(6): 469-7-476.
Humphrey et al., “Stable histone deacetylase complexes distinguished by the presence of SANT domain proteins CoREST/kiaa0071 and Mta-L1,” Journal of Biological Chemistry, Mar. 2, 2001, 276(9): 6817-6824.
Inaba et al., “Generation of large numbers of dendritic cells from mouse bone marrow cultures supplemented with granulocyte/macrophage colony-stimulating factor,” J Exp Med., 176(6):1693-1702, Dec. 1, 1992.
International Preliminary Report on Patentability in International Application No. PCT/US2012/046451, dated Jan. 21, 2014, 6 pages.
International Preliminary Report on Patentability in International Application No. PCT/US2013/043075, dated Dec. 2, 2014, 7 pages.
International Preliminary Report on Patentability in International Application No. PCT/US2013/064511, dated Apr. 23, 2015, 12 pages.
International Preliminary Report on Patentability in International Application No. PCT/US2014/015343, dated Aug. 20, 2015, 7 pages.
International Search Report and Written Opinion in International Application No. PCT/US2012/046451, dated Nov. 15, 2012, 8 pages.
International Search Report and Written Opinion in International Application No. PCT/US2013/043075, dated Sep. 26, 2013, 10 pages.
International Search Report and Written Opinion in International Application No. PCT/US2013/064511, dated Jan. 30, 2014, 8 pages.
International Search Report and Written Opinion in International Application No. PCT/US2014/015343, dated Jun. 3, 2014, 17 pages.
Isalan et al., “A rapid, generally applicable method to engineer zinc fingers illustrated by targeting the HIV-1 promoter,” Nat. Biotechnol., 19(7):656-660, Jul. 2001.
Ito et al., “Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine,” Science, 333(6047):4300-1303, Sep. 2, 2011.
Iyer et al., “Prediction of novel families of enzymes involved in oxidative and other complex modifications of bases in nucleic acids,” Cell Cycle, Jun. 2009, 8(11):1698-1710.
Jamieson et al., “In vitro selection of zinc fingers with altered DNA-binding specificity,” Biochemistiy, 33(19):5689-5695, May 17, 1994.
Japanese Office Action in Japanese Application No. 2014-520317, dated Apr. 5, 2016, 8 pages (with English translation).
Jia et al., “Cancer gene therapy targeting cellular apoptosis machinery,” Cancer Treatment Reviews, 2012, 38: 868-879.
Jinek et al., “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity,” Science, 337(6096):816-821, Epub Jun. 28, 2012.
Joung and Sander, “TALENs: a widely applicable technology for targeted genome editing,” Nat Rev Mol Cell Biol., 14(1):49-55, Epub Nov. 21, 2012.
Joung et al., “Reply to “Successful genome editing with modularly assembled zinc finger nucleases”,” Nat. Methods, Jan. 2010, 7:91-92.
Joung et al., “A bacterial two-hybrid selection system for studying protein-DNA and protein-protein interactions,” Proc Natl Acad Sci USA, Jun. 20, 2000;97(13):7382-7.
Juillerat et al., “Comprehensive analysis of the specificity of transcription activator-like effector nucleases,” Nucleic Acids Res., 42(8):5390-5402, Epub Feb. 24, 2014.
Jumlongras et al., “An evolutionarily conserved enhancer regulates Bmp4 expression in developing incisor and limb bud,” PLoS One, 7(6):e38568, Epub Jun. 12, 2012.
Kamijo et al. Tumor spectrum in ARF-deficient mice. Cancer Research, vol. 59, pp. 2217-2222, May 1999, (Year: 1999).
Karmirantzou and Hamodrakas, “A Web-based classification system of DNA-binding protein families,” Protein Eng. 14(7):465-472, Jul. 2001.
Kay et al., “A bacterial effector acts as a plant transcription factor and induces a cell size regulator,” Science, Oct. 26, 2007;318(5850):648-51.
Kearns et al., “Recombinant adeno-associated virus (AAV-CFTR) vectors do not integrate in a site-specific fashion in an immortalized epithelial cell line,” Gene Ther., Sep. 1996, 9:748-55.
Kim et al., “Targeted genome editing in human cells with zinc finger nucleases constructed via modular assembly,” Genome Res., 19(7):4279-1288, Epub May 21, 2009.
Kim et al., “Genome editing with modularly assembled zinc-finger nucleases,” Nat. Methods, 7(2):91-92, Feb. 2010.
Kim et al., “Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain,” Proc Natl Acad Sci USA, Feb. 6, 1996;93(3):1156-60.
Klee et al., “Agrobacterium-Mediated Plant Transformation and its Further Applications to Plant Biology,” Ann. Rev. Plant Phys., Jun. 1987, 38:467-486.
Klimpel et al., “Anthrax toxin protective antigen is activated by a cell surface protease with the sequence specificity and catalytic properties of furin,” PNAS, Nov. 1992, 89:10277-81.
Klug, “Co-chairman's remarks: protein designs for the specific recognition of DNA,” Gene, 135(1-2):83-92, Dec. 15, 1993.
Ko et al., “Impaired hydroxylation of 5-methylcytosine in myeloid cancers with mutant TET2,” Nature, Dec. 2010, 468(7325):839-843.
Kohn et al., “Engraftment of gene-modified umbilical cord blood cells in neonates with adenosine deaminase deficiency,” Nat. Med., 1995, 1:1017-1023.
Koller et al., “Normal development of mice deficient in beta 2M, MHC class I proteins, and CD8+ T cells,” Science, Jun. 1990, 248:1227-30.
Kondo et al., “Epigenetic changes in colorectal cancer,” Cancer Metastasis Reviews, Jan. 2004, 23(1-2): 29-39.
Ku et al., “Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains,” PLoS Genet., 4(10):e1000242, Epub Oct. 31, 2008.
Kumar et al., “DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest,” J Biomol Struct Dyn., 26(6):679-686, Jun. 2009.
Kumar et al., “Identification of DNA-binding proteins using support vector machines and evolutionary profiles,” BMC Bioinformatics, 8:463, Nov. 27, 2007.
Kummerfeld and Teichmann, “DBD: a transcription factor prediction database,” Nucleic Acids Res., 34 (Database issue): D74-D81, Jan. 1, 2006.
Kurmasheva et al., “Upstream CpG island methylation of the PAX3 gene in human rhabdomyosarcomas,” Pediatr. Blood Cancer, Apr. 2005, 44:328-37.
Lawrence et al., “Supercharging Proteins Can Impart Unusual Resilience,” J. Am. Chem. Soc., 2007, 129:10110-10112.
Lea et al., “Aberrant p16 methylation is a biomarker for tobacco exposure in cervical squamous cell carcinogenesis,” Am. J. Obstet. Gynecol., 2004, 190:674-9.
Lee et al., “An essential role for CoREST in nucleosomal histone 3 lysine 4 demethylation,” Nature, 437(7057):432-435, Epub Aug. 3, 2005.
Lee et al., “Three-dimensional solution structure of a single zinc finger DNA-binding domain,” Science., 245(4918):635-637, Aug. 11, 1989.
Li et al. Regulatory mechanisms of tumor suppressor p16AINK4A and their relevance to cancer. Biochemistry, vol. 50, pp. 5566-5582, May 27, 2011.
Li et al., “DNA methylation in prostate cancer,” Biochim Biophys. Acta., Sep. 2004, 1704:87-102.
Li et al., “Modularly assembled designer TAL effector nucleases for targeted gene knockout and gene replacement in eukaiyotes,” Nucleic Acids Res., 39(14):6315-6325, Epub Mar. 31, 2011.
Li et al., “Protein trans-splicing as a means for viral vector-mediated in vivo gene therapy,” Hum Gene Ther., 19(9):958-964, Sep. 2008.
Li et al., “Transcription activator-like effector hybrids for conditional control and rewiring of chromosomal transgene expression,” Sci Rep., 2:897, Epub Nov. 28, 2012.
Li et al., “TAL nucleases (TALNs): hybrid proteins composed of TAL effectors and FokI DNA-cleavage domain,” Nucl Acids Res, 39:359-372 (2011).
Lin et al., “iDNA-Prot: identification of DNA binding proteins using random forest with grey model,” PLoS One., 6(9):e24756, Epub Sep. 15, 2011.
Lin et al., “Inhibition of Nuclear Translocation of Transcription Factor NF-κB by a Synthetic Peptide Containing a Cell Membrane-permeable Motif and Nuclear Localization Sequence,” J. Biol. Chem., 1995, 270:14255-58.
Lippow et al., “Creation of a type IIS restriction endonuclease with a long recognition sequence,” Nucleic Acids Res., 37(9):3061-3073, May 2009.
Liu et al., “Regulation of an endogenous locus using a panel of designed zinc finger proteins targeted to accessible chromatin regions. Activation of vascular endothelial growth factor A,” J Biol Chem., 276(14):11323-11334, Epub Jan. 5, 2001.
Liu et al., “Validated zinc finger protein designs for all 16 GNN DNA triplet targets,” J. Biol. Chem., 277(6):3850-3856, Epub Nov. 28, 2001.
Loenarz and Schofield, Oxygenase Catalyzed 5-Methylcytosine Hydroxylation, Chemistry & Biology, Jun. 2009, 16:580-583.
Lund et al., “DNA Methylation Polymorphisms Precede Any Histological Sign of Atherosclerosis in Mice Lacking Apolipoprotein E,” J. Biol. Chem., Jul. 2004, 279:29147-54.
Lutz-Freyerinuth et al., “Quantitative determination that one of two potential RNA-binding domains of the A protein component of the U1 small nuclear ribonucleoprotein complex binds with high affinity to stem-loop II of U1 RNA,” PNAS, Aug. 1990, 87:6393-97.
Mabaera et al., “Developmental- and differentiation-specific patterns of human γ- and β-globin promoter DNA methylation,” Blood, 110(4):1343-52 (2007).
Madrigal and Krajewski, “Current bioinformatic approaches to identify DNase I hypersensitive sites and genomic footprints from DNase-seq data,” Front Genet., 3:230, eCollection 2012, Oct. 31, 2012.
Maeder et al, “Upregulation of the Pluripotency-Associated miRNA 302-367 Cluster 1 Using Engineered Transcription Activator-Like Effector(TALE) Activators,” Molecular Therapy, 2012, 20:S193 499.
Maeder et al., “Rapid “open-source” engineering of customized zinc-finger nucleases for highly efficient gene modification,” Mol Cell., 31(2):294-301, Jul. 25, 2008.
Maeder et al., “Robust, synergistic regulation of human gene expression using TALE activators,” Nat Methods., 10(3):243-245, Epub Feb. 10, 2013.
Maeder et al., “Targeted DNA demethylation and activation of endogenous genes using programmable TALE-TET1 fusion proteins,” Nat Biotechnol., 31(12):1137-1142, [author manuscript] Epub Oct. 9, 2013.
Maeder et al., “Oligomerized pool engineering (OPEN): an ‘open-source’ protocol for making customized zinc-finger arrays,” Nat Protoc., 2009;4(10):1471-501.
Mahfouz et al., “Targeted transcriptional repression using a chimeric TALE-SRDX repressor protein,” Plant Mol Biol., 78(3):311-321, Epub Dec. 14, 2011.
Mahfouz et al., “De novo-engineered transcription activator-like effector (TALE) hybrid nuclease with novel DNA binding specificity creates double-strand breaks,” Proc Natl Acad Sci U S A, 108:2623-2628 (2011).
Maiti and Drohat, “Thymine DNA glycosylase can rapidly excise 5-formylcytosine and 5-carboxylcytosine: potential implications for active demethylation of CpG sites,” J Biol Chem., 286(41):35334-35338, Epub Aug. 23, 2011.
Majumdar et al., “Targeted Gene Knock in and Sequence Modulation Mediated by a Psoralen-linked Triplex-forming Oligonucleotide,” J Biol Chem., 283(17):11244-52 (2008).
Malech et al., “Prolonged production of NADPH oxidase-corrected granulocytes after gene therapy of chronic granulomatous disease,” PNAS, Oct. 1997, 94:12133-38.
Mancini et al. “CpG methylation within the 5′ regulatory region of the BRCA1 gene is tumor specific and includes a putative CREB binding site,” Oncogene, 1998, 16:1161-9.
Mandecki et al., “A totally synthetic plasmid for general cloning, gene expression and mutagenesis in Escherichia coli,” GENE, Sep. 28, 1990, 94(1):103-107.
Mandell and Barbas et al., “Zinc Finger Tools: custom DNA-binding domains for transcription factors and nucleases,” Nucleic Acids Res., 34(Web Server issue):W516-W523, Jul. 1, 2006.
Markmann et al., “Indefinite survival of MHC class I-deficient murine pancreatic islet allografts,” Transplantation, Dec. 1992, 54:1085-89.
Martin et al., “GAP domains responsible for ras p21-dependent inhibition of muscarinic atrial K+ channel currents,” Science, Jan. 1992, 255:192-194.
Maurano et al., “Systematic localization of common disease-associated variation in regulatory DNA,” Science, 337(6099):1190-1195, Epub Sep. 5, 2012.
McDaniell et al., “Heritable individual-specific and allele-specific chromatin signatures in humans,” Science, 328(5975):235-239, [author manuscript] Epub Mar. 18, 2010.
McNaughton et al., “Mammalian cell penetration, siRNA transfection, and DNA transfection by supercharged proteins,” PNAS, Apr. 2009, 106:6111.
Medenhall et al., “Identification of promoter targets of enhancers by epigenetic knockdown using TAL DNA binding proteins,” Epigenetics & Chromatin, 2013, 6(Suppl): 1-2.
Mendenhall et al., “Locus-specific editing of histone modifications at endogenous enhancers,” Nat Biotechnol., 31(12):1133-1136, Epub Sep. 8, 2013.
Metzger et al., “LSD1 demethylates repressive histone marks to promote androgen-receptor-dependent transcription,” Nature, 437(7057):436-439, 2005.
Miller et al., “A TALE nuclease architecture for efficient genome editing,” Nat. Biotechnol., 29(2): 143-148, Epub Dec. 22, 2010.
Miller et al., “Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes,” EMBO J., 4(6):1609-1614, Jun. 1985.
Moore et al., “Design of polyzinc finger peptides with structured linkers,” Proc Natl Acad Sci USA, Feb. 2001, 98:1432-1436.
Moore et al., “Improved somatic mutagenesis in zebrafish using transcription activator-like effector nucleases (TALENs),” PLoS One, May 2012, 7(5):e37877.
Morbitzer et al., “Regulation of selected genome loci using de novo-engineered transcription activator-like effector (TALE)-type transcription factors,” Proc Natl Acad Sci U S A., 107(50):21617-21622, Epub Nov. 24, 2010.
Morbitzer et al., “Assembly of custom TALE-type DNA binding domains by modular cloning,” Nucl Acids Res., 39:5790-5799 (2011).
Moscou and Bogdanove, “A simple cipher governs DNA recognition by TAL effectors,” Science, 326(5959):1501, Dec. 11, 2009.
Mussolino and Cathomen, “TALE nucleases: tailored genome engineering made easy,” Curr Opin Biotechnol., 23(5):644-650, Epub Feb. 17, 2012.
Mussolino et al., “A novel TALE nuclease scaffold enables high genome editing activity in combination with low toxicity,” Nucleic Acids Res., 2011, 39:9283-93.
Muthuswamy et al., “Controlled Dimerization of ErbB Receptors Provides Evidence for Differential Signaling by Homo- and Heterodimers,” Mol. Cell. Biol., Oct. 1999, 19(10):6845-6857.
Neering et al., “Transduction of primitive human hematopoietic cells with recombinant adenovirus vectors,” Blood, 88(4):1147-1155, Aug. 15, 1996.
Ng et al., “In vivo epigenomic profiling of germ cells reveals germ cell molecular signatures,” Dev Cell., 24(3):324-333, Epub Jan. 24, 2013.
Noonan and McCallion, “Genomics of long-range regulatory elements,” Annu Rev Genomics Hum Genet, 11:1-23, 2010.
Novak et al., “Functional Characterization of Protease-treated Bacillus anthracis Protective Antigen,” J. Biol. Chem., Aug. 1992, 267:17186-93.
Office Action in Australian Application No. 2017204819, dated Sep. 7, 2018, 7 pages.
Office Action in Australian Application No. 2014214719, dated Feb. 14, 2019, 3 pages.
Office Action in Canadian Application No. 2,841,710, dated Apr. 15, 2019, 4 pages.
Office Action in Canadian Application No. 2,841,710, dated May 11, 2018, 4 pages.
Office Action in European Application No. 12814750.1, dated Mar. 8, 2017, 7 pages.
Office Action in European Application No. 13845212.3, dated Feb. 15, 2018, 4 pages.
Office Action in Japanese Application No. 2014-520317, dated Jan. 17, 2017, 6 pages (with English translation).
Office Action in Japanese Application No. 2015-557129, dated Dec. 19, 2017, 8 pages (with English translation).
Office Action in Japanese Application No. 2017-136828, dated Sep. 11, 2018, 7 pages (with English translation).
Office Action in U.S. Appl. No. 13/838,520, dated Feb. 24, 2017, 49 pages.
Office Action in U.S. Appl. No. 14/435,065, dated Jan. 26, 2017, 22 pages.
Office Action in U.S. Appl. No. 14/435,065, dated Jul. 27, 2017, 25 pages.
Office Action in U.S. Appl. No. 14/766,713, dated Jan. 26, 2017, 3 9 pages.
Office Action in U.S. Appl. No. 14/766,713, dated Jul. 25, 2017.
Oligino et al., “Drug inducible transgene expression in brain using a herpes simplex virus vector,” Gene Ther., 5(4):491-496, Apr. 1998.
Ong and Corees, “Enhancer function: new insights into the regulation of tissue-specific gene expression,” Nat Rev Genet., 12(4):283-293, Epub Mar. 1, 2011.
Orlando et al., “Zinc-finger nuclease-driven targeted integration into mammalian genomes using donors with limited chromosomal homology,” Nucleic Acids Res., Aug. 2010;38(15):e152, 15 pages.
Ovchinnikov et al., “PspXI, a novel restriction endonuclease that recognizes the unusual DNA sequence 5-VCTCGAGB-3,” Bulletin of Biotechnology and Physico-chemical Biology, 2005, 1(1):18-24.
Paik W K et al., “Enzymatic Demethylation of Calf Thymus Histones,” Biochemical and Biophysical Research Communications, 1973, 51(3): 781-788.
Paiva et al., “Secretion of interferon by Bacillus subtilis,” Gene, 22(2-3):229-235, May-Jun. 1983.
Paques et al., “Meganucleases and DNA double-strand break-induced recombination: perspectives for gene therapy,” Current Gene Therapy, Bentham Science Publishers LTD, Feb. 1, 2007, 7(1):49-66.
Partial European Search Report in Application No. 18191841.8, dated Jan. 30, 2019, 17 pages.
Pavletich and Pabo, “Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 A,” Science, 252(5007):809-817, May 10, 1991.
Pekowska et al. H3K4 tri-methylation provides an epigenetic signature of active enhancers. The EMBO Journal, vol. 30, pp. 4198-4210, Aug. 16, 2011, including supplementary figures S1-S11, printed as pp. 1/13- 13/13.
Perelle et al., “Characterization of Clostridium perfringens Iota-Toxin Genes and Expression in Eschenichia coli,” Infect. Immun., Dec. 1993, 61:5147-56.
Perez-Pinera et al., “Synergistic and tunable human gene activation by combinations of synthetic transcription factors,” Nat Methods., 10(3):239-342, Epub Feb. 3, 2013.
Perez-Quintero et al., “An Improved Method for TAL Effectors DNA-Binding Sites Prediction Reveals Functional Convergence in TAL Repertoires of Xanthomonas oryzae Strains,” Jul. 2013, PLoS One, 8.
Pingoudand Silva, “Precision genome surgery,” Nat Biotechnol., 25(7):743-744, Jul. 2007.
Porteus & Baltimore, “Chimeric nucleases stimulate gene targeting in human cells,” Science. May 2, 2003;300(5620):763.
Prochiantz, “Getting hydrophilic compounds into cells: lessons from homeopeptides,” Curr. Opin. Neurobiol., Oct. 1996, 6:629-634.
Qi et al., “Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression,” Cell, 152(5):1173-1183, Feb. 28, 2013.
Rada-Iglesias et al., “A unique chromatin signature uncovers early developmental enhancers in humans,” Nature, 470(7333):279-283, Epub Dec. 15, 2010.
Ram et al., “Combinatorial patterning of chromatin regulators uncovered by genome-wide location analysis in human cells,” Cell, 147(7):1628-1639, Dec. 23, 2011.
Ramirez et al., “Unexpected failure rates for modular assembly of engineered zinc fingers,” Nat Methods., 5(5):374-375, May 2008.
Rebar and Pabo, “Zinc finger phage: affinity selection of fingers with new DNA-binding specificities,” Science, 263(5147):671-673, Feb. 4, 1994.
Rendahl et al., “Regulation of gene expression in vivo following transduction by two separate rAAV vectors,” Nat. Biotechnol., 16(8):757-761, Aug. 1998.
Reyon et al., “Engineering designer transcription activator-like effector nucleases (TALENs) by REAL or REAL-Fast assembly” Curr Protoc Mol Biol., Chapter 12:Unit 12.15, [author manuscript] Oct. 2012.
Reyon et al., “FLASH assembly of TALENs for high-throughput genome editing,” Nat Biotechnol., 30(5):460-465, May 2012.
Rivenbank et al., “Epigenetic reprogramming of cancer cells via targeted DNA methylation,” Epigenetics, Apr. 2012, 7(4): 350-360.
Rodenhiser and Mann, “Epigenetics and human disease: translating basic biology into clinical applications,” CMAJ, 174(3):341-348 (2006).
Rohde et al., “BISMA—Fast and accurate bisulfite sequencing data analysis of individual clones from unique and repetitive sequences,” BMC Bioinformatics, 11:230 12 pages (2010).
Romer et al., “Plant pathogen recognition mediated by promoter activation of the pepper Bs3 resistance gene,” Science, Oct. 26, 2007;318(5850):645-8.
Rosenbloom et al., “ENCODE whole-genome data in the UCSC Genome Browser: update 2012,” Nucleic Acids Res., 40(Database issue):D912-D917, Epub Nov. 9, 2011.
Rosenecker et al., “Adenovirus infection in cystic fibrosis patients: implications for the use of adenoviral vectors for gene transfer,” Infection, 1996, 24(1)5-8.
Rothman, “Mechanisms of intracellular protein transport,” Nature, 372(6501):55-63, Nov. 3, 1994.
Ruben et al., “Isolation of a rel-related human cDNA that potentially encodes the 65-kD subunit of NF-kappaB,” Science, Mar. 1991, 251:1490-93.
Sabo et al, “Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays,” Nat Methods., 3(7):511-518, Jul. 2006.
Sabo et al., “Discovery of functional noncoding elements by digital analysis of chromatin structure,” Proc Natl Acad Sci U S A., 101(48):16837-16842, Epub Nov. 18, 2004.
Sadowski et al., “GAL4-VP16 is an unusually potent transcriptional activator,” Nature, Oct. 1988, 335:563-564.
Samulski et al., “Helper-Free Stocks of Recombinant Adeno-Associated Viruses: Normal Integration Does Not Require Viral Gene Expression,” J. Virol., Sep. 1989, 63:3822-28.
Sander et al., “Targeted gene disruption in somatic zebrafish cells using engineered TALENs,” Nat. Biotechnol., 29:697-698 (2011).
Sanjana et al., “A transcription activator-like effector toolbox for genome engineering,” Nat Protoc., 7(1):171-192, Jan. 5, 2012.
Schleifman et al., “Triplex-mediated gene modification,” Methods Mol. Biol., 435:175-190, 2008.
Schmidt et al., “Arginine-rich cell-penetrating peptides,” FEBS Lett., May 2010, 584:1806-13.
Scholze & Boch, “TAL effectors are remote controls for gene activation,” J. Curr. Opin. Microbiol, 14:47-53 (2011).
Schonthal, “Regulation of gene expression by serine/threonine protein phosphatases,” Semin Cancer Biol., Aug. 1995;6(4):239-48.
Schornack et al., “Gene-for-gene-mediated recognition of nuclear-targeted AvrBs3-like bacterial effector proteins,” J Plant Physiol., Feb. 2006;163(3):256-72.
Sebo et al., “Cell-invasive activity of epitope-tagged adenylate cyclase of Bordetella pertussis allows in vitro presentation of a foreign epitope to CD8+ cytotoxic T cells,” Infect. Immun., Oct. 1995, 63:3851-57.
Segal et al., “Evaluation of a modular strategy for the construction of novel polydactyl zinc finger DNA-binding proteins,” Biochemistry, 42(7):2137-2148, Feb. 25, 2003.
Sera et al. Zinc-finger-based artificial transcription factors and their applications. Advanced Drug Deliveiy Reviews, vol. 61, pp. 513-526, Apr. 2009.
Sharma, “Schizophrenia, epigenetics and ligand-activated nuclear receptors: a framework for chromatin therapeutics,” Schizophr. Res., Jan. 2005, 72:79-90.
Shi et al., “Histone demethylation mediated by the nuclear amine oxidase homolog LSD1,” Cell, 119(7):941-953, Dec. 29, 2004.
Shi et al., “Metabolic enzymes and coenzymes in transcription—a direct link between metabolism and transcription?,” Trends in Genetics: TIG, Sep. 2004, 20(9): 445-452.
Silva et al., “Meganucleases and other tools for targeted genome engineering: perspectives and challenges for gene therapy,” Curr Gene Ther., 11(1):11-27, Feb. 2011.
Silver, “How Proteins Enter the Nucleus,” Cell, 64(3):489-497, Feb. 8, 1991.
Simon et al., “Sequence-specific DNA cleavage mediated by bipyridine polyamide conjugates,” Nucl. Acids Res., 36(11):3531-8 (2008).
Sipione et al., “Insulin expressing cells from differentiated embryonic stem cells are not beta cells,” Diabetologia, 47(3):499-508, Epub Feb. 14, 2004.
Skinner et al., “Use of the Glu-Glu-Phe C-terminal epitope for rapid purification of the catalytic domain of normal and mutant ras GTPase-activating proteins,” J. Biol. Chem., 1991, 266:14163-14166.
Stadler et al., “DNA-binding factors shape the mouse methylome at distal regulatory regions,” Nature, 480(7378):490-495, Dec. 14, 2011.
Stenmark et al., “Peptides fused to the amino-terminal end of diphtheria toxin are translocated to the cytosol,” J. Cell Biol., Jun. 1991, 113:1025-32.
Sterman et al., “Adenovirus-mediated herpes simplex virus thymidine kinase/ganciclovir gene therapy in patients with localized malignancy: results of a phase I clinical trial in malignant mesothelioma,” Hum. Gene Ther., May 1998, 7:1083-89.
Stoddard, “Homing endonuclease structure and function,” Q. Rev. Biophys., 38(1): 49-95, Epub Dec. 9, 2005.
Stott et al., “The alternative product from the human CDKN2A locus, p14(ARF), participates in a regulatoiy feedback loop with p53 and MDM2,” EMBO J., 17(17):5001-5014, Sep. 1, 1998.
Streubel et al., “TAL effector RVD specificities and efficiencies,” Nat Biotechnol., 30(7):593-595, Jul. 10, 2012.
Sugio et al., “Two type III effector genes of Xanthomonas oryzae pv. oryzae control the induction of the host genes OsTFIIAgammal and OsTFX1 during bacterial blight of rice,” Proc Natl Acad Sci USA, Jun. 19, 2007;104(25):10720-5.
Szyf et al., “DNA methylation and breast cancer,” Biochem. Pharmacol., Sep. 2004, 68:1187-97.
Tahiliani et al., “Conversion of 5-Methylcytosine to 5-Hydroxymethylcytosine in Mammalian DNA by MLL Partner TET1,” Science, 324:930-935 (2009).
Tan et al., “Zinc-finger protein-targeted gene regulation: genomewide single-gene specificity,” Proc Natl Acad Sci U S A., 100(21):11997-2002, Epub Sep. 26, 2003.
Tani et al., “Updates on current advances in gene therapy,” The West Indian Medical Journal, Mar. 2011, 60: 188-194.
Tesson et al., “Knockout rats generated by embryo microinjection of TALENs,” Nat. Biotechnol., 29:695-696 (2011).
Thiesen et al., “Conserved KRAB protein domain identified upstream from the zinc finger region of Kox 8,” Nucleic Acids Res., 1991, 19:3996.
Thompson et al., “Engineering and Identifying Supercharged Proteins for Macromolecule Delivery into Mammalian Cells,” Methods in Enzymology, 2012, 503:293-319.
Thurman et al., “The accessible chromatin landscape of the human genome,” Nature, 489(7414):75-82, Sep. 6, 2012.
Tjong and Zhou, “DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces,” Nucleic Acids Res., 35(5):4465-1477, Epub Feb. 6, 2007.
Topf et al., “Regional ‘pro-drug’ gene therapy: intravenous administration of an adenoviral vector expressing the E. coli cytosine deaminase gene and systemic administration of 5-fluorocytosine suppresses growth of hepatic metastasis of colon carcinoma,” Gene Ther., Apr. 1998, 5:507-513.
Townsend et al., “High-frequency modification of plant genes using engineered zinc-finger nucleases,” Nature: International Weekly Journal of Science, Nature Publishing Group, May 21, 2009, pp. 442-445.
Tratschin et al., “A human parvovirus, adeno-associated virus, as a eucaryotic vector: transient expression and encapsidation of the procaryotic gene for chloramphenicol acetyltransferase,” Mol. Cell. Biol., Oct. 1984, 4:2072-81.
Tratschin et al., “Adeno-Associated Vims Vector for High-Frequency Integration, Expression, and Rescue of Genes in Mammalian Cells,” Mol. Cell. Biol., Nov. 1985, 5:3251-60.
Tremblay et al., “Transcription activator-like effector proteins induce the expression of the frataxin gene,” Hum Gene Ther., 23(8):883-890, Epub Jul. 20, 2012.
UCSC Genome Browser (Human Feb. 2009 (GRCh37/hg19) Assembly, chr12:2,162,284-2,162,418 with HFK293T DNase 1 HS track from ENCODE/DUKE, printed from https://genome.ucwsc.edu as p. 1/1 on Jun. 28, 2019. (Year: 2019).
UCSC Genome Browser (Human Feb. 2009 (GRCh37/hg19) Assembly, chr9:21,440,329-21,440,478 with HEK293T DNase 1 HS track from ENCODE/DUKE, printed from https://genome.ucwsc.edu as p. 1/1 on Jun. 25, 2019. (Year: 2019).
Uhlman, “An alternative approach in gene synthesis: use of long selfpriming oligodeoxynucleotides for the construction of double-stranded DNA,” GENE, Nov. 15, 1988, 71(15): 29-40.
Uhlmann et al., “Distinct methylation profiles of glioma subtypes,” Int. J. Cancer, Aug. 2003, 106:52-9.
Urnov et al., “Highly efficient endogenous human gene correction using designed zinc-finger nucleases,” Nature, Jun. 2, 2005;435(7042):646-51.
U.S. Final Office Action in U.S. Appl. No. 13/838,520, dated Jul. 15, 2015, 35 pages.
U.S. Non-Final Office Action in U.S. Appl. No. 13/838,520, dated Oct. 6, 2014, 38 pages.
U.S. Non-Final Office Action in U.S. Appl. No. 14/232,067, dated Nov. 17, 2015, 10 pages.
Valton et al., “Overcoming transcription activator-like effector (TALE) DNA binding domain sensitivity to cytosine methylation,” J Biol Chem., 287(46):38427-38432, Epub Sep. 26, 2012.
Van den Brulle et al., “A novel solid phase technology for high-throughput gene synthesis,” BioTechniques, 45(3):340-343 (2008).
Verma and Weitzman, “Gene Therapy: Twenty-first century medicine,” Annual Review of Biochemistry, 2005, 74: 711-738.
Visel et al., “Genomic views of distant-acting enhancers,” Nature, 461(7261):199-205, Sep. 10, 2009.
Vogelstein and Kinzler, “Cancer genes and the pathways they control,” Nat. Med., Aug. 2004, 10:789-799.
Voytas and Joung, “Plant Science. DNA binding made easy,” Science, Dec. 11, 2009, 326:1491-1492.
Wagner et al., “Efficient and persistent gene transfer of AAV-CFTR in maxillary sinus,” Lancet, Jun. 1998, 351:1702-1703.
Wang et al., “An integrated chip for the high-throughput synthesis of transcription activator-like effectors,” Angew Chem Int Ed Engl., 51(34):8505-8508, Epub Jul. 23, 2012.
Wang et al., “Human PADA4 regulates histone arginine methylation levels via demethylimination,” Science, Oct. 8, 2004, 306(5694): 279-283.
Wang et al., “pH-sensitive immunoliposomes mediate target-cell-specific delivery and controlled expression of a foreign gene in mouse,” PNAS, Nov. 1987, 84:7851-7855.
Wang et al., “Positive and negative regulation of gene expression in eukaryotic cells with an inducible transcriptional regulator,” Gene Ther., 4(5):432-441, May 1997.
Weber et al., “Assembly of Designer TAL Effectors by Golden Gate Cloning,” PLoS One, 6:e19722 (2011).
Weising et al., “Foreign Genes in Plants: Transfer, Structure, Expression, and Applications,” Ann. Rev. Genet., 1988, 22:421-477.
Welsh et al., “Adenovirus-mediated gene transfer for cystic fibrosis: Part A. Safety of dose and repeat administration in the nasal epithelium. Part B. Clinical efficacy in the maxillary sinus,” Hum. Gene Ther., Feb. 1995, 6(2):205-218.
Whyte et al., “Enhancer decommissioning by LSD1 during embryonic stem cell differentiation,” Nature, 482(7384):221-225, Feb. 1, 2012.
Widschwendter and Jones, “DNA methylation and breast carcinogenesis,” Oncogene, Aug. 2002, 21:5462-82.
Wong et al., “Detection of aberrant p16 methylation in the plasma and semm of liver cancer patients,” Cancer Res., 59(1):71-73 Jan. 1, 1999.
Wood et al., “Targeted Genome Editing Across Species Using ZFNs and TALENs,” Science, 333:307 (2011).
Wright et al., “Standardized reagents and protocols for engineering zinc finger nucleases by modular assembly,” Nat Protoc., 2006, 1(3):1637-1652.
Wu et al., “Building zinc fingers by selection: toward a therapeutic application,” Proc Natl Acad Sci U S A., 92(2):344-348, Jan. 17, 1995.
Wu et al., “Custom-designed zinc finger nucleases: what is next?” Cell Mol Life Sci., 64(22):2933-2944, Nov. 2007.
Wu, “The 5′ ends of Drosophila heat shock genes in chromatin are hypersensitive to DNase I,” Nature, 286(5776):854-860, Aug. 28, 1980.
Xie et al., “DNA hypomethylation within specific transposable element families associates with tissue-specific enhancer landscape,” Nat Genet., 45(7):836-841, Epub May 26, 2013.
Xu et al., “Pioneer factor interactions and unmethylated CpG dinucleotides mark silent tissue-specific enhancers in embryonic stem cells,” Proc Natl Acad Sci U S A., 104(30):12377-12382, Epub Jul. 18, 2007.
Xu et al., “Cytosine methylation targeted to pre-determined sequences,” Nat Genet., Dec. 1997;17(4):376-8.
Xu et al., “Genome-wide regulation of 5hmC, 5mC, and gene expression by Tet1 hydroxylase in mouse embiyonic stem cells,” Mol Cell., 42(4):451-464, Epub Apr. 21, 2011.
Yan et al., “Drugging the undruggable: Transcription therapy for cancer,” Biochinnica et Biophysica Acta, 2013, 1835: 76-85.
Yang et al., “Os8N3 is a host disease-susceptibility gene for bacterial blight of rice,” Proc Natl Acad Sci USA, Jul. 5, 2006;103(27):10503-8.
Yeager, “Genome Editing in a FLASH ,” BioTechniques, Apr. 4, 2012, 2 pages, http://www.biotechniques.com/news/Genome-Editing-in-a-FLASH/biotechniques-329367.html.
Yoon and Brem, “Noncanonical transcript forms in yeast and their regulation during environmental stress,” RNA, 16(6):1256-1267, Epub Apr. 26, 2010.
Yost et al., “Targets in epigenetics: inhibiting the methyl writers of the histone code,” Curr Chem Genomics, 5(Suppl 1):72-84, Epub Aug. 22, 2011.
Zhang et al., “Efficient construction of sequence-specific TAL effectors for modulating mammalian transcription,” Nat Biotechnol., 29(2):449-153, Epub Jan. 19, 2011.
Zhang et al., “Genome-wide identification of regulatory DNA elements and protein-binding footprints using signatures of open chromatin in Arabidopsis,” Plant Cell., 24(7):2719-2731. Epub Jul. 5, 2012.
Zhang et al., “Programmable Sequence-Specific Transcriptional Regulation of mammalian Genome Using Designer TAL Effectors,” Nature Biotechnology, Feb. 2011, 29(2): 149-153.
Zhang et al., “Supplementary Information, Data S1, TET1 is a 5mC hydroxylase in vitro” from, “TET1 is a DNA-binding protein that modulates DNA methylation and gene transcription via hydroxylation of 5-methylcytosine,” Cell Res., 6 pages, 2010.
Zhang et al., “Transcription activator-like effector nucleases enable efficient plant genome engineering,” Plant Physiol., 161(1):20-27, Epub Nov. 2, 2012.
Zhang et al., “TET1 is a DNA-binding protein that modulates DNA methylation and gene transcription via hydroxylation of 5-methylcytosine,” Cell Res., 20(12):1390-1393, Epub Nov. 16, 2010.
Zheng S et al., “Correlations of partial and extensive methylation at the P14ARF locus with reduced MRNA expression in colorectal cancer cell lines and clinicopathological features in primary tumors,” Carcinogenesis, Nov. 1, 2000, 21(11): 2057-2064.
Zitzewitz et al., “Probing the folding mechanism of a leucine zipper peptide by stopped-flA4:A48ism spectroscopy,” Biochemistry, 34(39):12812-12819, Oct. 3, 1995.
Branco et al., “Uncovering the role of 5-hydroxymethylcytosine in the epigenome,” Nature Reviews Genetics, Nov. 15, 2011, 13:7-13.
CA Office Action in Canadian Appln, No. 2,900,338, dated Feb. 5, 2021, 5 pages.
EP Extended European Search Report in European Appln. No. 20183740.8, dated Nov. 4, 2020, 15 pages.
EP Extended European Search Report in European Appln. No. 20184257.2, dated Nov. 5, 2020, 13 pages.
EP Extended European Search Report in European Appln. No. 20194689.4, dated Feb. 10, 2021, 8 pages.
JP Office Action in Japanese Appln. No. 2019-043522, dated Mar. 2, 2021, 7 pages (with English translation).
CA Office Action in Canadian Appln, No. 2,900,338, dated Dec. 16, 2019, 6 pages.
EP Extended European Search Report in EP Appln. No. 19191923,2, dated Feb. 14, 2020, 6 pages.
GenBank Accession No. FJ176909.1, “Xanthomonas oryzae pv. oryzae clone D41 avirulence/virulence factor repeat domain protein-like gene, complete sequence,” Sep. 30, 2008, 2 page.
JP Office Action in Japanese Application No. 2017-136828, dated Aug. 27, 2019, 7 pages (with English Translation).
JP Office Action in Japanese Application No. 2018-223519, dated Jan. 7, 2020, 7 pages (with English Translation).
JP Office Action in Japanese Appln. No. 2019-043522, dated Mar. 3, 2020, 8 pages (with English translation).
EP Office Action in European Appln. No. 18191841.8, dated May 10, 2021, 4 pages.
UCSC Genome Browser on Human Dec. 2013 (GRCh38/hg38) Assembly, chr12:5, 432, 047-5, 432, 118, retrieved on Nov. 11, 2021, retrieved from URL <http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr12%3A5432047%2D5432118&hgsid=1213134839_eQIFaKub3iflWhqtQsuImzsnVEpm>, 2 pages.
Related Publications (1)
Number Date Country
20190177374 A1 Jun 2019 US
Provisional Applications (3)
Number Date Country
61610212 Mar 2012 US
61601409 Feb 2012 US
61508366 Jul 2011 US
Continuations (2)
Number Date Country
Parent 15156574 May 2016 US
Child 16283380 US
Parent 14232067 US
Child 15156574 US