AAV DELIVERY OF NUCLEOBASE EDITORS

Abstract
Provided herein are methods of delivering “split” Cas9 protein or nucleobase editors into a cell, e.g., via a recombinant adeno-associated vims (rAAV), to form a complete and functional Cas9 protein or nucleobase editor. The Cas9 protein or the nucleobase editor is split into two sections, each fused with one part of an intein system (e.g., intein-N and intein-C encoded by the dnaE-n and dnaE-c genes, respectively). Upon co-expression, the two sections of the Cas9 protein or nucleobase editor are ligated together via intein-mediated protein splicing. Nucleic acid molecules encoding the N-terminal portion of a Cas9 protein or a nucleobase editor fused to an intein, and nucleic acid molecules encoding the C-terminal portion of a Cas9 protein or nucleobase editor, are provided. Recombinant AAV vectors (e.g, vectors comprising one or more of these nucleic acid molecules each comprising an intein) and particles for the delivery of the split Cas9 protein or nucleobase editor, compositions comprising such AAV vectors and particles, and methods of using such rAAV vectors and particles are also provided. Methods of administering such compositions and AAV particles to a subject are further provided. Cells and compositions comprising these nucleic acid molecules rAAV vectors, and rAAV particles are also provided.
Description
BACKGROUND

Precise genome targeting technologies using the CRISPR/Cas9 system have recently been explored in a wide range of applications, including gene therapy. A major limitation to the application of Cas9 and Cas9-based genome-editing agents in gene therapy is the size of Cas9 (>4 kb), impeding its efficient delivery via recombinant adeno-associated virus (rAAV).


SUMMARY

Point mutations represent the majority of known pathogenic human genetic variants1. To enable the direct installation or correction of point mutations in living cells, base editors (or “nucleobase editors”) were developed, which are engineered proteins that directly convert a target base pair to a different base pair without creating double-stranded DNA breaks2-4. Cytidine base editors (CBEs) such as BE4max3,5-7 catalyze the conversion of target C.G base pairs to T.A, while adenine base editors (ABEs) such as ABEmax4,6 convert target A.T base pairs to G.C. While CBEs and ABEs are both widely used and work robustly in many cultured mammalian cell systems2, the efficient delivery of base editors into live animals remains a challenge, despite promising initial studies8-10. A major impediment to the delivery of base editors in animals has been an inability to package base editors in adeno-associated virus (AAV), an efficient and widely used delivery agent that remains the only FDA-approved in vivo gene therapy vector11. The large size of the DNA encoding base editors (5.2 kb for base editors containing S. pyogenes Cas9, not including any guide RNA or regulatory sequences) precludes packaging in AAV, which has a genome packaging size limit of ≤5 k12,13.


To bypass this packaging size limit and deliver base editors (or “nucleobase editors”) using AAVs, a split-base editor dual AAV strategy14,15 was devised, in which the CBE or ABE is divided into an N-terminal and C-terminal half. Each nucleobase editor half is fused to half of a fast-splicing split-intein. Following co-infection by AAV particles expressing each nucleobase editor-split intein half, protein splicing in trans reconstitutes full-length nucleobase editor. Unlike other approaches utilizing small molecules16 or sgRNA17 to bridge split Cas9, intein splicing removes all exogenous sequences and regenerates a native peptide bond at the split site, resulting in a single reconstituted protein identical in sequence to the unmodified nucleobase editor.


Split-intein CBEs and split-intein ABEs were developed and integrated into optimized dual AAV genomes to enable efficient base editing in somatic tissues of therapeutic relevance, including liver, heart, muscle, retina, and brain. The resulting AAVs were used to achieve base editing efficiencies at test loci for both CBEs and ABEs that, in each of these tissues, meets or exceeds therapeutically relevant editing thresholds for the treatment of some human genetic diseases at AAV dosages that are known to be well-tolerated in humans. By integrating these developments, dual AAV split-intein nucleobase editors were used to treat a mouse model of Niemann-Pick disease type C (e.g., type C1), a debilitating disease that affects the central nervous system (CNS), resulting in correction of the casual mutation in CNS tissue, and an increase in the animal's lifespan. In addition, dual AAV split-intein nucleobase editors were used to treat a mouse model of congenital deafness, resulting in correction of the casual mutation in vivo.


Accordingly, in some aspects, described herein are nucleic acid molecules, compositions, recombinant AAV (rAAV) particles, kits, and methods for delivering a Cas9 protein or a base editor (or “nucleobase editor”) to cells, e.g., via rAAV vectors. Typically, a Cas9 protein or a nucleobase editor is “split” into an N-terminal portion and a C-terminal portion. The N-terminal portion or C-terminal portion of a Cas9 protein or a nucleobase editor may be fused to one member of the intein system, respectively. The resulting fusion proteins, when delivered on separate vectors (e.g., separate rAAV vectors) into one cell and co-expressed, may be joined to form a complete and functional Cas9 protein or nucleobase editor (e.g., via intein-mediated protein splicing). Further provided herein are empirical testing of regulatory elements in the delivery vectors for high expression levels of the split Cas9 protein or the nucleobase editor.


Some aspects of the present disclosure provide nucleic acid molecules encoding a N-terminal portion of a nucleobase editor fused at its C-terminus to a first intein sequence, wherein the nucleic acid molecule is operably linked to a first promoter, further comprising a nucleic acid segment encoding a guide RNA (gRNA) operably linked to a second promoter, wherein the direction of transcription of the nucleic acid segment is reversed relative to the direction of transcription of the nucleic acid molecule. Further provided are nucleic acid molecules encoding a C-terminal portion of a nucleobase editor fused at its N-terminus to a second intein sequence, wherein the nucleic acid molecule is operably linked to a third promoter, and further comprising a nucleic acid segment encoding a guide RNA (gRNA) operably linked to a fourth promoter, wherein the direction of transcription of the nucleic acid segment is reversed relative to the direction of transcription of the nucleic acid molecule.


In some embodiments, the disclosed nucleic acid molecules further comprise i) a transcriptional terminator, optionally wherein the transcriptional terminator is the transcriptional terminator from a bGH gene, hGH gene, or SV40 gene, and ii) a woodchuck hepatitis posttranscriptional regulatory element (WPRE) inserted 5′ of the transcriptional terminator. In certain embodiments, the WPRE is a truncated WPRE sequence. In certain embodiments, the truncated WPRE sequence comprises W3, as first reported in Choi, J. H., et al. (2014), Mol. Brain 7: 17, incorporated by reference herein. In certain embodiments, the WPRE is a full-length WPRE. In certain embodiments, the first and/or third promoters comprise a Cbh promoter. In certain embodiments, the second and/or fourth promoters comprise a U6 promoter.


Other aspects of the present disclosure provide compositions comprising: (i) a first nucleotide sequence encoding a N-terminal portion of a Cas9 protein fused at its C-terminus to an intein-N; and (ii) a second nucleotide sequence encoding an intein-C fused to the N-terminus of a C-terminal portion of the Cas9 protein, wherein at least one of the first nucleotide sequence and second nucleotide sequence is operably linked to a first promoter, wherein at least one of the first nucleotide sequence and second nucleotide sequence comprises at its 3′ end a gRNA nucleic acid segment encoding a guide RNA (gRNA) operably linked to a second promoter, and wherein the direction of transcription of the gRNA nucleic acid segment is reversed relative to the direction of transcription of the at least one nucleotide sequence.


In some embodiments, the Cas9 protein is a catalytically inactive Cas9 (dCas9) or a Cas9 nickase (nCas9), and wherein the first nucleotide sequence of (i) and/or the second nucleotide sequence of (ii) further comprises a nucleotide sequence encoding a nucleobase modifying enzyme fused to the N-terminus of the N-terminal portion of the Cas9 protein.


In some embodiments, the nucleobase modifying enzyme (or nucleobase modification domain) is a deaminase. In some embodiments, the deaminase is a cytosine deaminase. In some embodiments, the deaminase is an adenosine deaminase. In some embodiments, the second nucleotide sequence of (ii) further comprises a nucleotide sequence encoding a uracil glycosylase inhibitor (UGI) fused at the 3′ end of the second nucleotide sequence. In some embodiments, the first nucleotide sequence of (i) further comprises a nucleotide sequence encoding a uracil glycosylase inhibitor (UGI) at the 5′ end of the first nucleotide sequence. In some embodiments, the UGI comprises the amino acids sequence of SEQ ID NOs: 299-302.


In some embodiments, the first nucleotide sequence and the second nucleotide sequence are on different vectors. In some embodiments, the each of the different vectors is a genome of a recombinant adeno-associated virus (rAAV). In some embodiments, each vector is packaged in a rAAV particle. In some aspects, the present disclosure provides rAAV particles comprising a first nucleic acid molecule (e.g. encoding a N-terminal portion of a nucleobase editor or Cas9 protein fused at its C-terminus to an intein-N) as described herein. rAAV particles comprising a second nucleic acid molecule (e.g. encoding an intein-C fused to the N-terminus of a C-terminal portion of the Cas9 protein or nucleobase editor) as described herein are also provided. In some embodiments, the N-terminal portion of the Cas9 protein and the C-terminal portion of the Cas9 protein are joined together to form the Cas9 protein. The disclosed rAAV particles may comprise both a first nucleic acid molecule and second nucleic acid molecules as described herein.


In another aspect, host cells comprising the compositions described herein are provided. The disclosed cells may comprise any of the disclosed nucleic acid molecules, rAAV vectors, or rAAV particles described herein.


Some aspects of the present disclosure provide compositions comprising: (i) a first nucleotide sequence encoding a N-terminal portion of a nucleobase editor fused at its C-terminus to an intein-N; and (ii) a second nucleotide sequence encoding an intein-C fused to the N-terminus of a C-terminal portion of the nucleobase editor. Further provided herein are kits comprising the any of the compositions described herein.


In some embodiments, any of the nucleobase editors of the disclosure comprises a cytosine deaminase fused to the N-terminus of a catalytically inactive Cas9 or a Cas9 nickase. In some embodiments, the cytosine deaminase is selected from the group consisting of: APOBEC1, APOBEC3, AID, and pmCDA1. In some embodiments, the nucleobase editor further comprises a uracil glycosylase inhibitor (UGI).


Still other aspects of the present disclosure provide methods comprising contacting a cell with any of the compositions described herein, wherein the contacting results in the delivery of the first nucleotide sequence and the second nucleotide sequence into the cell, and wherein the N-terminal portion of the nucleobase editor and the C-terminal portion of the nucleobase editor are joined to form a nucleobase editor.


Still other aspects of the present disclosure provide methods comprising administering to a subject in need there of a therapeutically effective amount of any of the compositions described herein. In some embodiments, the subject has a disease or disorder (e.g. a genetic disease). In particular embodiments, the disease or condition is Niemann-Pick disease type C (NPC) disease. In other embodiments, the disease or condition is congenital deafness. In some embodiments, the disease or disorder is selected from the group consisting of: cystic fibrosis, phenylketonuria, epidermolytic hyperkeratosis (EHK), chronic obstructive pulmonary disease (COPD), Charcot-Marie-Toot disease type 4J, neuroblastoma (NB), von Willebrand disease (vWD), myotonia congenital, hereditary renal amyloidosis, dilated cardiomyopathy, hereditary lymphedema, familial Alzheimer's disease, prion disease, chronic infantile neurologic cutaneous articular syndrome (CINCA), and desmin-related myopathy (DRM).


The details of certain embodiments of the invention are set forth in the Detailed Description of Certain Embodiments, as described below. Other features, objects, and advantages of the invention will be apparent from the Definitions, Examples, Figures, and Claims.





BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which constitute a part of this Application, illustrate several embodiments of the invention and together with the description, serve to explain the principles of the invention.



FIGS. 1A-1C are graphs showing a “split nucleobase editor” for delivery into cells using recombinant adeno associated virus (rAAV) vectors. FIG. 1A is a schematic representation of how the nucleobase editor is split into two portions. FIG. 1B shows that AAV-delivered split nucleobase editor can undergo protein splicing upon expression of the two halves in cells to form a complete nucleobase editor that has comparable activity to a nucleobase editor expressed as a whole. FIG. 1C shows the formation of a complete nucleobase editor from the two halves via protein splicing mediated by DnaE intein.



FIG. 2 shows that U1118 cells were efficiently transfected by AAV2 containing nucleic acids encoding mCherry. Different viral titers were tested (2.5-10 μl at 4.5×1011 vg/ml*) and all resulted in efficient transfection of U118 cells. *vg/ml means viral genome-containing particles per microliter.



FIGS. 3A-3B are graphs showing high throughput sequence (HTS) results of nucleobase editing by rAAV-delivered split nucleobase editor in U118 and HEK cells. Lipid-transfected nucleobase editor was used as a control. A sgRNA targeting R37 in the PRNP gene was used, and the PRNP gene locus was sequenced. FIG. 3A shows the HTS reads, and FIG. 3B summarizes the base editing results.



FIG. 4 is a graph showing the optimization of the transcriptional terminator used in the AAV constructs encoding the split nucleobase editor. Transcriptional terminators of different sizes and origins were tested. bGH transcriptional terminator is relatively short and efficiently terminates transcription comparably to longer terminator sequences. It was therefore chosen to be used in the downstream experiments.



FIGS. 5A-5B are graphs showing the results of nucleobase editing with long term (up to 15 days) transduction of AAV encoding the split nucleobase editor in mouse astrocytes expressing human ApoE4 cDNA. The target base is in the codon for arginine 112 and arginine 158 in ApoE4, which is converted to a cysteine upon base editing. FIG. 5A shows that the editing of arginine 158 increases overtime when the mouse astrocytes were transduced at 1010 vg, while editing of arginine 112 remained minimal. The nucleotide sequence 3′ of the codon for arginine 158 sequence features a flanking NGG PAM allowing for high activity by SpCas9 (with guide sequence GAAGCGCCTGGCAGTGTACC, SEQ ID NO: 348), while the nucleotide sequence 3′ of the codon for arginine 112 contains a flanking NAG PAM which does not allow for high activity (with guide sequence GACGTGCGCGGCCGCCTGGTG, SEQ ID NO: 349). FIG. 5B shows cells transduced with rAAV encoding mCherry at 1010 vg (control).



FIG. 6 is a schematic representation of the optimization of the nuclear localization signal in AAV constructs encoding the split nucleobase editor. The nuclear localization signal controls nuclear import, which must occur for reconstituted nucleobase editor to associate with genomic DNA as a prerequisite for editing, and is a potential rate-limiting step in the process. This schematic shows that the NLS (and NLS optimization) is critical for the nucleobase editor to be imported into the nucleus.



FIG. 7 is a graph showing the results of base editing using different rAAV split nucleobase editor constructs containing different nuclear localization signals (NLS).



FIGS. 8A-8B are graphs showing the editing of DNMT1 gene in dissociated mouse cortical neurons using an AAV encoded split nucleobase editor.



FIGS. 9A-9B are graphs showing the editing of DNMT1 gene in mouse Neuro-2a cell line using either an AAV encoded split nucleobase editor, or a lipid transfected DNA encoded nucleobase editor.



FIGS. 10A-10F show the development of split-intein cytosine and adenine base editors (or nucleobase editors). FIG. 10A is a schematic representation of the intein reconstitution strategy. Two separately encoded protein fragments fused to split-intein halves splice to reconstitute full-length protein following co-expression. FIG. 10B is a graph showing lipofection of intact BE3, split BE3 with the Npu split-intein site between E573/C574 or K637/T638, or split BE3 with the Cfa split-intein site between E573/C574 into HEK293T cells followed by high-throughput sequencing of six test loci to determine base editing efficiency. FIG. 10C is a graph comparing average editing data in FIG. 10B, normalized to BE3 levels (dotted line). BE3-normalized editing at each locus (black dots) was averaged. FIG. 10D is a graph showing “BEmax” optimization of nuclear localization signals and codon usage increases editing efficiency at six standard loci. BE3.9max and BE4max show comparable editing efficiencies. FIG. 10E is a graph comparing average editing data in FIG. 10D, normalized to BE4 levels (dotted line). FIG. 10F is a graph showing lipofection of ABEmax (left bar) or Npu-split E573/C574 ABEmax (right bar) into NIH 3T3 cells for generation of a split-intein adenosine nucleobase editor. In FIG. 10B and FIG. 10D, dots represent values and bars represent mean+SD of n=3 independent biological replicates. Dots in FIG. 10C and FIG. 10E represent locus averages.



FIGS. 11A-11E show the optimization of split-intein nucleobase editor AAVs. FIG. 11A contains images showing GFP expression three weeks after injection of 1×1011 vg of GFP-NLS-bGH, GFP-NLS-W3-bGH, or GFP-NLS-WPRE-bGH into six-week-old C57BL/6 mice. Representative images of horizontal brain slices show hippocampus and neocortex. Top panels show DAPI and EGFP signals overlaid; bottom panels show EGFP signal only. The scale bar represents 500 μm. FIG. 11B is a graph showing transcriptional regulatory element optimization. Total GFP signal measured by ImageJ from mice injected as described in FIG. 11A. See methods for a detailed description of imaging and analysis procedures. FIG. 11C is a graph showing the number of GFP-positive cells per horizontal brain slice from the mice described in FIG. 11A. GFP-positive cells were identified by ilastik/CellProfiler as described in the image analysis section of the Methods of Example 3. FIG. 11D is a schematic of v3, v4, and v5 AAV variants. Arrows indicate direction of U6 promoter transcription. The CBE3.9 coding sequence consists of rAPOBEC1, spCas9 D10A nickase, and UGI. Small white boxes in v3 are non-essential backbone sequences removed in v4 and v5 AAV. See FIG. 17 for the schematic of v5 AAV-ABEmax. FIG. 11E is a graph showing cytosine base editing efficiencies in NIH 3T3 cells following a 14-day incubation with v3 AAV, v4 AAV, and v5 AAV. Dots and bars in FIG. 11B and FIG. 11C represent individual replicates and mean+SD of n=2-3 animals, 3-6 slices per animal. Darkened circles and error bars in FIG. 11E represent mean±SD. Dots in FIG. 11E represent values for independent biological replicates (n=3-4).



FIGS. 12A-12D show the systemic injection of v5 AAV9 editors results in cytosine and adenine base editing in heart, muscle, and liver. FIG. 12A is a schematic showing six-week-old C57BL/6 mice were treated by retro-orbital injection of 2×1012 vg total of v5 AAV9. After 4 weeks, organs were harvested and genomic DNA of unsorted cells was sequenced. FIG. 12B is a graph showing cytosine base editing by v5 AAV CBE3.9max in the indicated organs. FIG. 12C is a graph showing adenine base editing by v5 AAV ABEmax in the indicated organs. FIG. 12D is a graph comparing adenine base editing from v5 AAV-mediated ABEmax (grey bars) and from trans-mRNA splicing (white bars). Bars represent mean+SD of n=3 animals.



FIGS. 13A-13F show AAV-mediated cytosine and adenine base editing in the central nervous system by two delivery routes. FIG. 13A is a schematic of P0 intraventricular injections. P0 C57BL/6 mice were co-injected with 4×1010 vg total of v5 CBE3.9max or ABEmax AAV targeting DNMT1 and 1×1010 vg Cbh-KASH-GFP. Sorting for GFP-positive cells enriches for triply transduced cells. Tissue was harvested 3-4 weeks after injection, and cortex and cerebellum were separated. Cortical tissue comprises neocortex and hippocampus. For each tissue, nuclei were dissociated and analyzed as unsorted (all nuclei) or GFP-positive populations for DNA sequencing. FIG. 13B is a graph showing percent GFP-positive nuclei measured by flow cytometry following P0 injection. FIG. 13C is a graph showing cytosine base editing efficiency following P0 v5 CBE3.9max AAV injection in cortex and cerebellum at DNMT1 for unsorted nuclei (left bars) and GFP-positive nuclei (right bars). FIG. 13D is a graph showing adenosine base editing efficiency following P0 v5 CBE3.9max AAV9 injection in cortex and cerebellum at DNMT1 for unsorted nuclei (left bar) and GFP-positive nuclei (right bar). FIG. 13E is a schematic of retro-orbital injections. Brains from 9-week-old C57BL/6 mice were harvested 4 weeks after injection with 4×1012 vg total v5 CBE3.9max or ABEmax AAV targeting DNMT1 and 2×1011 vg KASH-GFP AAV, then processed and analyzed as described in FIG. 13A. FIG. 13F is a graph showing cytosine base editing in unsorted (left bar) and GFP-positive (right bar) cortical and cerebellar cells following the procedure described in FIG. 13A. Bars represent mean+SD. Black dots represent individual animals (n=3-4).



FIGS. 14A-14F show AAV-mediated cytosine and adenine base editing in the retina following sub-retinal injections of 2-week-old Rho-Cre; Ai9 mice. FIG. 14A is a schematic of sub-retinal injections. Two-week-old Rho-Cre; Ai9 mice were treated by sub-retinal injection of 1×109 to 1×1010 vg total of v5 CBE3.9max or v5 ABEmax AAV targeting DNMT1. For each group, at least three eyes were injected. Three weeks after injection, injected retinas were sorted into GFP-negative/tdTomato-positive (rod photoreceptors not transduced with GFP), tdTomato-positive/GFP-positive (transduced rods), GFP-positive/tdTomato-negative (marker transduced non-rod), and double-negative populations (unmarked non-rods, not shown). FIG. 14B is a graph showing the percentage of GFP transduced rod photoreceptors or non-rod retinal cells followed by subretinal injection of AAV mix of PHP.B-CBE, Anc80-CBE and Anc80-ABE AAV, respectively. The dose of AAV-GFP is 2×109 vg for PHP.B-CBE mix, 3.3×108 vg for Anc80-CBE mix and 4.5×108 vg for Anc80-ABE mix. FIG. 14C contains images showing the expression of tdTomato in the rod photoreceptor cells of Rho-Cre; Ai9 mice (left panel). Retinal transduction of PHP.B-GFP (middle panel) or Anc80-GFP (right panel) at 5×109 vg. Scale bar=20 μm. FIG. 14D is a graph showing cytosine base editing by v5 CBE3.9max PHP.B AAV in injected retinas. Editing percentage in all rods was inferred as ((editing % in GFP transduced rods)*(number of transduced rods)+(editing % in unmarked rods)*(number of unmarked rods))/total rods. This calculation was repeated for non-rods. FIG. 14E is a graph showing cytosine base editing by v5 CBE3.9max Anc80 AAV in photoreceptors and other retinal cells. Editing efficiencies in all rods and all non-rods were inferred as described for FIG. 14B. FIG. 14F is a graph showing adenine base editing by v5 ABEmax Anc80 AAV in photoreceptors. All GFP-positive cells were pooled in this experiment, resulting in a single GFP-positive population containing tdTomato-positive and tdTomato-negative cells (hashed bar). Bars represent mean+SD. Black dots represent individual eyes (n=3-4).



FIGS. 15A-15H show base editing of NPC1I1061T in the mouse CNS. FIG. 15A is a schematic of the NPC1 locus highlighting the mutation in exon 21, the protospacer and PAM sequence targeted, and the desired CBE-mediated reversion of I1061T. The scale bar represents 5 kilobases. FIG. 15B is a Kaplan-Meier plot of homozygous NPC1I1061T mice injected with 4×1010 vg total of v5 CBE3.9max AAV9 targeting NPC1I1061T (blue; n=7), untreated homozygous NPC1I1061T mice (red; n=12), and NPC1I1061T heterozygous animals (black; n=14). FIG. 15C is a Kaplan-Meier plot of NPC1I1061T mice injected with 1×1011 vg total v5 CBE3.9max AAV9 targeting NPC1I1061T (blue; n=5), with data from the other two cohorts replotted from FIG. 15B. FIG. 15D is a graph showing cortical and cerebellar base editing in P0 animals injected with v5 AAV9 targeting NPC1I1061T Lighter bars report editing in unsorted or GFP-positive cells following injection of n=3 mice of 4×1010 vg (2×1010 vg of each split nucleobase editor half); darker bars correspond to editing following injection of 1×1011 vg (5×1010 vg of each split nucleobase editor half). FIG. 15E is a graph showing base editing to the precisely corrected wild-type allele shown in FIG. 15A. Lighter bars indicate the frequency of alleles that are corrected to the wild-type sequence; darker bars replotted from FIG. 15D indicate total C.G-to-T.A editing in the T1061 codon (“ACA”) in FIG. 15A. FIG. 15F is a graph showing precisely corrected (wild-type) alleles as a percentage of all edited alleles. In FIG. 15B and FIG. 15C, tick marks indicate animal deaths. Bars represent mean+SD. Dots represent individual animals (n=3-5). FIG. 15G shows immunofluorescent measurements of calbindin and DAPI staining in midline saggital cerebellar slices from P98-P105 mice. Calbindin is indicated as the darker stain, and DAPI is indicated as the lighter stain. Images were taken using an Eclipse Ti microscope (Nikon).Wild-type, n=3 mice, 15 images; NPC1I1061T untreated, n=2 mice, 6 images; NpC1I1061T AAV-CBE, n=2 mice, 10 images. Untreated vs. treated, two-sided t-test, p=0.0005. FIG. 15H shows immunofluorescent measurements of CD68+ tissue area. Images are representative CD68-stained midline saggital cerebellar slices from P98-P105 mice. EGFP-KASH labeled cells are indicated with the ({circumflex over ( )}) symbol, CD68+ labeled cells are indicated with the (>) symbol, and DRAQ5 signal is indicated with the (*) symbol. The untreated mice were uninjected and did not express GFP. In the quantification of CD68+ tissue area, each point represents the average per mouse. Wild-type, n=3 mice, 15 images; Npc1I1061T untreated, n=2 mice, 6 images; NPC1I1061T AAV-CBE, n=2 mice, 10 images. Untreated vs. treated, two-sided t-test, p=0.0005. The middle subpanel reports base editing to the precisely corrected wild-type allele shown in FIG. 15A from the 1×1011 vg injections. Lighter bars indicate the frequency of alleles that are corrected to the wild-type sequence; replotted darker bars indicate total C.G-to-T.A editing of the T1061 codon (“ACA”) in FIG. 15A. The right subpanel shows precisely corrected (wild-type) alleles as a percentage of all edited alleles in mice injected with 1×1011 vg. In FIG. 15B, tick marks indicate animal deaths. In all other panels, bars represent mean+SD. Dots represent individual mice. Scale bars represent 200 μm. Statistical tests for immunofluorescence are two-sided t-tests without multiple comparison corrections.



FIGS. 16A-16F show the development of a split-intein S. aureus CBEs. FIG. 16A contains graphs showing editing performance in HEK293T cells of seven split S. aureus nucleobase editors with intein insertions between K534/C535, Y537/S538, Q501/T502, N484/S485, L431/S432, R453/S454, or Q457/S458. For each of the six endogenous genomic test sites, 16 bases of the protospacer, numbered with the PAM starting at position 21 are shown on the X axis. Unsplit S. aureus BE3 (saBE3) data are shown as black stars; seven split-intein CBEs are shown as shaded circles. Note that ABOBEC1 exhibits an anti-GpC preference. FIG. 16B contains bar graphs of editing efficiency at the most highly edited C for each site. Shading patterns correspond to the shading patterns of the circles shown in FIG. 16A. FIG. 16C is a graph showing the average editing across the six genomic sites, normalized to unsplit saBE3 editing (dotted line). FIG. 16D shows a sample Western blot of S. pyogenes nucleobase editor expression (BE3.9max and Npu-BE3.9max) in HEK293T cells. The lanes to the left of the ladder have been stained against FLAG. The lanes to the right are the same samples stained against HA. The FLAG-stained lanes are co-stained against GAPDH loading control. Untagged BE3.9max is shown in the first lane; other samples are tagged as indicated. This representative blot is one of three biological replicates. FIGS. 16E-16F show editing at the HEK3 locus by the tagged editor constructs. The bars in FIG. 16E correspond to the lanes shown on the Western blot; the bars in FIG. 16F show additional conditions measuring the effect of tagging on editing efficiency. NpuC1A constructs are split-intein constructs containing the inactivating Npu N-terminal C1A mutation. In FIG. 16A, and FIGS. 16E-16F, dots are mean+SD of n=3 independent biological replicates. In FIG. 16B and FIG. 16C, bars represent mean+SD. In FIG. 16B, dots represent values from independent biological replicates (n=3). Dots in FIG. 16C represent average editing at each of n=6 tested sites.



FIG. 17 is a schematic of v5 AAV ABEmax constructs. Arrows indicate direction of U6 promoter transcription. The ABEmax coding sequence consists of wild-type and evolved tadA monomers followed by spCas9 D10A nickase. The U6-sgRNA cassette was omitted from the N-terminal construct to avoid exceeding the AAV packaging limit.



FIGS. 18A-18C show CBE- and ABE-mediated editing in six organs following systemic injection of v5 AAV9 nucleobase editors. FIG. 18A is a graph showing cytosine base editing by v5 AAV CBE3.9max in organs poorly transduced by AAV9. The dotted line indicates the detection threshold of 0.1% editing. FIG. 18B is a graph comparing adenine base editing from v5 AAV-mediated ABEmax (grey bars, right) and from trans-mRNA splicing (white bars, left). Bars represent mean+SD of n=3 animals. FIG. 18C shows a comparison of cytosine base editing mediated by v5 AAV-SaBE3.9max compared to previously-reported constructs, which were modified to replace the liver-specific P3 promoter with Cbh and to replace the Pah sgRNA with PCKS9-targeting sgRNA. Bars to the left of the dotted line report editing in livers of mice injected retro-orbitally with 1×1011 vg total; bars to the right report a dose of 1×1012 vg total. Bars represent mean+SD of n=3 mice.



FIGS. 19A-19B show the transduction of cerebellar Purkinje cells by P0 intracerebroventricular injections. FIG. 19A is a schematic of P0 intraventricular injections. P0 L7-GFP mice were injected with 5×1010 vg of PHP.B Cbh-mCherry-NLS. Brains were prepared for imaging following a three-week incubation. Visible cerebellar cells fall into three categories: GFP-positive, mCherry-negative=untransduced Purkinje cells; GFP-negative, mCherry-positive=transduced non-Purkinje cells; and GFP-positive, mCherry-positive=transduced Purkinje cells. The overlap of EGFP and mCherry, which are shared in light grey and dark grey, respectively, produces white nuclei in transduced Purkinje cells. FIG. 19B contains sample cerebellar images from horizontally sliced hemispheres of injected L7-GFP mice. Left panel shows EGFP and mCherry signals overlaid; center and left panels respectively show EGFP and mCherry only. The scale bar represents 500 μm.



FIGS. 20A-20B show indel-subtracted AAV-mediated cytosine and adenine base editing in the retina following sub-retinal injections of 2-week-old C57BL/6 mice. Indel-containing datasets (solid bars) are reproduced from FIGS. 14D-14E for clarity. FIG. 20A is a graph showing cytosine base editing by v5 CBE3.9max PHP.B AAV in photoreceptors and other retinal cells. Diagonal-striped bars represent data re-analyzed after discarding indel-containing reads. Editing percentage was then calculated by dividing the number of T.A-containing reads by the original total read number. Removal of indel-containing reads was manually verified. The inferred editing percentages were calculated as in FIGS. 14A-14F: the editing percentage in all rods was inferred as ((editing % in transduced rods)*(number of transduced rods)+(editing % in unmarked rods)*(number of unmarked rods))/total rods. This calculation was repeated for non-rods. FIG. 20B is a graph showing cytosine base editing by v5 CBE3.9max Anc80 AAV in photoreceptors and other retinal cells. Indel removal was performed and editing efficiencies in all rods and all non-rods were inferred as described for FIG. 20A.Bars represent mean+SD. Black dots represent individual eyes (n=3).



FIGS. 21A-21D show the prolonged expression of a nucleobase editor. FIG. 21A is a graph showing editing in NPC1I1061T/+ mice injected at P0 with 1×1011 vg v5 CBE3.9max AAV9. The shaded area and dotted line indicate that in unedited heterozygous animals, 50% of HTS reads are expected to contain a T.A. Brains were harvested and sequenced at P29 after sorting into unsorted (left bar) or GFP-positive (right bar) cells. The darker bars represent unsorted and GFP-positive cells harvested at P110. FIG. 21B is a graph showing the percent of edited cells inferred from the percent of T.A-containing reads. The percent of edited cells was calculated as 2*(% T.A−50). Bars represent mean+SD. Dots represent individual animals (n=3). FIG. 21C shows the cerebellar Cas9/EGFP staining in a P110 mouse injected at P0 with v5 AAV-CBE and GFP-KASH. Merged images show EGFP in darker shading and Cas9 in lighter shading. The Cas9 antibody is a mouse monoclonal antibody which binds a motif in the C-terminal half of the split editor. The dashed white rectangle indicates the zoomed-in area depicted in the single-channel images. Greyscale images are as labeled. FIG. 21D shows cortical Cas9/EGFP staining in a P110 mouse injected at P0 with v5 AAV-CBE and GFP-KASH. Merged images show EGFP as the darker label and Cas9 as the lighter label. Images in FIG. 21C and FIG. 21D are representative of n=2 mice. The dashed white rectangle indicates the zoomed-in area depicted in the single-channel images. In FIG. 21A and FIG. 21B, bars represent mean+SD. Black dots represent individual mice.



FIGS. 22A-22C are a tables showing base editing efficiency, indel frequency, and base editing:indel ratio for all in vivo experiments at the DNMT1 locus. All in vivo intein-split experiments were performed with v5 AAV and are listed according to the figure in which they appear. The percentage of reads with C.G to T.A editing (CBE3.9max) or A.T to G.C editing (ABEmax) was divided by the percentage of reads containing indels to generate the base editing:indel ratio. All analyses of HTS data were performed by CRISPResso2 as described in the Methods section of Example 3. Crispresso2 is a public software that provides analyses of genome editing outcomes from deep sequencing data. See Clement et al., Nat Biotechnol. 2019 March; 37(3):224-226, herein incorporated by reference. All values represent mean±SD.



FIG. 23 contains flow cytometry plots exemplifying brain nuclei sorting. Plots show 500,000 events. Nuclei were sequentially gated on the basis of DyeCycle Ruby signal, FSC/SSC ratio, SSC-Width/SSC-height ratio, and GFP/DyeCycle ratio, as shown above. The first column demonstrates the gating strategy on a GFP-negative control sample. The middle column demonstrates the gating strategy on a sample with low transduction (P0 injection, cerebellar tissue), and the right column demonstrates high transduction efficiency (P0 injection, cortical tissue). In all cases, unsorted nuclei correspond to events that pass gates R1, R2, and R3, without sorting on R4.



FIG. 24 contains flow cytometry plots exemplifying retinal cell sorting. Plots show 250,000 events. Cells were sequentially gated on the basis of FSC/SSC ratio, FSC-W/FSC-A, SSC-W/FSC-A, and fluorescence. Cells were sorted four ways on the basis of signal intensity in the PE-Texas Red and GFP channels. The left column illustrates the gating strategy on an untransduced Rho-Cre; Ai9 mouse with tdTomato-positive rod photoreceptors. The right column illustrates the gating strategy on an Rho-Cre; Ai9 mouse co-injected with PHP.B GFP and v5 CBE3.9max.



FIGS. 25A-25B are tables containing primers used to generate sgRNA sequences and amplify genomic DNA. All sgRNA forward primers have 5′-CACC overhangs, and all reverse primers have 5′-AAAC overhangs to generate overhangs for efficient ligation. Primers for gDNA amplification contain bolded 5′ Illumina adapter sequences and 3′ gene-specific sequences (no special formatting).



FIGS. 26A-26U show the recombinant AAV vector construct nucleotide sequences encoding the CBE3.9max, ABEmax, and AID-BE3.9max nucleobase editors evaluated in the Examples. All constructs cloned in the px601 backbone (F. Zhang) modified to correct an 11-bp deletion in the left ITR. Pseudospacer-containing backbones were cut with Esp3I or BsmBI endonucleases. Primers listed in FIGS. 25A-25B were annealed and ligated with standard molecular biology techniques. Annotations are coded as described in the figure. The U6-sgRNA cassette was omitted from the ABEmax N-terminal constructs to keep the total construct size under the packaging limit.



FIG. 27 shows a Kaplan-Meier plot of homozygous NPC1I1061T mice injected with 4×1012 vg total of v5 CBE3.9max. Mice were injected with 3×1012 vg PHP.eB and 1×1012 vg AAV9 targeting NPC1I1061T (blue; n=5) or untreated homozygous NPC1I1061T mice (red; n=9). Tick marks indicate animal deaths. Median survival increases from 109 to 120 days, p=0.015 by Mantel-Cox.



FIGS. 28A-28B show cerebellar CD68 staining. FIG. 28A shows representative single-channel images of cerebellar slices stained against EGFP, CD68, and DNA in greyscale. EGFP labels cells transduced with GFP-KASH AAV transduction marker. CD68 labels reactive microglia, and DRAQ5 labels DNA. The NPC1I1061T animal in this case was not transduced. Multi-channel images from FIGS. 15A-15H are reproduced for clarity. The dotted white rectangle in the rightmost (treated) column highlights one area that is GFP+/CD68. Scale bar is 200 μm. FIG. 28B shows, CD68+ cells per mm2 in wild-type, treated, and untreated mice. Bars represent mean+SD. Black dots represent individual mice. For (a) and (b), n=3 wild-type; n=2 treated; n=2 untreated mice).



FIGS. 29A-29D show an off-target analysis of NPC1-targeting sgRNA. FIG. 29A shows the results of CIRCLE-seq using the NPC1-targeting sgRNA and Cas9 to cut gDNA harvested from untreated NPC1I1061T mouse liver. Note that off-target candidate sequences are aligned to the wild-type C57BL/6 genome; the wildtype NPC1 allele on line 2 is not present in the assay. FIG. 29B shows a CRISPOR off-target analysis off the six sites with the highest predicted Cas9 activity as determined by CFD score, including the on-target site, in descending order. Off-target guide sequences are shown in the left-most column. FIG. 29C shows an amplicon sequencing of the three CIRCLE-seq candidate loci from treated, sorted mouse cortical and cerebellar samples shown in FIG. 15F. FIG. 29D shows amplicon sequencing of the top five CRISPOR predicted Cas9 off-target sites from treated, sorted mouse cortical and cerebellar samples shown in FIG. 15F. In FIGS. 29C-29D, individual cytosines in the protospacer are arrayed on the x-axis, with base 1 the farthest from the PAM and base 20 PAM adjacent, as depicted in FIG. 29A. Light grey bars indicate cerebellar samples; dark grey bars indicate cortical samples. The dotted line indicates the detection threshold of 0.1% editing. Bars represent mean+SD. Black dots represent individual mice (n=4 mice for cerebellar samples; n=5 mice for cortical samples).



FIGS. 30A-30D show how evaluating different nucleobase editors and guide RNA can correct the Tmc1Y182C/Y182C allele in Baringo MEF cells. FIG. 30A is a schematic of the Tmc1 locus highlighting the c.A545G mutation (red), silent bystander bases, and three candidate guide RNAs that position the target C (directly below “Y/C”) at different protospacer positions (C8, C7, C10) and the use of different PAMs (AGG, GGA and TGA). FIG. 30B shows base editing efficiencies for the four CBE-P2A-GFP variants tested with sgRNA1 (where the four CBEs are APOBEC1-BE4max, CDA1-BE4max, evoCDA1-BE4max, or AID-BE4max). Base editing values (blue bars) reflect the correction of the Baringo mutation to the wild-type TMC1 protein coding sequence, with no other non-silent changes or indels. Three days following nucleofection into Baringo MEF cells, GFP positive (GFP+) cells were sorted and genomic DNA was characterized by high-throughput sequencing. FIG. 30C shows base editing efficiencies for three different guide RNAs tested with AID-BE4max variants: AID-BE4max+sgRNA1, AID-VRQR-BE4max+sgRNA2, or AID-VRQR-BE4max+sgRNA3. Three days following nucleofection of these plasmids into Baringo MEF cells, GFP-positive cells were sorted and sequenced by HTS. FIG. 30D shows base editing efficiencies in Baringo MEF cells following a 14-day incubation with dual AAV encoding AID-BE3.9max+sgRNA1 at high (N terminal: 6.1×108 vg, C terminal: 8.3×108 vg) and low (3.1×107 vg, C terminal: 4.2×107 vg) doses. Dots, shaded bars, and error bars represent individual biological replicates, mean values, and SEM, respectively (n=3-5).



FIGS. 31A-31F show in vivo base editing of Tmc1Y182C/Y182C in Baringo mice, in vitro off-target analysis for sgRNA1, and in vivo analysis of hair-cell stereocilia bundle morphology. FIG. 31A shows the ten most abundant genomic DNA cleavage products (which include the on-target site and nine potential off-target sequences) from Cas9 nuclease+sgRNA1 as identified in vitro by CIRCLE-seq, aligned to the on-target Tmc1 sequence. FIG. 31B shows an editing analysis of the nine candidate off-target sites identified by CIRCLE-seq in MEF cells treated with dual AAV encoding AID-BE3.9max+sgRNA1. The on-target locus, plus the top nine off-target sites identified by CIRCLE-seq, were sequenced by HTS. Dots and bars represent biological replicates and mean±SEM (n=3). FIG. 31C shows the efficiency of AID-BE3.9max+sgRNA1-mediated editing in treated Baringo (Tmc1Y182C/Y182C; Tmc2+/+) mice. Mouse inner ears were injected at P1 with 1 μL (3.1×109 vg of each AAV) dual AAV encoding AID-BE3.9max+sgRNA1. After 14 days, cochleas were microdissected into base, mid, and apex samples. Genomic DNA was extracted from each sample and sequenced by HTS. Each dot represents the efficiency of generating Tmc1 alleles with wild-type TMC1 protein sequence and no other non-silent mutations or indels, averaging all samples sequenced from one injected cochlea. To obtain Tmc1 mRNA from the cochlea, the cochlea was extracted at P30, isolated RNA, reverse transcribed into cDNA, and analyzed by HTS. Each dot represents the mRNA from one injected cochlea. FIGS. 31D-31F show representative scanning electron microscopy (SEM) images at the apical turn of OHCs and IHCs of wild-type (Tmc1+/+; Tmc2+/+) mice (FIG. 31D), untreated Baringo (Tmc1Y182C/Y182C; Tmc2+/+) mice (FIG. 31E), and Baringo mice treated with dual AAV encoding AID-BE3.9max+sgRNA1 (FIG. 31F). The organ of Corti samples were imaged by SEM at 4 weeks. Scale bar, 10 μm.



FIGS. 32A-32C show that the inner ear injection of dual AAV encoding AID-BE3.9max+sgRNA1 restores sensory transduction in Tmc1Y182C/Y182C; Tmc2Δ/Δ inner hair cells. FIG. 32A shows confocal images of mid-turn cochlear sections excised from P5 Tmc1Y182C/Y182C; Tmc2Δ/Δ mouse cochleas. A representative untreated mouse (top panel) or a representative mouse treated with 1 μL (3.1×109 vg of each AAV) of dual AAV encoding AID-BE3.9max+sgRNA1 (bottom panel) are shown. The tissue was cultured for 9-13 days and treated with 5 μM FM1-43 for 10 seconds followed by three full bath exchanges to wash out excess dye. The tissue was mounted and imaged for FM1-43 uptake (light shading) in IHCs and OHCs. All images are 500×150 μm. Scale bar, 50 μm. FIG. 32B is a graph showing the quantification of FM1-43-positive IHCs from untreated and treated mice represented as mean±SD (n=3-4 different mice in each group). FIG. 32C is a graph showing representative families of sensory transduction currents evoked by mechanical displacement of hair bundles recorded from apical IHCs of untreated Tmc1Y182C/Y182C; Tmc2Δ/Δ mice at P8 (untreated), from Tmc1Y182C/Y182C; Tmc2Δ/Δ mice treated with dual AAV encoding AID-BE3.9max+sgRNA1 at P14 and P18 and from wild-type Tmc1+/+; Tmc2+/+ mice at P14-16. Horizontal lines and error bars reflect mean values and SD of 3-4 independent mice and 4-8 hair cells (indicated on top of x-axis), with each dot representing one IHC.



FIGS. 33A-33D show that dual AAV nucleobase editor treatment partially restores auditory function in Baringo (Tmc1Y182C/Y182C; Tmc2Δ/Δ) mice. FIG. 33A shows representative sets of ABR waveforms recorded in response to 5.6-kHz tone bursts of varying sound intensity for untreated wild-type mice (left) and wild-type mice treated with dual AAV encoding AID-BE3.9max+sgRNA1 (right). FIG. 33B shows the same as FIG. 33A, but with untreated Baringo mice (left) and Baringo mice treated with 1 μL (3.1×109 vg of each AAV) dual AAV encoding AID-BE3.9max+sgRNA1 (right). FIG. 33C shows the mean ABR responses for all four groups (untreated and treated, Baringo and wild-type mice) across all tested frequencies. Untreated Baringo mice (black, n=10) are profoundly deaf, with no detectable ABR threshold (>110 dB, indicated by the upward arrows). Among the treated Baringo mice (n=15) injected with dual AAV encoding AID-BE3.9max+sgRNA1, nine showed ABR response improvements of up to >50 dB (series of overlapping lines associated with “n=9”), while six did not show any rescue (grey line, n=6). Untreated wild-type mice (darker line, n=6) and wild-type mice injected with dual AAV encoding AID BE3.9max+sgRNA1 (lighter line, n=4) show similar ABR thresholds. FIG. 33D shows that the same mice in FIG. 33C were subjected to DPOAE testing. Untreated (black line, n=10) and treated Baringo mice both showed no DPOAE responses under the tested conditions (up to 80 dB). Untreated wild-type mice (darker line, n=6) and wild-type mice injected with dual AAV encoding AID-BE3.9max+sgRNA1 (lighter line, n=4) exhibited normal DPOAE thresholds. All recordings were done at P30. Values and error bars reflect mean±SD for the numbers of mice specified above.



FIG. 34 shows the base editing outcomes from different CBE and sgRNA combinations. The heat map shows an average base editing efficiency by BE4max variants at cytosines surrounding the target nucleotide. The target Tmc1Y182C/Y182C mutation is at protospacer position 8. Silent bystander cytosines are at positions 1, 10, 15, and 16. Non-silent bystander cytosines are at positions −12, −11, −9, −8, 18, and 23.



FIGS. 35A-35C show Anc80-Cbh-GFP AAV transduction in IHCs and OHCs in wild-type mice. FIG. 35A shows low magnification, and FIG. 35B shows high magnification images of the entire apical and basal portions of the cochlea of a wild-type mouse injected at P1 with 1 μL of Anc80-Cbh-GFP AAV. The cochlea was harvested at P10, stained with Alexa555-phalloidin, and imaged for Alexa555 and GFP. Scale bar, 50 μm. FIG. 35C shows the number of hair cells are calculated by phalloidin-positive HCs and number of GFP+ HCs are counted. Values and error bars reflect individual data points and mean±SD from three samples from n=3 different mice in each group.



FIG. 36 shows base editing at on-target and off-target genomic DNA sites identified by CIRCLE-seq using Cas9+sgRNA1. Off-target editing analysis in MEF cells treated with dual AAV encoding AID-BE3.9max+sgRNA1. The top ten sites identified by CIRCLE-seq (the on-target locus and the top nine off-target loci) were sequenced by HTS. The maximum % C.G-to-T.A conversion at any position in the protospacer is shown. No off-target site showed editing levels (red) that were significantly (p<0.1) different than the maximum % C.G-to-T.A of the untreated control (blue). Dots and bars represent biological replicates and mean±SEM (n=3 for AAV-treated samples and n=1 for the untreated samples).



FIGS. 37A-37B show the transduction currents from IHCs and OHCs of Tmc1Y182C/Y182; Tmc2+/+ and Tmc1Y182C/Y182C; Tmc2Δ/Δ mice at different time points. FIG. 37A shows representative current traces from IHCs of a Tmc1Y182C/Y182C; Tmc2+/+ mouse (P7) and Tmc1Y182C/Y182C; Tmc2Δ/Δ mouse (P6) are shown. FIG. 37B shows that cellular recordings were obtained from the basal and mid-apical regions of IHCs or OHCs at different time points (P6-P27). Horizontal lines and error bars reflect mean values and SD of 3-4 independent mice and 2-8 hair cells (indicated on top of x-axis), with each dot representing one OHC or IHC.



FIG. 38A-38C show the hair cell morphology in the organ of Corti from Tmc1Y182C/Y182C; Tmc2+/+ mice with and without treatment with dual AAV-AID-BE3.9max+sgRNA1. FIG. 38A shows representative, low-magnification images of whole-mount apical and basal turns from Tmc1Y182C/Y182C; Tmc2+/+ mice treated with AAV-AID-BE3.9max+sgRNA1 and Tmc1Y182C/Y182C; Tmc2+/+ mice without treatment. Samples were stained with Myo7A (lighter shading) to label hair cells. FIG. 38B shows high-magnification images of the same cochleas boxed in FIG. 38A. FIG. 38C is a graph showing the quantification of the number of Myo7A positive IHCs and OHCs from entire cochleas of three untreated Tmc1Y182C/Y182C; Tmc2+/+ and four Tmc1Y182C/Y182C; Tmc2+/+ mice treated with dual AAV-AID-BE3.9max+sgRNA1 at P1. Dots and bars represent biological replicates and mean±SD.



FIGS. 39A-39C show the hair bundle morphology in the basal turn of the organ of Corti from Tmc1Y182C/Y182C; Tmc2+/+ mice with and without treatment with dual AAV-AID-BE3.9max+sgRNA1. Representative scanning electron microscopy images (basal part) of the organ of Corti are shown from wild-type Tmc1Y182C/Y182C; Tmc2+/+ mice (FIG. 39A), Tmc1Y182C/Y182CTmc2+/+ untreated mice (FIG. 39B), and Tmc1Y182C/Y182C; Tmc2+/+ mice treated with dual AAV-AID-BE3.9max+sgRNA1 (FIG. 39C). The apical and basal regions of organ of Corti were imaged at 4 weeks. Scale bar, 10 μm.





DEFINITIONS

As used herein and in the claims, the singular forms “a,” “an,” and “the” include the singular and the plural reference unless the context clearly indicates otherwise. Thus, for example, a reference to “an agent” includes a single agent and a plurality of such agents.


An “adeno-associated virus” or “AAV” is a virus which infects humans and some other primate species. The wild-type AAV genome is a single-stranded deoxyribonucleic acid (ssDNA), either positive- or negative-sensed. The genome comprises two inverted terminal repeats (ITRs), one at each end of the DNA strand, and two open reading frames (ORFs): rep and cap between the ITRs. The rep ORF comprises four overlapping genes encoding Rep proteins required for the AAV life cycle. The cap ORF comprises overlapping genes encoding capsid proteins: VP1, VP2 and VP3, which interact together to form the viral capsid. VP1, VP2 and VP3 are translated from one mRNA transcript, which can be spliced in two different manners: either a longer or shorter intron can be excised resulting in the formation of two isoforms of mRNAs: a ˜2.3 kb- and a ˜2.6 kb-long mRNA isoform. The capsid forms a supramolecular assembly of approximately 60 individual capsid protein subunits into a non-enveloped, T-1 icosahedral lattice capable of protecting the AAV genome. The mature capsid is composed of VP1, VP2, and VP3 (molecular masses of approximately 87, 73, and 62 kDa respectively) in a ratio of about 1:1:10.


rAAV particles may comprise a nucleic acid vector (e.g., a recombinant genome), which may comprise at a minimum: (a) one or more heterologous nucleic acid regions comprising a sequence encoding a protein or polypeptide of interest (e.g., a split Cas9 or split nucleobase) or an RNA of interest (e.g., a gRNA), or one or more nucleic acid regions comprising a sequence encoding a Rep protein; and (b) one or more regions comprising inverted terminal repeat (ITR) sequences (e.g., wild-type ITR sequences or engineered ITR sequences) flanking the one or more nucleic acid regions (e.g., heterologous nucleic acid regions). In some embodiments, the nucleic acid vector is between 4 kb and 5 kb in size (e.g., 4.2 to 4.7 kb in size). In some embodiments, the nucleic acid vector further comprises a region encoding a Rep protein. In some embodiments, the nucleic acid vector is circular. In some embodiments, the nucleic acid vector is single-stranded. In some embodiments, the nucleic acid vector is double-stranded. In some embodiments, a double-stranded nucleic acid vector may be, for example, a self-complimentary vector that contains a region of the nucleic acid vector that is complementary to another region of the nucleic acid vector, initiating the formation of the double-strandedness of the nucleic acid vector.


As used herein, the term “adenosine deaminase” or “adenosine deaminase domain” refers to a protein or enzyme that catalyzes a deamination reaction of an adenosine (or adenine). The terms are used interchangeably. In certain embodiments, the disclosure provides nucleobase editor fusion proteins comprising one or more adenosine deaminase domains. For instance, an adenosine deaminase domain may comprise a heterodimer of a first adenosine deaminase and a second deaminase domain, connected by a linker. Adenosine deaminases (e.g., engineered adenosine deaminases or evolved adenosine deaminases) provided herein may be enzymes that convert adenine (A) to inosine (I) in DNA or RNA. Such adenosine deaminase can lead to an A:T to G:C base pair conversion. In some embodiments, the deaminase is a variant of a naturally-occurring deaminase from an organism. In some embodiments, the deaminase does not occur in nature. For example, in some embodiments, the deaminase is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75% at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring deaminase.


In some embodiments, the adenosine deaminase is derived from a bacterium, such as, E. coli, S. aureus, S. typhi, S. putrefaciens, H. influenzae, or C. crescentus. In some embodiments, the adenosine deaminase is a TadA deaminase. In some embodiments, the TadA deaminase is an E. coli TadA deaminase (ecTadA). In some embodiments, the TadA deaminase is a truncated E. coli TadA deaminase. For example, the truncated ecTadA may be missing one or more N-terminal amino acids relative to a full-length ecTadA. In some embodiments, the truncated ecTadA may be missing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 6, 17, 18, 19, or 20 N-terminal amino acid residues relative to the full length ecTadA. In some embodiments, the truncated ecTadA may be missing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 6, 17, 18, 19, or 20 C-terminal amino acid residues relative to the full length ecTadA. In some embodiments, the ecTadA deaminase does not comprise an N-terminal methionine. Reference is made to U.S. Patent Publication No. 2018/0073012, published Mar. 15, 2018, which is incorporated herein by reference.


In genetics, the “antisense” strand of a segment within double-stranded DNA is the template strand, and which is considered to run in the 3′ to 5′ orientation. By contrast, the “sense” strand is the segment within double-stranded DNA that runs from 5′ to 3′, and which is complementary to the antisense strand of DNA, or template strand, which runs from 3′ to 5′. In the case of a DNA segment that encodes a protein, the sense strand is the strand of DNA that has the same sequence as the mRNA, which takes the antisense strand as its template during transcription, and eventually undergoes (typically, not always) translation into a protein. The antisense strand is thus responsible for the RNA that is later translated to protein, while the sense strand possesses a nearly identical makeup to that of the mRNA. Note that for each segment of dsDNA, there will possibly be two sets of sense and antisense, depending on which direction one reads (since sense and antisense is relative to perspective). It is ultimately the gene product, or mRNA, that dictates which strand of one segment of dsDNA is referred to as sense or antisense.


“Base editing” refers to genome editing technology that involves the conversion of a specific nucleic acid base into another at a targeted genomic locus. In certain embodiments, this can be achieved without requiring double-stranded DNA breaks (DSB), or single stranded breaks (i.e., nicking). To date, other genome editing techniques, including CRISPR-based systems, begin with the introduction of a DSB at a locus of interest. Subsequently, cellular DNA repair enzymes mend the break, commonly resulting in random insertions or deletions (indels) of bases at the site of the DSB. However, when the introduction or correction of a point mutation at a target locus is desired rather than stochastic disruption of the entire gene, these genome editing techniques are unsuitable, as correction rates are low (e.g. typically 0.1% to 5%), with the major genome editing products being indels. In order to increase the efficiency of gene correction without simultaneously introducing random indels, the present inventors previously modified the CRISPR/Cas9 system to directly convert one DNA base into another without DSB formation. See, Komor, A. C., et al., Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420-424 (2016), the entire contents of which is incorporated by reference herein.


The terms “base editor (BE)” and “nucleobase editor,” which are used interchangeably herein, refer to an agent comprising a polypeptide that is capable of making a modification to a base (e.g., A, T, C, G, or U) within a nucleic acid sequence (e.g., DNA or RNA) that converts one base to another (e.g., A to G, A to C, A to T, C to T, C to G, C to A, G to A, G to C, G to T, T to A, T to C, T to G). In some embodiments, the nucleobase editor is capable of deaminating a base within a nucleic acid such as a base within a DNA molecule. In the case of an adenine nucleobase editor, the nucleobase editor is capable of deaminating an adenine (A) in DNA. Such nucleobase editors may include a nucleic acid programmable DNA binding protein (napDNAbp) fused to an adenosine deaminase. Some nucleobase editors include CRISPR-mediated fusion proteins that are utilized in the base editing methods described herein. In some embodiments, the nucleobase editor comprises a nuclease-inactive Cas9 (dCas9) fused to a deaminase which binds a nucleic acid in a guide RNA-programmed manner via the formation of an R-loop, but does not cleave the nucleic acid. For example, the dCas9 domain of the fusion protein may include a D10A and a H840A mutation (which renders Cas9 capable of cleaving only one strand of a nucleic acid duplex), as described in PCT/US2016/058344, which published as WO 2017/070632 on Apr. 27, 2017 and is incorporated herein by reference in its entirety. The DNA cleavage domain of S. pyogenes Cas9 includes two subdomains, the HNH nuclease subdomain and the RuvC1 subdomain. The HNH subdomain cleaves the strand complementary to the gRNA (the “targeted strand”, or the strand in which editing or deamination occurs), whereas the RuvC1 subdomain cleaves the non-complementary strand containing the PAM sequence (the “non-edited strand”). The RuvC1 mutant D10A generates a nick in the targeted strand, while the HNH mutant H840A generates a nick on the non-edited strand (see Jinek et al., Science, 337:816-821(2012); Qi et al.,Cell. 28; 152(5):1173-83 (2013)).


In some embodiments, a nucleobase editor is a macromolecule or macromolecular complex that results primarily (e.g., more than 80%, more than 85%, more than 90%, more than 95%, more than 99%, more than 99.9%, or 100%) in the conversion of a nucleobase in a polynucleic acid sequence into another nucleobase (i.e., a transition or transversion) using a combination of 1) a nucleotide-, nucleoside-, or nucleobase-modifying enzyme and 2) a nucleic acid binding protein that can be programmed to bind to a specific nucleic acid sequence.


In some embodiments, the nucleobase editor comprises a DNA binding domain (e.g., a programmable DNA binding domain such as a dCas9 or nCas9) that directs it to a target sequence. In some embodiments, the nucleobase editor comprises a nucleobase modification domain fused to a programmable DNA binding domain (e.g., a dCas9 or nCas9). The terms “nucleobase modifying enzyme” and “nucleobase modification domain,” which are used interchangeably herein, refer to an enzyme that can modify a nucleobase and convert one nucleobase to another (e.g., a deaminase such as a cytidine deaminase or a adenosine deaminase). The nucleobase modifying enzyme of the the nucleobase editor may target cytosine (C) bases in a nucleic acid sequence and convert the C to thymine (T) base. In some embodiments, C to T editing is carried out by a deaminase, e.g., a cytidine deaminase. In some embodiments, A to G editing is carried out by a deaminase, e.g., an adenosine deaminase. Nucleobase editors that can carry out other types of base conversions (e.g., C to G) are also contemplated.


A “split nucleobase editor” refers to a nucleobase editor that is provided as an N-terminal portion (also referred to as a N-terminal half) and a C-terminal portion (also referred to as a C-terminal half) encoded by two separate nucleic acids. The polypeptides corresponding to the N-terminal portion and the C-terminal portion of the nucleobase editor may be combined to form a complete nucleobase editor. In some embodiments, for a nucleobase editor that comprises a dCas9 or nCas9, the “split” is located in the dCas9 or nCas9 domain, at positions as described herein in the split Cas9. Accordingly, in some embodiments, the N-terminal portion of the nucleobase editor contains the N-terminal portion of the split Cas9, and the C-terminal portion of the nucleobase editor contains the C-terminal portion of the split Cas9. Similarly, intein-N or intein-C may be fused to the N-terminal portion or the C-terminal portion of the nucleobase editor, respectively, for the joining of the N- and C-terminal portions of the nucleobase editor to form a complete nucleobase editor.


In some embodiments, a nucleobase editor converts a C to a T. In some embodiments, the nucleobase editor comprises a cytosine deaminase. A “cytosine deaminase”, or “cytidine deaminase,” refers to an enzyme that catalyzes the chemical reaction “cytosine+H2O→uracil+NH3” or “5-methyl-cytosine+H2O→thymine+NH3.” As it may be apparent from the reaction formula, such chemical reactions result in a C to U/T nucleobase change. In the context of a gene, such a nucleotide change, or mutation, may in turn lead to an amino acid change in the protein, which may affect the protein's function, e.g., loss-of-function or gain-of-function. In some embodiments, the C to T nucleobase editor comprises a dCas9 or nCas9 fused to a cytidine deaminase. In some embodiments, the cytidine deaminase domain is fused to the N-terminus of the dCas9 or nCas9. In some embodiments, the nucleobase editor further comprises a domain that inhibits uracil glycosylase, and/or a nuclear localization signal. Such nucleobase editors have been described in the art, e.g., in Rees & Liu, Nat Rev Genet. 2018; 19(12):770-788 and Koblan et al., Nat Biotechnol. 2018; 36(9):843-846; as well as U.S. Patent Publication No. 2018/0073012, published Mar. 15, 2018, which issued as U.S. Pat. No. 10,113,163; on Oct. 30, 2018; U.S. Patent Publication No. 2017/0121693, published May 4, 2017, which issued as U.S. Pat. No. 10,167,457 on Jan. 1, 2019; PCT Publication No. WO 2017/070633, published Apr. 27, 2017; U.S. Patent Publication No. 2015/0166980, published Jun. 18, 2015; U.S. Pat. No. 9,840,699, issued Dec. 12, 2017; U.S. Pat. No. 10,077,453, issued Sep. 18, 2018; PCT Publication No. WO 2019/023680, published Jan. 31, 2019; PCT Publication No. WO 2018/0176009, published Sep. 27, 2018, PCT Application No PCT/US2019/033848, filed May 23, 2019, PCT Application No. PCT/US2019/47996, filed Aug. 23, 2019; PCT Application No. PCT/US2019/049793, filed Sep. 5, 2019; International Patent Application No. PCT/US2020/028568, filed Apr. 17, 2020; PCT Application No. PCT/US2019/61685, filed Nov. 15, 2019; PCT Application No. PCT/US2019/57956, filed Oct. 24, 2019; PCT Publication No. PCT/US2019/58678, filed Oct. 29, 2019, the contents of each of which are incorporated herein by reference in their entireties.


In some embodiments, a nucleobase editor converts an A to a G. In some embodiments, the nucleobase editor comprises an adenosine deaminase. An “adenosine deaminase” is an enzyme involved in purine metabolism. It is needed for the breakdown of adenosine from food and for the turnover of nucleic acids in tissues. Its primary function in humans is the development and maintenance of the immune system. An adenosine deaminase catalyzes hydrolytic deamination of adenosine (forming inosine, which base pairs as G) in the context of DNA. There are no known natural adenosine deaminases that act on DNA. Instead, known adenosine deaminase enzymes only act on RNA (tRNA or mRNA). Evolved deoxyadenosine deaminase enzymes that accept DNA substrates and deaminate dA to deoxyinosine have been described, e.g., in PCT Application PCT/US2017/045381, filed Aug. 3, 2017, which published as WO 2018/027078, PCT Application No. PCT/US2019/033848, which published as WO 2019/226953, PCT Application No PCT/US2019/033848, filed May 23, 2019, and PCT Patent Application No. PCT/US2020/028568, filed Apr. 17, 2020; each of which is herein incorporated by reference by reference.


Exemplary adenosine and cytidine nucleobase editors are also described in Rees & Liu, Base editing: precision chemistry on the genome and transcriptome of living cells, Nat. Rev. Genet. 2018; 19(12):770-788; as well as U.S. Patent Publication No. 2018/0073012, published Mar. 15, 2018, which issued as U.S. Pat. No. 10,113,163, on Oct. 30, 2018; U.S. Patent Publication No. 2017/0121693, published May 4, 2017, which issued as U.S. Pat. No. 10,167,457 on Jan. 1, 2019; PCT Publication No. WO 2017/070633, published Apr. 27, 2017; U.S. Patent Publication No. 2015/0166980, published Jun. 18, 2015; U.S. Pat. No. 9,840,699, issued Dec. 12, 2017; and U.S. Pat. No. 10,077,453, issued Sep. 18, 2018, the contents of each of which are incorporated herein by reference in their entireties.


The term “Cas9” or “Cas9 nuclease” refers to an RNA-guided nuclease comprising a Cas9 domain, or a fragment thereof (e.g., a protein comprising an active or inactive DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9). A “Cas9 domain” as used herein, is a protein fragment comprising an active or inactive cleavage domain of Cas9 and/or the gRNA binding domain of Cas9. A “Cas9 protein” is a full length Cas9 protein. A Cas9 nuclease is also referred to sometimes as a casnl nuclease or a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat)-associated nuclease. CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements, and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 domain. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of which are hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J. J., McShan W. M., Ajdic D. J., Savic D. J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A. N., Kenton S., Lai H. S., Lin S. P., Qian Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B. A., McLaughlin R. E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C. M., Gonzales K., Chao Y., Pirzada Z. A., Eckert M.R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease comprises one or more mutations that partially impair or inactivate the DNA cleavage domain.


A “split Cas9 protein” or “split Cas9” refers to a Cas9 protein that is provided as an N-terminal portion (which is referred to herein interchangeably as an N-terminal half) and a C-terminal portion (which is referred to herein interchangeably as a C-terminal half) encoded by two separate nucleotide sequences. The polypeptides corresponding to the N-terminal portion and the C-terminal portion of the Cas9 protein may be combined (joined) to form a complete Cas9 protein. A Cas9 protein is known to consist of a bi-lobed structure linked by a disordered linker (e.g., as described in Nishimasu et al., Cell, Volume 156, Issue 5, pp. 935-949, 2014, incorporated herein by reference). In some embodiments, the “split” occurs between the two lobes, generating two portions of a Cas9 protein, each containing one lobe.


A nuclease-inactivated Cas9 domain may interchangeably be referred to as a “dCas9” protein (for nuclease-“dead” Cas9). Methods for generating a Cas9 domain (or a fragment thereof) having an inactive DNA cleavage domain are known (see, e.g., Jinek et al., Science. 337:816-821(2012); Qi et al., “Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression” (2013) Cell. 28; 152(5):1173-83, the entire contents of each of which are incorporated herein by reference). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, the HNH nuclease subdomain and the RuvC1 subdomain. The HNH subdomain cleaves the strand complementary to the gRNA, whereas the RuvC1 subdomain cleaves the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas9. For example, the mutations D10A and H840A completely inactivate the nuclease activity of S. pyogenes Cas9 (Jinek et al., Science. 337:816-821(2012); Qi et al., Cell. 28; 152(5):1173-83 (2013)). In some embodiments, proteins comprising fragments of Cas9 are provided. For example, in some embodiments, a protein comprises one of two Cas9 domains: (1) the gRNA binding domain of Cas9; or (2) the DNA cleavage domain of Cas9. In some embodiments, proteins comprising Cas9 or fragments thereof are referred to as “Cas9 variants.” A Cas9 variant shares homology to Cas9, or a fragment thereof. For example, a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, at least about 99.8% identical, or at least about 99.9% identical to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 1). In some embodiments, the Cas9 variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes compared to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 1). In some embodiments, the Cas9 variant comprises a fragment of Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 1). In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 1).


As used herein, the term “nCas9” or “Cas9 nickase” refers to a Cas9 or a variant thereof, which cleaves or nicks only one of the strands of a target cut site thereby introducing a nick in a double strand DNA molecule rather than creating a double strand break. This can be achieved by introducing appropriate mutations in a wild-type Cas9 which inactivates one of the two endonuclease activities of the Cas9. Any suitable mutation which inactivates one Cas9 endonuclease activity but leaves the other intact is contemplated, such as one of D10A or H840A mutations in the wild-type S. pyogenes Cas9 amino acid sequence, or a D10A mutation in the wild-type S. aureus Cas9 amino acid sequence, may be used to form the nCas9.


The term “cDNA” refers to a strand of DNA copied from an RNA template. cDNA is complementary to the RNA template.


As used herein, the term “circular permutant” refers to a protein or polypeptide (e.g., a Cas9) comprising a circular permutation, which is change in the protein's structural configuration involving a change in order of amino acids appearing in the protein's amino acid sequence. In other words, circular permutants are proteins that have altered N- and C-termini as compared to a wild-type counterpart, e.g., the wild-type C-terminal half of a protein becomes the new N-terminal half. Circular permutation (or CP) is essentially the topological rearrangement of a protein's primary sequence, connecting its N- and C-terminus, often with a peptide linker, while concurrently splitting its sequence at a different position to create new, adjacent N- and C-termini. The result is a protein structure with different connectivity, but which often can have the same overall similar three-dimensional (3D) shape, and possibly include improved or altered characteristics, including, reduced proteolytic susceptibility, improved catalytic activity, altered substrate or ligand binding, and/or improved thermostability. Circular permutant proteins can occur in nature (e.g., concanavalin A and lectin). In addition, circular permutation can occur as a result of posttranslational modifications or may be engineered using recombinant techniques. Such circularly permuted proteins (“CP-napDNAbp”, such as “CP-Cas9” in the case of Cas9), or variants thereof, retain the ability to bind DNA when complexed with a guide RNA (gRNA). See, Oakes et al., “Protein Engineering of Cas9 for enhanced function,” Methods Enzymol, 2014, 546: 491-511 and Oakes et al., “CRISPR-Cas9 Circular Permutants as Programmable Scaffolds for Genome Modification,” Cell, Jan. 10, 2019, 176: 254-267, each of are incorporated herein by reference.


The term “circularly permuted Cas9” refers to a Cas9 protein, or variant thereof (e.g., SpCas9), that occurs as or engineered as a circular permutant, whereby its N- and C-termini have been topically rearranged. The instant disclosure contemplates any previously known CP-Cas9 or use a new CP-Cas9 so long as the resulting circularly permuted protein retains the ability to bind DNA when complexed with a guide RNA (gRNA).


As used herein, a “cytosine deaminase” encoded by the CDA gene is an enzyme that catalyzes the removal of an amine group from cytidine (i.e., the base cytosine when attached to a ribose ring) to uridine (C to U) and deoxycytidine to deoxyuridine (C to U). A non-limiting example of a cytosine deaminase is APOBEC1 (“apolipoprotein B mRNA editing enzyme, catalytic polypeptide 1”). Another example is AID (“activation-induced cytosine deaminase”). Under standard Watson-Crick hydrogen bond pairing, a cytosine base hydrogen bonds to a guanine base. When cytidine is converted to uridine (or deoxycytidine is converted to deoxyuridine), the uridine (or the uracil base of uridine) undergoes hydrogen bond pairing with the base adenine. Thus, a conversion of “C” to uridine (“U”) by cytosine deaminase will cause the insertion of “A” instead of a “G” during cellular repair and/or replication processes. Since the adenine “A” pairs with thymine “T”, the cytosine deaminase in coordination with DNA replication causes the conversion of an C.G pairing to a T.A pairing in the double-stranded DNA molecule.


“CRISPR” is a family of DNA sequences (i.e., CRISPR clusters) in bacteria and archaea that represent snippets of prior infections by a virus that have invaded the prokaryote. The snippets of DNA are used by the prokaryotic cell to detect and destroy DNA from subsequent attacks by similar viruses and effectively compose, along with an array of CRISPR-associated proteins (including Cas9 and homologs thereof) and CRISPR-associated RNA, a prokaryotic immune defense system. In nature, CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In certain types of CRISPR systems (e.g., type II CRISPR systems), correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the RNA. Specifically, the target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gRNA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species—the guide RNA. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of which is hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. CRISPR biology, as well as Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J. J., McShan W. M., Ajdic D. J., Savic D. J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A. N., Kenton S., Lai H. S., Lin S. P., Qian Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B. A., McLaughlin R. E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C. M., Gonzales K., Chao Y., Pirzada Z. A., Eckert M. R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference.


The term “deaminase” or “deaminase domain” refers to a protein or enzyme that catalyzes a deamination reaction. In some embodiments, the deaminase is an adenosine (or adenine) deaminase, which catalyzes the hydrolytic deamination of adenine or adenosine. In some embodiments, the adenosine deaminase catalyzes the hydrolytic deamination of adenine or adenosine in deoxyribonucleic acid (DNA) to inosine. In other embodiments, the deminase is a cytidine (or cytosine) deaminase, which catalyzes the hydrolytic deamination of cytidine or cytosine.


The deaminases provided herein may be from any organism, such as a bacterium. In some embodiments, the deaminase or deaminase domain is a variant of a naturally-occurring deaminase from an organism. In some embodiments, the deaminase or deaminase domain does not occur in nature. For example, in some embodiments, the deaminase or deaminase domain is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75% at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring deaminase.


As used herein, the term “DNA binding protein” or “DNA binding protein domain” refers to any protein that localizes to and binds a specific target DNA nucleotide sequence (e.g. a gene locus of a genome). This term embraces RNA-programmable proteins, which associate (e.g. form a complex) with one or more nucleic acid molecules (i.e., which includes, for example, guide RNA in the case of Cas systems) that direct or otherwise program the protein to localize to a specific target nucleotide sequence (e.g., DNA sequence) that is complementary to the one or more nucleic acid molecules (or a portion or region thereof) associated with the protein. Exemplary RNA-programmable proteins are CRISPR-Cas9 proteins, as well as Cas9 equivalents, homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g. engineered or modified), and may include a Cas9 equivalent from any type of CRISPR system (e.g. type II, V, VI), including Cpf1 (a type-V CRISPR-Cas systems) (now known as Cas12a), C2c1 (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system), C2c3 (a type V CRISPR-Cas system), dCas9, GeoCas9, CjCas9, Cas12b, Cas12c, Cas12d, Cas12g, Cas12h, Cas12i, Cas13d, Cas14, Argonaute, and nCas9. Further Cas-equivalents are described in Makarova et al., “C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299), the contents of which are incorporated herein by reference.


The term “DNA editing efficiency,” as used herein, refers to the number or proportion of intended base pairs that are edited. For example, if a nucleobase editor edits 10% of the base pairs that it is intended to target (e.g., within a cell or within a population of cells), then the nucleobase editor can be described as being 10% efficient. Some aspects of editing efficiency embrace the modification (e.g. deamination) of a specific nucleotide within DNA, without generating a large number or percentage of insertions or deletions (i.e., indels). It is generally accepted that editing while generating less than 5% indels (as measured over total target nucleotide substrates) is high editing efficiency. The generation of more than 20% indels is generally accepted as poor or low editing efficiency. Indel formation may be measured by techniques known in the art, including high-throughput screening of sequencing reads.


The term “off-target editing frequency,” as used herein, refers to the number or proportion of unintended base pairs, e.g. DNA base pairs, that are edited. On-target and off-target editing frequencies may be measured by the methods and assays described herein, further in view of techniques known in the art, including high-throughput sequencing reads. As used herein, high-throughput sequencing involves the hybridization of nucleic acid primers (e.g., DNA primers) with complementarity to nucleic acid (e.g., DNA) regions just upstream or downstream of the target sequence or off-target sequence of interest. Because the DNA target sequence and the Cas9-independent off-target sequences are known a priori in the methods disclosed herein, nucleic acid primers with sufficient complementarity to regions upstream or downstream of the target sequence and Cas9-independent off-target sequences of interest may be designed using techniques known in the art, such as the PhusionU PCR kit (Life Technologies), Phusion HS II kit (Life Technologies), and Illumina MiSeq kit. Since many of the Cas9-dependent off-target sites have high sequence identity to the target site of interest, nucleic acid primers with sufficient complementarity to regions upstream or downstream of the Cas9-dependent off-target site may likewise be designed using techniques and kits known in the art. These kits make use of polymerase chain reaction (PCR) amplification, which produces amplicons as intermediate products. The target and off-target sequences may comprise genomic loci that further comprise protospacers and PAMs. Accordingly, the term “amplicons,” as used herein, may refer to nucleic acid molecules that constitute the aggregates of genomic loci, protospacers and PAMs. High-throughput sequencing techniques used herein may further include Sanger sequencing and IIlumina-based next-generation genome sequencing (NGS).


The term “on-target editing,” as used herein, refers to the introduction of intended modifications (e.g., deaminations) to nucleotides (e.g., adenine) in a target sequence, such as using the nucleobase editors described herein. The term “off-target DNA editing,” as used herein, refers to the introduction of unintended modifications (e.g. deaminations) to nucleotides (e.g. adenine) in a sequence outside the canonical nucleobase editor binding window (i.e., from one protospacer position to another, typically 2 to 8 nucleotides long). Off-target DNA editing can result from weak or non-specific binding of the gRNA sequence to the target sequence.


As used herein, the terms “upstream” and “downstream” are terms of relativety that define the linear position of at least two elements located in a nucleic acid molecule (whether single or double-stranded) that is orientated in a 5′-to-3′ direction. In particular, a first element is upstream of a second element in a nucleic acid molecule where the first element is positioned somewhere that is 5′ to the second element. For example, a SNP is upstream of a Cas9-induced nick site if the SNP is on the 5′ side of the nick site. Conversely, a first element is downstream of a second element in a nucleic acid molecule where the first element is positioned somewhere that is 3′ to the second element. For example, a SNP is downstream of a Cas9-induced nick site if the SNP is on the 3′ side of the nick site. The nucleic acid molecule can be a DNA (double or single stranded). RNA (double or single stranded), or a hybrid of DNA and RNA. The analysis is the same for single strand nucleic acid molecule and a double strand molecule since the terms upstream and downstream are in reference to only a single strand of a nucleic acid molecule, except that one needs to select which strand of the double stranded molecule is being considered. Often, the strand of a double stranded DNA which can be used to determine the positional relativity of at least two elements is the “sense” or “coding” strand. In genetics, a “sense” strand is the segment within double-stranded DNA that runs from 5′ to 3′, and which is complementary to the antisense strand of DNA, or template strand, which runs from 3′ to 5′. Thus, as an example, a SNP nucleobase is “downstream” of a promoter sequence in a genomic DNA (which is double-stranded) if the SNP nucleobase is on the 3′ side of the promoter on the sense or coding strand.


The term “base edit:indel ratio,” as used herein, refers to the ratio of intended DNA nucleobase modifications (e.g., point mutations or deaminations) to formation of indels.


The term “effective amount,” as used herein, refers to an amount of a biologically active agent that is sufficient to elicit a desired biological response. For example, in some embodiments, an effective amount of a nucleobase editor may refer to the amount of the editor that is sufficient to edit a target site nucleotide sequence, e.g., a genome. In some embodiments, an effective amount of a nucleobase editor provided herein, e.g., of a fusion protein comprising a nickase Cas9 domain and a guide RNA may refer to the amount of the fusion protein that is sufficient to induce editing of a target site specifically bound and edited by the fusion protein. As will be appreciated by the skilled artisan, the effective amount of an agent, e.g., a fusion protein, a nuclease, a hybrid protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide, may vary depending on various factors as, for example, on the desired biological response, e.g., on the specific allele, genome, or target site to be edited, on the cell or tissue being targeted, and on the agent being used.


The term “functional equivalent” refers to a second biomolecule that is equivalent in function, but not necessarily equivalent in structure to a first biomolecule. For example, a “Cas9 equivalent” refers to a protein that has the same or substantially the same functions as Cas9, but not necessarily the same amino acid sequence. In the context of the disclosure, the specification refers throughout to “a protein X, or a functional equivalent thereof.” In this context, a “functional equivalent” of protein X embraces any homolog, paralog, fragment, naturally occurring, engineered, circular permutant, mutated, or synthetic version of protein X which bears an equivalent function.


The term “fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an “amino-terminal fusion protein” or a “carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic-acid editing protein. Another example includes a Cas9 or equivalent thereof fused to an adenosine deaminae. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.


Two proteins or protein domains are considered to be “fused” when a peptide bond is formed linking the two proteins or two protein domains. In some embodiments, a linker (e.g., a peptide linker) is present between the two proteins or two protein domains. The term “linker,” as used herein, refers to a chemical group or a molecule linking two molecules or moieties, e.g., two domains of a fusion protein, such as, for example, a nuclease-inactive Cas9 domain and a nucleic acid editing domain (e.g., a deaminase domain). Typically, the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linke are also contemplated.


The term “guide nucleic acid” or “napDNAbp-programming nucleic acid molecule” or equivalently “guide sequence” refers the one or more nucleic acid molecules which associate with and direct or otherwise program a napDNAbp protein to localize to a specific target nucleotide sequence (e.g., a gene locus of a genome) that is complementary to the one or more nucleic acid molecules (or a portion or region thereof) associated with the protein, thereby causing the napDNAbp protein to bind to the nucleotide sequence at the specific target site. A non-limiting example is a guide RNA of a Cas protein of a CRISPR-Cas genome editing system. Chemically, guide nucleic acids can be all RNA, all DNA, or a chimeric of RNA and DNA. The guide nucleic acids may also include nucleotide analogs. Guide nucleic acids can be expressed as transcription products or can be synthesized.


As used herein, a “guide RNA” can refer to a synthetic fusion of the endogenous bacterial crRNA and tracrRNA that provides both targeting specificity and a scaffold and/or binding ability for Cas9 nuclease to a target DNA. This synthetic fusion does not exist in nature and is also commonly referred to as an sgRNA. However, the term, guide RNA, also embraces equivalent guide nucleic acid molecules that associate with Cas9 equivalents, homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and which otherwise program the Cas9 equivalent to localize to a specific target nucleotide sequence. The Cas9 equivalents may include other napDNAbps from any type of CRISPR system (e.g., type II, V, VI), including Cpf1 (a type-V CRISPR-Cas systems) (now known as Cas12a), C2c1 (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system) and C2c3 (a type V CRISPR-Cas system). Further Cas-equivalents are described in Makarova et al., “C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299), the contents of which are incorporated herein by reference. Exemplary sequences are and structures of guide RNAs are provided herein. In addition, methods for designing appropriate guide RNA sequences are provided herein.


A guide RNA is a particular type of guide nucleic acid which is mostly commonly associated with a Cas protein of a CRISPR-Cas9 and which associates with Cas9, directing the Cas9 protein to a specific sequence in a DNA molecule that includes complementarity to the protospacer sequence for the guide RNA. Functionally, guide RNAs associate with Cas9, directing (or programming) the Cas9 protein to a specific sequence in a DNA molecule that includes a sequence complementary to the protospacer sequence for the guide RNA. A gRNA is a component of the CRISPR/Cas system. Typically, a guide RNA comprises a fusion of a CRISPR-targeting RNA (crRNA) and a trans-activation crRNA (tracrRNA), providing both targeting specificity and scaffolding/binding ability for Cas9 nuclease. A “crRNA” is a bacterial RNA that confers target specificity and requires tracrRNA to bind to Cas9. A “tracrRNA” is a bacterial RNA that links the crRNA to the Cas9 nuclease and typically can bind any crRNA. The sequence specificity of a Cas DNA-binding protein is determined by gRNAs, which have nucleotide base-pairing complementarity to target DNA sequences. The native gRNA comprises a 20 nucleotide (nt) Specificity Determining Sequence (SDS), or spacer, which specifies the DNA sequence to be targeted, and is immediately followed by a 80 nt scaffold sequence, which associates the gRNA with Cas9. In some embodiments, an SDS of the present disclosure has a length of 15 to 100 nucleotides, or more. For example, an SDS may have a length of 15 to 90, 15 to 85, 15 to 80, 15 to 75, 15 to 70, 15 to 65, 15 to 60, 15 to 55, 15 to 50, 15 to 45, 15 to 40, 15 to 35, 15 to 30, or 15 to 20 nucleotides. In some embodiments, the SDS is 20 nucleotides long. For example, the SDS may be 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides long. At least a portion of the target DNA sequence is complementary to the SDS of the gRNA. For Cas9 to successfully bind to the DNA target sequence, a region of the target sequence is complementary to the SDS of the gRNA sequence and is immediately followed by the correct protospacer adjacent motif (PAM) sequence (e.g., NGG for Cas9 and TTN, TTTN, or YTN for Cpf1). In some embodiments, an SDS is 100% complementary to its target sequence. In some embodiments, the SDS sequence is less than 100% complementary to its target sequence and is, thus, considered to be partially complementary to its target sequence. For example, a targeting sequence may be 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, or 90% complementary to its target sequence. In some embodiments, the SDS of template DNA or target DNA may differ from a complementary region of a gRNA by 1, 2, 3, 4 or 5 nucleotides.


In some embodiments, the guide RNA is about 15-120 nucleotides long and comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence. In some embodiments, the guide RNA is 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, or 120 nucleotides long. In some embodiments, the guide RNA comprises a sequence of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more contiguous nucleotides that is complementary to a target sequence. Sequence complementarity refers to distinct interactions between adenine and thymine (DNA) or uracil (RNA), and between guanine and cytosine.


As used herein, a “spacer sequence” is the sequence of the guide RNA (˜20 nts in length) which has the same sequence (with the exception of uridine bases in place of thymine bases) as the protospacer of the PAM strand of the target (DNA) sequence, and which is complementary to the target strand (or non-PAM strand) of the target sequence.


As used herein, the “target sequence” refers to the ˜20 nucleotides in the target DNA sequence that have complementarity to the protospacer sequence in the PAM strand. The target sequence is the sequence that anneals to or is targeted by the spacer sequence of the guide RNA. The spacer sequence of the guide RNA and the protospacer have the same sequence (except the spacer sequence is RNA, and the protospacer is DNA).


As used herein, the terms “guide RNA core,” “guide RNA scaffold sequence” and “backbone sequence,” which are used interchangeably, refer to the region (or sequence) within the gRNA that is responsible for Cas9 binding. It does not include the 20 bp spacer sequence that is used to guide Cas9 to target DNA. This region also known as the crRNA/tracrRNA. The guide RNA backbone sequence is separate from the guide sequence, or spacer, region of the guide RNA, which has complementarity to a protospacer of a nucleic acid molecule.


As used herein, the term “protospacer” refers to the sequence (e.g., a ˜20 bp sequence) in DNA adjacent to the PAM (protospacer adjacent motif) sequence which shares the same sequence as the spacer sequence of the guide RNA, and which is complementary to the target sequence of the non-PAM strand. The spacer sequence of the guide RNA anneals to the target sequence located on the non-PAM strand. In order for Cas9 to function it also requires a specific protospacer adjacent motif (PAM) that varies depending on the bacterial species of the Cas9 gene. The most commonly used Cas9 nuclease, derived from S. pyogenes, recognizes a PAM sequence of NGG that is found directly downstream of the protospacer sequence in the genomic DNA, on the non-target strand. The skilled person will appreciate that the literature in the state of the art sometimes refers to the “protospacer” as the ˜20-nt target-specific guide sequence on the guide RNA itself, rather than referring to it as a “spacer” (and that the protospacer (DNA) and the spacer (RNA) have the same sequence). Thus, the term “protospacer” as used herein may be used interchangeably with the term “spacer.” The context of the discription surrounding the appearance of either “protospacer” or “spacer” will help inform the reader as to whether the term is reference to the gRNA or the DNA sequence. Both usages of these terms are acceptable since the state of the art uses both terms in each of these ways.


A “protospacer adjacent motif” (PAM) is typically a sequence of nucleotides located adjacent to (e.g., within 10, 9, 8, 7, 6, 5, 4, 3, 3, or 1 nucleotide(s) of a target sequence). A PAM sequence is “immediately adjacent to” a target sequence if the PAM sequence is contiguous with the target sequence (that is, if there are no nucleotides located between the PAM sequence and the target sequence). In some embodiments, a PAM sequence is a wild-type PAM sequence. Examples of PAM sequences include, without limitation, NGG, NGR, NNGRR(T/N), NNNNGATT, NNAGAAW, NGGAG, NAAAAC, AWG, and CC. In some embodiments, a PAM sequence is obtained from Streptococcus pyogenes (e.g., NGG or NGR). In some embodiments, a PAM sequence is obtained from Staphylococcus aureus (e.g., NNGRR(T/N)). In some embodiments, a PAM sequence is obtained from Neisseria meningitidis (e.g., NNNNGATT). In some embodiments, a PAM sequence is obtained from Streptococcus thermophilus (e.g., NNAGAAW or NGGAG). In some embodiments, a PAM sequence is obtained from Treponema denticola (e.g., NAAAAC). In some embodiments, a PAM sequence is obtained from Escherichia coli (e.g., AWG). In some embodiments, a PAM sequence is obtained from Pseudomonas auruginosa (e.g., CC). Other PAM sequences are contemplated. A PAM sequence is typically located downstream (i.e., 3′) from the target sequence, although in some embodiments a PAM sequence may be located upstream (i.e., 5′) from the target sequence.


The term “host cell,” as used herein, refers to a cell that can host, replicate, and transfer a phage vector useful for a continuous evolution process as provided herein. In embodiments where the vector is a viral vector, a suitable host cell is a cell that may be infected by the viral vector, can replicate it, and can package it into viral particles that can infect fresh host cells. A cell can host a viral vector if it supports expression of genes of viral vector, replication of the viral genome, and/or the generation of viral particles. One criterion to determine whether a cell is a suitable host cell for a given viral vector is to determine whether the cell can support the viral life cycle of a wild-type viral genome that the viral vector is derived from. For example, if the viral vector is a modified M13 phage genome, as provided in some embodiments described herein, then a suitable host cell would be any cell that can support the wild-type M13 phage life cycle. Suitable host cells for viral vectors useful in continuous evolution processes are well known to those of skill in the art, and the disclosure is not limited in this respect. In some embodiments, the viral vector is a phage and the host cell is a bacterial cell. In some embodiments, the host cell is an E. coli cell. Suitable E. coli host strains will be apparent to those of skill in the art, and include, but are not limited to, New England Biolabs (NEB) Turbo, Top10F′, DH12S, ER2738, ER2267, and XL1-Blue MRF′. These strain names are art recognized and the genotype of these strains has been well characterized. It should be understood that the above strains are exemplary only and that the invention is not limited in this respect. The term “fresh,” as used herein interchangeably with the terms “non-infected” or “uninfected” in the context of host cells, refers to a host cell that has not been infected by a viral vector comprising a gene of interest as used in a continuous evolution process provided herein. A fresh host cell can, however, have been infected by a viral vector unrelated to the vector to be evolved or by a vector of the same or a similar type but not carrying the gene of interest.


In some embodiments, the host cell is a prokaryotic cell, for example, a bacterial cell. In some embodiments, the host cell is an E. coli cell. In some embodiments, the host cell is a eukaryotic cell, for example, a yeast cell, a plant cell, an insect cell, or a mammalian cell. In some embodiments, the cell is a human cell. The type of host cell, will, of course, depend on the viral vector employed, and suitable host cell/viral vector combinations will be readily apparent to those of skill in the art.


An “intein” is a segment of a protein that is able to excise itself and join the remaining portions (the exteins) with a peptide bond in a process known as protein splicing. Inteins are also referred to as “protein introns.” The process of an intein excising itself and joining the remaining portions of the protein is herein termed “protein splicing” or “intein-mediated protein splicing.” In some embodiments, an intein of a precursor protein (an intein containing protein prior to intein-mediated protein splicing) comes from two genes. Such intein is referred to herein as a split intein. For example, in cyanobacteria, DnaE, the catalytic subunit a of DNA polymerase III, is encoded by two separate genes, dnaE-n and dnaE-c. The intein encoded by the dnaE-n gene is herein referred as “intein-N.” The intein encoded by the dnaE-c gene is herein referred as “intein-C.”


Other intein systems may also be used. For example, a synthetic intein based on the dnaE intein, the Cfa-N and Cfa-C intein pair, has been described (e.g., in Stevens et al., J Am Chem Soc. 2016 Feb. 24; 138(7):2162-5, incorporated herein by reference). As another example, a synthetic intein based on the dnaE intein, the Nostoc punctiforme (Npu) intein pair, has been described (see Zettler, J., Schutz, V. & Mootz, H. D., The naturally split Npu DnaE intein exhibits an extraordinarily high rate in the protein trans-splicing reaction. FEBS letters 583, 909-914 (2009), incorporated herein by reference). Non-limiting examples of intein pairs that may be used in accordance with the present disclosure include: Cfa DnaE intein, Npu DnaE intein, Ssp GyrB intein, Ssp DnaX intein, Ter DnaE3 intein, Ter ThyX intein, Rma DnaB intein and Cne Prp8 intein (e.g., as described in U.S. Pat. No. 8,394,604, incorporated herein by reference).


Exemplary nucleotide and amino acid sequences of inteins are provided below, as SEQ ID NOs: 350-357. In some embodiments, the inteins used in accordance with the disclosed napDNAbp domains (e.g., Cas9 domains) comprise the Npu intein-N comprising the amino acid sequence of SEQ ID NO: 351 and the the Npu intein-C comprising the amino acid sequence of SEQ ID NO: 353. In some embodiments, the inteins used in accordance with the disclosed nucleobase editors comprise the Npu intein-N comprising the amino acid sequence of SEQ ID NO: 351 and the Npu intein-C comprising the amino acid sequence of SEQ ID NO: 353. In some embodiments, the inteins used in accordance with the disclosed constructs encoding any of the disclosed napDNAbp domains (e.g., a Cas9 domain) comprise the Npu intein-N DNA comprising the nucleotide sequence of SEQ ID NO: 350 and the the Npu intein-C DNA comprising the nucleotide sequence of SEQ ID NO: 352. In some embodiments, the inteins used in accordance with the disclosed constructs encoding any of the disclosed nucleobase editors comprise the Npu intein-N DNA comprising the nucleotide sequence of SEQ ID NO: 350 and the Npu intein-C DNA comprising the nucleotide sequence of SEQ ID NO: 352.


In some embodiments, the intein-N comprises an amino acid sequence that is at least 90%, 95%, 98%, or 99% identical to the amino acid of SEQ ID NOs: 351 or 355. In some embodiments, the intein-N comprises an amino acid sequence that differs from the amino acid of SEQ ID NOs: 351 or 355 by 1, 2, 3, 4, 5, 6, or 7 amino acids. In some embodiments, the intein-N comprises the amino acid sequence of SEQ ID NOs: 351 or 355. In some embodiments, the intein-N used in accordance with the disclosed constructs comprises a nucleotide sequence that is at least 90%, 95%, 98%, or 99% identical to the nucleotide sequence of SEQ ID NOs: 350 or 354. In some embodiments, the intein-N used in accordance with the disclosed constructs comprises a nucleotide sequence that differs by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 10-15 nucleotides from the nucleotide sequence of SEQ ID NOs: 350 or 354.


In some embodiments, the intein-C comprises an amino acid sequence that is at least 90%, 95%, 98%, or 99% identical to the amino acid of SEQ ID NOs: 353 or 357. In some embodiments, the intein-C comprises an amino acid sequence that differs from the amino acid of SEQ ID NOs: 353 or 357 by 1, 2, 3, 4, or 5 amino acids. In some embodiments, the intein-C comprises the amino acid sequence of SEQ ID NOs: 351 or 355. In some embodiments, the intein-C used in accordance with the disclosed constructs comprises a nucleotide sequence that is at least 90%, 95%, 98%, or 99% identical to the nucleotide sequence of SEQ ID NOs: 352 or 356. In some embodiments, the intein-C used in accordance with the disclosed constructs comprises a nucleotide sequence that differs by 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides from the nucleotide sequence of SEQ ID NOs: 352 or 356.


In particular embodiments, the intein-N comprises the amino acid sequence as set forth in SEQ ID NO: 355. In some embodiments, the intein-C comprises the amino acid sequence as set forth in SEQ ID NO: 357.









DnaE Intein-N DNA:


(SEQ ID NO: 350)


TGCCTGTCATACGAAACCGAGATACTGACAGTAGAATATGGCCTTCTGCC





AATCGGGAAGATTGTGGAGAAACGGATAGAATGCACAGTTTACTCTGTCG





ATAACAATGGTAAATTTATACTCAGCCAGTTGCCCAGTGGCACGACCGGG





GAGAGCAGGAAGTATTCGAATACTGTCTGGAGGATGGAAGTCTCATTAGG





GCCACTAAGGACCACAAATTTATGACAGTCGATGGCCAGATGCTGCCTAT





AGACGAAATCTTTGAGCGAGAGTTGGACCTCATGCGAGTTGACAACCTTC





CTAAT





Npu DnaE N-terminal Protein:


(SEQ ID NO: 351)


CLSYETEILTVEYGLLPIGKIVEKRIECTVYSVDNNGNIYTQPVAQWHDR





GEQEVFEYCLEDGSLIRATKDHKFMTVDGQMLPIDEIFERELDLMRVDNL





PN





DnaE Intein-C DNA:


(SEQ ID NO: 352)


ATGATCAAGATAGCTACAAGGAAGTATCTTGGCAAACAAAACGTTTATGA





TATTGGAGTCGAAAGAGATCACAACTTTGCTCTGAAGAACGGATTCATAG





CTTCTAAT





Npu DnaE C-terminal Protein:


(SEQ ID NO: 353)


MIKIATRKYLGKQNVYDIGVERDHNFALKNGFIASN





Cfa-N DNA:


(SEQ ID NO: 354)


TGCCTGTCTTATGATACCGAGATACTTACCGTTGAATATGGCTTCTTGCC





TATTGGAAAGATTGTCGAAGAGAGAATTGAATGCACAGTATATACTGTAG





ACAAGAATGGTTTCGTTTACACACAGCCCATTGCTCAATGGCACAATCGC





GGCGAACAAGAAGTATTTGAGTACTGTCTCGAGGATGGAAGCATCATACG





AGCAACTAAAGATCATAAATTCATGACCACTGACGGGCAGATGTTGCCAA





TAGATGAGATATTCGAGCGGGGCTTGGATCTCAAACAAGTGGATGGATTG





CCA





Cfa-N Protein:


(SEQ ID NO: 355)


CLSYDTEILTVEYGFLPIGKIVEERIECTVYTVDKNGFVYTQPIAQWHNR





GEQEVFEYCLEDGSIIRATKDHKFMTTDGQMLPIDEIFERGLDLKQVDGL





P





Cfa-C DNA:


(SEQ ID NO: 356)


ATGAAGAGGACTGCCGATGGATCAGAGTTTGAATCTCCCAAGAAGAAGAG





GAAAGTAAAGATAATATCTCGAAAAAGTCTTGGTACCCAAAATGTCTATG





ATATTGGAGTGGAGAAAGATCACAACTTCCTTCTCAAGAACGGTCTCGTA





GCCAGCAAC





Cfa-C Protein:


(SEQ ID NO: 357)


MKRTADGSEFESPKKKRKVKIISRKSLGTQNVYDIGVEKDHNFLLKNGLV





ASN






Intein-N and intein-C may be fused to the N-terminal portion of the split Cas9 and the C-terminal portion of the split Cas9, respectively, for the joining of the N-terminal portion of the split Cas9 and the C-terminal portion of the split Cas9. For example, in some embodiments, an intein-N is fused to the C-terminus of the N-terminal portion of the split Cas9, i.e., to form a structure of N-[N-terminal portion of the split Cas9]-[intein-N]-C. In some embodiments, an intein-C is fused to the N-terminus of the C-terminal portion of the split Cas9, i.e., to form a structure of N-[intein-C]-[C-terminal portion of the split Cas9]-C. The mechanism of intein-mediated protein splicing for joining the proteins the inteins are fused to (e.g., split Cas9) is known in the art, e.g., as described in Shah et al., Chem Sci. 2014; 5(1):446-461, incorporated herein by reference.


The term “mutation,” as used herein, refers to a substitution of a residue within a sequence, e.g. a nucleic acid or amino acid sequence, with another residue; a deletion or insertion of one or more residues within a sequence; or a substitution of a residue within a sequence of a genome in a subject to be corrected. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)). Mutations can include a variety of categories, such as single base polymorphisms, microduplication regions, indel, and inversions, and is not meant to be limiting in any way. Mutations can include “loss-of-function” mutations which are mutations that reduce or abolish a protein activity. Most loss-of-function mutations are recessive, because in a heterozygote the second chromosome copy carries an unmutated version of the gene coding for a fully functional protein whose presence compensates for the effect of the mutation. There are some exceptions where a loss-of-function mutation is dominant, one example being haploinsufficiency, where the organism is unable to tolerate the approximately 50% reduction in protein activity suffered by the heterozygote. This is the explanation for a few genetic diseases in humans, including Marfan syndrome, which results from a mutation in the gene for the connective tissue protein called fibrillin. Mutations also embrace “gain-of-function” mutations, which is one which confers an abnormal activity on a protein or cell that is otherwise not present in a normal condition. Many gain-of-function mutations are in regulatory sequences rather than in coding regions, and can therefore have a number of consequences. Because of their nature, gain-of-function mutations are usually dominant. Many loss-of-function mutations are recessive, such as autosomal recessive.


The term “napDNAbp” which stand for “nucleic acid programmable DNA binding protein” refers to any protein that may associate (e.g., form a complex) with one or more nucleic acid molecules (i.e., which may broadly be referred to as a “napDNAbp-programming nucleic acid molecule” and includes, for example, guide RNA in the case of Cas systems) which direct or otherwise program the protein to localize to a specific target nucleotide sequence (e.g., a gene locus of a genome) that is complementary to the one or more nucleic acid molecules (or a portion or region thereof) associated with the protein, thereby causing the protein to bind to the nucleotide sequence at the specific target site. This term napDNAbp embraces CRISPR-Cas9 proteins, as well as Cas9 equivalents, homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or modified), and may include a Cas9 equivalent from any type of CRISPR system (e.g., type II, V, VI), including Cpf1 (a type-V CRISPR-Cas systems) (now known as Cas12a), C2c1 (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system), C2c3 (a type V CRISPR-Cas system), dCas9, GeoCas9, CjCas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12g, Cas12h, Cas12i, Cas13d, Cas14, Argonaute, and nCas9. Further Cas-equivalents are described in Makarova et al., “C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353 (6299), the contents of which are incorporated herein by reference. However, the nucleic acid programmable DNA binding protein (napDNAbp) that may be used in connection with this invention are not limited to CRISPR-Cas systems. The invention embraces any such programmable protein, such as the Argonaute protein from Natronobacterium gregoryi (NgAgo) which may also be used for DNA-guided genome editing. NgAgo-guide DNA system does not require a PAM sequence or guide RNA molecules, which means genome editing can be performed simply by the expression of generic NgAgo protein and introduction of synthetic oligonucleotides on any genomic sequence. See Gao et al., DNA-guided genome editing using the Natronobacterium gregoryi Argonaute. Nature Biotechnology 2016; 34(7):768-73, which is incorporated herein by reference.


In some embodiments, the napDNAbp is a RNA-programmable nuclease, when in a complex with an RNA, may be referred to as a nuclease:RNA complex. Typically, the bound RNA(s) is referred to as a guide RNA (gRNA). gRNAs can exist as a complex of two or more RNAs, or as a single RNA molecule. gRNAs that exist as a single RNA molecule may be referred to as single-guide RNAs (sgRNAs), though “gRNA” is used interchangeably to refer to guide RNAs that exist as either single molecules or as a complex of two or more molecules. Typically, gRNAs that exist as single RNA species comprise two domains: (1) a domain that shares homology to a target nucleic acid (e.g., and directs binding of a Cas9 (or equivalent) complex to the target); and (2) a domain that binds a Cas9 protein. In some embodiments, domain (2) corresponds to a sequence known as a tracrRNA, and comprises a stem-loop structure. For example, in some embodiments, domain (2) is homologous to a tracrRNA as depicted in FIG. 1E of Jinek et al., Science 337:816-821(2012), the entire contents of which is incorporated herein by reference. Other examples of gRNAs (e.g., those including domain 2) can be found in U.S. Pat. No. 9,340,799, entitled “mRNA-Sensing Switchable gRNAs,” and International Patent Application No. PCT/US2014/054247, filed Sep. 6, 2013, published as WO 2015/035136 and entitled “Delivery System For Functional Nucleases,” the entire contents of each are herein incorporated by reference. In some embodiments, a gRNA comprises two or more of domains (1) and (2), and may be referred to as an “extended gRNA.” For example, an extended gRNA will, e.g., bind two or more Cas9 proteins and bind a target nucleic acid at two or more distinct regions, as described herein. The gRNA comprises a nucleotide sequence that complements a target site, which mediates binding of the nuclease/RNA complex to said target site, providing the sequence specificity of the nuclease:RNA complex. In some embodiments, the RNA-programmable nuclease is the (CRISPR-associated system) Cas9 endonuclease, for example Cas9 (Csn1) from Streptococcus pyogenes (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti J. J. et al., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E. et al., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M. et al., Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference.


The napDNAbp nucleases (e.g., Cas9) use RNA:DNA hybridization to target DNA cleavage sites, these proteins are able to be targeted, in principle, to any sequence specified by the guide RNA. Methods of using napDNAbp nucleases, such as Cas9, for site-specific cleavage (e.g., to modify a genome) are known in the art (see e.g., Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013); Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823-826 (2013); Hwang, W. Y. et al. Efficient genome editing in zebrafish using a CRISPR-Cas system. Nature Biotechnology 31, 227-229 (2013); Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013); Dicarlo, J. E. et al., Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic Acid Res. (2013); Jiang, W. et al. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nature Biotechnology 31, 233-239 (2013); the entire contents of each of which are incorporated herein by reference).


The term “nickase” refers to a napDNAbp (e.g., a Cas9) having only a single nuclease activity that cuts only one strand of a target DNA, rather than both strands. Thus, a nickase type napDNAbp does not leave a double-strand break. Exemplary nickases include SpCas9 and SaCas9 nickases. An exemplary nickase comprises a sequence having at least 99%, or 100%, identity to the amino acid sequence of SEQ ID NO: 3 or 11.


A “uracil glycosylase inhibitor (UGI)” refers to a protein that inhibits the activity of uracil-DNA glycosylase. Suitable UGI proteins for use in accordance with the present disclosure include, for example, those published in Wang et al., J. Biol. Chem. 264:1163-1171(1989); Lundquist et al., J. Biol. Chem. 272:21408-21419(1997); Ravishankar et al., Nucleic Acids Res. 26:4880-4887(1998); and Putnam et al., J. Mol. Biol. 287:331-346 (1999), each of which is incorporated herein by reference. Non-limiting, exemplary proteins that may be used as a UGI of the present disclosure and their respective sequences are provided below. In some embodiments, the UGI is a variant of a naturally-occurring deaminase from an organism, and the variants do not occur in nature. For example, in some embodiments, the UGI is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75% at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring UGI from an organism or any UGIs provided herein (e.g., a UGI comprising the amino acid sequence of any one of SEQ ID NOs: 299-302). In some embodiments, the UGI comprises an amino acid sequence that is shorter or longer in length (e.g., by no more than 30%, no more than 25%, no more than 20%, no more than 15%, no more than 10%, no more than 5%, no more than 1% longer or shorter) than any of the UGIs provided herein. In some embodiments, the UGI comprises an amino acid sequence that is shorter or longer in length (e.g., by no more than 20 amino acids, no more than 15 amino acids, no more than 10 amino acids, no more than 5 amino acids, no more than 2 amino acids longer or shorter) than any of the UGIs provided herein.


A “nuclear localization signal” or “NLS” refers to as an amino acid sequence that “tags” a protein for import into the cell nucleus by nuclear transport. Typically, this signal consists of one or more short sequences of positively charged lysines or arginines exposed on the protein surface. One or more NLS may be added to the N- or C-terminus of a protein, or internally (e.g., between two protein domains). For example, one or more NLS may be added to the N- or C-terminus of a nucleobase editor, or between the Cas9 and the deaminase in a nucleobase editor. In some embodiments, 1, 2, 3, 4, 5, or more NLS may be added. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., PCT/EP2000/011690, filed Nov. 23, 2000, the contents of which are incorporated herein by reference for its disclosure of exemplary nuclear localization sequences. In some embodiments, a NLS comprises a bipartite nuclear localization signal comprising an amino acid sequence selected from the group consisting of KRTADGSEFEPKKKRKV (SEQ ID NO: 398), KRPAATKKAGQAKKKK (SEQ ID NO: 344), KKTELQTTNAENKTKKL (SEQ ID NO: 345), KRGINDRNFWRGENGRKTR(SEQ ID NO: 346), RKSGKIAAIVVKRPRK (SEQ ID NO: 347), PKKKRKV (SEQ ID NO: 373) or MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 374). In some embodiments, a linker is inserted between the Cas9 and the deaminase. In certain embodiments, the NLS comprises the amino acid sequence of SEQ ID NO: 398. In some embodiments, the NLS comprises the amino acid sequence of SEQ ID NO: 344.


An NLS can be classified as monopartite or bipartite. A non-limiting example of a monopartite NLS is the sequence PKKKRKV (SEQ ID NO: 373) in the SV40 Large T-antigen. A “bipartite” NLS typically contains two clusters of basic amino acids, separated by a spacer of about 10 amino acids. One non-limiting example of a bipartite NLS is the NLS of nucleoplasmin, KRPAATKKAGQAKKKK (spacer underlined) (SEQ ID NO: 344). In some embodiments, the NLS used in accordance with the present disclosure is the NLS of nucleoplasmin comprising the amino acid sequence of KRPAATKKAGQAKKKK (SEQ ID NO: 344). Other bipartite NLSs that may be used in accordance with the present disclosure include, without limitation: SV40 bipartite NLS (KRTADGSEFESPKKKRKV (SEQ ID NO: 375), e.g., as described in Hodel et al., J Biol Chem. 2001 Jan. 12; 276(2):1317-25, incorporated herein by reference); Kanadaptin bipartite NLS (KKTELQTTNAENKTKKL (SEQ ID NO: 345), e.g., as described in Hubner et al., Biochem J. 2002 Jan. 15; 361 (Pt 2):287-96, incorporated herein by reference); influenza A nucleoprotein bipartite NLS (KRGINDRNFWRGENGRKTR (SEQ ID NO: 346), e.g., as described in Ketha et al., BMC Cell Biology. 2008; 9:22, incorporated herein by reference); and ZO-2 bipartite NLS (RKSGKIAAIVVKRPRK (SEQ ID NO: 347), e.g., as described in Quiros et al., Nusrat A, ed. Molecular Biology of the Cell. 2013; 24(16):2528-2543, incorporated herein by reference).


The nucleotide sequence encoding an NLS is “operably linked” to the nucleotide sequence encoding a protein to which the NLS is fused (e.g., a Cas9 or a nucleobase editor) when two coding sequences are “in-frame with each other” and are translated as a single polypeptide fusing two sequences.


Nucleic acids of the present disclosure may include one or more genetic elements. A “genetic element” refers to a particular nucleotide sequence that has a role in nucleic acid expression (e.g., promoter, enhancer, terminator) or encodes a discrete product of an engineered nucleic acid (e.g., a nucleotide sequence encoding a guide RNA, a protein and/or an RNA interference molecule).


A “promoter” refers to a control region of a nucleic acid sequence at which initiation and rate of transcription of the remainder of a nucleic acid sequence are controlled. A promoter may also contain sub-regions at which regulatory proteins and molecules may bind, such as RNA polymerase and other transcription factors. Promoters may be constitutive, inducible, activatable, repressible, tissue-specific, or any combination thereof. A promoter drives expression or drives transcription of the nucleic acid sequence that it regulates. Herein, a promoter is considered to be “operably linked” when it is in a correct functional location and orientation in relation to a nucleic acid sequence it regulates to control (“drive”) transcriptional initiation and/or expression of that sequence.


A promoter may be one naturally associated with a gene or sequence, as may be obtained by isolating the 5′ non-coding sequences located upstream of the coding segment of a given gene or sequence. Such a promoter is referred to as an “endogenous promoter.” In some embodiments, a coding nucleic acid sequence may be positioned under the control of a recombinant or heterologous promoter, which refers to a promoter that is not normally associated with the encoded sequence in its natural environment. Such promoters may include promoters of other genes; promoters isolated from any other cell; and synthetic promoters or enhancers that are not “naturally occurring” such as, for example, those that contain different elements of different transcriptional regulatory regions and/or mutations that alter expression through methods of genetic engineering that are known in the art. In addition to producing nucleic acid sequences of promoters and enhancers synthetically, sequences may be produced using recombinant cloning and/or nucleic acid amplification technology, including polymerase chain reaction (PCR).


In some embodiments, promoters used in accordance with the present disclosure are “inducible promoters,” which are promoters that are characterized by regulating (e.g., initiating or activating) transcriptional activity when in the presence of, influenced by or contacted by an inducer signal. An inducer signal may be endogenous or a normally exogenous condition (e.g., light), compound (e.g., chemical or non-chemical compound) or protein that contacts an inducible promoter in such a way as to be active in regulating transcriptional activity from the inducible promoter. Thus, a “signal that regulates transcription” of a nucleic acid refers to an inducer signal that acts on an inducible promoter. A signal that regulates transcription may activate or inactivate transcription, depending on the regulatory system used. Activation of transcription may involve directly acting on a promoter to drive transcription or indirectly acting on a promoter by inactivation a repressor that is preventing the promoter from driving transcription. Conversely, deactivation of transcription may involve directly acting on a promoter to prevent transcription or indirectly acting on a promoter by activating a repressor that then acts on the promoter.


In genetics, a “sense” strand is the segment within double-stranded DNA that runs from 5′ to 3′, and which is complementary to the antisense strand of DNA, or template strand, which runs from 3′ to 5′. In the case of a DNA segment that encodes a protein, the sense strand is the strand of DNA that has the same sequence as the mRNA, which takes the antisense strand as its template during transcription, and eventually undergoes (typically, not always) translation into a protein. The antisense strand is thus responsible for the RNA that is later translated to protein, while the sense strand possesses a nearly identical makeup to that of the mRNA. Note that for each segment of dsDNA, there will possibly be two sets of sense and antisense, depending on which direction one reads (since sense and antisense is relative to perspective). It is ultimately the gene product, or mRNA, that dictates which strand of one segment of dsDNA is referred to as sense or antisense.


The term “subject,” as used herein, refers to an individual organism, for example, an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, a goat, a cattle, a cat, or a dog. In some embodiments, the subject is a vertebrate, an amphibian, a reptile, a fish, an insect, a fly, or a nematode. In some embodiments, the subject is a research animal. In some embodiments, the subject is genetically engineered, e.g., a genetically engineered non-human subject. The subject may be of either sex and at any stage of development.


A subject in need thereof” refers to an individual who has a disease, a sign and/or symptom of a disease, or a predisposition toward a disease, with the purpose to cure, heal, alleviate, relieve, alter, remedy, ameliorate, improve, or affect the disease, the symptom of the disease, or the predisposition toward the disease. In some embodiments, the subject is a mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is human. In some embodiments, the mammal is a rodent. In some embodiments, the rodent is a mouse. In some embodiments, the rodent is a rat. In some embodiments, the mammal is a companion animal. A “companion animal” refers to pets and other domestic animals. Non-limiting examples of companion animals include dogs and cats; livestock, such as horses, cattle, pigs, sheep, goats, and chickens; and other animals, such as mice, rats, guinea pigs, and hamsters.


The term “target site” refers to a sequence within a nucleic acid molecule that is edited by a base editor (BE) or nucleobase editor disclosed herein. The term “target site,” in the context of a single strand, also can refer to the “target strand” which anneals or binds to the spacer sequence of the guide RNA. The target site can refer, in certain embodiments, to a segment of double-stranded DNA that includes the protospacer (i.e., the strand of the target site that has the same nucleotide sequence as the spacer sequence of the guide RNA) on the PAM-strand (or non-target strand) and target strand, which is complementary to the protospacer and the spacer alike, and which anneals to the spacer of the guide RNA, thereby targeting or programming a Cas9 nucleobase editor to target the target site.


A “transcriptional terminator” is a nucleic acid sequence that causes transcription to stop. A transcriptional terminator may be unidirectional or bidirectional. It is comprised of a DNA sequence involved in specific termination of an RNA transcript by an RNA polymerase. A transcriptional terminator sequence prevents transcriptional activation of downstream nucleic acid sequences by upstream promoters. A transcriptional terminator may be necessary in vivo to achieve desirable expression levels or to avoid transcription of certain sequences. A transcriptional terminator is considered to be “operably linked to” a nucleotide sequence when it is able to terminate the transcription of the sequence it is linked to.


The most commonly used type of terminator is a forward terminator. When placed downstream of a nucleic acid sequence that is usually transcribed, a forward transcriptional terminator will cause transcription to abort. In some embodiments, bidirectional transcriptional terminators are provided, which usually cause transcription to terminate on both the forward and reverse strand. In some embodiments, reverse transcriptional terminators are provided, which usually terminate transcription on the reverse strand only.


In prokaryotic systems, terminators usually fall into two categories (1) rho-independent terminators and (2) rho-dependent terminators. Rho-independent terminators are generally composed of palindromic sequence that forms a stem loop rich in G-C base pairs followed by several T bases. Without wishing to be bound by theory, the conventional model of transcriptional termination is that the stem loop causes RNA polymerase to pause, and transcription of the poly-A tail causes the RNA:DNA duplex to unwind and dissociate from RNA polymerase.


In eukaryotic systems, the terminator region may comprise specific DNA sequences that permit site-specific cleavage of the new transcript so as to expose a polyadenylation site. This signals a specialized endogenous polymerase to add a stretch of about 200 A residues (polyA) to the 3′ end of the transcript. RNA molecules modified with this polyA tail appear to more stable and are translated more efficiently. Thus, in some embodiments involving eukaryotes, a terminator may comprise a signal for the cleavage of the RNA. In some embodiments, the terminator signal promotes polyadenylation of the message. The terminator and/or polyadenylation site elements may serve to enhance output nucleic acid levels and/or to minimize read through between nucleic acids.


Terminators for use in accordance with the present disclosure include any terminator of transcription described herein or known to one of ordinary skill in the art. Examples of terminators include, without limitation, the termination sequences of genes such as, for example, the bovine growth hormone terminator, and viral termination sequences such as, for example, the SV40 terminator, spy, yejM, secG-leuU, thrLABC, rrnB T1, hisLGDCBHAFI, metZWV, rrnC, xapR, aspA and arcA terminator. In some embodiments, the termination signal may be a sequence that cannot be transcribed or translated, such as those resulting from a sequence truncation.


A “Woodchuck Hepatitis Virus (WHP) Posttranscriptional Regulatory Element (WPRE)” is a DNA sequence that, when transcribed creates a tertiary structure enhancing expression. Commonly used in molecular biology to increase expression of genes delivered by viral vectors. WPRE is a tripartite regulatory element with gamma, alpha, and beta components.


The full WPRE sequence is 609 bp long:









(SEQ ID NO: 376)


GCTTATCGATAATCAACCTCTGGATTACAAAATTTGTGAAAGATTGACTG





GTATTCTTAACTATGTTGCTCCTTTTACGCTATGTGGATACGCTGCTTTA





ATGCCTTTGTATCATGCTATTGCTTCCCGTATGGCTTTCATTTTCTCCTC





CTTGTATAAATCCTGGTTGCTGTCTCTTTATGAGGAGTTGTGGCCCGTTG





TCAGGCAACGTGGCGTGGTGTGCACTGTGTTTGCTGACGCAACCCCCACT





GGTTGGGGCATTGCCACCACCTGTCAGCTCCTTTCCGGGACTTTCGCTTT





CCCCCTCCCTATTGCCACGGCGGAACTCATCGCCGCCTGCCTTGCCCGCT





GCTGGACAGGGGCTCGGCTGTTGGGCACTGACAATTCCGTGGTGTTGTCG





GGGAAATCATCGTCCTTTCCTTGGCTGCTCGCCTATGTTGCCACCTGGAT





TCTGCGCGGGACGTCCTTCTGCTACGTCCCTTCGGCCCTCAATCCAGCGG





ACCTTCCTTCCCGCGGCCTGCTGCCGGCTCTGCGGCCTCTTCCGCGTCTT





CGCCTTCGCCCTCAGACGAGTCGGATCTCCCTTTGGGCCGCCTCCCCGCA





TCGATACCG.






The terms “nucleic acid,” and “polynucleotide,” as used herein, refer to a compound comprising a nucleobase and an acidic moiety, e.g., a nucleotide, or a polymer of nucleotides. Typically, polymeric nucleic acids, e.g., nucleic acid molecules comprising three or more nucleotides are linear molecules, in which adjacent nucleotides are linked to each other via a phosphodiester linkage. In some embodiments, “nucleic acid” refers to individual nucleic acid residues (e.g. nucleotides and/or nucleosides). In some embodiments, “nucleic acid” refers to an oligonucleotide chain comprising three or more individual nucleotide residues. As used herein, the terms “oligonucleotide” and “polynucleotide” can be used interchangeably to refer to a polymer of nucleotides (e.g., a string of at least three nucleotides). In some embodiments, “nucleic acid” encompasses RNA as well as single and/or double-stranded DNA. Nucleic acids may be naturally occurring, for example, in the context of a genome, a transcript, an mRNA, tRNA, rRNA, siRNA, snRNA, a plasmid, cosmid, chromosome, chromatid, or other naturally occurring nucleic acid molecule. On the other hand, a nucleic acid molecule may be a non-naturally occurring molecule, e.g., a recombinant DNA or RNA, an artificial chromosome, an engineered genome (e.g., an engineered viral vector), an engineered vector, or fragment thereof, or a synthetic DNA, RNA, or DNA/RNA hybrid, optionally including non-naturally occurring nucleotides or nucleosides. Furthermore, the terms “nucleic acid,” “DNA,” “RNA,” and/or similar terms include nucleic acid analogs, e.g., analogs having other than a phosphodiester backbone. Nucleic acids can be purified from natural sources, produced using recombinant expression systems and optionally purified, chemically synthesized, etc. Where appropriate, e.g., in the case of chemically synthesized molecules, nucleic acids can comprise nucleoside analogs such as analogs having chemically modified bases or sugars, and backbone modifications. A nucleic acid sequence is presented in the 5′ to 3′ direction unless otherwise indicated. In some embodiments, a nucleic acid is or comprises natural nucleosides (e.g. adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine); nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, 2-aminoadenosine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 2-aminoadenosine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O(6)-methylguanine, and 2-thiocytidine); chemically modified bases; biologically modified bases (e.g., methylated bases); intercalated bases; modified sugars (e.g., 2′-fluororibose, ribose, 2′-deoxyribose, arabinose, and hexose); and/or modified phosphate groups (e.g., phosphorothioates and 5′-N-phosphoramidite linkages).


The terms “protein,” “peptide,” and “polypeptide” are used interchangeably herein, and refer to a polymer of amino acid residues linked together by peptide (amide) bonds. The terms refer to a protein, peptide, or polypeptide of any size, structure, or function. Typically, a protein, peptide, or polypeptide will be at least three amino acids long. A protein, peptide, or polypeptide may refer to an individual protein or a collection of proteins. One or more of the amino acids in a protein, peptide, or polypeptide may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a farnesyl group, an isofarnesyl group, a fatty acid group, a linker for conjugation, functionalization, or other modification, etc. A protein, peptide, or polypeptide may also be a single molecule or may be a multi-molecular complex. A protein, peptide, or polypeptide may be just a fragment of a naturally occurring protein or peptide. A protein, peptide, or polypeptide may be naturally occurring, recombinant, or synthetic, or any combination thereof. The term “fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an “amino-terminal fusion protein” or a “carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic-acid editing protein. In some embodiments, a protein is in a complex with, or is in association with, a nucleic acid, e.g., RNA or DNA. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), which are incorporated herein by reference.


The term “subject,” as used herein, refers to an individual organism, for example, an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent (e.g., mouse, rat). In some embodiments, the subject is a domesticated animal. In some embodiments, the subject is a sheep, a goat, a cow, a cat, or a dog. In some embodiments, the subject is a research animal. In some embodiments, the subject is genetically engineered, e.g., a genetically engineered non-human subject. The subject may be of either sex and at any stage of development.


The term “recombinant” as used herein in the context of proteins or nucleic acids refers to proteins or nucleic acids that do not occur in nature, but are the product of human engineering. For example, in some embodiments, a recombinant protein or nucleic acid molecule comprises an amino acid or nucleotide sequence that comprises at least one, at least two, at least three, at least four, at least five, at least six, or at least seven mutations as compared to any naturally occurring sequence. The fusion proteins (e.g., nucleobase editors) described herein are made by recombinant technology. Recombinant technology is familiar to those skilled in the art.


The term “pharmaceutically-acceptable carrier” means a pharmaceutically-acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., organ, tissue or portion of the body). A pharmaceutically acceptable carrier is “acceptable” in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiologic pH, etc.).


“A therapeutically effective amount” as used herein refers to the amount of each therapeutic agent (e.g., nucleobase editor, rAAV) described in the present disclosure required to confer therapeutic effect on the subject, either alone or in combination with one or more other therapeutic agents. Effective amounts vary, as recognized by those skilled in the art, depending on the particular condition being treated, the severity of the condition, the individual subject parameters including age, physical condition, size, gender, and weight, the duration of the treatment, the nature of concurrent therapy (if any), the specific route of administration and like factors within the knowledge and expertise of the health practitioner. These factors are well known to those of ordinary skill in the art and can be addressed with no more than routine experimentation. It is generally preferred that a maximum dose of the individual components or combinations thereof be used, that is, the highest safe dose according to sound medical judgment. It will be understood by those of ordinary skill in the art, however, that a subject may insist upon a lower dose or tolerable dose for medical reasons, psychological reasons or for virtually any other reasons. Empirical considerations, such as the half-life, generally will contribute to the determination of the dosage. For example, therapeutic agents that are compatible with the human immune system, such as polypeptides comprising regions from humanized antibodies or fully human antibodies, may be used to prolong half-life of the polypeptide and to prevent the polypeptide being attacked by the host's immune system.


The terms “treatment,” “treat,” and “treating,” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. As used herein, the terms “treatment,” “treat,” and “treating” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. In some embodiments, treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed. In other embodiments, treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease. For example, treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their recurrence.


As used herein, the term “variant” refers to a protein having characteristics that deviate from what occurs in nature that retains at least one functional i.e. binding, interaction, or enzymatic ability and/or therapeutic property thereof. A “variant” is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the wild type protein. For instance, a variant of Cas9 may comprise a Cas9 that has one or more changes in amino acid residues as compared to a wild type Cas9 amino acid sequence. As another example, a variant of a deaminase may comprise a deaminase that has one or more changes in amino acid residues as compared to a wild type deaminase amino acid sequence, e.g. following ancestral sequence reconstruction of the deaminase. These changes include chemical modifications, including substitutions of different amino acid residues truncations, covalent additions (e.g. of a tag), and any other mutations. The term also encompasses circular permutants, mutants, truncations, or domains of a reference sequence, and which display the same or substantially the same functional activity or activities as the reference sequence. This term also embraces fragments of a wild type protein.


The level or degree of which the property is retained may be reduced relative to the wild type protein but is typically the same or similar in kind. Generally, variants are overall very similar, and in many regions, identical to the amino acid sequence of the protein described herein. A skilled artisan will appreciate how to make and use variants that maintain all, or at least some, of a functional ability or property.


The variant proteins may comprise, or alternatively consist of, an amino acid sequence which is at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%, identical to, for example, the amino acid sequence of a wild-type protein, or any protein provided herein.


By a polypeptide having an amino acid sequence at least, for example, 95% “identical” to a query amino acid sequence, it is intended that the amino acid sequence of the subject polypeptide is identical to the query sequence except that the subject polypeptide sequence may include up to five amino acid alterations per each 100 amino acids of the query amino acid sequence. In other words, to obtain a polypeptide having an amino acid sequence at least 95% identical to a query amino acid sequence, up to 5% of the amino acid residues in the subject sequence may be inserted, deleted, or substituted with another amino acid. These alterations of the reference sequence may occur at the amino- or carboxy-terminal positions of the reference amino acid sequence or anywhere between those terminal positions, interspersed either individually among residues in the reference sequence or in one or more contiguous groups within the reference sequence.


As a practical matter, whether any particular polypeptide is at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to, for instance, the amino acid sequence of a protein such as a Niemann-Pick C1 (NPC1) protein, can be determined conventionally using known computer programs. A preferred method for determining the best overall match between a query sequence (a sequence of the present invention) and a subject sequence, also referred to as a global sequence alignment, can be determined using the FASTDB computer program based on the algorithm of Brutlag et al. (Comp. App. Biosci. 6:237-245 (1990)). In a sequence alignment the query and subject sequences are either both nucleotide sequences or both amino acid sequences. The result of said global sequence alignment is expressed as percent identity. Preferred parameters used in a FASTDB amino acid alignment are: Matrix=PAM 0, k-tuple=2, Mismatch Penalty=1, Joining Penalty=20, Randomization Group Length=0, Cutoff Score=1, Window Size=sequence length, Gap Penalty=5, Gap Size Penalty=0.05, Window Size=500 or the length of the subject amino acid sequence, whichever is shorter.


If the subject sequence is shorter than the query sequence due to N- or C-terminal deletions, not because of internal deletions, a manual correction must be made to the results. This is because the FASTDB program does not account for N- and C-terminal truncations of the subject sequence when calculating global percent identity. For subject sequences truncated at the N- and C-termini, relative to the query sequence, the percent identity is corrected by calculating the number of residues of the query sequence that are N- and C-terminal of the subject sequence, which are not matched/aligned with a corresponding subject residue, as a percent of the total bases of the query sequence. Whether a residue is matched/aligned is determined by results of the FASTDB sequence alignment. This percentage is then subtracted from the percent identity, calculated by the above FASTDB program using the specified parameters, to arrive at a final percent identity score. This final percent identity score is what is used for the purposes of the present invention. Only residues to the N- and C-termini of the subject sequence, which are not matched/aligned with the query sequence, are considered for the purposes of manually adjusting the percent identity score. That is, only query residue positions outside the farthest N- and C-terminal residues of the subject sequence.


The term “vector,” as used herein, refers to a nucleic acid that can be modified to encode a gene of interest and that is able to enter into a host cell and replicate within the host cell, and then transfer a replicated form of the vector into another host cell. Exemplary suitable vectors include viral vectors, such as AAV vectors or bacteriophages and filamentous phage, and conjugative plasmids. Additional suitable vectors will be apparent to those of skill in the art based on the instant disclosure.


As used herein the term “wild type” is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms.


DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

Provided herein are nucleic acid molecules (e.g., vector genomes), compositions (containing, e.g., vectors, recombinant viruses), rAAV particles, and kits comprising nucleic acids encoding split napDNAbp domains (e.g., Cas9 proteins) or nucleobase editors, and methods of delivering a nucleobase editor or a napDNAbp domain into a cell using such nucleic acids. The N-terminal portion and C-terminal portion of a nucleobase editor or a napDNAbp domain are encoded on separate nucleic acids and delivered into a cell, e.g., a via recombinant adeno-associated virus (rAAV particles) delivery. In particular embodiments, the N-terminal portion of a nucleobase editor is fused to a first intein, and the C-terminal portion of a nucleobase editor is fused to an intein. The N-terminal and C-terminal portions may each be encoded on separate nucleic acids and delivered into a cell, e.g., a via rAAV particle delivery. The polypeptides corresponding to the N-terminal portion and C-terminal portion of the base editor (or nucleobase editor) may be joined to form a complete nucleobase editor or Cas9 protein, e.g., via intein-mediated protein splicing.


To overcome the packaging size limit and deliver base editors using AAVs, a split-base editor dual AAV strategy was devised, in which the CBE or ABE is divided into an N-terminal portion (or “half”) and a C-terminal half. Each base editor half is fused to half of a fast-splicing split-intein. Following co-infection by AAV particles expressing each base editor-split intein half, protein splicing in trans reconstitutes the full-length base editor. Unlike other approaches utilizing small molecules or sgRNA to bridge split Cas9, intein splicing removes all exogenous sequences and regenerates a native peptide bond at the split site, resulting in a single reconstituted protein (e.g., a protein that is identical in sequence to the unmodified nucleobase editor).


Split-intein CBEs and split-intein ABEs are disclosed that are integrated into dual AAV genomes to enable efficient base editing in somatic tissues of therapeutic relevance, including liver, heart, muscle, retina, and brain. The resulting AAVs were used to achieve base editing efficiencies at test loci for both CBEs and ABEs that, in each of these tissues, meets or exceeds therapeutically relevant editing thresholds for the treatment of human genetic diseases at AAV dosages that are known to be well-tolerated in humans. In particular, the disclosed AAV-nucleobase editor vectors achieved editing efficiencies of 59% editing (A.T-to-G.C) among unsorted cells in the cortex, and 48-50% editing (C.G-to-T.A) in photoreceptor cells and mouse embryonic fibroblasts (MEFs). The highest in vivo genome editing efficiencies were observed following injection of ˜1013-1014 vector genomes per kilogram weight of subject (vgs/kg), which is a dosage comparable to those currently used in human gene therapy trials. Accordingly, the invention provides split napDNAbp domains (e.g., Cas9 proteins), split nucleobase editors, and nucleic acids and vectors encoding same; as well as cells, compositions, methods, kits, and systems that utilize the disclosed split napDNAbp domains, split nucleobase editors, and vectors.


Aspects of the present disclosure relate to nucleic acid molecules encoding a N-terminal portion of a base editor or nucleobase editor fused at its C-terminus to a first intein sequence, wherein the nucleic acid molecule is operably linked to a first promoter, further comprising a nucleic acid segment encoding a guide RNA (gRNA) operably linked to a second promoter, wherein the direction of transcription of the nucleic acid segment is reversed relative to the direction of transcription of the nucleic acid molecule. These nucleic acid molecules may be comprised within a viral genome, such as an rAAV genome or rAAV vector.


Further provided are nucleic acid molecules encoding a C-terminal portion of a nucleobase editor fused at its N-terminus to an intein sequence, wherein the nucleic acid molecule is operably linked to a first promoter, and further comprising a nucleic acid segment encoding a guide RNA (gRNA) operably linked to a second promoter, wherein the direction of transcription of the nucleic acid segment is reversed relative to the direction of transcription of the nucleic acid molecule. In some embodiments, the first promoter of the nucleic acid molecule encoding the N-terminal portion of the nucleobase editor and the first promoter of the nucleic acid molecule encoding the C-terminal portion of the nucleobase editor comprise the same promoter (i.e., are the same). In other embodiments, these first promoters are different. In some embodiments, the second promoter of the nucleic acid molecule encoding the N-terminal portion of the nucleobase editor and the second promoter of the nucleic acid molecule encoding the C-terminal portion of the nucleobase editor are the same. In other embodiments, these second promoters are different.


Some aspects of the present disclosure relate to compositions comprising (i) a first nucleotide sequence encoding an N-terminal portion of a Cas9 protein fused at its C-terminus to an intein-N; and (ii) a second nucleotide sequence encoding an intein-C fused to the N-terminus of a C-terminal portion of the Cas9 protein, wherein at least one of the first nucleotide sequence and second nucleotide sequence comprises at its 3′ end a gRNA nucleic acid segment encoding a guide RNA (gRNA) operably linked to a second promoter, and wherein the direction of transcription of the gRNA nucleic acid segment is reversed relative to the direction of transcription of the at least one nucleotide sequence. In some embodiments, the first nucleotide sequence and/or second nucleotide sequence is operably linked to a nucleotide sequence encoding at least one bipartite nuclear localization signal (NLS).


Additional aspects of the present disclosure relate to methods of editing using the split nucleobase editors and/or the split Cas9 proteins disclosed herein. In particular embodiments, provided herein are methods of base editing at therapeutically-relevant efficiencies in vivo, such as in murine retina. The methods disclosed herein improve the rate and throughput with which promising base editor targets can be identified in cultured cells and in vivo.


This disclosure describes methods of base editing that may be used for targeted editing of DNA in vitro, e.g., for the generation of mutant cells or animals; for the introduction of targeted mutations, e.g., for the correction of genetic defects in cells ex vivo, e.g., in cells obtained from a subject that are subsequently re-introduced into the same or another subject; and for the introduction of targeted mutations in vivo, e.g., the correction of genetic defects or the introduction of deactivating mutations in disease-associated genes in a subject. As an example, diseases and conditions can be treated by making an A to G, or a C to T mutation, may be treated using the base editors provided herein. The base editors described herein may be utilized for the targeted editing of C to T and G to A mutations so as to correct a mutation or restore a normal reading frame in an gene to generate a functional protein. In certain embodiments, the subject has been diagnosed with a disease, disorder, or condition, such as, but not limited to, a disease, disorder, or condition associated with a point mutation in the Tmc1 gene or the NPC1 gene. The methods described herein involving contacting a base editor with a target nucleotide sequence in the genome of an organism, e.g., a human.


In certain embodiments, the methods described above result in cutting (or nicking) one strand of the double-stranded DNA, for example, the strand that includes the thymine (T) of a target A:T nucleobase pair opposite the strand containing the target adenine (A) that is being deaminated. This nicking result serves to direct mismatch repair machinery to the non-edited strand, ensuring that the chemically modified nucleobase is not interpreted as a lesion by the machinery. This nick may be created by the use of an nCas9.


Still further, the present disclosure provides for methods of making the disclosed split nucleobase editors, as well as methods of using the split nucleobase editors or nucleic acid molecules encoding the nucleobase editors in applications including editing a nucleic acid molecule, e.g., a genome. Such methods involve transducing (e.g., via transfection) cells with a plurality of complexes each comprising a portion of a split nucleobase editor (e.g., a nucleobase editor comprising a napDNAbp (e.g., nCas9) domain and a deaminase domain) and/or a gRNA molecule. In some embodiments, the nucleic acid constructs encoding the N-terminal and C-terminal portions of the split nucleobase editor are transfected separately from one another. In certain embodiments, the methods involve the transfection of nucleic acid constructs (e.g., plasmids) that each (or together) encode the components of a complex of split nucleobase editor and a gRNA molecule.


In certain embodiments of the disclosed methods of making the disclosed split nucleobase editors, one or more nucleic acid constructs that encode the split nucleobase editor is transfected into the cell separately from the plasmid that encodes the gRNA molecule. In certain embodiments, these components are encoded on a single construct and transfected together. In other embodiments, the methods disclosed herein involve the introduction into cells of one or more nucleic acid vectors encoding a a split nucleobase editor and gRNA molecule that has been expressed and cloned outside of these cells. In some embodiments, these vectors are delivered as part of an rAAV vector.


It should be appreciated that any nucleobase editor, e.g., any of the nucleobase editors provided herein, may be introduced into the cell in any suitable way, either stably or transiently. In some embodiments, a nucleobase editor may be transfected into the cell. In some embodiments, the cell may be transduced or transfected with a nucleic acid construct that encodes a nucleobase editor. For example, a cell may be transduced (e.g., with a virus encoding a nucleobase editor), or transfected (e.g., with a plasmid encoding a nucleobase editor) with a nucleic acid that encodes a nucleobase editor, or the translated nucleobase editor. Such transduction may be a stable or transient transduction. In some embodiments, cells expressing a nucleobase editor or containing a nucleobase editor may be transduced or transfected with one or more gRNA molecules, for example, when the nucleobase editor comprises a Cas9 (e.g., nCas9) domain. In some embodiments, a plasmid expressing one or more portions of a nucleobase editor may be introduced into cells through electroporation, transient (e.g., lipofection) and stable genome integration (e.g., nucleofection and piggybac), viral transduction, or other methods known to those of skill in the art. In particular embodiments, plasmids expressing one or more portions of any of the disclosed nucleobase editors may be delivered to cells through nucleofection.


In some aspects, the disclosed split nucleobase editors are delivered to the cell (or the subject) by use of recombinant AAV (rAAV) particles. In some embodiments, any of the disclosed split nucleobase editors is fused to split intein pairs that are packaged into two separate rAAV particles that, when co-delivered to a cell, reconstitute the functional editor protein. Several other considerations to account for the unique features of base editing are described, including the optimization of second-site nicking targets and properly packaging nucleobase editors into virus vectors, including lentiviruses and rAAV. Accordingly, the disclosure provides dual rAAV vectors and dual rAAV vector particles that comprise expression constructs that encode two portions (or “two halves”) of any of the disclosed nucleobase editors, wherein the encoded nucleobase editor is divided between the two halves at a split site. In some embodiments, the disclosed rAAV vectors encoding the split nucleobase editors may comprise a nucleotide sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of the sequences depicted in FIGS. 26A-26U.


Accordingly, the present disclosure provides compositions comprising: (i) a first recombinant adeno associated virus (rAAV) particle comprising a first nucleotide sequence encoding a N-terminal portion of a Cas9 protein fused at its C-terminus to an intein-N; and (ii) a second recombinant adeno associated virus (rAAV) particle comprising a second nucleotide sequence encoding an intein-C fused to the N-terminus of a C-terminal portion of the Cas9 protein. In some embodiments, at least one of the first nucleotide sequence and second nucleotide sequence comprises at its 3′ end a gRNA nucleic acid segment encoding a guide RNA (gRNA) operably linked to a second promoter, and wherein the direction of transcription of the gRNA nucleic acid segment is reversed relative to the direction of transcription of the at least one nucleotide sequence.


In some aspects, the specification discloses a pharmaceutical composition comprising any one of the presently disclosed complexes of nucleobase editors and gRNA. In other aspects, the present disclosure discloses a pharmaceutical composition comprising one or more polynucleotides encoding the nucleobase editors disclosed herein and one or more polynucleotides encoding a gRNA, or polynucleotides encoding both. The one or more polynucleotides encoding the nucleobase editors and one or moe polynucleotides encoding a gRNA may be provided on the same vector, or different vectors (e.g., different rAAV vectors).


napDNAbp Domains


In some aspects, the base editing methods and nucleobase editors described herein involve a nucleic acid programmable DNA binding protein (napDNAbp). Each napDNAbp is associated with at least one guide nucleic acid (e.g., guide RNA), which localizes the napDNAbp to a DNA sequence that comprises a DNA strand (i.e., a target strand) that is complementary to the guide nucleic acid, or a portion thereof (e.g., the protospacer of a guide RNA). In other words, the guide nucleic-acid “programs” the napDNAbp (e.g., Cas9 or equivalent) to localize and bind to a complementary sequence. In various embodiments, the napDNAbp can be fused to a disclosed herein adenosine deaminase or a herein disclosed cytosine deaminase. In other aspects, the napDNAbp can be fused to a non-deaminase nucleobase modifying enzyme (or nucleobase modification domain) disclosed herein.


Without being bound by theory, the binding mechanism of a napDNAbp—guide RNA complex, in general, includes the step of forming an R-loop whereby the napDNAbp induces the unwinding of a double-strand DNA target, thereby separating the strands in the region bound by the napDNAbp. The guide RNA spacer then hybridizes to the “target strand.” This displaces a “non-target strand” that is complementary to the target strand, which forms the single strand region of the R-loop. In some embodiments, the napDNAbp includes one or more nuclease activities, which then cut the DNA leaving various types of lesions. For example, the napDNAbp may comprises a nuclease activity that cuts the non-target strand at a first location, and/or cuts the target strand at a second location. Depending on the nuclease activity, the target DNA can be cut to form a “double-stranded break” whereby both strands are cut. In other embodiments, the target DNA can be cut at only a single site, i.e., the DNA is “nicked” on one strand. Exemplary napDNAbp with different nuclease activities include “Cas9 nickase” (“nCas9”) and a deactivated Cas9 having no nuclease activities (“dead Cas9” or “dCas9”).


The below description of various napDNAbps which can be used in connection with the presently disclose nucleobase editors is not meant to be limiting in any way. The nucleobase editors may comprise the canonical SpCas9, or any ortholog Cas9 protein, or any variant Cas9 protein—including any naturally occurring variant, mutant, or otherwise engineered version of Cas9—that is known or which can be made or evolved through a directed evolutionary or otherwise mutagenic process. In various embodiments, the Cas9 or Cas9 variants have a nickase activity, i.e., only cleave of strand of the target DNA sequence. In other embodiments, the Cas9 or Cas9 variants have inactive nucleases, i.e., are “dead” Cas9 proteins. Other variant Cas9 proteins that may be used are those having a smaller molecular weight than the canonical SpCas9 (e.g., for easier delivery) or having modified or rearranged primary amino acid structure (e.g., the circular permutant formats). The nucleobase editors described herein may also comprise Cas9 equivalents, including Cas12a/Cpf1 and Cas12b proteins which are the result of convergent evolution. The napDNAbps used herein (e.g., SpCas9, Cas9 variant, or Cas9 equivalents) may also may also contain various modifications that alter/enhance their PAM specificities. Lastly, the application contemplates any Cas9, Cas9 variant, or Cas9 equivalent which has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence, such as a references SpCas9 canonical sequence or a reference Cas9 equivalent (e.g., Cas12a/Cpf1).


The napDNAbp can be a CRISPR (clustered regularly interspaced short palindromic repeat)-associated nuclease. As outlined above, CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gRNA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M. et al., Science 337:816-821(2012), the entire contents of which is hereby incorporated by reference.


In some embodiments, the napDNAbp directs cleavage of one or both strands at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the napDNAbp directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence. In some embodiments, a vector encodes a napDNAbp that is mutated to with respect to a corresponding wild-type enzyme such that the mutated napDNAbp lacks the ability to cleave one or both strands of a target polynucleotide containing a target sequence. For example, an aspartate-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 from S. pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves a single strand). Other examples of mutations that render Cas9 a nickase include, without limitation, H840A, N854A, and N863A in reference to the canonical SpCas9 sequence, or to equivalent amino acid positions in other Cas9 variants or Cas9 equivalents.


As used herein, the term “Cas protein” refers to a full-length Cas protein obtained from nature, a recombinant Cas protein having a sequences that differs from a naturally occurring Cas protein, or any fragment of a Cas protein that nevertheless retains all or a significant amount of the requisite basic functions needed for the disclosed methods, i.e., (i) possession of nucleic-acid programmable binding of the Cas protein to a target DNA, and (ii) ability to nick the target DNA sequence on one strand. The Cas proteins contemplated herein embrace CRISPR Cas 9 proteins, as well as Cas9 equivalents, variants (e.g., Cas9 nickase (nCas9) or nuclease inactive Cas9 (dCas9)) homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and may include a Cas9 equivalent from any type of CRISPR system (e.g., type II, V, VI), including Cpf1 (a type-V CRISPR-Cas systems), C2c1 (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system) and C2c3 (a type V CRISPR-Cas system). Further Cas-equivalents are described in Makarova et al., “C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299), the contents of which are incorporated herein by reference.


The terms “Cas9” or “Cas9 nuclease” or “Cas9 moiety” or “Cas9 domain” embrace any naturally occurring Cas9 from any organism, any naturally-occurring Cas9 equivalent or functional fragment thereof, any Cas9 homolog, ortholog, or paralog from any organism, and any mutant or variant of a Cas9, naturally-occurring or engineered. The term Cas9 is not meant to be particularly limiting and may be referred to as a “Cas9 or equivalent.” Exemplary Cas9 proteins are further described herein and/or are described in the art and are incorporated herein by reference. The present disclosure is unlimited with regard to the particular Cas9 that is employed in the nucleobase editor (BE) of the invention.


As noted herein, Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J. J., McShan W. M., Ajdic D. J., Savic D. J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A. N., Kenton S., Lai H. S., Lin S. P., Qian Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S. W., Roe B. A., McLaughlin R. E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C. M., Gonzales K., Chao Y., Pirzada Z. A., Eckert M. R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference).


The Cas9 protein encoded by the first and second nucleotide sequence is herein referred as a “split Cas9.” The Cas9 protein is known to have an N-terminal lobe and a C-terminal lobe linked by a disordered linker (e.g., as described in Nishimasu et al., Cell, Volume 156, Issue 5, pp. 935-949, 2014, incorporated herein by reference). In some embodiments, the N-terminal portion of the split Cas9 protein comprises the N-terminal lobe of a Cas9 protein. In some embodiments, the C-terminal portion of the split Cas9 comprises the C-terminal lobe of a Cas9 protein.


In some embodiments, the N-terminal portion of the split Cas9 comprises a portion of any one of SEQ ID NO: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 1-(550-650) in SEQ ID NO: 1. “1-(550-650)” means starting from amino acid 1 and ending anywhere between amino acid 550-650 (inclusive). For example, the N-terminal portion of the split Cas9 may comprise a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 1-550, 1-551, 1-552, 1-553, 1-554, 1-555, 1-556, 1-557, 1-558, 1-559, 1-560, 1-561, 1-562, 1-563, 1-564, 1-565, 1-566, 1-567, 1-568, 1-569, 1-570, 1-571, 1-572, 1-573, 1-574, 1-575, 1-576, 1-577, 1-578, 1-579, 1-580, 1-581, 1-582, 1-583, 1-584, 1-585, 1-586, 1-587, 1-588, 1-589, 1-590, 1-591, 1-592, 1-593, 1-594, 1-595, 1-596, 1-597, 1-598, 1-599, 1-600, 1-601, 1-602, 1-603, 1-604, 1-605, 1-606, 1-607, 1-608, 1-609, 1-610, 1-611, 1-612, 1-613, 1-614, 1-615, 1-616, 1-617, 1-618, 1-619, 1-620, 1-621, 1-622, 1-623, 1-624, 1-625, 1-626, 1-627, 1-628, 1-629, 1-630, 1-631, 1-632, 1-633, 1-634, 1-635, 1-636, 1-637, 1-638, 1-639, 1-640, 1-641, 1-642, 1-643, 1-644, 1-645, 1-646, 1-647, 1-648, 1-649, or 1-650 of SEQ ID NO: 1. In some embodiments, the N-terminal portion of the split Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 1-573 or 1-637 of SEQ ID NO: 1.


In some embodiments, the N-terminal portion of the split Cas9 may comprise a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 1-430, 1-431, 1-432, 1-433, 1-434, 1-435, 1-436, 1-437, 1-438, 1-439, 1-440, 1-441, 1-442, 1-443, 1-444, 1-445, 1-446, 1-447, 1-448, 1-449, 1-450, 1-451, 1-452, 1-453, 1-454, 1-455, 1-456, 1-457, 1-458, 1-459, 1-460, 1-461, 1-462, 1-463, 1-464, 1-465, 1-466, 1-467, 1-468, 1-469, 1-470, 1-471, 1-472, 1-473, 1-474, 1-475, 1-476, 1-477, 1-478, 1-479, 1-480, 1-481, 1-482, 1-483, 1-484, 1-485, 1-486, 1-487, 1-488, 1-489, 1-490, 1-491, 1-492, 1-493, 1-494, 1-495, 1-496, 1-497, 1-498, 1-499, 1-500, 1-501, 1-502, 1-503, 1-504, 1-505, 1-506, 1-507, 1-508, 1-509, 1-510, 1-511, 1-512, 1-513, 1-514, 1-515, 1-516, 1-517, 1-518, 1-519, 1-520, 1-521, 1-522, 1-523, 1-524, 1-525, 1-526, 1-527, 1-528, 1-529, 1-530, 1-531, 1-532, 1-533, 1-534, 1-535, 1-536, 1-537, 1-538, or 1-539 of SEQ ID NO: 11. In some embodiments, the N-terminal portion of the split Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 1-431, 1-453, 1-457, 1-484, 1-501, 1-534, or 1-537 of SEQ ID NO: 11. In certain embodiments, the N-terminal portion of the split Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 1-534 of SEQ ID NO: 11.


The C-terminal portion of the split Cas9 can be joined with the N-terminal portion of the split Cas9 to form a complete Cas9 protein. In some embodiments, the C-terminal portion of the Cas9 protein starts from where the N-terminal portion of the Cas9 protein ends. As such, in some embodiments, the C-terminal portion of the split Cas9 comprises a portion of any one of SEQ ID NO: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids (551-651)-1368 of SEQ ID NO: 1. “(551-651)-1368” means starting at an amino acid between amino acids 551-651 (inclusive) and ending at amino acid 1368.


For example, the C-terminal portion of the split Cas9 may comprise a portion of any one of SEQ ID NO: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acid 551-1368, 552-1368, 553-1368, 554-1368, 555-1368, 556-1368, 557-1368, 558-1368, 559-1368, 560-1368, 561-1368, 562-1368, 563-1368, 564-1368, 565-1368, 566-1368, 567-1368, 568-1368, 569-1368, 570-1368, 571-1368, 572-1368, 573-1368, 574-1368, 575-1368, 576-1368, 577-1368, 578-1368, 579-1368, 580-1368, 581-1368, 582-1368, 583-1368, 584-1368, 585-1368, 586-1368, 587-1368, 588-1368, 589-1368, 590-1368, 591-1368, 592-1368, 593-1368, 594-1368, 595-1368, 596-1368, 597-1368, 598-1368, 599-1368, 600-1368, 601-1368, 602-1368, 603-1368, 604-1368, 605-1368, 606-1368, 607-1368, 608-1368, 609-1368, 610-1368, 611-1368, 612-1368, 613-1368, 614-1368, 615-1368, 616-1368, 617-1368, 618-1368, 619-1368, 620-1368, 621-1368, 622-1368, 623-1368, 624-1368, 625-1368, 626-1368, 627-1368, 628-1368, 629-1368, 630-1368, 631-1368, 632-1368, 633-1368, 634-1368, 635-1368, 636-1368, 637-1368, 638-1368, 639-1368, 640-1368, 641-1368, 642-1368, 643-1368, 644-1368, 645-1368, 646-1368, 647-1368, 648-1368, 649-1368, 650-1368, or 651-1368 of SEQ ID NO: 1. In some embodiments, the C-terminal portion of the split Cas9 protein comprises a portion of any one of SEQ ID NO: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 574-1368 or 638-1368 of SEQ ID NO: 1.


In other embodiments, the C-terminal portion of the split Cas9 protein comprises a portion of any one of SEQ ID NO: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 432-1054, 454-1054, 458-1054, 485-1054, 502-1054, 535-1054, or 538-1054 of SEQ ID NO: 11. In certain embodiments, the C-terminal portion of the split Cas9 protein comprises a portion of any one of SEQ ID NO: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 535-1054 of SEQ ID NO: 11.


In other embodiments, the C-terminal portion of the split Cas9 protein comprises a portion of any one of SEQ ID NO: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 432-1054, 454-1054, 458-1054, 485-1054, 502-1054, 535-1054, or 538-1054 of SEQ ID NO: 10. In certain embodiments, the C-terminal portion of the split Cas9 protein comprises a portion of any one of SEQ ID NO: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 535-1054 of SEQ ID NO: 10.


Further aspects of the present disclosure provide rAAV particles comprising a first nucleic acid molecule (e.g. encoding a N-terminal portion of a nucleobase editor or Cas9 protein fused at its C-terminus to an intein-N) as described herein. rAAV particles comprising a second nucleic acid molecule (e.g. encoding an intein-C fused to the N-terminus of a C-terminal portion of the Cas9 protein or nucleobase editor) as described herein are also provided. The disclosed rAAV particles may comprise both a first nucleic acid molecule and second nucleic acid molecules as described herein.


Cas9 variants may also be delivered to cells using the methods described herein. For example, a Cas9 variant may also be “split” as described herein. A Cas9 variant may comprise an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of the Cas9 sequences provided herein. In some embodiments, the Cas9 variant comprises an amino acid sequence that is shorter or longer in length (e.g., by no more than 30%, no more than 25%, no more than 20%, no more than 15%, no more than 10%, no more than 5%, no more than 1% longer or shorter) than any of the Cas9 proteins provided herein (e.g., a S. pyogenes Cas9 (SpCas9) (SEQ ID NO: 1), S. pyogenes Cas9 nickase (SpCas9n) (SEQ ID NO: 3), S. aureus Cas9 (SaCas9) (SEQ ID NO: 10), and S. aureus Cas9 nickase (SaCas9) (SEQ ID NO: 11). In some embodiments, the Cas9 variant comprises an amino acid sequence that is shorter or longer in length (e.g., by no more than 200 amino acids, no more than 150 amino acids, no more than 100 amino acids, no more than 50 amino acids, no more than 10 amino acids, no more than 5 amino acids, or no more than 2 amino acids longer or shorter) than any of the Cas9 proteins provided herein.


In some embodiments, the N-terminal portion of a split Cas9 comprises an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the corresponding portion of any one of the Cas9 sequences provided herein (e.g., a SpCas9, SpCas9n, SaCas9, or SaCas9n). In some embodiments, the N-terminal portion of the split Cas9 comprises an amino acid sequence that is shorter or longer in length (e.g., by no more than 30%, no more than 25%, no more than 20%, no more than 15%, no more than 10%, no more than 5%, no more than 1% longer or shorter) than the corresponding portion of any of the Cas9 proteins provided herein. In some embodiments, the N-terminal portion of the split Cas9 comprises an amino acid sequence that is shorter or longer in length (e.g., by no more than 200 amino acids, no more than 150 amino acids, no more than 100 amino acids, no more than 50 amino acids, no more than 10 amino acids, no more than 5 amino acids, or no more than 2 amino acids longer or shorter) than the corresponding portion of any of the Cas9 proteins provided herein.


In some embodiments, the C-terminal portion of a split Cas9 comprises an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the corresponding portion of any one of the Cas9 sequences provided herein (e.g., the Cas9 sequences of any of SEQ ID NOs: 1, 3, 10, and 11). In some embodiments, the C-terminal portion of the split Cas9 comprises an amino acid sequence that is shorter or longer in length (e.g., by no more than 30%, no more than 25%, no more than 20%, no more than 15%, no more than 10%, no more than 5%, no more than 1% longer or shorter) than the corresponding portion of any of the Cas9 proteins provided herein. In some embodiments, the C-terminal portion of the split Cas9 comprises an amino acid sequence that is shorter or longer in length (e.g., by no more than 200 amino acids, no more than 150 amino acids, no more than 100 amino acids, no more than 50 amino acids, no more than 10 amino acids, no more than 5 amino acids, or no more than 2 amino acids longer or shorter) than the corresponding portion of any of the Cas9 proteins provided herein.


In some embodiments, the Cas9 variant is a dCas9 or nCas9. In some embodiments, the Cas9 protein is selected from S. pyogenes Cas9 (SpCas9) (SEQ ID NO: 1), S. pyogenes Cas9 nickase (SEQ ID NO: 3), S. aureus Cas9 (SaCas9) (SEQ ID NO: 10), and S. aureus Cas9 nickase (SEQ ID NO: 11). In certain embodiments, the Cas9 variant is a VRQR variant of SpCas9 that is compatible with NGA PAM sites.


Accordingly, in some embodiments, the N-terminal portion of the Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 1-573 or 1-637 of SEQ ID NO: 1. In some embodiments, the C-terminal portion of the Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 574-1368 or 638-1368 of SEQ ID NO: 1. In other embodiments, the N-terminal portion of the Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 1-573 or 1-637 of SEQ ID NO: 3. In some embodiments, the C-terminal portion of the Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 574-1368 or 638-1368 of SEQ ID NO: 3.


In some embodiments, the N-terminal portion of the Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 1-534 of SEQ ID NO: 11. In some embodiments, the C-terminal portion of the Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 535-1054 of SEQ ID NO: 11.


In some embodiments, the N-terminal portion of the split Cas9 comprises a mutation corresponding to a D10A mutation in SEQ ID NO: 1. In some embodiments, the N-terminal portion of the split Cas9 comprises a mutation corresponding to a D10A mutation in SEQ ID NO: 1 and the C-terminal portion of the split Cas9 comprises a mutation corresponding to a H840A mutation in SEQ ID NO:1. In some embodiments, the N-terminal portion of the split Cas9 comprises a mutation corresponding to a D10A mutation in SEQ ID NO: 1, and the C-terminal portion of the split Cas9 comprises a histidine at the position corresponding to position 840 in SEQ ID NO:1.


In other embodiments, the N-terminal portion of the split Cas9 comprises a mutation corresponding to a D10A mutation in SEQ ID NO: 10.


In some embodiments, to join the N-terminal portion of the Cas9 protein and the C-terminal portion of the Cas9 protein, an intein system may be used. In some embodiments, the N-terminal portion of the Cas9 is fused to an intein-N. In some embodiments, the intein-N is fused to the C-terminus of the N-terminal portion of the Cas9 to form a structure of NH2-[N-terminal portion of Cas9]-[intein-N]-COOH. In some embodiments, the intein-N is encoded by the dnaE-n gene. In some embodiments, the intein-N comprises the amino acid sequence as set forth in SEQ ID NO: 351 or 355. In some embodiments, the C-terminal portion of the Cas9 is fused to an intein-C, and the intein-C is fused to the N-terminus of the C-terminal portion of the Cas9 to form a structure of NH2-[intein-C]-[C-terminal portion of Cas9]-COOH. In some embodiments, the intein-C is encoded by the dnaE-c gene. In some embodiments, the intein-C comprises the amino acid sequence as set forth in SEQ ID NO: 353 or 357.


Other split intein systems may also be used in the present disclosure and are known in the art. For example, in some embodiments, the intein pair comprises an Npu split intein. In certain such embodiments, the intein-N comprises the amino acid sequence of SEQ ID NO: 351. In some embodiments, the intein-C comprises the amino acid sequence of SEQ ID NO: 353.


As described herein, the N-terminal portion of a nucleobase editor comprises the N-terminal portion of a nuclease-inactive Cas9 protein (dCas9) or a Cas9 nickase (nCas9). In some embodiments, the N-terminal portion of a nucleobase editor further comprises a nucleobase modifying enzyme (e.g., nucleases, nickases, recombinases, deaminases, DNA repair enzymes, DNA damage enzymes, dismutases, alkylation enzymes, depurination enzymes, oxidation enzymes, pyrimidine dimer forming enzymes, integrases, transposases, polymerases, ligases, helicases, photolyases, glycosylases, epigenetic modifiers such as methylases, acetylases, methyltransferase, demethylase, etc.). In some embodiments, the nucleobase modifying enzyme is a deaminase (e.g., a cytosine deaminase or an adenosine deaminase, or functional variants thereof). In some embodiments, the nucleobase modifying enzyme is fused to the N-terminus of the N-terminal portion of the split dCas9 or split nCas9. In some embodiments, the N-terminal portion of the nucleobase editor has of the structure: NH2-[nucleobase modifying enzyme]-[N-terminal portion of dCas9 or nCas9]-COOH. In some embodiments, the N-terminal portion of the nucleobase editor is fused to an intein N. In some embodiments, the intein-N is fused to the C-terminus of the N-terminal portion of the nucleobase editor.


In some embodiments, the first nucleotide sequence encodes a polypeptide comprising the structure NH2-[nucleobase modifying enzyme]-[N-terminal portion of dCas9 or nCas9]-[intein-N]-COOH.


In some embodiments, the C-terminal portion of the nucleobase editor comprises the C-terminal portion of a nuclease-inactive Cas9 protein (dCas9) or a Cas9 nickase (nCas9). In some embodiments, the nucleobase modifying enzyme is fused to the C-terminus of the C-terminal portion of the split dCas9 or split nCas9. In some embodiments, the C-terminal portion of the nucleobase editor is of the structure: NH2-[C-terminal portion of dCas9 or nCas9]-[nucleobase modifying enzyme]-COOH. In some embodiments, the C-terminal portion of the nucleobase editor comprises an intein-C fused to the C-terminal portion of the Cas9 protein. In some embodiments, the intein-C is fused to the N-terminus of the C-terminal portion of the nucleobase editor. In some embodiments, the second nucleotide sequence encodes a polypeptide of the structure: NH2-[intein-C]-[C-terminal portion of the Cas9 protein]-COOH.


Non-limiting examples of suitable Cas9 proteins and variants, and nucleobase editors and variants are provided. The disclosure provides Cas9 variants, for example, Cas9 proteins from one or more organisms, which may comprise one or more mutations (e.g., to generate dCas9 or Cas9 nickase). In some embodiments, one or more of the amino acid residues, identified below by an asterisk, of a Cas9 protein may be mutated. In some embodiments, the D10 and/or H840 residues of the amino acid sequence provided in SEQ ID NO: 1, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs: 2-275, 394-397 and 488, are mutated. In some embodiments, the D10 residue of the amino acid sequence provided in SEQ ID NO: 1, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs: 2-275, 394-397 and 488, is mutated to any amino acid residue, except for D. In some embodiments, the D10 residue of the amino acid sequence provided in SEQ ID NO: 1, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs: 2-275, 394-397 and 488, is mutated to an A. In some embodiments, the H840 residue of the amino acid sequence provided in SEQ ID NO: 1, or a corresponding residue in any of the amino acid sequences provided in SEQ ID NOs: 2-275, 394-397 and 488, is an H. In some embodiments, the H840 residue of the amino acid sequence provided in SEQ ID NO: 1, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs: 2-275, 394-397 and 488, is mutated to any amino acid residue, except for H. In some embodiments, the H840 residue of the amino acid sequence provided in SEQ ID NO: 1, or a corresponding mutation in any of the amino acid sequences provided in SEQ ID NOs: 2-275, 394-397 and 488, is mutated to an A. In some embodiments, the D10 residue of the amino acid sequence provided in SEQ ID NO: 1, or a corresponding residue in any of the amino acid sequences provided in SEQ ID NOs: 2-275, 394-397 and 488, is a D.


A number of Cas9 sequences from various species were aligned to determine whether corresponding homologous amino acid residues of D10 and H840 of SEQ ID NO: 1 can be identified in other Cas9 proteins, allowing the generation of Cas9 variants with corresponding mutations of the homologous amino acid residues. The alignment was carried out using the NCBI Constraint-based Multiple Alignment Tool (COBALT (accessible at st-va.ncbi.nlm.nih.gov/tools/cobalt)), with the following parameters. Alignment parameters: Gap penalties −11, −1; End-Gap penalties −5, −1. CDD Parameters: Use RPS BLAST on; Blast E-value 0.003; Find Conserved columns and Recompute on. Query Clustering Parameters: Use query clusters on; Word Size 4; Max cluster distance 0.8; Alphabet Regular.


Examples of Cas9 and Cas9 equivalents are provided as follows; however, these specific examples are not meant to be limiting. The nucleobase editor fusions of the present disclosure may use any suitable napDNAbp, including any suitable Cas9 or Cas9 equivalent.











S. pyogenes Cas9 wild type 




(NCBI Reference Sequence: NC 002737.2, Uniprot Reference Sequence: Q99ZW2)


(SEQ ID NO: 1) 



MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTAR






RRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLR





KKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA





KAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDN





LLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEK





YKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHL





GELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA





SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFK





TNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLT





LFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRN





FMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVI





EMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQEL





DINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRK





FDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSD





FRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA





TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQ





TGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIME





RSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA





SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENI





IHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD 






S. pyogenes dCas9 (D10A and H840A)



(SEQ ID NO: 2) 



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTAR






RRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLR





KKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA





KAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDN





LLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEK





YKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHL





GELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA





SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFK





TNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLT





LFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRN





FMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVI





EMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQEL





DINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSLEVVKKMKNYWRQLLNAKLITQRK





FDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSD





FRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA





TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQ





TGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIME





RSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA





SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENI





IHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD 






S. pyogenes Cas9 Nickase (D10A)



(SEQ ID NO: 3) 



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTAR






RRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLR





KKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA





KAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDN





LLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEK





YKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHL





GELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA





SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFK





TNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLT





LFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRN





FMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVI





EMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQEL





DINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRK





FDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSD





FRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA





TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQ





TGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIME





RSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA





SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENI





IHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD 





VRER-nCas9 (D10A/D1135V/G1218R/R1335E/T1337R) S. pyogenes Cas9 Nickase


(SEQ ID NO: 4)



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTAR






RRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLR





KKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA





KAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDN





LLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEK





YKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHL





GELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA





SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFK





TNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLT





LFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRN





FMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVI





EMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQEL





DINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRK





FDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSD





FRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA





TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQ





TGGFSKESILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIME





RSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARELQKGNELALPSKYVNFLYLA





SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENI





IHLFTLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD





VQR-nCas9 (D10A/D1135V/R1335Q/T1337R) S. pyogenes Cas9 Nickase


(SEQ ID NO: 5)



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTAR






RRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLR





KKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA





KAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDN





LLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEK





YKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHL





GELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA





SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFK





TNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLT





LFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRN





FMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVI





EMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQEL





DINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRK





FDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSD





FRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA





TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQ





TGGFSKESILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIME





RSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA





SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENI





IHLFTLTNLGAPAAFKYFDTTIDRKQYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD 





EQR-nCas9 (D10A/D1135E/R1335Q/T1337R) S. pyogenes Cas9 Nickase


(SEQ ID NO: 6)



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTAR






RRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLR





KKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA





KAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDN





LLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEK





YKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHL





GELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA





SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFK





TNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLT





LFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRN





FMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVI





EMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQEL





DINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRK





FDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSD





FRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA





TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQ





TGGFSKESILPKRNSDKLIARKKDWDPKKYGGFESPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIME





RSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA





SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENI





IHLFTLTNLGAPAAFKYFDTTIDRKQYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD 





VRQR-nCas9 (D10A/D1135V/G1218R/R1335Q/T1337R) S. pyogenes Cas9 


Nickase


(SEQ ID NO: 488)



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTAR






RRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLR





KKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA





KAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDN





LLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEK





YKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHL





GELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA





SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFK





TNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLT





LFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRN





FMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVI





EMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQEL





DINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRK





FDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSD





FRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA





TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQ





TGGFSKESILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIME





RSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARELQKGNELALPSKYVNFLYLA





SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENI





IHLFTLTNLGAPAAFKYFDTTIDRKQYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD 





SaKKH-nCas9 (D10A/E782K/N968K/R1015H) S. aureus Cas9 Nickase


(SEQ ID NO: 7)



MKRNYILGLAIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARRLKRRRRHRIQRVKK






LLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQI





SRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLE





TRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLNNLVITRDENE





KLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGKPEFTNLKVYHDIKDITARKEIIEN





AELLDQIAKILTIYQSSEDIQEELTNLNSELTQLEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIA





IFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQK





MINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYLVDHIIP





RSVSFDNSFNNKVLVKQEENSKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEER





DINRFSVQKDFINRNLVDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGY





KHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFK





DYKYSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDP





QTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSR





NKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYLVNSKCYLEAKKLKKISNQAEFIASFYKN





DLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPPHIIKTIASKTQSIKKYSTDILGNLYE





VKSKKHPQIIKKG






Streptococcus thermophilus CRISPR1 Cas9 (St1Cas9) Nickase (D9A)



(SEQ ID NO: 8)



MSDLVLGLAIGIGSVGVGILNKVTGEIIHKNSRIFPAAQAENNLVRRTNRQGRRLTRRKKHRRVRLNRL






FEESGLITDFTKISINLNPYQLRVKGLTDELSNEELFIALKNMVKHRGISYLDDASDDGNSSIGDYAQIVK





ENSKQLETKTPGQIQLERYQTYGQLRGDFTVEKDGKKHRLINVFPTSAYRSEALRILQTQQEFNPQITDE





FINRYLEILTGKRKYYHGPGNEKSRTDYGRYRTSGETLDNIFGILIGKCTFYPDEFRAAKASYTAQEFNL





LNDLNNLTVPTETKKLSKEQKNQIINYVKNEKAMGPAKLFKYIAKLLSCDVADIKGYRIDKSGKAEIHT





FEAYRKMKTLETLDIEQMDRETLDKLAYVLTLNTEREGIQEALEHEFADGSFSQKQVDELVQFRKANS





SIFGKGWHNFSVKLMMELIPELYETSEEQMTILTRLGKQKTTSSSNKTKYIDEKLLTEEIYNPVVAKSVR





QAIKIVNAAIKEYGDFDNIVIEMARETNEDDEKKAIQKIQKANKDEKDAAMLKAANQYNGKAELPHSV





FHGHKQLATKIRLWHQQGERCLYTGKTISIHDLINNSNQFEVDHILPLSITFDDSLANKVLVYATANQE





KGQRTPYQALDSMDDAWSFRELKAFVRESKTLSNKKKEYLLTEEDISKFDVRKKFIERNLVDTRYASR





VVLNALQEHFRAHKIDTKVSVVRGQFTSQLRRHWGIEKTRDTYHHHAVDALIIAASSQLNLWKKQKN





TLVSYSEDQLLDIETGELISDDEYKESVFKAPYQHFVDTLKSKEFEDSILFSYQVDSKFNRKISDATIYAT





RQAKVGKDKADETYVLGKIKDIYTQDGYDAFMKIYKKDKSKFLMYRHDPQTPEKVIEPILENYPNKQI





NEKGKEVPCNPFLKYKEEHGYIRKYSKKGNGPEIKSLKYYDSKLGNHIDITPKDSNNKVVLQSVSPWR





ADVYFNKTTGKYEILGLKYADLQFEKGTGTYKISQLKYNDIKKKEGVDSDSEFKFTLYKNDLLLVKDT





ETKEQQLFRFLSRTMPKQKHYVELKPYDKQKFEGGEALIKVLGNVANSGQCKKGLGKSNISIYKVRTD





VLGNQHIIKNEGDKPKLDF






Streptococcus thermophilus CRISPR3Cas9 (St3Cas9) Nickase (D10A)



(SEQ ID NO: 9)



MTKPYSIGLAIGTNSVGWAVITDNYKVPSKKMKVLGNTSKKYIKKNLLGVLLFDSGITAEGRRLKRTA






RRRYTRRRNRILYLQEIFSTEMATLDDAFFQRLDDSFLVPDDKRDSKYPIFGNLVEEKVYHDEFPTIYHL





RKYLADSTKKADLRLVYLALAHMIKYRGHFLIEGEFNSKNNDIQKNFQDFLDTYNAIFESDLSLENSKQ





LEEIVKDKISKLEKKDRILKLFPGEKNSGIFSEFLKLIVGNQADFRKCFNLDEKASLHFSKESYDEDLETL





LGYIGDDYSDVFLKAKKLYDAILLSGFLTVTDNETEAPLSSAMIKRYNEHKEDLALLKEYIRNISLKTYN





EVFKDDTKNGYAGYIDGKTNQEDFYVYLKNLLAEFEGADYFLEKIDREDFLRKQRTFDNGSIPYQIHLQ





EMRAILDKQAKFYPFLAKNKERIEKILTFRIPYYVGPLARGNSDFAWSIRKRNEKITPWNFEDVIDKESS





AEAFINRMTSFDLYLPEEKVLPKHSLLYETFNVYNELTKVRFIAESMRDYQFLDSKQKKDIVRLYFKDK





RKVTDKDIIEYLHAIYGYDGIELKGIEKQFNSSLSTYHDLLNIINDKEFLDDSSNEAIIEEIIHTLTIFEDRE





MIKQRLSKFENIFDKSVLKKLSRRHYTGWGKLSAKLINGIRDEKSGNTILDYLIDDGISNRNFMQLIHDD





ALSFKKKIQKAQIIGDEDKGNIKEVVKSLPGSPAIKKGILQSIKIVDELVKVMGGRKPESIVVEMARENQ





YTNQGKSNSQQRLKRLEKSLKELGSKILKENIPAKLSKIDNNALQNDRLYLYYLQNGKDMYTGDDLDI





DRLSNYDIDHIIPQAFLKDNSIDNKVLVSSASNRGKSDDFPSLEVVKKRKTFWYQLLKSKLISQRKFDNL





TKAERGGLLPEDKAGFIQRQLVETRQITKHVARLLDEKFNNKKDENNRAVRTVKIITLKSTLVSQFRKD





FELYKVREINDFHHAHDAYLNAVIASALLKKYPKLEPEFVYGDYPKYNSFRERKSATEKVYFYSNIMNI





FKKSISLADGRVIERPLIEVNEETGESVWNKESDLATVRRVLSYPQVNVVKKVEEQNHGLDRGKPKGL





FNANLSSKPKPNSNENLVGAKEYLDPKKYGGYAGISNSFAVLVKGTIEKGAKKKITNVLEFQGISILDRI





NYRKDKLNFLLEKGYKDIELIIELPKYSLFELSDGSRRMLASILSTNNKRGEIHKGNQIFLSQKFVKLLYH





AKRISNTINENHRKYVENHKKEFEELFYYILEFNENYVGAKKNGKLLNSAFQSWQNHSIDELCSSFIGPT





GSERKGLFELTSRGSAADFEFLGVKIPRYRDYTPSSLLKDATLIHQSVTGLYETRIDLAKLGEG 






S. aureus Cas9 wild type



(SEQ ID NO: 10)



MKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARRLKRRRRHRIQRVKK






LLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQI





SRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLE





TRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLNNLVITRDENE





KLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGKPEFTNLKVYHDIKDITARKEIIEN





AELLDQIAKILTIYQSSEDIQEELTNLNSELTQLEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIA





IFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQK





MINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIP





RSVSFDNSFNNKVLVKQEENSKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEER





DINRFSVQKDFINRNLVDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGY





KHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFK





DYKYSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDP





QTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSR





NKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYNN





DLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPPRIIKTIASKTQSIKKYSTDILGNLYE





VKSKKHPQIIKKG






S. aureus Cas9 Nickase (D10A)



(SEQ ID NO: 11)



MKRNYILGLAIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARRLKRRRRHRIQRVKK






LLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQI





SRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLE





TRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLNNLVITRDENE





KLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGKPEFTNLKVYHDIKDITARKEIIEN





AELLDQIAKILTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIA





IFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQK





MINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIP





RSVSFDNSFNNKVLVKQEENSKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEER





DINRFSVQKDFINRNLVDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKF1KKERNKG





YKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDF





KDYKYSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHD





PQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSR





NKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYNN





DLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPPRIIKTIASKTQSIKKYSTDILGNLYE





VKSKKHPQIIKKG 






Streptococcus thermophilus wild type CRISPR3 Cas9 (St3Cas9)



(SEQ ID NO: 12)



MTKPYSIGLDIGTNSVGWAVITDNYKVPSKKMKVLGNTSKKYIKKNLLGVLLFDSGITAEGRRLKRTA






RRRYTRRRNRILYLQEIFSTEMATLDDAFFQRLDDSFLVPDDKRDSKYPIFGNLVEEKVYHDEFPTIYHL





RKYLADSTKKADLRLVYLALAHMIKYRGHFLIEGEFNSKNNDIQKNFQDFLDTYNAIFESDLSLENSKQ





LEEIVKDKISKLEKKDRILKLFPGEKNSGIFSEFLKLIVGNQADFRKCFNLDEKASLHFSKESYDEDLETL





LGYIGDDYSDVFLKAKKLYDAILLSGFLTVTDNETEAPLSSAMIKRYNEHKEDLALLKEYIRNISLKTYN





EVFKDDTKNGYAGYIDGKTNQEDFYVYLKNLLAEFEGADYFLEKIDREDFLRKQRTFDNGSIPYQIHLQ





EMRAILDKQAKFYPFLAKNKERIEKILTFRIPYYVGPLARGNSDFAWSIRKRNEKITPWNFEDVIDKESS





AEAFINRMTSFDLYLPEEKVLPKHSLLYETFNVYNELTKVRFIAESMRDYQFLDSKQKKDIVRLYFKDK





RKVTDKDIIEYLHAIYGYDGIELKGIEKQFNSSLSTYHDLLNIINDKEFLDDSSNEAIIEEIIHTLTIFEDRE





MIKQRLSKFENIFDKSVLKKLSRRHYTGWGKLSAKLINGIRDEKSGNTILDYLIDDGISNRNFMQLIHDD





ALSFKKKIQKAQIIGDEDKGNIKEVVKSLPGSPAIKKGILQSIKIVDELVKVMGGRKPESIVVEMARENQ





YTNQGKSNSQQRLKRLEKSLKELGSKILKENIPAKLSKIDNNALQNDRLYLYYLQNGKDMYTGDDLDI





DRLSNYDIDHIIPQAFLKDNSIDNKVLVSSASNRGKSDDFPSLEVVKKRKTFWYQLLKSKLISQRKFDNL





TKAERGGLLPEDKAGFIQRQLVETRQITKHVARLLDEKFNNKKDENNRAVRTVKIITLKSTLVSQFRKD





FELYKVREINDFHHAHDAYLNAVIASALLKKYPKLEPEFVYGDYPKYNSFRERKSATEKVYFYSNIMNI





FKKSISLADGRVIERPLIEVNEETGESVWNKESDLATVRRVLSYPQVNVVKKVEEQNHGLDRGKPKGL





FNANLSSKPKPNSNENLVGAKEYLDPKKYGGYAGISNSFAVLVKGTIEKGAI(KKITNVLEFQGISILDRI





NYRKDKLNFLLEKGYKDIELIIELPKYSLFELSDGSRRMLASILSTNNKRGEIHKGNQIFLSQKFVKLLYH





AKRISNTINENHRKYVENHKKEFEELFYYILEFNENYVGAKKNGKLLNSAFQSWQNHSIDELCSSFIGPT





GSERKGLFELTSRGSAADFEFLGVKIPRYRDYTPSSLLKDATLIHQSVTGLYETRIDLAKLGEG 






Streptococcus thermophilus CRISPR1 Cas9 wild type (St1Cas9)



(SEQ ID NO: 13)



MSDLVLGLDIGIGSVGVGILNKVTGEIIHKNSRIFPAAQAENNLVRRTNRQGRRLTRRKKHRRVRLNRL






FEESGLITDFTKISINLNPYQLRVKGLTDELSNEELFIALKNMVKHRGISYLDDASDDGNSSIGDYAQIVK





ENSKQLETKTPGQIQLERYQTYGQLRGDFTVEKDGKKHRLINVFPTSAYRSEALRILQTQQEFNPQITDE





FINRYLEILTGKRKYYHGPGNEKSRTDYGRYRTSGETLDNIFGILIGKCTFYPDEFRAAKASYTAQEFNL





LNDLNNLTVPTETKKLSKEQKNQIINYVKNEKAMGPAKLFKYIAKLLSCDVADIKGYRIDKSGKAEIHT





FEAYRKMKTLETLDIEQMDRETLDKLAYVLTLNTEREGIQEALEHEFADGSFSQKQVDELVQFRKANS





SIFGKGWHNFSVKLMMELIPELYETSEEQMTILTRLGKQKTTSSSNKTKYIDEKLLTEEIYNPVVAKSVR





QAIKIVNAAIKEYGDFDNIVIEMARETNEDDEKKAIQKIQKANKDEKDAAMLKAANQYNGKAELPHSV





FHGHKQLATKIRLWHQQGERCLYTGKTISIHDLINNSNQFEVDHILPLSITFDDSLANKVLVYATANQE





KGQRTPYQALDSMDDAWSFRELKAFVRESKTLSNKKKEYLLTEEDISKFDVRKKFIERNLVDTRYASR





VVLNALQEHFRAHKIDTKVSVVRGQFTSQLRRHWGIEKTRDTYHHHAVDALIIAASSQLNLWKKQKN





TLVSYSEDQLLDIETGELISDDEYKESVFKAPYQHFVDTLKSKEFEDSILFSYQVDSKFNRKISDATIYAT





RQAKVGKDKADETYVLGKIKDIYTQDGYDAFMKIYKKDKSKFLMYRHDPQTPEKVIEPILENYPNKQI





NEKGKEVPCNPFLKYKEEHGYIRKYSKKGNGPEIKSLKYYDSKLGNHIDITPKDSNNKVVLQSVSPWR





ADVYFNKTTGKYEILGLKYADLQFEKGTGTYKISQLKYNDIKKKEGVDSDSEFKFTLYKNDLLLVKDT





ETKEQQLFRFLSRTMPKQKHYVELKPYDKQKFEGGEALIKVLGNVANSGQCKKGLGKSNISIYKVRTD





VLGNQHIIKNEGDKPKLDF





CasX from Sulfolobus islandicus (strain REY15A)


(SEQ ID NO: 14)



MEVPLYNIFGDNYIIQVATEAENSTIYNNKVEIDDEELRNVLNLAYKIAKNNEDAAAERRGKAKKKKG






LEGETTTSNIILPLSGNDKNPWTETLKCYNFPTTVALSEVFKNFSQVKECEEVSAPSFVKPEFYKFGRSP





GMVERTRRVKLEVEPHYLIMAAAGWVLTRLGKAKVSEGDYVGVNVFTPTRGILYSLIQNVNGIVPGIK





PETAFGLWIARKVVSSVTNPNVSVVSIYTISDAVGQNPTTINGGFSIDLTKLLEKRDLLSERLEAIARNAL





SISSNMRERYIVLANYIYEYLTGSKRLEDLLYFANRDLIMNLNSDDGKVRDLKLISAYVNGELIRGEG 





CasY from Sulfolobus islandicus (strain REY15A)


(SEQ ID NO: 15)



MEVPLYNIFGDNYIIQVATEAENSTIYNNKVEIDDEELRNVLNLAYKIAKNNEDAAAERRGKAKKKKG






LEGETTTSNIILPLSGNDKNPWTETLKCYNFPTTVALSEVFKNFSQVKECEEVSAPSFVKPEFYLFGRSPG





MVERTRRVKLEVEPHYLIIAAAGWVLTRLGKAKVSEGDYVGVNVFTPTRGILYSLIQNVNGIVPGIKPE





TAFGLWIARKVVSSVTNPNVSVVRIYTISDAVGQNPTTINGGFSIDLTKLLEKRYLLSERLEAIARNALSI





SSNMRERYIVLANYIYEYLTGSKRLEDLLYFANRDLIMNLNSDDGKVRDLKLISAYVNGELIRGEG 






Some aspects of the disclosure provide Cas9 domains that have different PAM specificities. Typically, Cas9 proteins, such as Cas9 from S. pyogenes (spCas9), require a canonical NGG PAM sequence to bind a particular nucleic acid region. This may limit the ability to edit desired bases within a genome. In some embodiments, the base editing fusion proteins provided herein may need to be placed at a precise location, for example where a target base is placed within a 4 base region (e.g., a “editing window”), which is approximately 15 bases upstream of the PAM. See Komor, A. C., et al., “Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage” Nature 533, 420-424 (2016), the entire contents of which are hereby incorporated by reference. Accordingly, in some embodiments, any of the fusion proteins provided herein may contain a Cas9 domain that is capable of binding a nucleotide sequence that does not contain a canonical (e.g., NGG) PAM sequence. Cas9 domains that bind to non-canonical PAM sequences have been described in the art and would be apparent to the skilled artisan. For example, Cas9 domains that bind non-canonical PAM sequences have been described in Kleinstiver, B. P., et al., “Engineered CRISPR-Cas9 nucleases with altered PAM specificities” Nature 523, 481-485 (2015); and Kleinstiver, B. P., et al., “Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition” Nature Biotechnology 33, 1293-1298 (2015); the entire contents of each are hereby incorporated by reference.


For example, a napDNAbp domain with altered PAM specificity, such as a domain with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with wild type Francisella novicida Cpf1 (SEQ ID NO: 16) (D917, E1006, and D1255), which has the following amino acid sequence:










Wild type Francisella novicida Cpf1 



(D917, E1006, and D1255 are bolded and underlined)


(SEQ ID NO: 16)



MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKAKQIIDKYHQFFIEEILSSVCIS






EDLLQNYSDVYFKLKKSDDDNLQKDFKSAKDTIKKQISEYIKDSEKFKNLFNQNLIDAKKGQESDLILW





LKQSKDNGIELFKANSDITDIDEALEIIKSFKGWTTYFKGFHENRKNVYSSNDIPTSIIYRIVDDNLPKFLE





NKAKYESLKDKAPEAINYEQIKKDLAELLTFDIDYKTSEVNQRVFSLDEVFEIANFNNYLNQSGITKFNT





IIGGKFVNGENTKRKGINEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVTTMQ





SFYEQIAAFKTVEEKSIKETLSLLFDDLKAQKLDLSKIYFKNDKSLTDLSQQVFDDYSVIGTAVLEYITQ





QIAPKNLDNPSKKEQELIAKKTEKAKYLSLETIKLALEEFNKHRDIDKQCRFEEILANFAAIPMIFDEIAQ





NKDNLAQISIKYQNQGKKDLLQASAEDDVKAIKDLLDQTNNLLHKLKIFHISQSEDKANILDKDEHFYL





VFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNFENSTLANGWDKNKEPDNTAILFIKDDKYYLGV





MNKKNNKIFDDKAIKENKGEGYKKIVYKLLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKNG





SPQKGYEKFEFNIEDCRKFIDFYKQSISKHPEWKDFGFRFSDTQRYNSIDEFYREVENQGYKLTFENISES





YIDSVVNQGKLYLFQIYNKDFSAYSKGRPNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKK





ITHPAKEAIANKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKFNDEINLLLKEKAND





VHILSIcustom-character RGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMKTNYHDKLAAIEKDRDSARKDWKKINNIKE





MKEGYLSQVVHEIAKLVIEYNAIVVFcustom-character DLNFGFKRGRFKVEKQVYQKLEKMLIEKLNYLVFKDNEFDK





TGGVLRAYQLTAPFETFKKMGKQTGIIYYVPAGFTSKICPVTGFVNQLYPKYESVSKSQEFFSKFDKICY





NLDKGYFEFSFDYKNFGDKAAKGKWTIASFGSRLINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEY





GHGECIKAAICGESDKKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNMPQDA






custom-character ANGAYHIGLKGLMLLGRIKNNQEGKKLNLVIKNEEYFEFVQNRNN 







Francisella novicida Cpf1 D917A 



(A917, E1006, and D1255 are bolded and underlined)


(SEQ ID NO: 17)



MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKAKQIIDKYHQFFIEEILSSVCIS






EDLLQNYSDVYFKLKKSDDDNLQKDFKSAKDTIKKQISEYIKDSEKFKNLFNQNLIDAKKGQESDLILW





LKQSKDNGIELFKANSDITDIDEALEIIKSFKGWTTYFKGFHENRKNVYSSNDIPTSIIYRIVDDNLPKFLE





NKAKYESLKDKAPEAINYEQIKKDLAELLTFDIDYKTSEVNQRVFSLDEVFEIANFNNYLNQSGITKFNT





IIGGKFVNGENTKRKGINEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVTTMQ





SFYEQIAAFKTVEEKSIKETLSLLFDDLKAQKLDLSKIYFKNDKSLTDLSQQVFDDYSVIGTAVLEYITQ





QIAPKNLDNPSKKEQELIAKKTEKAKYLSLETIKLALEEFNKHRDIDKQCRFEEILANFAAIPMIFDEIAQ





NKDNLAQISIKYQNQGKKDLLQASAEDDVKAIKDLLDQTNNLLHKLKIFHISQSEDKANILDKDEHFYL





VFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNFENSTLANGWDKNKEPDNTAILFIKDDKYYLGV





MNKKNNKIFDDKAIKENKGEGYKKIVYKLLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKNG





SPQKGYEKFEFNIEDCRKFIDFYKQSISKHPEWKDFGFRFSDTQRYNSIDEFYREVENQGYKLTFENISES





YIDSVVNQGKLYLFQIYNKDFSAYSKGRPNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKK





ITHPAKEAIANKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKFNDEINLLLKEKAND





VHILSIcustom-character RGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMKTNYHDKLAAIEKDRDSARKDWKKINNIKE





MKEGYLSQVVHEIAKLVIEYNAIVVFcustom-character DLNFGFKRGRFKVEKQVYQKLEKMLIEKLNYLVFKDNEFDK





TGGVLRAYQLTAPFETFKKMGKQTGIIYYVPAGFTSKICPVTGFVNQLYPKYESVSKSQEFFSKFDKICY





NLDKGYFEFSFDYKNFGDKAAKGKWTIASFGSRLINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEY





GHGECIKAAICGESDKKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNMPQDA






custom-character ANGAYHIGLKGLMLLGRIKNNQEGKKLNLVIKNEEYFEFVQNRNN 







Francisella novicida Cpf1 E1006A 



(D917, A1006, and D1255 are bolded and underlined)


(SEQ ID NO: 18)



MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKAKQIIDKYHQFFIEEILSSVCIS






EDLLQNYSDVYFKLKKSDDDNLQKDFKSAKDTIKKQISEYIKDSEKFKNLFNQNLIDAKKGQESDLILW





LKQSKDNGIELFKANSDITDIDEALEIIKSFKGWTTYFKGFHENRKNVYSSNDIPTSIIYRIVDDNLPKFLE





NKAKYESLKDKAPEAINYEQIKKDLAELLTFDIDYKTSEVNQRVFSLDEVFEIANFNNYLNQSGITKFNT





IIGGKFVNGENTKRKGINEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVTTMQ





SFYEQIAAFKTVEEKSIKETLSLLFDDLKAQKLDLSKIYFKNDKSLTDLSQQVFDDYSVIGTAVLEYITQ





QIAPKNLDNPSKKEQELIAKKTEKAKYLSLETIKLALEEFNKHRDIDKQCRFEEILANFAAIPMIFDEIAQ





NKDNLAQISIKYQNQGKKDLLQASAEDDVKAIKDLLDQTNNLLHKLKIFHISQSEDKANILDKDEHFYL





VFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNFENSTLANGWDKNKEPDNTAILFIKDDKYYLGV





MNKKNNKIFDDKAIKENKGEGYKKIVYKLLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKNG





SPQKGYEKFEFNIEDCRKFIDFYKQSISKHPEWKDFGFRFSDTQRYNSIDEFYREVENQGYKLTFENISES





YIDSVVNQGKLYLFQIYNKDFSAYSKGRPNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKK





ITHPAKEAIANKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKFNDEINLLLKEKAND





VHILSIcustom-character RGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMKTNYHDKLAAIEKDRDSARKDWKKINNIKE





MKEGYLSQVVHEIAKLVIEYNAIVVFcustom-character DLNFGFKRGRFKVEKQVYQKLEKMLIEKLNYLVFKDNEFDK





TGGVLRAYQLTAPFETFKKMGKQTGIIYYVPAGFTSKICPVTGFVNQLYPKYESVSKSQEFFSKFDKICY





NLDKGYFEFSFDYKNFGDKAAKGKWTIASFGSRLINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEY





GHGECIKAAICGESDKKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNMPQDA






custom-character ANGAYHIGLKGLMLLGRIKNNQEGKKLNLVIKNEEYFEFVQNRNN 







Francisella novicida Cpf1 D1255A 



(D917, E1006, and A1255 are bolded and underlined)


(SEQ ID NO: 19)



MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKAKQIIDKYHQFFIEEILSSVCIS






EDLLQNYSDVYFKLKKSDDDNLQKDFKSAKDTIKKQISEYIKDSEKFKNLFNQNLIDAKKGQESDLILW





LKQSKDNGIELFKANSDITDIDEALEIIKSFKGWTTYFKGFHENRKNVYSSNDIPTSIIYRIVDDNLPKFLE





NKAKYESLKDKAPEAINYEQIKKDLAELLTFDIDYKTSEVNQRVFSLDEVFEIANFNNYLNQSGITKFNT





IIGGKFVNGENTKRKGINEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVTTMQ





SFYEQIAAFKTVEEKSIKETLSLLFDDLKAQKLDLSKIYFKNDKSLTDLSQQVFDDYSVIGTAVLEYITQ





QIAPKNLDNPSKKEQELIAKKTEKAKYLSLETIKLALEEFNKHRDIDKQCRFEEILANFAAIPMIFDEIAQ





NKDNLAQISIKYQNQGKKDLLQASAEDDVKAIKDLLDQTNNLLHKLKIFHISQSEDKANILDKDEHFYL





VFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNFENSTLANGWDKNKEPDNTAILFIKDDKYYLGV





MNKKNNKIFDDKAIKENKGEGYKKIVYKLLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKNG





SPQKGYEKFEFNIEDCRKFIDFYKQSISKHPEWKDFGFRFSDTQRYNSIDEFYREVENQGYKLTFENISES





YIDSVVNQGKLYLFQIYNKDFSAYSKGRPNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKK





ITHPAKEAIANKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKFNDEINLLLKEKAND





VHILSIcustom-character RGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMKTNYHDKLAAIEKDRDSARKDWKKINNIKE





MKEGYLSQVVHEIAKLVIEYNAIVVFcustom-character DLNFGFKRGRFKVEKQVYQKLEKMLIEKLNYLVFKDNEFDK





TGGVLRAYQLTAPFETFKKMGKQTGIIYYVPAGFTSKICPVTGFVNQLYPKYESVSKSQEFFSKFDKICY





NLDKGYFEFSFDYKNFGDKAAKGKWTIASFGSRLINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEY





GHGECIKAAICGESDKKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNMPQDA






custom-character ANGAYHIGLKGLMLLGRIKNNQEGKKLNLVIKNEEYFEFVQNRNN 







Francisella novicida Cpf1 D917A/E1006A 



(A917, A1006, and D1255 are bolded and underlined)


(SEQ ID NO: 20)



MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKAKQIIDKYHQFFIEEILSSVCIS






EDLLQNYSDVYFKLKKSDDDNLQKDFKSAKDTIKKQISEYIKDSEKFKNLFNQNLIDAKKGQESDLILW





LKQSKDNGIELFKANSDITDIDEALEIIKSFKGWTTYFKGFHENRKNVYSSNDIPTSIIYRIVDDNLPKFLE





NKAKYESLKDKAPEAINYEQIKKDLAELLTFDIDYKTSEVNQRVFSLDEVFEIANFNNYLNQSGITKFNT





IIGGKFVNGENTKRKGINEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVTTMQ





SFYEQIAAFKTVEEKSIKETLSLLFDDLKAQKLDLSKIYFKNDKSLTDLSQQVFDDYSVIGTAVLEYITQ





QIAPKNLDNPSKKEQELIAKKTEKAKYLSLETIKLALEEFNKHRDIDKQCRFEEILANFAAIPMIFDEIAQ





NKDNLAQISIKYQNQGKKDLLQASAEDDVKAIKDLLDQTNNLLHKLKIFHISQSEDKANILDKDEHFYL





VFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNFENSTLANGWDKNKEPDNTAILFIKDDKYYLGV





MNKKNNKIFDDKAIKENKGEGYKKIVYKLLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKNG





SPQKGYEKFEFNIEDCRKFIDFYKQSISKHPEWKDFGFRFSDTQRYNSIDEFYREVENQGYKLTFENISES





YIDSVVNQGKLYLFQIYNKDFSAYSKGRPNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKK





ITHPAKEAIANKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKFNDEINLLLKEKAND





VHILSIcustom-character RGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMKTNYHDKLAAIEKDRDSARKDWKKINNIKE





MKEGYLSQVVHEIAKLVIEYNAIVVFcustom-character DLNFGFKRGRFKVEKQVYQKLEKMLIEKLNYLVFKDNEFDK





TGGVLRAYQLTAPFETFKKMGKQTGIIYYVPAGFTSKICPVTGFVNQLYPKYESVSKSQEFFSKFDKICY





NLDKGYFEFSFDYKNFGDKAAKGKWTIASFGSRLINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEY





GHGECIKAAICGESDKKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNMPQDA






custom-character ANGAYHIGLKGLMLLGRIKNNQEGKKLNLVIKNEEYFEFVQNRNN 







Francisella novicida Cpf1 D917A/D1255A 



(A917, E1006, and A1255 are bolded and underlined)


(SEQ ID NO: 21)



MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKAKQIIDKYHQFFIEEILSSVCIS






EDLLQNYSDVYFKLKKSDDDNLQKDFKSAKDTIKKQISEYIKDSEKFKNLFNQNLIDAKKGQESDLILW





LKQSKDNGIELFKANSDITDIDEALEIIKSFKGWTTYFKGFHENRKNVYSSNDIPTSIIYRIVDDNLPKFLE





NKAKYESLKDKAPEAINYEQIKKDLAELLTFDIDYKTSEVNQRVFSLDEVFEIANFNNYLNQSGITKFNT





IIGGKFVNGENTKRKGINEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVTTMQ





SFYEQIAAFKTVEEKSIKETLSLLFDDLKAQKLDLSKIYFKNDKSLTDLSQQVFDDYSVIGTAVLEYITQ





QIAPKNLDNPSKKEQELIAKKTEKAKYLSLETIKLALEEFNKHRDIDKQCRFEEILANFAAIPMIFDEIAQ





NKDNLAQISIKYQNQGKKDLLQASAEDDVKAIKDLLDQTNNLLHKLKIFHISQSEDKANILDKDEHFYL





VFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNFENSTLANGWDKNKEPDNTAILFIKDDKYYLGV





MNKKNNKIFDDKAIKENKGEGYKKIVYKLLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKNG





SPQKGYEKFEFNIEDCRKFIDFYKQSISKHPEWKDFGFRFSDTQRYNSIDEFYREVENQGYKLTFENISES





YIDSVVNQGKLYLFQIYNKDFSAYSKGRPNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKK





ITHPAKEAIANKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKFNDEINLLLKEKAND





VHILSIcustom-character RGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMKTNYHDKLAAIEKDRDSARKDWKKINNIKE





MKEGYLSQVVHEIAKLVIEYNAIVVFcustom-character DLNFGFKRGRFKVEKQVYQKLEKMLIEKLNYLVFKDNEFDK





TGGVLRAYQLTAPFETFKKMGKQTGIIYYVPAGFTSKICPVTGFVNQLYPKYESVSKSQEFFSKFDKICY





NLDKGYFEFSFDYKNFGDKAAKGKWTIASFGSRLINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEY





GHGECIKAAICGESDKKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNMPQDA






custom-character ANGAYHIGLKGLMLLGRIKNNQEGKKLNLVIKNEEYFEFVQNRNN 







Francisella novicida Cpf1 E1006A/D1255A 



(D917, A1006, and A1255 are bolded and underlined) 


(SEQ ID NO: 22)



MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKAKQIIDKYHQFFIEEILSSVCIS






EDLLQNYSDVYFKLKKSDDDNLQKDFKSAKDTIKKQISEYIKDSEKFKNLFNQNLIDAKKGQESDLILW





LKQSKDNGIELFKANSDITDIDEALEIIKSFKGWTTYFKGFHENRKNVYSSNDIPTSIIYRIVDDNLPKFLE





NKAKYESLKDKAPEAINYEQIKKDLAELLTFDIDYKTSEVNQRVFSLDEVFEIANFNNYLNQSGITKFNT





IIGGKFVNGENTKRKGINEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVTTMQ





SFYEQIAAFKTVEEKSIKETLSLLFDDLKAQKLDLSKIYFKNDKSLTDLSQQVFDDYSVIGTAVLEYITQ





QIAPKNLDNPSKKEQELIAKKTEKAKYLSLETIKLALEEFNKHRDIDKQCRFEEILANFAAIPMIFDEIAQ





NKDNLAQISIKYQNQGKKDLLQASAEDDVKAIKDLLDQTNNLLHKLKIFHISQSEDKANILDKDEHFYL





VFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNFENSTLANGWDKNKEPDNTAILFIKDDKYYLGV





MNKKNNKIFDDKAIKENKGEGYKKIVYKLLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKNG





SPQKGYEKFEFNIEDCRKFIDFYKQSISKHPEWKDFGFRFSDTQRYNSIDEFYREVENQGYKLTFENISES





YIDSVVNQGKLYLFQIYNKDFSAYSKGRPNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKK





ITHPAKEAIANKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKFNDEINLLLKEKAND





VHILSIcustom-character RGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMKTNYHDKLAAIEKDRDSARKDWKKINNIKE





MKEGYLSQVVHEIAKLVIEYNAIVVFcustom-character DLNFGFKRGRFKVEKQVYQKLEKMLIEKLNYLVFKDNEFDK





TGGVLRAYQLTAPFETFKKMGKQTGIIYYVPAGFTSKICPVTGFVNQLYPKYESVSKSQEFFSKFDKICY





NLDKGYFEFSFDYKNFGDKAAKGKWTIASFGSRLINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEY





GHGECIKAAICGESDKKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNMPQDA






custom-character ANGAYHIGLKGLMLLGRIKNNQEGKKLNLVIKNEEYFEFVQNRNN 







Francisella novicida Cpf1 D917A/E1006A/D1255A 



(A917, A1006, and A1255 are bolded and underlined)


(SEQ ID NO: 23)



MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKAKQIIDKYHQFFIEEILSSVCIS






EDLLQNYSDVYFKLKKSDDDNLQKDFKSAKDTIKKQISEYIKDSEKFKNLFNQNLIDAKKGQESDLILW





LKQSKDNGIELFKANSDITDIDEALEIIKSFKGWTTYFKGFHENRKNVYSSNDIPTSIIYRIVDDNLPKFLE





NKAKYESLKDKAPEAINYEQIKKDLAELLTFDIDYKTSEVNQRVFSLDEVFEIANFNNYLNQSGITKFNT





IIGGKFVNGENTKRKGINEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVTTMQ





SFYEQIAAFKTVEEKSIKETLSLLFDDLKAQKLDLSKIYFKNDKSLTDLSQQVFDDYSVIGTAVLEYITQ





QIAPKNLDNPSKKEQELIAKKTEKAKYLSLETIKLALEEFNKHRDIDKQCRFEEILANFAAIPMIFDEIAQ





NKDNLAQISIKYQNQGKKDLLQASAEDDVKAIKDLLDQTNNLLHKLKIFHISQSEDKANILDKDEHFYL





VFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNFENSTLANGWDKNKEPDNTAILFIKDDKYYLGV





MNKKNNKIFDDKAIKENKGEGYKKIVYKLLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKNG





SPQKGYEKFEFNIEDCRKFIDFYKQSISKHPEWKDFGFRFSDTQRYNSIDEFYREVENQGYKLTFENISES





YIDSVVNQGKLYLFQIYNKDFSAYSKGRPNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKK





ITHPAKEAIANKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKFNDEINLLLKEKAND





VHILSIcustom-character RGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMKTNYHDKLAAIEKDRDSARKDWKKINNIKE





MKEGYLSQVVHEIAKLVIEYNAIVVFcustom-character DLNFGFKRGRFKVEKQVYQKLEKMLIEKLNYLVFKDNEFDK





TGGVLRAYQLTAPFETFKKMGKQTGIIYYVPAGFTSKICPVTGFVNQLYPKYESVSKSQEFFSKFDKICY





NLDKGYFEFSFDYKNFGDKAAKGKWTIASFGSRLINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEY





GHGECIKAAICGESDKKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNMPQDA






custom-character ANGAYHIGLKGLMLLGRIKNNQEGKKLNLVIKNEEYFEFVQNRNN 






An additional napDNAbp domain with altered PAM specificity, such as a domain 


having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence 


identity with wild type Geobacillus thermodenbrificans Cas9 (SEQ ID NO: 519):


(SEQ ID NO: 519)



MKYKIGLDIGITSIGWAVINLDIPRIEDLGVRIFDRAENPKTGESLALPRRLARSARRRLRRRK






HRLERIRRLFVREGILTKEELNKLFEKKHEIDVWQLRVEALDRKLNNDELARILLHLAKRRGF





RSNRKSERTNKENSTMLKHIEENQSILSSYRTVAEMVVKDPKFSLHKRNKEDNYTNTVARDD





LEREIKLIFAKQREYGNIVCTEAFEHEYISIWASQRPFASKDDIEKKVGFCTFEPKEKRAPKAT





YTFQSFTVWEHINKLRLVSPGGIRALTDDERRLIYKQAFHKNKITFHDVRTLLNLPDDTRFKG





LLYDRNTTLKENEKVRFLELGAYHKIRKAIDSVYGKGAAKSFRPIDFDTFGYALTMFKDDTDI





RSYLRNEYEQNGKRMENLADKVYDEELIEELLNLSFSKFGHLSLKALRNILPYMEQGEVYST





ACERAGYTFTGPKKKQKTVLLPNIPPIANPVVMRALTQARKVVNAIIKKYGSPVSIHIELAREL





SQSFDERRKMQKEQEGNRKKNETAIRQLVEYGLTLNPTGLDIVKFKLWSEQNGKCAYSLQPI





EIERLLEPGYTEVDHVIPYSRSLDDSYTNKVLVLTKENREKGNRTPAEYLGLGSERWQQFETF





VLTNKQFSKKKRDRLLRLHYDENEENEFKNRNLNDTRYISRFLANFIREHLKFADSDDKQKV





YTVNGRITAHLRSRWNFNKNREESNLHHAVDAAIVACTTPSDIARVTAFYQRREQNKELSKK





TDPQFPQPWPHFADELQARLSKNPKESIKALNLGNYDNEKLESLQPVFVSRMPKRSITGAAH





QETLRRYIGIDERSGKIQTVVKKKLSEIQLDKTGHFPMYGKESDPRTYEAIRQRLLEHNNDPK





KAFQEPLYKPKKNGELGPIIRTIKIIDTTNQVIPLNDGKTVAYNSNIVRVDVFEKDGKYYCVPI





YTIDMMKGILPNKAIEPNKPYSEWKEMTEDYTFRFSLYPNDLIRIEFPREKTIKTAVGEEIKIKD





LFAYYQTIDSSNGGLSLVSHDNNFSLRSIGSRTLKRFEKYQVDVLGNIYKVRGEKRVGVASSS





HSKAGETIRPL 






In some embodiments, the nucleic acid programmable DNA binding protein (napDNAbp) is a nucleic acid programmable DNA binding protein that does not require a canonical (NGG) PAM sequence. In some embodiments, the napDNAbp is an argonaute protein. One example of such a nucleic acid programmable DNA binding protein is an Argonaute protein from Natronobacterium gregoryi (NgAgo). NgAgo is an ssDNA-guided endonuclease. NgAgo binds 5′ phosphorylated ssDNA of ˜24 nucleotides (gDNA) to guide it to its target site and will make DNA double-strand breaks at the gDNA site. In contrast to Cas9, the NgAgo-gDNA system does not require a protospacer-adjacent motif (PAM). Using a nuclease inactive NgAgo (dNgAgo) can greatly expand the bases that may be targeted. The characterization and use of NgAgo have been described in Gao et al., Nat Biotechnol., 34(7): 768-73 (2016), PubMed PMID: 27136078; Swarts et al., Nature, 507(7491): 258-61 (2014); and Swarts et al., Nucleic Acids Res. 43(10) (2015): 5120-9, each of which is incorporated herein by reference. The sequence of Natronobacterium gregoryi Argonaute is provided in SEQ ID NO: 24.


The disclosed fusion proteins may comprise a napDNAbp domain having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with wild type Natronobacterium gregoryi Argonaute (SEQ ID NO: 24), which has the following amino acid sequence:











(SEQ ID NO: 24)



MTVIDLDSTTTADELTSGHTYDISVTLTGVYDNTD






EQHPRMSLAFEQDNGERRYITLWKNTTPKDVFTYD






YATGSTYIFTNIDYEVKDGYENLTATYQTTVENAT






AQEVGTTDEDETFAGGEPLDHHLDDALNETPDDAE






TESDSGHVMTSFASRDQLPEWTLHTYTLTATDGAK






TDTEYARRTLAYTVRQELYTDHDAAPVATDGLMLL






TPEPLGETPLDLDCGVRVEADETRTLDYTTAKDRL






LARELVEEGLKRSLWDDYLVRGIDEVLSKEPVLTC






DEFDLHERYDLSVEVGHSGRAYLHINFRHRFVPKL






TLADIDDDNIYPGLRVKTTYRPRRGHIVWGLRDEC






ATDSLNTLGNQSVVAYHRNNQTPINTDLLDAIEAA






DRRVVETRRQGHGDDAVSFPQELLAVEPNTHQIKQ






FASDGFHQQARSKTRLSASRCSEKAQAFAERLDPV






RLNGSTVEFSSEFFTGNNEQQLRLLYENGESVLTF






RDGARGAHPDETFSKGIVNPPESFEVAVVLPEQQA






DTCKAQWDTMADLLNQAGAPPTRSETVQYDAFSSP






ESISLNVAGAIDPSEVDAAFVVLPPDQEGFADLAS






PTETYDELKKALANMGIYSQMAYFDRFRDAKIFYT






RNVALGLLAAAGGVAFTTEHAMPGDADMFIGIDVS






RSYPEDGASGQINIAATATAVYKDGTILGHSSTRP






QLGEKLQSTDVRDIMKNAILGYQQVTGESPTHIVI






HRDGFMNEDLDPATEFLNEQGVEYDIVEIRKQPQT






RLLAVSDVQYDTPVKSIAAINQNEPRATVATFGAP






EYLATRDGGGLPRPIQIERVAGETDIETLTRQVYL






LSQSHIQVHNSTARLPITTAYADQASTHATKGYLV






QTGAFESNVGFL






Cas9 variant with decreased electrostatic



interactions between the Cas9 and DNA



backbone



(SEQ ID NO: 25)



DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLG






NTDRHSIKKNLIGALLFDSGETALATRLKRTARRR






YTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL






VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKK






LVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP






DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAI






LSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSL






GLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQ






IGDQYADLFLAAKNLSDAILLSDILRVNTEITKAP






LSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIF






FDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGT






EELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHA






ILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLA






RGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSF






IERMTAFDKNLPNEKVLPKHSLLYEYFTVYNELTK






VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTV






KQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHD






LLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREM






IEERLKTYAHLFDDKVMKQLKRRRYTGWGALSRKL






INGIRDKQSGKTILDFLKSDGFANRNFMALIHDDS






LTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKG






ILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQ






KGQKNSRERMKRIEEGIKELGSQILKEHPVENTQL






QNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHI






VPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVV






KKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSEL






DKAGFIKRQLVETRAITKHVAQILDSRMNTKYDEN






DKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY






HHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVY






DVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEIT






LANGEIRKRPLIETNGETGEIVWDKGRDFATVRKV






LSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA






RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKK






LKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVK






KDLIIKLPKYSLFELENGRKRMLASAGELQKGNEL






ALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQ






HKHYLDEIIEQISEFSKRVILADANLDKVLSAYNK






HRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTI






DRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLG






GD






CasY (ncbi.nlm.nih.gov/protein/APG80656.1)



>APG80656.1 CRISPR-associated protein CasY



[uncultured Parcubacteria group bacterium]



(SEQ ID NO: 26)



MSKRHPRISGVKGYRLHAQRLEYTGKSGAMRTIKY






PLYSSPSGGRTVPREIVSAINDDYVGLYGLSNFDD






LYNAEKRNEEKVYSVLDFWYDCVQYGAVFSYTAPG






LLKNVAEVRGGSYELTKTLKGSHLYDELQIDKVIK






FLNKKEISRANGSLDKLKKDIIDCFKAEYRERHKD






QCNKLADDIKNAKKDAGASLGERQKKLFRDFFGIS






EQSENDKPSFTNPLNLTCCLLPFDTVNNNRNRGEV






LFNKLKEYAQKLDKNEGSLEMWEYIGIGNSGTAFS






NFLGEGFLGRLRENKITELKKAMMDITDAWRGQEQ






EEELEKRLRILAALTIKLREPKFDNHWGGYRSDIN






GKLSSWLQNYINQTVKIKEDLKGHKKDLKKAKEMI






NRFGESDTKEEAVVSSLLESIEKIVPDDSADDEKP






DIPAIAIYRRFLSDGRLTLNRFVQREDVQEALIKE






RLEAEKKKKPKKRKKKSDAEDEKETIDFKELFPHL






AKPLKLVPNFYGDSKRELYKKYKNAAIYTDALWKA






VEKIYKSAFSSSLKNSFFDTDFDKDFFIKRLQKIF






SVYRRFNTDKWKPIVKNSFAPYCDIVSLAENEVLY






KPKQSRSRKSAAIDKNRVRLPSTENIAKAGIALAR






ELSVAGFDWKDLLKKEEHEEYIDLIELHKTALALL






LAVTETQLDISALDFVENGTVKDFMKTRDGNLVLE






GRFLEMFSQSIVFSELRGLAGLMSRKEFITRSAIQ






TMNGKQAELLYIPHEFQSAKITTPKEMSRAFLDLA






PAEFATSLEPESLSEKSLLKLKQMRYYPHYFGYEL






TRTGQGIDGGVAENALRLEKSPVKKREIKCKQYKT






LGRGQNKIVLYVRSSYYQTQFLEWFLHRPKNVQTD






VAVSGSFLIDEKKVKTRWNYDALTVALEPVSGSER






VFVSQPFTIFPEKSAEEEGQRYLGIDIGEYGIAYT






ALEITGDSAKILDQNFISDPQLKTLREEVKGLKLD






QRRGTFAMPSTKIARIRESLVHSLRNRIHHLALKH






KAKIVYELEVSRFEEGKQKIKKVYATLKKADVYSE






IDADKNLQTTVWGKLAVASEISASYTSQFCGACKK






LWRAEMQVDETITTQELIGTVRVIKGGTLIDAIKD






FMRPPIFDENDTPFPKYRDFCDKHHISKKMRGNSC






LFICPFCRANADADIQASQTIALLRYVKEEKKVED






YFERFRKLKNIKVLGQMKKI






High-fidelity Cas9 domain



(SEQ ID NO: 394)



DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLG






NTDRHSIKKNLIGALLFDSGETAEATRLKRTARRR






YTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL






VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKK






LVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP






DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAI






LSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSL






GLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQ






IGDQYADLFLAAKNLSDAILLSDILRVNTEITKAP






LSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIF






FDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGT






EELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHA






ILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLA






RGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSF






IERMTAFDKNLPNEKVLPKHSLLYEYFTVYNELTK






VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTV






KQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHD






LLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREM






IEERLKTYAHLFDDKVMKQLKRRRYTGWGALSRKL






INGIRDKQSGKTILDFLKSDGFANRNFMALIHDDS






LTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKG






ILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQ






KGQKNSRERMKRIEEGIKELGSQILKEHPVENTQL






QNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHI






VPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVV






KKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSEL






DKAGFIKRQLVETRAITKHVAQILDSRMNTKYDEN






DKLIREVKVITLKSKLVSDFRKDFQFYKVREINNY






HHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVY






DVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEIT






LANGEIRKRPLIETNGETGEIVWDKGRDFATVRKV






LSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA






RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKK






LKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVK






KDLIIKLPKYSLFELENGRKRMLASAGELQKGNEL






ALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQ






HKHYLDEIIEQISEFSKRVILADANLDKVLSAYNK






HRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTI






DRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLG






GD






C2c1 (uniprot.org/uniprot/TOD7A2#)



sp|T0D7A2|C2C1_ALIAG CRISPR-associated



endonuclease C2c1 OS = Alicyclobacillus




acidoterrestris (strain ATCC 49025/DSM




3922/CIP 106132/NCIMB 13137/GD3B)



GN = c2c1 PE = 1 SV = 1



(SEQ ID NO: 395)



MAVKSIKVKLRLDDMPEIRAGLWKLHKEVNAGVRY






YTEWLSLLRQENLYRRSPNGDGEQECDKTAEECKA






ELLERLRARQVENGHRGPAGSDDELLQLARQLYEL






LVPQAIGAKGDAQQIARKFLSPLADKDAVGGLGIA






KAGNKPRWVRMREAGEPGWEEEKEKAETRKSADRT






ADVLRALADFGLKPLMRVYTDSEMSSVEWKPLRKG






QAVRTWDRDMFQQAIERMMSWESWNQRVGQEYAKL






VEQKNRFEQKNFVGQEHLVHLVNQLQQDMKEASPG






LESKEQTAHYVTGRALRGSDKVFEKWGKLAPDAPF






DLYDAEIKNVQRRNTRRFGSHDLFAKLAEPEYQAL






WREDASFLTRYAVYNSILRKLNHAKMFATFTLPDA






TAHPIWTRFDKLGGNLHQYTFLFNEFGERRHAIRF






HKLLKVENGVAREVDDVTVPISMSEQLDNLLPRDP






NEPIALYFRDYGAEQHFTGEFGGAKIQCRRDQLAH






MHRRRGARDVYLNVSVRVQSQSEARGERRPPYAAV






FRLVGDNHRAFVHFDKLSDYLAEHPDDGKLGSEGL






LSGLRVMSVDLGLRTSASISVFRVARKDELKPNSK






GRVPFFFPIKGNDNLVAVHERSQLLKLPGETESKD






LRAIREERQRTLRQLRTQLAYLRLLVRCGSEDVGR






RERSWAKLIEQPVDAANHMTPDWREAFENELQKLK






SLHGICSDKEWMDAVYESVRRVWRHMGKQVRDWRK






DVRSGERPKIRGYAKDVVGGNSIEQIEYLERQYKF






LKSWSFFGKVSGQVIRAEKGSRFAITLREHIDHAK






EDRLKKLADRIIMEALGYVYALDERGKGKWVAKYP






PCQLILLEELSEYQFNNDRPPSENNQLMQWSHRGV






FQELINQAQVHDLLVGTMYAAFSSRFDARTGAPGI






RCRRVPARCTQEHNPEPFPWWLNKFVVEHTLDACP






LRADDLIPTGEGEIFVSPFSAEEGDFHQIHADLNA






AQNLQQRLWSDFDISQIRLRCDWGEVDGELVLIPR






LTGKRTADSYSNKVFYTNTGVTYYERERGKKRRKV






FAQEKLSEEEAELLVEADEAREKSVVLMRDPSGII






NRGNWTRQKEFWSMVNQRIEGYLVKQIRSRVPLQD






SACENTGDI






C2c2 (uniprot.org/uniprot/P0DOC6)



>sp|P0DOC6|C2C2 LEPSD CRISPR-associated



endoribonuclease C2c2 OS = Leptotrichia




shahii (strain DSM 19757/CCUG 47503/




CIP 107916/JCM 16776/LB37)



GN = c2c2 PE = 1 SV = 1



(SEQ ID NO: 396)



MGNLFGHKRWYEVRDKKDFKIKRKVKVKRNYDGNK






YILNINENNNKEKIDNNKFIRKYINYKKNDNILKE






FTRKFHAGNILFKLKGKEGIIRIENNDDFLETEEV






VLYIEAYGKSEKLKALGITKKKIIDEAIRQGITKD






DKKIEIKRQENEEEIEIDIRDEYTNKTLNDCSIIL






RIIENDELETKKSIYEIFKNINMSLYKIIEKIIEN






ETEKVFENRYYEEHLREKLLKDDKIDVILTNFMEI






REKIKSNLEILGFVKFYLNVGGDKKKSKNKKMLVE






KILNINVDLTVEDIADFVIKELEFWNITKRIEKVK






KVNNEFLEKRRNRTYIKSYVLLDKHEKFKIERENK






KDKIVKFFVENIKNNSIKEKIEKILAEFKIDELIK






KLEKELKKGNCDTEIFGIFKKHYKVNFDSKKFSKK






SDEEKELYKIIYRYLKGRIEKILVNEQKVRLKKME






KIEIEKILNESILSEKILKRVKQYTLEHIMYLGKL






RHNDIDMTTVNTDDFSRLHAKEELDLELITFFAST






NMELNKIFSRENINNDENIDFFGGDREKNYVLDKK






ILNSKIKIIRDLDFIDNKNNITNNFIRKFTKIGTN






ERNRILHAISKERDLQGTQDDYNKVINIIQNLKIS






DEEVSKALNLDVVFKDKKNIITKINDIKISEENNN






DIKYLPSFSKVLPEILNLYRNNPKNEPFDTIETEK






IVLNALIYVNKELYKKLILEDDLEENESKNIFLQE






LKKTLGNIDEIDENIIENYYKNAQISASKGNNKAI






KKYQKKVIECYIGYLRKNYEELFDFSDFKMNIQEI






KKQIKDINDNKTYERITVKTSDKTIVINDDFEYII






SIFALLNSNAVINKIRNRFFATSVWLNTSEYQNII






DILDEIMQLNTLRNECITENWNLNLEEFIQKMKEI






EKDFDDFKIQTKKEIFNNYYEDIKNNILTEFKDDI






NGCDVLEKKLEKIVIFDDETKFEIDKKSNILQDEQ






RKLSNINKKDLKKKVDQYIKDKDQEIKSKILCRII






FNSDFLKKYKKEIDNLIEDMESENENKFQEIYYPK






ERKNELYIYKKNLFLNIGNPNFDKIYGLISNDIKM






ADAKFLFNIDGKNIRKNKISEIDAILKNLNDKLNG






YSKEYKEKYIKKLKENDDFFAKNIQNKNYKSFEKD






YNRVSEYKKIRDLVEFNYLNKIESYLIDINWKLAI






QMARFERDMHYIVNGLRELGIIKLSGYNTGISRAY






PKRNGSDGFYTTTAYYKFFDEESYKKFEKICYGFG






IDLSENSEINKPENESIRNYISHFYIVRNPFADYS






IAEQIDRVSNLLSYSTRYNNSTYASVFEVFKKDVN






LDYDELKKKFKLIGNNDILERLMKPKKVSVLELES






YNSDYIKNLIIELLTKIENTNDTL






C2c3, translated from >CEPX01008730.1 marine



metagenome genome assembly TARA_037_MES_0.1-



0.22_contig TARA_037_MES_0.1-0.22_



scaffo1d22115_1, whole genome shotgun



sequence.



(SEQ ID NO: 397)



MRSNYHGGRNARQWRKQISGLARRTKETVFTYKFP






LETDAAEIDFDKAVQTYGIAEGVGHGSLIGLVCAF






HLSGFRLFSKAGEAMAFRNRSRYPTDAFAEKLSAI






MGIQLPTLSPEGLDLIFQSPPRSRDGIAPVWSENE






VRNRLYTNWTGRGPANKPDEHLLEIAGEIAKQVFP






KFGGWDDLASDPDKALAAADKYFQSQGDFPSIASL






PAAIMLSPANSTVDFEGDYIAIDPAAETLLHQAVS






RCAARLGRERPDLDQNKGPFVSSLQDALVSSQNNG






LSWLFGVGFQHWKEKSPKELIDEYKVPADQHGAVT






QVKSFVDAIPLNPLFDTTHYGEFRASVAGKVRSWV






ANYWKRLLDLKSLLATTEFTLPESISDPKAVSLFS






GLLVDPQGLKKVADSLPARLVSAEEAIDRLMGVGI






PTAADIAQVERVADEIGAFIGQVQQFNNQVKQKLE






NLQDADDEEFLKGLKIELPSGDKEPPAINRISGGA






PDAAAEISELEEKLQRLLDARSEHFQTISEWAEEN






AVTLDPIAAMVELERLRLAERGATGDPEEYALRLL






LQRIGRLANRVSPVSAGSIRELLKPVFMEEREFNL






FFHNRLGSLYRSPYSTSRHQPFSIDVGKAKAIDWI






AGLDQISSDIEKALSGAGEALGDQLRDWINLAGFA






ISQRLRGLPDTVPNALAQVRCPDDVRIPPLLAMLL






EEDDIARDVCLKAFNLYVSAINGCLFGALREGFIV






RTRFQRIGTDQIHYVPKDKAWEYPDRLNTAKGPIN






AAVSSDWIEKDGAVIKPVETVRNLSSTGFAGAGVS






EYLVQAPHDWYTPLDLRDVAHLVTGLPVEKNITKL






KRLTNRTAFRMVGASSFKTHLDSVLLSDKIKLGDF






TIIIDQHYRQSVTYGGKVKISYEPERLQVEAAVPV






VDTRDRTVPEPDTLFDHIVAIDLGERSVGFAVFDI






KSCLRTGEVKPIHDNNGNPVVGTVAVPSIRRLMKA






VRSHRRRRQPNQKVNQTYSTALQNYRENVIGDVCN






RIDTLMERYNAFPVLEFQIKNFQAGAKQLEIVYGS







S. canis (ScCas9)




(SEQ ID NO: 520)



MEKKYSIGLDIGTNSVGWAVITDDYKVPSKKFKVL






GNTNRKSIKKNLMGALLFDSGETAEATRLKRTARR






RYTRRKNRIRYLQEIFANEMAKLDDSFFQRLEESF






LVEEDKKNERHPIFGNLADEVAYHRNYPTIYHLRK






KLADSPEKADLRLIYLALAHIIKFRGHFLIEGKLN






AENSDVAKLFYQLIQTYNQLFEESPLDEIEVDAKG






ILSARLSKSKRLEKLIAVFPNEKKNGLFGNIIALA






LGLTPNFKSNFDLTEDAKLQLSKDTYDDDLDELLG






QIGDQYADLFSAAKNLSDAILLSDILRSNSEVTKA






PLSASMVKRYDEHHQDLALLKTLVRQQFPEKYAEI






FKDDTKNGYAGYVGIGIKHRKRTTKLATQEEFYKF






IKPILEKMDGAEELLAKLNRDDLLRKQRTFDNGSI






PHQIHLKELHAILRRQEEFYPFLKENREKIEKILT






FRIPYYVGPLARGNSRFAWLTRKSEEAITPWNFEE






VVDKGASAQSFIERMTNFDEQLPNKKVLPKHSLLY






EYFTVYNELTKVKYVTERMRKPEFLSGEQKKAIVD






LLFKTNRKVTVKQLKEDYFKKIECFDSVEIIGVED






RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIV






LTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRH






YTGWGRLSRKMINGIRDKQSGKTILDFLKSDGFSN






RNFMQLIHDDSLTFKEEIEKAQVSGQGDSLHEQIA






DLAGSPAIKKGILQTVKIVDELVKVMGHKPENIVI






EMARENQTTTKGLQQSRERKKRIEEGIKELESQIL






KENPVENTQLQNEKLYLYYLQNGRDMYVDQELDIN






RLSDYDVDHIVPQSFIKDDSIDNKVLTRSVENRGK






SDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLT






KAERGGLSEADKAGFIKRQLVETRQITKHVARILD






SRMNTKRDKNDKPIREVKVITLKSKLVSDFRKDFQ






LYKVRDINNYHHAHDAYLNAVVGTALIKKYPKLES






EFVYGDYKVYDVRKMIAKSEQEIGKATAKRFFYSN






IMNFFKTEVKLANGEIRKRPLIETNGETGEVVWNK






EKDFATVRKVLAMPQVNIVKKTEVQTGGFSKESIL






SKRESAKLIPRKKGWDTRKYGGFGSPTVAYSILVV






AKVEKGKAKKLKSVKVLVGITIMEKGSYEKDPIGF






LEAKGYKDIKKELIFKLPKYSLFELENGRRRMLAS






ATELQKANELVLPQHLVRLLYYTQNISATTGSNNL






GYIEQHREEFKEIFEKIIDFSEKYILKNKVNSNLK






SSFDEQFAVSDSILLSNSFVSLLKYTSFGASGGFT






FLDLDVKQGRLRYQTVTEVLDATLIYQSITGLYET






RTDLSQLGGD






In some embodiments, the base editors described herein can include any Cas9 equivalent. As used herein, the term “Cas9 equivalent” is a broad term that encompasses any napDNAbp protein that serves the same function as Cas9 in the present base editors despite that its amino acid primary sequence and/or its three-dimensional structure may be different and/or unrelated from an evolutionary standpoint. Thus, while Cas9 equivalents include any Cas9 ortholog, homolog, mutant, or variant described or embraced herein that are evolutionarily related, the Cas9 equivalents also embrace proteins that may have evolved through convergent evolution processes to have the same or similar function as Cas9, but which do not necessarily have any similarity with regard to amino acid sequence and/or three dimensional structure. The base editors described here embrace any Cas9 equivalent that would provide the same or similar function as Cas9 despite that the Cas9 equivalent may be based on a protein that arose through convergent evolution.


For example, CasX is a Cas9 equivalent that reportedly has the same function as Cas9 but which evolved through convergent evolution. Thus, the CasX protein described in Liu et al., “CasX enzymes comprises a distinct family of RNA-guided genome editors,” Nature, 2019, Vol. 566: 218-223, is contemplated to be used with the base editors described herein. In addition, any variant or modification of CasX is conceivable and within the scope of the present disclosure.


Cas9 is a bacterial enzyme that evolved in a wide variety of species. However, the Cas9 equivalents contemplated herein may also be obtained from archaea, which constitute a domain and kingdom of single-celled prokaryotic microbes different from bacteria.


In some embodiments, Cas9 equivalents may refer to CasX or CasY, which have been described in, for example, Burstein et al., “New CRISPR-Cas systems from uncultivated microbes.” Cell Res. 2017 Feb. 21. doi: 10.1038/cr.2017.21, the entire contents of which is hereby incorporated by reference. Using genome-resolved metagenomics, a number of CRISPR-Cas systems were identified, including the first reported Cas9 in the archaeal domain of life. This divergent Cas9 protein was found in little-studied nanoarchaea as part of an active CRISPR-Cas system. In bacteria, two previously unknown systems were discovered, CRISPR-CasX and CRISPR-CasY, which are among the most compact systems yet discovered. In some embodiments, Cas9 refers to CasX, or a variant of CasX. In some embodiments, Cas9 refers to a CasY, or a variant of CasY. It should be appreciated that other RNA-guided DNA binding proteins may be used as a nucleic acid programmable DNA binding protein (napDNAbp), and are within the scope of this disclosure. Also see Liu et al., “CasX enzymes comprises a distinct family of RNA-guided genome editors,” Nature, 2019, Vol. 566: 218-223. Any of these Cas9 equivalents are contemplated.


In some embodiments, the Cas9 equivalent comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring CasX or CasY protein. In some embodiments, the napDNAbp is a naturally-occurring CasX or CasY protein. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a wild-type Cas moiety or any Cas moiety provided herein.


In various embodiments, the nucleic acid programmable DNA binding proteins include, without limitation, Cas9 (e.g., dCas9 and nCas9), CasX, CasY, Cpf1, C2c1, C2c2, C2C3, Argonaute, Cas12a, and Cas12b. One example of a nucleic acid programmable DNA-binding protein that has different PAM specificity than Cas9 is Clustered Regularly Interspaced Short Palindromic Repeats from Prevotella and Francisella 1 (Cpf1). Similar to Cas9, Cpf1 is also a class 2 CRISPR effector. It has been shown that Cpf1 mediates robust DNA interference with features distinct from Cas9. Cpf1 is a single RNA-guided endonuclease lacking tracrRNA, and it utilizes a T-rich protospacer-adjacent motif (TTN, TTTN, or YTN). Moreover, Cpf1 cleaves DNA via a staggered DNA double-stranded break. Out of 16 Cpf1-family proteins, two enzymes from Acidaminococcus and Lachnospiraceae are shown to have efficient genome-editing activity in human cells. Cpf1 proteins are known in the art and have been described previously, for example Yamano et al., “Crystal structure of Cpf1 in complex with guide RNA and target DNA.” Cell (165) 2016, p. 949-962; the entire contents of which is hereby incorporated by reference. The state of the art may also now refer to Cpf1 enzymes as Cas12a.


In still other embodiments, the Cas protein may include any CRISPR associated protein, including but not limited to, Cas12a, Cas12b, Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas10, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2. Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, homologs thereof, or modified versions thereof, and preferably comprising a nickase mutation (e.g., a mutation corresponding to the D10A mutation of the wild type Cas9 polypeptide of SEQ ID NO: 1).


In various other embodiments, the napDNAbp can be any of the following proteins: a Cas9, a Cpf1, a CasX, a CasY, a C2c1, a C2c2, a C2c3, a GeoCas9, a CjCas9, a Cas12a, a Cas12b, a Cas12g, a Cas12h, a Cas12i, a Cas13b, a Cas13c, a Cas13d, a Cas14, a Csn2, an xCas9, an SpCas9-NG, a circularly permuted Cas9, or an Argonaute (Ago) domain, or a variant thereof.


Exemplary Cas9 equivalent protein sequences can include the following:













Description
Sequence







AsCas12a
MTQFEGFTNLYQVSKTLRFELIPQG


(previously
KTLKHIQEQGFIEEDKARNDHYKEL


known as
KPIIDRIYKTYADQCLQLVQLDWEN


Cpf1)
LSAAIDSYRKEKTEETRNALIEEQA



Acidaminococcus sp.

TYRNAIHDYFIGRTDNLTDAINKRH


(strain
AEIYKGLFKAELFNGKVLKQLGTVT


BV3L6)
TTEHENALLRSFDKFTTYFSGFYEN


UniProtKB
RKNVFSAEDISTAIPHRIVQDNFPK


U2UMQ6
FKENCHIFTRLITAVPSLREHFENV



KKAIGIFVSTSIEEVFSFPFYNQLL



TQTQIDLYNQLLGGISREAGTEKIK



GLNEVLNLAIQKNDETAHIIASLPH



RFIPLFKQILSDRNTLSFILEEFKS



DEEVIQSFCKYKTLLRNENVLETAE



ALFNELNSIDLTHIFISHKKLETIS



SALCDHWDTLRNALYERRISELTGK



ITKSAKEKVRQRSLKHEDINLQEII



SAAGKELSEAFKQKTSEILSHAHAA



LDQPLPTTLKKQEEKEILKSQLDSL



LGLYHLLDWFAVDESNEVDPEFSAR



LTGIKLEMEPSLSFYNKARNYATKK



PYSVEKFKLNFQMPTLASGWDVNKE



KNNGAILFVKNGLYYLGIMPKQKGR



YKALSFEPTEKTSEGFDKMYYDYFP



DAAKMIPKCSTQLKAVTAHFQTHTT



PILLSNNFIEPLEITKEIYDLNNPE



KEPKKFQTAYAKKTGDQKGYREALC



KWIDFTRDFLSKYTKTTSIDLSSLR



PSSQYKDLGEYYAELNPLLYHISFQ



RIAEKEIMDAVETGKLYLFQIYNKD



FAKGHHGKPNLHTLYWTGLFSPENL



AKTSIKLNGQAELFYRPKSRMKRMA



HRLGEKMLNKKLKDQKTPIPDTLYQ



ELYDYVNHRLSHDLSDEARALLPNV



ITKEVSHEIIKDRRFTSDKFFFHVP



ITLNYQAANSPSKFNQRVNAYLKEH



PETPIIGIDRGERNLIYITVIDSTG



KILEQRSLNTIQQFDYQKKLDNREK



ERVAARQAWSVVGTIKDLKQGYLSQ



VIHEIVDLMIHYQAVVVLENLNFGF



KSKRTGIAEKAVYQQFEKMLIDKLN



CLVLKDYPAEKVGGVLNPYQLTDQF



TSFAKMGTQSGFLFYVPAPYTSKID



PLTGFVDPFVWKTIKNHESRKHFLE



GFDFLHYDVKTGDFILHFKMNRNLS



FQRGLPGFMPAWDIVFEKNETQFDA



KGTPFIAGKRIVPVIENHRFTGRYR



DLYPANELIALLEEKGIVFRDGSNI



LPKLLENDDSHAIDTMVALIRSVLQ



MRNSNAATGEDYINSPVRDLNGVCF



DSRFQNPEWPMDADANGAYHIALKG



QLLLNHLKESKDLKLQNGISNQDWL



AYIQELRN (SEQ ID NO: 120)





AsCas12a
MTQFEGFTNLYQVSKTLRFELIPQG


nickase
KTLKHIQEQGFIEEDKARNDHYKEL


(e.g.,
KPIIDRIYKTYADQCLQLVQLDWEN


R1226A)
LSAAIDSYRKEKTEETRNALIEEQA



TYRNAIHDYFIGRTDNLTDAINKRH



AEIYKGLFKAELFNGKVLKQLGTVT



TTEHENALLRSFDKFTTYFSGFYEN



RKNVFSAEDISTAIPHRIVQDNFPK



FKENCHIFTRLITAVPSLREHFENV



KKAIGIFVSTSIEEVFSFPFYNQLL



TQTQIDLYNQLLGGISREAGTEKIK



GLNEVLNLAIQKNDETAHIIASLPH



RFIPLFKQILSDRNTLSFILEEFKS



DEEVIQSFCKYKTLLRNENVLETAE



ALFNELNSIDLTHIFISHKKLETIS



SALCDHWDTLRNALYERRISELTGK



ITKSAKEKVRQRSLKHEDINLQEII



SAAGKELSEAFKQKTSEILSHAHAA



LDQPLPTTLKKQEEKEILKSQLDSL



LGLYHLLDWFAVDESNEVDPEFSAR



LTGIKLEMEPSLSFYNKARNYATKK



PYSVEKFKLNFQMPTLASGWDVNKE



KNNGAILFVKNGLYYLGIMPKQKGR



YKALSFEPTEKTSEGFDKMYYDYFP



DAAKMIPKCSTQLKAVTAHFQTHTT



PILLSNNFIEPLEITKEIYDLNNPE



KEPKKFQTAYAKKTGDQKGYREALC



KWIDFTRDFLSKYTKTTSIDLSSLR



PSSQYKDLGEYYAELNPLLYHISFQ



RIAEKEIMDAVETGKLYLFQIYNKD



FAKGHHGKPNLHTLYWTGLFSPENL



AKTSIKLNGQAELFYRPKSRMKRMA



HRLGEKMLNKKLKDQKTPIPDTLYQ



ELYDYVNHRLSHDLSDEARALLPNV



ITKEVSHEIIKDRRFTSDKFFFHVP



ITLNYQAANSPSKFNQRVNAYLKEH



PETPIIGIDRGERNLIYITVIDSTG



KILEQRSLNTIQQFDYQKKLDNREK



ERVAARQAWSVVGTIKDLKQGYLSQ



VIHEIVDLMIHYQAVVVLENLNFGF



KSKRTGIAEKAVYQQFEKMLIDKLN



CLVLKDYPAEKVGGVLNPYQLTDQF



TSFAKMGTQSGFLFYVPAPYTSKID



PLTGFVDPFVWKTIKNHESRKHFLE



GFDFLHYDVKTGDFILHFKMNRNLS



FQRGLPGFMPAWDIVFEKNETQFDA



KGTPFIAGKRIVPVIENHRFTGRYR



DLYPANELIALLEEKGIVFRDGSNI



LPKLLENDDSHAIDTMVALIRSVLQ



MANSNAATGEDYINSPVRDLNGVCF



DSRFQNPEWPMDADANGAYHIALKG



QLLLNHLKESKDLKLQNGISNQDWL



AYIQELRN (SEQ ID NO: 121)





LbCas12a
MNYKTGLEDFIGKESLSKTLRNALI


(previously
PTESTKIHMEEMGVIRDDELRAEKQ


known as
QELKEIMDDYYRTFIEEKLGQIQGI


Cpf1)
QWNSLFQKMEETMEDISVRKDLDKI



Lachnospiraceae

QNEKRKEICCYFTSDKRFKDLFNAK



bacterium

LITDILPNFIKDNKEYTEEEKAEKE


GAM79
QTRVLFQRFATAFTNYFNQRRNNFS


Ref Seq.
EDNISTAISFRIVNENSEIHLQNMR


WP_
AFQRIEQQYPEEVCGMEEEYKDMLQ


119623382.1
EWQMKHIYSVDFYDRELTQPGIEYY



NGICGKINEHMNQFCQKNRINKNDF



RMKKLHKQILCKKSSYYEIPFRFES



DQEVYDALNEFIKTMKKKEIIRRCV



HLGQECDDYDLGKIYISSNKYEQIS



NALYGSWDTIRKCIKEEYMDALPGK



GEKKEEKAEAAAKKEEYRSIADIDK



IISLYGSEMDRTISAKKCITEICDM



AGQISIDPLVCNSDIKLLQNKEKTT



EIKTILDSFLHVYQWGQTFIVSDII



EKDSYFYSELEDVLEDFEGITTLYN



HVRSYVTQKPYSTVKFKLHFGSPTL



ANGWSQSKEYDNNAILLMRDQKFYL



GIFNVRNKPDKQIIKGHEKEEKGDY



KKMIYNLLPGPSKMLPKVFITSRSG



QETYKPSKHILDGYNEKRHIKSSPK



FDLGYCWDLIDYYKECIHKHPDWKN



YDFHFSDTKDYEDISGFYREVEMQG



YQIKWTYISADEIQKLDEKGQIFLF



QIYNKDFSVHSTGKDNLHTMYLKNL



FSEENLKDIVLKLNGEAELFFRKAS



IKTPIVHKKGSVLVNRSYTQTVGNK



EIRVSIPEEYYTEIYNYLNHIGKGK



LSSEAQRYLDEGKIKSFTATKDIVK



NYRYCCDHYFLHLPITINFKAKSDV



AVNERTLAYIAKKEDIHIIGIDRGE



RNLLYISVVDVHGNIREQRSFNIVN



GYDYQQKLKDREKSRDAARKNWEEI



EKIKELKEGYLSMVIHYIAQLVVKY



NAVVAMEDLNYGFKTGRFKVERQVY



QKFETMLIEKLHYLVFKDREVCEEG



GVLRGYQLTYIPESLKKVGKQCGFI



FYVPAGYTSKIDPTTGFVNLFSFKN



LTNRESRQDFVGKFDEIRYDRDKKM



FEFSFDYNNYIKKGTILASTKWKVY



TNGTRLKRIVVNGKYTSQSMEVELT



DAMEKMLQRAGIEYHDGKDLKGQIV



EKGIEAEIIDIFRLTVQMRNSRSES



EDREYDRLISPVLNDKGEFFDTATA



DKTLPQDADANGAYCIALKGLYEVK



QIKENWKENEQFPRNKLVQDNKTWF



DFMQKKRYL (SEQ ID NO: 122)





PcCas12a-
MAKNFEDFKRLYSLSKTLRFEAKPI


previously
GATLDNIVKSGLLDEDEHRAASYVK


known at
VKKLIDEYHKVFIDRVLDDGCLPLE


Cpf1
NKGNNNSLAEYYESYVSRAQDEDAK



Prevotella

KKFKEIQQNLRSVIAKKLTEDKAYA



copri

NLFGNKLIESYKDKEDKKKIIDSDL


Ref Seq.
IQFINTAESTQLDSMSQDEAKELVK


WP_
EFWGFVTYFYGFFDNRKNMYTAEEK


119227726.1
STGIAYRLVNENLPKFIDNIEAFNR



AITRPEIQENMGVLYSDFSEYLNVE



SIQEMFQLDYYNMLLTQKQIDVYNA



IIGGKTDDEHDVKIKGINEYINLYN



QQHKDDKLPKLKALFKQILSDRNAI



SWLPEEFNSDQEVLNAIKDCYERLA



ENVLGDKVLKSLLGSLADYSLDGIF



IRNDLQLTDISQKMFGNWGVIQNAI



MQNIKRVAPARKHKESEEDYEKRIA



GIFKKADSFSISYINDCLNEADPNN



AYFVENYFATFGAVNTPTMQRENLF



ALVQNAYTEVAALLHSDYPTVKHLA



QDKANVSKIKALLDAIKSLQHFVKP



LLGKGDESDKDERFYGELASLWAEL



DTVTPLYNMIRNYMTRKPYSQKKIK



LNFENPQLLGGWDANKEKDYATIIL



RRNGLYYLAIMDKDSRKLLGKAMPS



DGECYEKMVYKFFKDVTTMIPKCST



QLKDVQAYFKVNTDDYVLNSKAFNK



PLTITKEVFDLNNVLYGKYKKFQKG



YLTATGDNVGYTHAVNVWIKFCMDF



LNSYDSTCIYDFSSLKPESYLSLDA



FYQDANLLLYKLSFARASVSYINQL



VEEGKMYLFQIYNKDFSEYSKGTPN



MHTLYWKALFDERNLADVVYKLNGQ



AEMFYRKKSIENTHPTHPANHPILN



KNKDNKKKESLFDYDLIKDRRYTVD



KFMFHVPITMNFKSVGSENINQDVK



AYLRHADDMHIIGIDRGERHLLYLV



VIDLQGNIKEQYSLNEIVNEYNGNT



YHTNYHDLLDVREEERLKARQSWQT



IENIKELKEGYLSQVIHKITQLMVR



YHAIVVLEDLSKGFMRSRQKVEKQV



YQKFEKMLIDKLNYLVDKKTDVSTP



GGLLNAYQLTCKSDSSQKLGKQSGF



LFYIPAWNTSKIDPVTGFVNLLDTH



SLNSKEKIKAFFSKFDAIRYNKDKK



WFEFNLDYDKFGKKAEDTRTKWTLC



TRGMRIDTFRNKEKNSQWDNQEVDL



TTEMKSLLEHYYIDIHGNLKDAISA



QTDKAFFTGLLHILKLTLQMRNSIT



GTETDYLVSPVADENGIFYDSRSCG



NQLPENADANGAYNIARKGLMLIEQ



IKNAEDLNNVKFDISNKAWLNFAQQ



KPYKNG (SEQ ID NO: 123)





ErCas12a-
MFSAKLISDILPEFVIHNNNYSASE


previously
KEEKTQVIKLFSRFATSFKDYFKNR


known at
ANCFSANDISSSSCHRIVNDNAEIF


Cpf1
FSNALVYRRIVKNLSNDDINKISGD



Eubacterium

MKDSLKEMSLEEIYSYEKYGEFITQ



rectale

EGISFYNDICGKVNLFMNLYCQKNK


Ref Seq.
ENKNLYKLRKLHKQILCIADTSYEV


WP_11922364
PYKFESDEEVYQSVNGFLDNISSKH


2.1
IVERLRKIGENYNGYNLDKIYIVSK



FYESVSQKTYRDWETINTALEIHYN



NILPGNGKSKADKVKKAVKNDLQKS



ITEINELVSNYKLCPDDNIKAETYI



HEISHILNNFEAQELKYNPEIHLVE



SELKASELKNVLDVIMNAFHWCSVF



MTEELVDKDNNFYAELEEIYDEIYP



VISLYNLVRNYVTQKPYSTKKIKLN



FGIPTLADGWSKSKEYSNNAIILMR



DNLYYLGIFNAKNKPDKKIIEGNTS



ENKGDYKKMIYNLLPGPNKMIPKVF



LSSKTGVETYKPSAYILEGYKQNKH



LKSSKDFDITFCHDLIDYFKNCIAI



HPEWKNFGFDFSDTSTYEDISGFYR



EVELQGYKIDWTYISEKDIDLLQEK



GQLYLFQIYNKDFSKKSSGNDNLHT



MYLKNLFSEENLKDIVLKLNGEAEI



FFRKSSIKNPIIHKKGSILVNRTYE



AEEKDQFGNIQIVRKTIPENIYQEL



YKYFNDKSDKELSDEAAKLKNVVGH



HEAATNIVKDYRYTYDKYFLHMPIT



INFKANKTSFINDRILQYIAKEKDL



HVIGIDRGERNLIYVSVIDTCGNIV



EQKSFNIVNGYDYQIKLKQQEGARQ



IARKEWKEIGKIKEIKEGYLSLVIH



EISKMVIKYNAIIAMEDLSYGFKKG



RFKVERQVYQKFETMLINKLNYLVF



KDISITENGGLLKGYQLTYIPDKLK



NVGHQCGCIFYVPAAYTSKIDPTTG



FVNIFKFKDLTVDAKREFIKKFDSI



RYDSDKNLFCFTFDYNNFITQNTVM



SKSSWSVYTYGVRIKRRFVNGRFSN



ESDTIDITKDMEKTLEMTDINWRDG



HDLRQDIIDYEIVQHIFEIFKLTVQ



MRNSLSELEDRDYDRLISPVLNENN



IFYDSAKAGDALPKDADANGAYCIA



LKGLYEIKQITENWKEDGKFSRDKL



KISNKDWFDFIQNKRYL



(SEQ ID NO: 124)





CsCas12a-
MNYKTGLEDFIGKESLSKTLRNALI


previously
PTESTKIHMEEMGVIRDDELRAEKQ


known at
QELKEIMDDYYRAFIEEKLGQIQGI


Cpf1
QWNSLFQKMEETMEDISVRKDLDKI



Clostridium sp.

QNEKRKEICCYFTSDKRFKDLFNAK


AF34-
LITDILPNFIKDNKEYTEEEKAEKE


10BH
QTRVLFQRFATAFTNYFNQRRNNFS


Ref Seq.
EDNISTAISFRIVNENSEIHLQNMR


WP_
AFQRIEQQYPEEVCGMEEEYKDMLQ


118538418.1
EWQMKHIYLVDFYDRVLTQPGIEYY



NGICGKINEHMNQFCQKNRINKNDF



RMKKLHKQILCKKSSYYEIPFRFES



DQEVYDALNEFIKTMKEKEIICRCV



HLGQKCDDYDLGKIYISSNKYEQIS



NALYGSWDTIRKCIKEEYMDALPGK



GEKKEEKAEAAAKKEEYRSIADIDK



IISLYGSEMDRTISAKKCITEICDM



AGQISTDPLVCNSDIKLLQNKEKTT



EIKTILDSFLHVYQWGQTFIVSDII



EKDSYFYSELEDVLEDFEGITTLYN



HVRSYVTQKPYSTVKFKLHFGSPTL



ANGWSQSKEYDNNAILLMRDQKFYL



GIFNVRNKPDKQIIKGHEKEEKGDY



KKMIYNLLPGPSKMLPKVFITSRSG



QETYKPSKHILDGYNEKRHIKSSPK



FDLGYCWDLIDYYKECIHKHPDWKN



YDFHFSDTKDYEDISGFYREVEMQG



YQIKWTYISADEIQKLDEKGQIFLF



QIYNKDFSVHSTGKDNLHTMYLKNL



FSEENLKDIVLKLNGEAELFFRKAS



IKTPVVHKKGSVLVNRSYTQTVGDK



EIRVSIPEEYYTEIYNYLNHIGRGK



LSTEAQRYLEERKIKSFTATKDIVK



NYRYCCDHYFLHLPITINFKAKSDI



AVNERTLAYIAKKEDIHIIGIDRGE



RNLLYISVVDVHGNIREQRSFNIVN



GYDYQQKLKDREKSRDAARKNWEEI



EKIKELKEGYLSMVIHYIAQLVVKY



NAVVAMEDLNYGFKTGRFKVERQVY



QKFETMLIEKLHYLVFKDREVCEEG



GVLRGYQLTYIPESLKKVGKQCGFI



FYVPAGYTSKIDPTTGFVNLFSFKN



LTNRESRQDFVGKFDEIRYDRDKKM



FEFSFDYNNYIKKGTMLASTKWKVY



TNGTRLKRIVVNGKYTSQSMEVELT



DAMEKMLQRAGIEYHDGKDLKGQIV



EKGIEAEIIDIFRLTVQMRNSRSES



EDREYDRLISPVLNDKGEFFDTATA



DKTLPQDADANGAYCIALKGLYEVK



QIKENWKENEQFPRNKLVQDNKTWF



DFMQKKRYL



(SEQ ID NO: 125)





BhCas12b
MATRSFILKIEPNEEVKKGLWKTHE



Bacillus

VLNHGIAYYMNILKLIRQEAIYEHH



hisashii

EQDPKNPKKVSKAEIQAELWDFVLK


Ref Seq.
MQKCNSFTHEVDKDEVFNILRELYE


WP_
ELVPSSVEKKGEANQLSNKFLYPLV


095142515.1
DPNSQSGKGTASSGRKPRWYNLKIA



GDPSWEEEKKKWEEDKKKDPLAKIL



GKLAEYGLIPLFIPYTDSNEPIVKE



IKWMEKSRNQSVRRLDKDMFIQALE



RFLSWESWNLKVKEEYEKVEKEYKT



LEERIKEDIQALKALEQYEKERQEQ



LLRDTLNTNEYRLSKRGLRGWREII



QKWLKMDENEPSEKYLEVFKDYQRK



HPREAGDYSVYEFLSKKENHFIWRN



HPEYPYLYATFCEIDKKKKDAKQQA



TFTLADPINHPLWVRFEERSGSNLN



KYRILTEQLHTEKLKKKLTVQLDRL



IYPTESGGWEEKGKVDIVLLPSRQF



YNQIFLDIEEKGKHAFTYKDESIKF



PLKGTLGGARVQFDRDHLRRYPHKV



ESGNVGRIYFNMTVNIEPTESPVSK



SLKIHRDDFPKVVNFKPKELTEWIK



DSKGKKLKSGIESLEIGLRVMSIDL



GQRQAAAASIFEVVDQKPDIEGKLF



FPIKGTELYAVHRASFNIKLPGETL



VKSREVLRKAREDNLKLMNQKLNFL



RNVLHFQQFEDITEREKRVTKWISR



QENSDVPLVYQDELIQIRELMYKPY



KDWVAFLKQLHKRLEVEIGKEVKHW



RKSLSDGRKGLYGISLKNIDEIDRT



RKFLLRWSLRPTEPGEVRRLEPGQR



FAIDQLNHLNALKEDRLKKMANTII



MHALGYCYDVRKKKWQAKNPACQII



LFEDLSNYNPYEERSRFENSKLMKW



SRREIPRQVALQGEIYGLQVGEVGA



QFSSRFHAKTGSPGIRCSVVTKEKL



QDNRFFKNLQREGRLTLDKIAVLKE



GDLYPDKGGEKFISLSKDRKCVTTH



ADIMAAQNLQKRFWTRTHGFYKVYC



KAYQVDGQTVYIPESKDQKQKIIEE



FGEGYFILKDGVYEWVNAGKLKIKK



GSSKQSSSELVDSDILKDSFDLASE



LKGEKLMLYRDPSGNVFPSDKWMAA



GVFFGKLERILISKLTNQYSISTIE



DDSSKQSM



(SEQ ID NO: 126)





ThCas12b
MSEKTTQRAYTLRLNRASGECAVCQ



Thermomonas

NNSCDCWHDALWATHKAVNRGAKAF



hydrothermalis

GDWLLTLRGGLCHTLVEMEVPAKGN


Ref Seq.
NPPQRPTDQERRDRRVLLALSWLSV


WP_
EDEHGAPKEFIVATGRDSADDRAKK


072754838
VEEKLREILEKRDFQEHEIDAWLQD



CGPSLKAHIREDAVWVNRRALFDAA



VERIKTLTWEEAWDFLEPFFGTQYF



AGIGDGKDKDDAEGPARQGEKAKDL



VQKAGQWLSARFGIGTGADFMSMAE



AYEKIAKWASQAQNGDNGKATIEKL



ACALRPSEPPTLDTVLKCISGPGHK



SATREYLKTLDKKSTVTQEDLNQLR



KLADEDARMCRKKVGKKGKKPWADE



VLKDVENSCELTYLQDNSPARHREF



SVMLDHAARRVSMAHSWIKKAEQRR



RQFESDAQKLKNLQERAPSAVEWLD



RFCESRSMTTGANTGSGYRIRKRAI



EGWSYVVQAWAEASCDTEDKRIAAA



RKVQADPEIEKFGDIQLFEALAADE



AICVWRDQEGTQNPSILIDYVTGKT



AEHNQKRFKVPAYRHPDELRHPVFC



DFGNSRWSIQFAIHKEIRDRDKGAK



QDTRQLQNRHGLKMRLWNGRSMTDV



NLHWSSKRLTADLALDQNPNPNPTE



VTRADRLGRAASSAFDHVKIKNVFN



EKEWNGRLQAPRAELDRIAKLEEQG



KTEQAEKLRKRLRWYVSFSPCLSPS



GPFIVYAGQHNIQPKRSGQYAPHAQ



ANKGRARLAQLILSRLPDLRILSVD



LGHRFAAACAVWETLSSDAFRREIQ



GLNVLAGGSGEGDLFLHVEMTGDDG



KRRTVVYRRIGPDQLLDNTPHPAPW



ARLDRQFLIKLQGEDEGVREASNEE



LWTVHKLEVEVGRTVPLIDRMVRSG



FGKTEKQKERLKKLRELGWISAMPN



EPSAETDEKEGEIRSISRSVDELMS



SALGTLRLALKRHGNRARIAFAMTA



DYKPMPGGQKYYFHEAKEASKNDDE



TKRRDNQIEFLQDALSLWHDLFSSP



DWEDNEAKKLWQNHIATLPNYQTPE



EISAELKRVERNKKRKENRDKLRTA



AKALAENDQLRQHLHDTWKERWESD



DQQWKERLRSLKDWIFPRGKAEDNP



SIRHVGGLSITRINTISGLYQILKA



FKMRPEPDDLRKNIPQKGDDELENF



NRRLLEARDRLREQRVKQLASRIIE



AALGVGRIKIPKNGKLPKRPRTTVD



TPCHAVVIESLKTYRPDDLRTRREN



RQLMQWSSAKVRKYLKEGCELYGLH



FLEVPANYTSRQCSRTGLPGIRCDD



VPTGDFLKAPWWRRAINTAREKNGG



DAKDRFLVDLYDHLNNLQSKGEALP



ATVRVPRQGGNLFIAGAQLDDTNKE



RRAIQADLNAAANIGLRALLDPDWR



GRWWYVPCKDGTSEPALDRIEGSTA



FNDVRSLPTGDNSSRRAPREIENLW



RDPSGDSLESGTWSPTRAYWDTVQS



RVIELLRRHAGLPTS



(SEQ ID NO: 127)





LsCas12b
MSIRSFKLKLKTKSGVNAEQLRRGL



Laceyella

WRTHQLINDGIAYYMNWLVLLRQED



sacchari

LFIRNKETNEIEKRSKEEIQAVLLE


WP_
RVHKQQQRNQWSGEVDEQTLLQALR


132221894.1
QLYEEIVPSVIGKSGNASLKARFFL



GPLVDPNNKTTKDVSKSGPTPKWKK



MKDAGDPNWVQEYEKYMAERQTLVR



LEEMGLIPLFPMYTDEVGDIHWLPQ



ASGYTRTWDRDMF



QQAIERLLSWESWNRRVRERRAQFE



KKTHDFASRFSESDVQWMNKLREYE



AQQEKSLEENAFAPNEPYALTKKAL



RGWERVYHSWMRLDSAASEEAYWQE



VATCQTAMRGEFGDPAIYQFLAQKE



NHDIWRGYPERVIDFAELNHLQREL



RRAKEDATFTLPDSVDHPLWVRYEA



PGGTNIHGYDLVQDTKRNLTLILDK



FILPDENGSWHEVKKVPFSLAKSKQ



FHRQVWLQEEQKQKKREVVFYDYST



NLPHLGTLAGAKLQWDRNFLNKRTQ



QQIEETGEIGKVFFNISVDVRPAVE



VKNGRLQNGLGKALTVLTHPDGTKI



VTGWKAEQLEKWVGESGRVSSLGLD



SLSEGLRVMSIDLGQRTSATVSVFE



ITKEAPDNPYKFFYQLEGTEMFAVH



QRSFLLALPGENPPQKIKQMREIRW



KERNRIKQQVDQLSAILRLHKKVNE



DERIQAIDKLLQKVASWQLNEEIAT



AWNQALSQLYSKAKENDLQWNQAIK



MAHHQLEPVVGKQISLWRKDLSTGR



QGIAGLSLWSIEELEATKKLLTRVV



SKRSREPGWKRIERFETFAKQIQHH



INQVKENRLKQLANLIVMTALGYKY



DQEQKKWIEVYPACQVVLFENLRSY



RFSFERSRRENKKLMEWSHRSIPKL



VQMQGELFGLQVADVYAAYSSRYHG



RTGAPGIRCHALTEADLRNETNIIH



ELIEAGFIKEEHRPYLQQGDLVPWS



GGELFATLQKPYDNPRILTLHADIN



AAQNIQKRFWHPSMWFRVNCESVME



GEIVTYVPKNKTVHKKQGKTFRFVK



VEGSDVYEWAKWSKNRNKNTFSSIT



ERKPPSSMILFRDPSGTFFKEQEWV



EQKTFWGKVQSMIQAYMKKTIVRQR



MEE (SEQ ID NO: 128)





DtCas12b
MVLGRKDDTAELRRALWTTHEHVNL



Dsulfonatronum

AVAEVERVLLRCRGRSYWTLDRRGD



thiodismutans

PVHVPESQVAEDALAMAREAQRRNG


WP_
WPVVGEDEEILLALRYLYEQIVPSC


031386437
LLDDLGKPLKGDAQKIGTNYAGPLF



DSDTCRRDEGKDVACCGPFHEVAGK



YLGALPEWATPISKQEFDGKDASHL



RFKATGGDDAFFRVSIEKANAWYED



PANQDALKNKAYNKDDWKKEKDKGI



SSWAVKYIQKQLQLGQDPRTEVRRK



LWLELGLLPLFIPVFDKTMVGNLWN



RLAVRLALAHLLSWESWNHRAVQDQ



ALARAKRDELAALFLGMEDGFAGLR



EYELRRNESIKQHAFEPVDRPYVVS



GRALRSWTRVREEWLRHGDTQESRK



NICNRLQDRLRGKFGDPDVFHWLAE



DGQEALWKERDCVTSFSLLNDADGL



LEKRKGYALMTFADARLHPRWAMYE



APGGSNLRTYQIRKTENGLWADVVL



LSPRNESAAVEEKTFNVRLAPSGQL



SNVSFDQIQKGSKMVGRCRYQSANQ



QFEGLLGGAEILFDRKRIANEQHGA



TDLASKPGHVWFKLTLDVRPQAPQG



WLDGKGRPALPPEAKHFKTALSNKS



KFADQVRPGLRVLSVDLGVRSFAAC



SVFELVRGGPDQGTYFPAADGRTVD



DPEKLWAKHERSFKITLPGENPSRK



EEIARRAAMEELRSLNGDIRRLKAI



LRLSVLQEDDPRTEHLRLFMEAIVD



DPAKSALNAELFKGFGDDRFRSTPD



LWKQHCHFFHDKAEKVVAERFSRWR



TETRPKSSSWQDWRERRGYAGGKSY



WAVTYLEAVRGLILRWNMRGRTYGE



VNRQDKKQFGTVASALLHHINQLKE



DRIKTGADMIIQAARGFVPRKNGAG



WVQVHEPCRLILFEDLARYRFRTDR



SRRENSRLMRWSHREIVNEVGMQGE



LYGLHVDTTEAGFSSRYLASSGAPG



VRCRHLVEEDFHDGLPGMHLVGELD



WLLPKDKDRTANEARRLLGGMVRPG



MLVPWDGGELFATLNAASQLHVIHA



DINAAQNLQRRFWGRCGEAIRIVCN



QLSVDGSTRYEMAKAPKARLLGALQ



QLKNGDAPFHLTSIPNSQKPENSYV



MTPTNAGKKYRAGPGEKSSGEEDEL



ALDIVEQAEELAQGRKTFFRDPSGV



FFAPDRWLPSEIYWSRIRRRIWQVT



LERNSSGRQERAEMDEMPY



(SEQ ID NO:129)









The napDNAbp domains of the split nucleobase editors described herein may also comprise Cas12a/Cpf1 (dCpf1) variants that may be used as a guide nucleotide sequence-programmable DNA-binding protein domain. The Cas12a/Cpf1 protein has a RuvC-like endonuclease domain that is similar to the RuvC domain of Cas9 but does not have a HNH endonuclease domain, and the N-terminal of Cpf1 does not have the alfa-helical recognition lobe of Cas9. It was shown in Zetsche et al., Cell, 163, 759-771, 2015 (which is incorporated herein by reference) that, the RuvC-like domain of Cpf1 is responsible for cleaving both DNA strands and inactivation of the RuvC-like domain inactivates Cpf1 nuclease activity.


In some embodiments, the napDNAbp is a nucleic acid programmable DNA binding protein that does not require a canonical (NGG) PAM sequence. In some embodiments, the napDNAbp is an argonaute protein. In some embodiments, the disclosure provides napDNAbp domains that comprise SpCas9 variants that recognize and work best with NRRH, NRCH, and NRTH PAMs. See PCT Application No. PCT/US2019/47996, incorporated by reference herein. In some embodiments, the disclosed base editors comprise a napDNAbp domain selected from SpCas9-NRRH, SpCas9-NRTH, and SpCas9-NRCH.


In some embodiments, the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to SpCas9-NRRH. In some embodiments, the disclosed base editors comprise a napDNAbp domain that comprises SpCas9-NRRH. The SpCas9-NRRH has an amino acid sequence as presented in SEQ ID NO: 435 (underligned residues are mutated relative to SpCas9, as set forth in SEQ ID NO: 1)











 (SEQ ID NO: 435)



MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVL






GNTDRHSIKKNLIGALLFDSGETAEATRLKRTARR






RYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESF






LVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRK






KLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLN






PDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKA






ILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALS






LGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLA






QIGDQYADLFLAAKNLSDAILLSDILRVNTEITKA






PLSASMVKRYDEHHQDLTLLKALVRQQLPEKYKEI






FFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDG






TEELLVKLNREDLLRKQRTFDNGIIPHQIHLGELH






AILRRQGDFYPFLKDNREKIEKILTFRIPYYVGPL






ARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQS






FIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELT






KVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVT






VKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYH






DLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDRE






MIEERLKTYAHLFDDKVMKQLKRLRYTGWGRLSRK






LINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDD






SLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKK






GILQTVKVVDELVKVMGGHKPENIVIEMARENQTT






QKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQ






LQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDH






IVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEV






VKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSE






LDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDE






NDKLIREVKVITLKSKLVSDFRKDFQFYKVREINN






YHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKV






YDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEI






TLANGEIRKRPLIETNGETGEIVWDKGRDFATVRK






VLSMPQVNIVKKTEVQTGGFSKESILPKGNSDKLI






ARKKDWDPKKYGGFNSPTAAYSVLVVAKVEKGKSK






KLKSVKELLGITIMERSSFEKNPIGFLEAKGYKEV






KKDLIIKLPKYSLFELENGRKRMLASAGVLHKGNE






LALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVE






QHKHYLDEIIEQISEFSKRVILADANLDKVLSAYN






KHRDKPIREQAENIIHLFTLTNLGVPAAFKYFDTT






IDKKRYTSTKEVLDATLIHQSITGLYETRIDLSQL






GGD.






In some embodiments, the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to SpCas9-NRCH. In some embodiments, the disclosed base editors comprise a napDNAbp domain that comprises SpCas9-NRCH. The SpCas9-NRCH has an amino acid sequence as presented in SEQ ID NO: 436 (underligned residues are mutated relative to SpCas9)











(SEQ ID NO: 436)



MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVL






GNTDRHSIKKNLIGALLFDSGETAEATRLKRTARR






RYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESF






LVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRK






KLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLN






PDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKA






ILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALS






LGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLA






QIGDQYADLFLAAKNLSDAILLSDILRVNTEITKA






PLSASMVKRYDEHHQDLTLLKALVRQQLPEKYKEI






FFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMD






GTEELLVKLNREDLLRKQRTFDNGIIPHQIHLGEL






HAILRRQGDFYPFLKDNREKIEKILTFRIPYYVGP






LARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQ






SFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNEL






TKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKV






TVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTY






HDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDR






EMIEERLKTYAHLFDDKVMKQLKRLRYTGWGRLSR






KLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHD






DSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIK






KGILQTVKVVDELVKVMGGHKPENIVIEMARENQT






TQKGQKNSRERMKRIEEGIKELGSQILKEHPVENT






QLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVD






HIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEE






VVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLS






ELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD






ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREIN






NYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYK






VYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTE






ITLANGEIRKRPLIETNGETGEIVWDKGRDFATVR






KVLSMPQVNIVKKTEVQTGGFSKESILPKGNSDKL






IARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKS






KKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKE






VKKDLIIKLPKYSLFELENGRKRMLASAGVLQKGN






ELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFV






EQHKHYLDEIIEQISEFSKRVILADANLDKVLSAY






NKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDT






TINRKQYNTTKEVLDATLIRQSITGLYETRIDLSQ






LGGD






In some embodiments, the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to SpCas9-NRTH. In some embodiments, the disclosed base editors comprise a napDNAbp domain that comprises SpCas9-NRTH. The SpCas9-NRTH has an amino acid sequence as presented in SEQ ID NO: 437 (underligned residues are mutated relative to SpCas9)











 (SEQ ID NO: 437)



MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVL






GNTDRHSIKKNLIGALLFDSGETAEATRLKRTARR






RYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESF






LVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRK






KLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLN






PDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKA






ILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALS






LGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLA






QIGDQYADLFLAAKNLSDAILLSDILRVNTEITKA






PLSASMVKRYDEHHQDLTLLKALVRQQLPEKYKEI






FFDQSKNGYAGYIDGGASQEEFYKFIK PILEKMD






GTEELLVKLNREDLLRKQRTFDNGIIPHQIHLGEL






HAILRRQGDFYPFLKDNREKIEKILTFRIPYYVGP






LARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQ






SFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNEL






TKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKV






TVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTY






HDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDR






EMIEERLKTYAHLFDDKVMKQLKRLRYTGWGRLSR






KLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHD






DSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIK






KGILQTVKVVDELVKVMGGHKPENIVIEMARENQT






TQKGQKNSRERMKRIEEGIKELGSQILKEHPVENT






QLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVD






HIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEE






VVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLS






ELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD






ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREIN






NYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYK






VYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTE






ITLANGEIRKRPLIETNGETGEIVWDKGRDFATVR






KVLSMPQVNIVKKTEVQTGGFSKESILPKGNSDKL






IARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKS






KKLKSVKELLGITIMERSSFEKNPIGFLEAKGYKE






VKKDLIIKLPKYSLFELENGRKRMLASASVLHKGN






ELALPSKYVNFLYLASHYEKLKGSSEDNKQKQLFV






EQHKHYLDEIIEQISEFSKRVILADANLDKVLSAY






NKHRDKPIREQAENIIHLFTLTNLGASAAFKYFDT






TIGRKLYTSTKEVLDATLIHQSITGLYETRIDLSQ






LGGD






The napDNAbp domains of the split nucleobase editors of the present disclosure may also comprise Cas9 variants with modified PAM specificities. Some aspects of this disclosure provide Cas9 proteins that exhibit activity on a target sequence that does not comprise the canonical PAM (5′-NGG-3′, where N is A, C, G, or T) at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NGG-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5″-NNG-3″ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NNA-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NNC-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5″-NNT-3″ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5″-NGT-3″ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5″-NGA-3″ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NGC-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5″-NAA-3″ PAM sequence at its 3″-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5″-NAC-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5″-NAT-3″ PAM sequence at its 3″-end. In still other embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5″-NAG-3′ PAM sequence at its 3″-end.


In some embodiments, the disclosed adenine base editors comprise a napDNAbp domain comprising a SpCas9-NG, which has a PAM that corresponds to NGN. In some embodiments, the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to SpCas9-NG. The sequence of SpCas9-NG is illustrated below:











 (SEQ ID NO: 554)



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVL






GNTDRHSIKKNLIGALLFDSGETAEATRLKRTARR






RYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESF






LVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRK






KLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLN






PDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKA






ILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALS






LGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLA






QIGDQYADLFLAAKNLSDAILLSDILRVNTEITKA






PLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEI






FFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDG






TEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELH






AILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPL






ARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQS






FIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELT






KVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVT






VKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYH






DLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDRE






MIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRK






LINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDD






SLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKK






GILQTVKVVDELVKVMGRHKPENIVIEMARENQTT






QKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQ






LQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDH






IVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEV






VKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSE






LDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDE






NDKLIREVKVITLKSKLVSDFRKDFQFYKVREINN






YHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKV






YDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEI






TLANGEIRKRPLIETNGETGEIVWDKGRDFATVRK






VLSMPQVNIVKKTEVQTGGFSKESIRPKRNSDKLI






ARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSK






KLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEV






KKDLIIKLPKYSLFELENGRKRMLASARFLQKGNE






LALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVE






QHKHYLDEIIEQISEFSKRVILADANLDKVLSAYN






KHRDKPIREQAENIIHLFTLTNLGAPRAFKYFDTT






IDRKVYRSTKEVLDATLIHQSITGLYETRIDLSQL






GGD






In some embodiments, the disclosed base editors comprise a napDNAbp domain comprising a SaCas9-KKH, which has a PAM that corresponds to NNNRRT. In some embodiments, the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to SaCas9-KKH. The sequence of SaCas9-KKH is illustrated below:



S. aureus Cas9 nickase KKH (D10A/E782K/N968K/R1015H) (SaCas9-KKH)











 (SEQ ID NO: 555)



MGKRNYILGLAIGITSVGYGIIDYETRDVIDAGVR






LFKEANVENNEGRRSKRGARRLKRRRRHRIQRVKK






LLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEE






FSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQIS






RNSKALEEKYVAELQLERLKKDGEVRGSINRFKTS






DYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRR






TYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEEL






RSVKYAYNADLYNALNDLNNLVITRDENEKLEYYE






KFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYR






VTSTGKPEFTNLKVYHDIKDITARKEIIENAELLD






QIAKILTIYQSSEDIQEELTNLNSELTQEEIEQIS






NLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFN






RLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSF






IQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQK






MINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKI






KLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHII






PRSVSFDNSFNNKVLVKQEENSKKGNRTPFQYLSS






SDSKISYETFKKHILNLAKGKGRISKTKKEYLLEE






RDINRFSVQKDFINRNLVDTRYATRGLMNLLRSYF






RVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYK






HHAEDALIIANADFIFKEWKKLDKAKKVMENQMFE






EKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDY






KYSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNN






LNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQK






LKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNG






PVIKKIKYYGNKLNAHLDITDDYPNSRNKVVKLSL






KPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNS






KCYEEAKKLKKISNQAEFIASFYKNDLIKINGELY






RVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPP






HIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQI






IKKG






In some embodiments, the disclosed adenine base editors comprise a napDNAbp domain comprising a xCas9, an evolved variant of SpCas9. In some embodiments, the disclosed base editors comprise a napDNAbp domain that has a sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to xCas9. The sequence of xCas9 is illustrated below:











 (SEQ ID NO: 556)



MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVL






GNTDRHSIKKNLIGALLFDSGETAEATRLKRTARR






RYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESF






LVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRK






KLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLN






PDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKA






ILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALS






LGLTPNFKSNFDLAEDTKLQLSKDTYDDDLDNLLA






QIGDQYADLFLAAKNLSDAILLSDILRVNTEITKA






PLSASMIKLYDEHHQDLTLLKALVRQQLPEKYKEI






FFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDG






TEELLVKLNREDLLRKQRTFDNGIIPHQIHLGELH






AILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPL






ARGNSRFAWMTRKSEETITPWNFEKVVDKGASAQS






FIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELT






KVKYVTEGMRKPAFLSGDQKKAIVDLLFKTNRKVT






VKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYH






DLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDRE






MIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRK






LINGIRDKQSGKTILDFLKSDGFANRNFIQLIHDD






SLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKK






GILQTVKVVDELVKVMGRHKPENIVIEMARENQTT






QKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQ






LQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDH






IVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEV






VKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSE






LDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDE






NDKLIREVKVITLKSKLVSDFRKDFQFYKVREINN






YHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKV






YDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEI






TLANGEIRKRPLIETNGETGEIVWDKGRDFATVRK






VLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLI






ARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSK






KLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEV






KKDLIIKLPKYSLFELENGRKRMLASAGVLQKGNE






LALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVE






QHKHYLDEIIEQISEFSKRVILADANLDKVLSAYN






KHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTT






IDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQL






GGD






In various embodiments, the base editors disclosed herein may comprise a circular permutant of Cas9. The term “circularly permuted Cas9” or “circular permutant” of Cas9 or “CP-Cas9”) refers to any Cas9 protein, or variant thereof, that occurs or has been modify to engineered as a circular permutant variant, which means the N-terminus and the C-terminus of a Cas9 protein (e.g., a wild type Cas9 protein) have been topically rearranged. Such circularly permuted Cas9 proteins, or variants thereof, retain the ability to bind DNA when complexed with a guide RNA (gRNA). See, Oakes et al., “Protein Engineering of Cas9 for enhanced function,” Methods Enzymol, 2014, 546: 491-511 and Oakes et al., “CRISPR-Cas9 Circular Permutants as Programmable Scaffolds for Genome Modification,” Cell, Jan. 10, 2019, 176: 254-267, each of are incorporated herein by reference. The present disclosure contemplates any previously known CP-Cas9 or use a new CP-Cas9 so long as the resulting circularly permuted protein retains the ability to bind DNA when complexed with a guide RNA (gRNA).


In some embodiments, circular permutant Cas9 variants may be defined as a topological rearrangement of a Cas9 primary structure based on the following method, which is based on S. pyogenes Cas9 of SEQ ID NO: 1: (a) selecting a circular permutant (CP) site corresponding to an internal amino acid residue of the Cas9 primary structure, which dissects the original protein into an N-terminal portion and a C-terminal portion; (b) modifying the Cas9 protein sequence (e.g., by genetic engineering techniques) by moving the original C-terminal region (comprising the CP site amino acid) to preceed the original N-terminal region, thereby forming a new N-terminus of the Cas9 protein that now begins with the CP site amino acid residue. The CP site can be located in any domain of the Cas9 protein, including, for example, the helical-II domain, the RuvCIII domain, or the CTD domain. For example, the CP site may be located (relative the S. pyogenes Cas9 of SEQ ID NO: 1) at original amino acid residue 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1041, 1247, 1249, or 1282. Thus, once relocated to the N-terminus, original amino acid 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1041, 1247, 1249, or 1282 would become the new N-terminal amino acid. Nomenclature of these CP-Cas9 proteins may be referred to as Cas9-CP181, Cas9-CP199, Cas9-CP230, Cas9-CP270, Cas9-CP310, Cas9-CP1010, Cas9-CP1016, Cas9-CP1023, Cas9-CP1029, cas9-CP1041, Cas9-CP1247, Cas9-CP1249, and Cas9-CP1282, respectively. This description is not meant to be limited to making CP variants from SEQ ID NO: 1, but may be implemented to make CP variants in any Cas9 sequence, either at CP sites that correspond to these positions, or at other CP sites entirely. This description is not meant to limit the specific CP sites in any way. Virtually any CP site may be used to form a CP-Cas9 variant.


Exemplary CP-Cas9 amino acid sequences, based on the Cas9 of SEQ ID NO: 1, are provided below in which linker sequences are indicated by underlining and optional methionine (M) residues are indicated in bold. It should be appreciated that the disclosure provides CP-Cas9 sequences that do not include a linker sequence or that include different linker sequences. It should be appreciated that CP-Cas9 sequences may be based on Cas9 sequences other than that of SEQ ID NO: 1 and any examples provided herein are not meant to be limiting. Exemplary CP-Cas9 sequences are as follows:














CPname
Sequence
SEQ ID NO: 







CP1012
DYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETN
SEQ ID NO:



GETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA
282



RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEK




NPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSK




YVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADAN




LDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKE




VLDATLIHQSITGLYETRIDLSQLGGDGGSGGSGGSGGSGGSGGSGGDKKYSIGL




AIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRL




KRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIF




GNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDL




NPONSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQL




PGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGD




QYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALV




RQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLN




REDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIP




YYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPN




EKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKV




TVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENED




ILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLING




IRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHI




ANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRE




RMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLS




DYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA




KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD




ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIK




KYPKLESEFVYG






CP1028
EIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFAT
SEQ ID NO:



VRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSP
283



TVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKK




DLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKG




SPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPI




REQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE




TRIDLSQLGGDGGSGGSGGSGGSGGSGGSGGMDKKYSIGLAIGTNSVGWAVITDE




YKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRI




CYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPT




IYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPONSDVDKLFIQLV




QTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIAL




SLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDA




ILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ




SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGS




IPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAW




MTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFT




VYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIEC




FDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDR




EMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLK




SDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQ




TVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQ




ILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKD




DSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE




RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLK




SKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYK




VYDVRKMIAKSEQ






CP1041
NIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIV
SEQ ID NO:



KKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVE
284



KGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFE




LENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVE




QHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTL




TNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDGG





SGGSGGSGGSGGSGGSGGDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNT





DRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKV




DDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDK




ADLRLIYLALAHMIKFRGHFLIEGDLNPONSDVDKLFIQLVQTYNQLFEENPINA




SGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDL




AEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEIT




KAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGAS




QEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAIL




RRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNF




EEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEG




MRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFN




ASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLF




DDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIH




DDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR




HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQN




EKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKN




RGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIK




RQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFY




KVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQE




IGKATAKYFFYS






CP1249
PEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIR
SEQ ID NO:



EQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYET
285



RIDLSQLGGDGGSGGSGGSGGSGGSGGSGGMDKKYSIGLAIGTNSVGWAVITDEY




KVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRIC




YLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTI




YHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPONSDVDKLFIQLVQ




TYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALS




LGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAI




LLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQS




KNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSI




PHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWM




TRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTV




YNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECF




DSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDRE




MIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKS




DGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQT




VKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQI




LKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDD




SIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAER




GGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKS




KLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKV




YDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETG




EIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKD




WDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID




FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNF




LYLASHYEKLKGS






CP1300
KPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITG
SEQ ID NO:



LYETRIDLSQLGGDGGSGGSGGSGGSGGSGGSGGDKKYSIGLAIGTNSVGWAVIT
286



DEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKN




RICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY




PTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPONSDVDKLFIQ




LVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLI




ALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLS




DAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFF




DQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDN




GSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRF




AWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEY




FTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKI




ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFE




DREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDF




LKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGI




LQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELG




SQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFL




KDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK




AERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVIT




LKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGD




YKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNG




ETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR




KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKN




PIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKY




VNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANL




DKVLSAYNKHRD









The Cas9 circular permutants that may be useful in the base editing constructs described herein. Exemplary C-terminal fragments of Cas9, based on the Cas9 of SEQ ID NO: 1, which may be rearranged to an N-terminus of Cas9, are provided below. It should be appreciated that such C-terminal fragments of Cas9 are exemplary and are not meant to be limiting. These exemplary CP-Cas9 fragments have the following sequences:














CP name
Sequence
SEQ ID NO:







CP1012 C-
DYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETN
SEQ ID NO:


terminal
GETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA
287


fragment
RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEK




NPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSK




YVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADAN




LDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKE




VLDATLIHQSITGLYETRIDLSQLGGD






CP1028 C-
EIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFAT
SEQ ID NO:


terminal
VRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSP
288


fragment
TVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKK




DLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKG




SPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPI




REQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE




TRIDLSQLGGD






CP1041 C-
NIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIV
SEQ ID NO:


terminal
KKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVE
289


fragment
KGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFE




LENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVE




QHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTL




TNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD






CP1249 C-
PEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIR
SEQ ID NO:


terminal
EQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYET
290


fragment
RIDLSQLGGD






CP1300 C-
KPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITG
SEQ ID NO:


terminal
LYETRIDLSQLGGD
291


fragment









An exemplary alignment of four Cas9 sequences is provided below. The Cas9 sequences in the alignment are: Sequence 1 (S1): SEQ ID NO: 1|WP_010922251| gi 499224711|type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes]; Sequence 2 (S2): SEQ ID NO: 27|WP_039695303|gi 746743737|type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus gallolyticus]; Sequence 3 (S3): SEQ ID NO: 28|WP_045635197|gi 782887988|type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mitis]; Sequence 4 (S4): SEQ ID NO: 29|5AXW_A|gi 924443546|Staphylococcus aureus Cas9. The HNH domain (bold and underlined) and the RuvC domain (boxed) are identified for each of the four sequences. Amino acid residues 10 and 840 in S1 and the homologous amino acids in the aligned sequences are identified with an asterisk following the respective amino acid residue.













S1
1
--MDKK-YSIGLD*IGTNSVGWAVITDEYKVESKKEKVLGNTDRESIKKNLI--GALLEDSG--ETAKATRLKRTARRRYT
73






S2
1
--MTKKNYSIGLD*IGTNSVGWAVITDDYKVPAKKMKVIGNTDKEYIKKNLL--GALLEDSG--ETAKATRLKRTARRRYT
74





S3
1
--M-KKGYSIGLD*IGTNSVGFAVITDDYKVESKEMEVLGNTDERFIKKNLI--GALLFDEG--TTAKARRLKRTARRRYT
73





S4
1
GSHMKRNYILGLD*IGITSVGYGII--DYET-----------------RDVIDAGVRIFKEANVENNEGRRSKRGARRLKR
61





S1
74
RRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRL
153





S2
75
RRKNRLRYLQEIFANEIAKVDESFFQRLDESFLTDDDKTEDSHPIFGNKAEEDAYHQKFPTIYHLRKHLADSSEKADLRL
154





S3
74
RRKNRLRYLQEIFSEEMSKVDSSFFHRLDDSFLIPEDKRESKYPIFATLTEEKEYHKQFPTIYHLRKQLADSKEKTDLRL
153





S4
62
RRRHRIQRVKKLL--------------FDYNLLTD--------------------HSELSGINPYEARVKGLSQKLSEEE
107





S1
154
IYLALAHMIKERGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEK
233





S2
155
VYLALAHMIKFRGHFLIEGELNAENTDVQKIFADFVGVYNRTFDDSHLSEITVDVASILTEKISKSRRLENLIKYYPTEK
234





S3
154
IYLALAHMIKYRGHFLYEEAFDIKNNDIQKIFNEFISIYDNTFEGSSLSGQNAQVEAIFTDKISKSAKRERVLKLEPDEK
233





S4
108
FSAALLHLAKRRG----------------------VHNVNEVEEDT----------------------------------
131





S1
234
KNGLFGNLIALSLGLTPNEKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEIT
313





S2
235
KNTLFGNLIALALGLQPNEKTNFKLSEDAKLQFSKDTYEEDLEELLGKIGDDYADLFTSAKNLYDAILLSGILTVDDNST
314





S3
234
STGLFSEFLKLIVGNQADFKKHFDLEDKAPLQFSKDTYDEDLENLLGQIGDDFTDLFVSAKKLYDAILLSGILTVTDPST
313





S4
132
-----GNELS------------------TKEQISRN--------------------------------------------
144





S1
314
KAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKM--DGTEELLV
391





S2
315
KAPLSASMIKRYVEHHEDLEKLKEFIKANKSELYHDIFKDKNKNGYAGYIENGVKQDEFYKYLKNILSKIKIDGSDYFLD
394





S3
314
KAPLSASMIERYENHQNDLAALKQFIKNNLPEKYDEVFSDQSKDGYAGYIDGKTTQETFYKYIKNLLSKF--EGTDYFLD
391





S4
145
----SKALEEKYVAELQ-------------------------------------------------LERLKKDG------
165





S1
392
KLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEE
471





S2
395
KIEREDFLRKQRTFDNGSIPHQIHLQEMHAILRRQGDYYPFLKEKQDRIEKILTFRIPYYVGPLVRKDSRFAWAEYRSDE
474





S3
392
KIEREDFLRKQRTFDNGSIPHQIHLQEMNAILRRQGEYYPFLKDNKEKIEKILTFRIPYYVGPLARGNRDFAWLTRNSDE
471





S4
166
--EVRGSINRFKTSD--------YVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYEGP--GEGSPFGW------K
227





S1
472
TITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDL
551





S2
475
KITPWNFDKVIDKEKSAEKFITRMTLNDLYLPEEKVLPKHSHVYETYAVYNELTKIKYVNEQGKE-SFFDSNMKQEIFDH
553





S3
472
AIRPWNFEEIVDKASSAEDFINKMTNYDLYLPEEKVLPKHSLLYETFAVYNELTKVKFIAEGLRDYQFLDSGQKKQIVNQ
551





S4
228
DIKEW---------------YEMLMGHCTYFPEELRSVKYAYNADLYNALNDLNNLVITRDENEK---LEYYEKFQIIEN
289





S1
552
LFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDR---FNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFED
628





S2
554
VFKENRKVTKEKLLNYLNKEFFEYRIKDLIGLDKENKSFNASLGTYHDLKKIL-DKAFLDDKVNEEVIEDIIKTLTLFED
632





S3
552
LEKENRKVTEKDIIHYLHN-VDGYDGIELKGIEKQ---FNASLSTYHDLLKIIKDKEEMDDAKNEAILENIVHTLTIFED
627





S4
290
VFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGKPEF---TNLKVYHDIKDITARKEII---ENAELLDQIAKILTIYQS
363





S1
629
REMIEERLKTYAHLFDDKVMKQLKR-RRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKED
707





S2
633
KDMIHERLQKYSDIFTANQLKKLER-RHYTGWGRLSYKLINGIRNKENNKTILDYLIDDGSANRNFMQLINDDTLPFKQI
711





S3
628
REMIKQRLAQYDSLFDEKVIKALTR-RHYTGWGKLSAKLINGICDKQTGNTILDYLIDDGKINRNFMQLINDDGLSFKEI
706





S4
364
SEDIQEELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAINLILDE-----LWHTNDNQTAIENRLKLVP----------
428





S1
708


embedded image


781





S2
712


embedded image


784





S3
707


embedded image


779





S4
429


embedded image


505





S1
782


KRIEEGIKELGSQIL-------KEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSD----YDVDH*IVPQSFLKDD


850





S2
785


KKLQNSLKELGSNILNEEKPSYIEDKVENSHLQNDQLFLYYIQNGKDMYTGDELDIDHLSD----YDIDH*IIPQAFIKDD


860





S3
780


KRIEDSLKILASGL---DSNILKENPTDNNQLQNDRLFLYYLQNGKDMYTGEALDINQLSS----YDIDH*IIPQAFIKDD


852





S4
506


ERIEEIIRTTGK---------------ENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDH*IIPRSVSFDN


570





S1
851


embedded image


922





S2
861


embedded image


932





S3
853


embedded image


924





S4
571


embedded image


650





S1
923


embedded image


1002





S2
933


embedded image


1012





S3
925


embedded image


1004





S4
651


embedded image


712





S1
1003


embedded image


1077





S2
1013


embedded image


1083





S3
1005


embedded image


1081





S4
713


embedded image


764





S1
1078


embedded image


1149





S2
1084


embedded image


1158





S3
1082


embedded image


1156





S4
765


embedded image


835





S1
1150
EKGKSKKLKSVKELLGITIMERSSFEKNPI-DFLEAKG------YKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKG
1223





S2
1159
EKGKAKKLKTVKELVGISIMERSFFEENPV-EFLENKG------YHNIREDKLIKLPKYSLFEFEGGRRRLLASASELQKG
1232





S3
1157
EKGKAKKLKTVKTLVGITIMEKAAFEENPI-TFLENKG------YHNVRKENILCLPKYSLFELENGRRRLLASAKELQKG
1230





S4
836
DPQTYQKLK---------LIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKV
907





S1
1224
NELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEITEQISEFSKRVILADANLDKVLSAYNKH------
1297





S2
1233
NEMVLPGYLVELLYHAHRADNF-----NSTEYLNYVSEHKKEFEKVLSCVEDFANLYVDVEKNLSKIRAVADSM------
1301





S3
1231
NEIVLPVYLTTLLYHSKNVHKL-----DEPGHLEYIQKHRNEFKDLLNLVSEFSQKYVLADANLEKIKSLYADN------
1299





S4
908
VKLSLKPYRFD-VYLDNGVYKFV-----TVKNLDVIK--KENYYEVNSKAYEEAKKLKKISNQAEFIASFYNNDLIKING
979





S1
1298
RDKPIREQAENITHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSIT--------GLYETRI----DLSQL
1365





S2
1302
DNFSIEEISNSFINLLTLTALGAPADFNFLGEKIPRKRYTSTKECLNATLIHQSIT--------GLYETRI----DLSKL
1369








S3
1300
EQADIEILANSFINLLTFTALGAPAAFKFFGKDIDRKRYTTVSEILNATLIHQSIT--------GLYETWI----DLSKL
1367





S4
980
ELYRVIGVNNDLLNRIEVNMIDITYR-EYLENMNDKRPPRIIKTIASKT---QSIKKYSTDILGNLYEVKSKKHPQIIKK
1055





S1
1366
GGD
1368





S2
1370
GEE
1372





S3
1368
GED
1370





S4
1056
G--
1056






The alignment demonstrates that amino acid sequences and amino acid residues that are homologous to a reference Cas9 amino acid sequence or amino acid residue can be identified across Cas9 sequence variants, including, but not limited to Cas9 sequences from different species, by identifying the amino acid sequence or residue that aligns with the reference sequence or the reference residue using alignment programs and algorithms known in the art. This disclosure provides Cas9 variants in which one or more of the amino acid residues identified by an asterisk in SEQ ID NOs: 1 and 27-29 (e.g., 51, S2, S3, and S4, respectively) are mutated as described herein. The residues D10 and H840 in Cas9 of SEQ ID NO: 1 that correspond to the residues identified in SEQ ID NOs: 1 and 27-29 by an asterisk are referred to herein as “homologous” or “corresponding” residues. Such homologous residues can be identified by sequence alignment, e.g., as described above, and by identifying the sequence or residue that aligns with the reference sequence or residue. Similarly, mutations in Cas9 sequences that correspond to mutations identified in SEQ ID NO: 1 herein, e.g., mutations of residues 10, and 840 in SEQ ID NO: 1, are referred to herein as “homologous” or “corresponding” mutations. For example, the mutations corresponding to the D10A mutation in SEQ ID NO: 1 (51) for the four aligned sequences above are D11A for S2, D10A for S3, and D13A for S4; the corresponding mutations for H840A in SEQ ID NO: 1 (S1) are H850A for S2, H842A for S3, and H560A for S4.


A total of 250 Cas9 sequences (SEQ ID NOs: 1 and 27-275) from different species are provided. Amino acid residues corresponding to residues 10 and 840 of SEQ ID NO: 1 may be identified in the same manner as outlined above. All of these Cas9 sequences may be used in accordance with the present disclosure.

  • WP_010922251.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 1
  • WP_039695303.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus gallolyticus] SEQ ID NO: 27
  • WP_045635197.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mitis] SEQ ID NO: 28
  • 5AXW_A Cas9, Chain A, Crystal Structure [Staphylococcus Aureus] SEQ ID NO: 29
  • WP_009880683.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 30
  • WP_010922251.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 31
  • WP_011054416.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 32
  • WP_011284745.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 33
  • WP_011285506.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 34
  • WP_011527619.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 35
  • WP_012560673.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 36
  • WP_014407541.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 37
  • WP_020905136.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 38
  • WP_023080005.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 39
  • WP_023610282.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 40
  • WP_030125963.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 41
  • WP_030126706.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 42
  • WP_031488318.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 43
  • WP_032460140.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 44
  • WP_032461047.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 45
  • WP_032462016.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 46
  • WP_032462936.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 47
  • WP_032464890.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 48
  • WP_033888930.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 49
  • WP_038431314.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 50
  • WP_038432938.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 51
  • WP_038434062.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pyogenes] SEQ ID NO: 52
  • BAQ51233.1 CRISPR-associated protein, Csn1 family [Streptococcus pyogenes] SEQ ID NO: 53
  • KGE60162.1 hypothetical protein MGAS2111_0903 [Streptococcus pyogenes MGAS2111] SEQ ID NO: 54
  • KGE60856.1 CRISPR-associated endonuclease protein [Streptococcus pyogenes SS1447] SEQ ID NO: 55
  • WP_002989955.1 MULTISPECIES: type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus] SEQ ID NO: 56
  • WP_003030002.1 MULTISPECIES: type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus] SEQ ID NO: 57
  • WP_003065552.1 MULTISPECIES: type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus] SEQ ID NO: 58
  • WP_001040076.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 59
  • WP_001040078.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 60
  • WP_001040080.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 61
  • WP_001040081.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 62
  • WP_001040083.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 63
  • WP_001040085.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 64
  • WP_001040087.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 65
  • WP_001040088.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 66
  • WP_001040089.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 67
  • WP_001040090.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 68
  • WP_001040091.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 69
  • WP_001040092.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 70
  • WP_001040094.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 71
  • WP_001040095.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 72
  • WP_001040096.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 73
  • WP_001040097.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 74
  • WP_001040098.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 75
  • WP_001040099.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 76
  • WP_001040100.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 77
  • WP_001040104.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 78
  • WP_001040105.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 79
  • WP_001040106.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 80
  • WP_001040107.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 81
  • WP_001040108.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 82
  • WP_001040109.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 83
  • WP_001040110.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 84
  • WP_015058523.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 85
  • WP_017643650.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 86
  • WP_017647151.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 87
  • WP_017648376.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 88
  • WP_017649527.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 89
  • WP_017771611.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 90
  • WP_017771984.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 91
  • CFQ25032.1 CRISPR-associated protein [Streptococcus agalactiae] SEQ ID NO: 92
  • CFV16040.1 CRISPR-associated protein [Streptococcus agalactiae] SEQ ID NO: 93
  • KLJ37842.1 CRISPR-associated protein Csn1 [Streptococcus agalactiae] SEQ ID NO: 94
  • KLJ72361.1 CRISPR-associated protein Csn1 [Streptococcus agalactiae] SEQ ID NO: 95
  • KLL20707.1 CRISPR-associated protein Csn1 [Streptococcus agalactiae] SEQ ID NO: 96
  • KLL42645.1 CRISPR-associated protein Csn1 [Streptococcus agalactiae] SEQ ID NO: 97
  • WP_047207273.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 98
  • WP_047209694.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 99
  • WP_050198062.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 100
  • WP_050201642.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 101
  • WP_050204027.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 102
  • WP_050881965.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 103
  • WP_050886065.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus agalactiae] SEQ ID NO: 104
  • AHN30376.1 CRISPR-associated protein Csn1 [Streptococcus agalactiae 138P] SEQ ID NO: 105
  • EAO78426.1 reticulocyte binding protein [Streptococcus agalactiae H36B] SEQ ID NO: 106
  • CCW42055.1 CRISPR-associated protein, SAG0894 family [Streptococcus agalactiae ILRI112] SEQ ID NO:107
  • WP_003041502.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus anginosus] SEQ ID NO: 108
  • WP_037593752.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus anginosus] SEQ ID NO: 109
  • WP_049516684.1 CRISPR-associated protein Csn1 [Streptococcus anginosus] SEQ ID NO: 110
  • GAD46167.1 hypothetical protein ANG6_0662 [Streptococcus anginosus T5] SEQ ID NO: 111
  • WP_018363470.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus caballi] SEQ ID NO: 112
  • WP_003043819.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus can's] SEQ ID NO: 113
  • WP_006269658.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus constellatus] SEQ ID NO: 114
  • WP_048800889.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus constellatus] SEQ ID NO: 115
  • WP_012767106.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus dysgalactiae] SEQ ID NO: 116
  • WP_014612333.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus dysgalactiae] SEQ ID NO: 117
  • WP_015017095.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus dysgalactiae] SEQ ID NO: 118
  • WP_015057649.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus dysgalactiae] SEQ ID NO: 119
  • WP_048327215.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus dysgalactiae] SEQ ID NO: 143
  • WP_049519324.1 CRISPR-associated protein Csn1 [Streptococcus dysgalactiae] SEQ ID NO: 144
  • WP_012515931.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus equi] SEQ ID NO: 145
  • WP_021320964.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus equi] SEQ ID NO: 146
  • WP_037581760.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus equi] SEQ ID NO: 147
  • WP_004232481.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus equinus] SEQ ID NO: 148
  • WP_009854540.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus gallolyticus] SEQ ID NO: 149
  • WP_012962174.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus gallolyticus] SEQ ID NO: 150
  • WP_039695303.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus gallolyticus] SEQ ID NO: 151
  • WP_014334983.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus infantarius] SEQ ID NO: 152
  • WP_003099269.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus iniae] SEQ ID NO: 153
  • AHY15608.1 CRISPR-associated protein Csn1 [Streptococcus iniae] SEQ ID NO: 154
  • AHY17476.1 CRISPR-associated protein Csn1 [Streptococcus iniae] SEQ ID NO: 155
  • ESR09100.1 hypothetical protein IUSA1_08595 [Streptococcus iniae IUSA1] SEQ ID NO: 156
  • AGM98575.1 CRISPR-associated protein Cas9/Csn1, subtype II/NMEMI [Streptococcus iniae SF1] SEQ ID NO: 157
  • ALF27331.1 CRISPR-associated protein Csn1 [Streptococcus intermedius] SEQ ID NO: 158
  • WP_018372492.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus massiliensis] SEQ ID NO: 159
  • WP_045618028.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mitis] SEQ ID NO: 160
  • WP_045635197.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mitis] SEQ ID NO: 161
  • WP_002263549.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 162
  • WP_002263887.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 163
  • WP_002264920.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 164
  • WP_002269043.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 165
  • WP_002269448.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 166
  • WP_002271977.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 167
  • WP_002272766.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 168
  • WP_002273241.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 169
  • WP_002275430.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 170
  • WP_002276448.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 171
  • WP_002277050.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 172
  • WP_002277364.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 173
  • WP_002279025.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 174
  • WP_002279859.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 175
  • WP_002280230.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 176
  • WP_002281696.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 177
  • WP_002282247.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 178
  • WP_002282906.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 179
  • WP_002283846.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 180
  • WP_002287255.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 181
  • WP_002288990.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 182
  • WP_002289641.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 183
  • WP_002290427.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 184
  • WP_002295753.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 185
  • WP_002296423.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 186
  • WP_002304487.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 187
  • WP_002305844.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 188
  • WP_002307203.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 189
  • WP_002310390.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 190
  • WP_002352408.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 191
  • WP_012997688.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 192
  • WP_014677909.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 193
  • WP_019312892.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 194
  • WP_019313659.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 195
  • WP_019314093.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 196
  • WP_019315370.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 197
  • WP_019803776.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 198
  • WP_019805234.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 199
  • WP_024783594.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 200
  • WP_024784288.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 207
  • WP_024784666.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 208
  • WP_024784894.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 209
  • WP_024786433.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus mutans] SEQ ID NO: 210
  • WP_049473442.1 CRISPR-associated protein Csn1 [Streptococcus mutans] SEQ ID NO: 211
  • WP_049474547.1 CRISPR-associated protein Csn1 [Streptococcus mutans] SEQ ID NO: 212
  • EMC03581.1 hypothetical protein SMU69_09359 [Streptococcus mutans NLML4] SEQ ID NO: 213
  • WP_000428612.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus oral's] SEQ ID NO: 214
  • WP_000428613.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus oral's] SEQ ID NO: 215
  • WP_049523028.1 CRISPR-associated protein Csn1 [Streptococcus parasanguinis] SEQ ID NO: 216
  • WP_003107102.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus parauberis] SEQ ID NO: 217
  • WP_054279288.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus phocae] SEQ ID NO: 218
  • WP_049531101.1 CRISPR-associated protein Csn1 [Streptococcus pseudopneumoniae] SEQ ID NO: 219
  • WP_049538452.1 CRISPR-associated protein Csn1 [Streptococcus pseudopneumoniae] SEQ ID NO: 220
  • WP_049549711.1 CRISPR-associated protein Csn1 [Streptococcus pseudopneumoniae] SEQ ID NO: 221
  • WP_007896501.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus pseudoporcinus] SEQ ID NO: 222
  • EFR44625.1 CRISPR-associated protein, Csn1 family [Streptococcus pseudoporcinus SPIN 20026] SEQ ID NO: 223
  • WP_002897477.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus sanguinis] SEQ ID NO: 224
  • WP_002906454.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus sanguinis] SEQ ID NO: 225
  • WP_009729476.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus sp. F0441] SEQ ID NO: 226
  • CQR24647.1 CRISPR-associated protein [Streptococcus sp. FF10] SEQ ID NO: 227
  • WP_000066813.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus sp. M334] SEQ ID NO: 228
  • WP_009754323.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus sp. taxon 056] SEQ ID NO: 229
  • WP_044674937.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus suis] SEQ ID NO: 230
  • WP_044676715.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus suis] SEQ ID NO: 231
  • WP_044680361.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus suis] SEQ ID NO: 232
  • WP_044681799.1 type II CRISPR RNA-guided endonuclease Cas9 [Streptococcus suis] SEQ ID NO: 233
  • WP_049533112.1 CRISPR-associated protein Csn1 [Streptococcus suis] SEQ ID NO: 234
  • WP_029090905.1 type II CRISPR RNA-guided endonuclease Cas9 [Brochothrix thermosphacta] SEQ ID NO: 235
  • WP_006506696.1 type II CRISPR RNA-guided endonuclease Cas9 [Catenibacterium mitsuokai] SEQ ID NO: 236
  • AIT42264.1 Cas9hc:NLS:HA [Cloning vector pYB196] SEQ ID NO: 237
  • WP_034440723.1 type II CRISPR endonuclease Cas9 [Clostridiales bacterium S5-A11] SEQ ID NO: 238
  • AKQ21048.1 Cas9 [CRISPR-mediated gene targeting vector p(bhsp68-Cas9)] SEQ ID NO: 239
  • WP_004636532.1 type II CRISPR RNA-guided endonuclease Cas9 [Dolosigranulum pigrum] SEQ ID NO: 240
  • WP_002364836.1 MULTISPECIES: type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus] SEQ ID NO: 241
  • WP_016631044.1 MULTISPECIES: type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus] SEQ ID NO: 242 EMS75795.1 hypothetical protein H318_06676 [Enterococcus durans IPLA 655] SEQ ID NO: 243
  • WP_002373311.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecalis] SEQ ID NO: 244
  • WP_002378009.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecalis] SEQ ID NO: 245
  • WP_002407324.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecalis] SEQ ID NO: 246
  • WP_002413717.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecalis] SEQ ID NO: 247
  • WP_010775580.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecalis] SEQ ID NO: 248
  • WP_010818269.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecalis] SEQ ID NO: 249
  • WP_010824395.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecalis] SEQ ID NO: 250
  • WP_016622645.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecalis] SEQ ID NO: 251
  • WP_033624816.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecalis] SEQ ID NO: 252
  • WP_033625576.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecalis] SEQ ID NO: 253
  • WP_033789179.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecalis] SEQ ID NO: 254
  • WP_002310644.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecium] SEQ ID NO: 255
  • WP_002312694.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecium] SEQ ID NO: 256
  • WP_002314015.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecium] SEQ ID NO: 257
  • WP_002320716.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecium] SEQ ID NO: 258
  • WP_002330729.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecium] SEQ ID NO: 259
  • WP_002335161.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecium] SEQ ID NO: 260
  • WP_002345439.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecium] SEQ ID NO: 261
  • WP_034867970.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecium] SEQ ID NO: 262
  • WP_047937432.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus faecium] SEQ ID NO: 263
  • WP_010720994.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus hirae] SEQ ID NO: 264
  • WP_010737004.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus hirae] SEQ ID NO: 265
  • WP_034700478.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus hirae] SEQ ID NO: 266
  • WP_007209003.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus italicus] SEQ ID NO: 267
  • WP_023519017.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus mundtil] SEQ ID NO: 268
  • WP_010770040.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus phoeniculicola] SEQ ID NO: 269
  • WP_048604708.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus sp. AM1] SEQ ID NO: 270
  • WP_010750235.1 type II CRISPR RNA-guided endonuclease Cas9 [Enterococcus villorum] SEQ ID NO: 271
  • AII16583.1 Cas9 endonuclease [Expression vector pCas9] SEQ ID NO: 272
  • WP_029073316.1 type II CRISPR RNA-guided endonuclease Cas9 [Kandleria vitulina] SEQ ID NO: 273
  • WP_031589969.1 type II CRISPR RNA-guided endonuclease Cas9 [Kandleria vitulina] SEQ ID NO: 274
  • KDA45870.1 CRISPR-associated protein Cas9/Csn1, subtype II/NMEMI [Lactobacillus animalis] SEQ ID NO: 275
  • WP_039099354.1 type II CRISPR RNA-guided endonuclease Cas9 [Lactobacillus curvatus] SEQ ID NO: 521
  • AKP02966.1 hypothetical protein ABB45_04605 [Lactobacillus farciminis] SEQ ID NO: 522
  • WP_010991369.1 type II CRISPR RNA-guided endonuclease Cas9 [Listeria innocua] SEQ ID NO: 523
  • WP_033838504.1 type II CRISPR RNA-guided endonuclease Cas9 [Listeria innocua] SEQ ID NO: 524
  • EHN60060.1 CRISPR-associated protein, Csn1 family [Listeria innocua ATCC 33091] SEQ ID NO: 525
  • EFR89594.1 crispr-associated protein, Csn1 family [Listeria innocua FSL 54-378] SEQ ID NO: 526
  • WP_038409211.1 type II CRISPR RNA-guided endonuclease Cas9 [Listeria ivanovii] SEQ ID NO: 527
  • EFR95520.1 crispr-associated protein Csn1 [Listeria ivanovii FSL F6-596] SEQ ID NO: 528
  • WP_003723650.1 type II CRISPR RNA-guided endonuclease Cas9 [Listeria monocytogenes] SEQ ID NO: 529
  • WP_003727705.1 type II CRISPR RNA-guided endonuclease Cas9 [Listeria monocytogenes] SEQ ID NO: 530
  • WP_003730785.1 type II CRISPR RNA-guided endonuclease Cas9 [Listeria monocytogenes] SEQ ID NO: 531
  • WP_003733029.1 type II CRISPR RNA-guided endonuclease Cas9 [Listeria monocytogenes] SEQ ID NO: 532
  • WP_003739838.1 type II CRISPR RNA-guided endonuclease Cas9 [Listeria monocytogenes] SEQ ID NO: 533
  • WP_014601172.1 type II CRISPR RNA-guided endonuclease Cas9 [Listeria monocytogenes] SEQ ID NO: 534
  • WP_023548323.1 type II CRISPR RNA-guided endonuclease Cas9 [Listeria monocytogenes] SEQ ID NO: 535
  • WP_031665337.1 type II CRISPR RNA-guided endonuclease Cas9 [Listeria monocytogenes] SEQ ID NO: 536
  • WP_031669209.1 type II CRISPR RNA-guided endonuclease Cas9 [Listeria monocytogenes] SEQ ID NO: 537
  • WP_033920898.1 type II CRISPR RNA-guided endonuclease Cas9 [Listeria monocytogenes] SEQ ID NO: 538
  • AKI42028.1 CRISPR-associated protein [Listeria monocytogenes] SEQ ID NO: 539
  • AK150529.1 CRISPR-associated protein [Listeria monocytogenes] SEQ ID NO: 540
  • EFR83390.1 crispr-associated protein Csn1 [Listeria monocytogenes FSL F2-208] SEQ ID NO: 541
  • WP_046323366.1 type II CRISPR RNA-guided endonuclease Cas9 [Listeria seeligeri] SEQ ID NO: 542
  • AKE81011.1 Cas9 [Plant multiplex genome editing vector pYLCRISPR/Cas9Pubi-H] SEQ ID NO: 543
  • CU082355.1 Uncharacterized protein conserved in bacteria [Roseburia hominis] SEQ ID NO: 544
  • WP_033162887.1 type II CRISPR RNA-guided endonuclease Cas9 [Sharpea azabuensis] SEQ ID NO: 545
  • AGZ01981.1 Cas9 endonuclease [synthetic construct] SEQ ID NO: 546
  • AKA60242.1 nuclease deficient Cas9 [synthetic construct] SEQ ID NO: 547
  • AKS40380.1 Cas9 [Synthetic plasmid pFC330] SEQ ID NO: 548 4UN5_B Cas9, Chain B, Crystal Structure SEQ ID NO: 549


Cytosine Deaminase Domains

Nucleobase editors that convert a C to T, in some embodiments, comprise a cytosine deaminase. A “cytosine deaminase” refers to an enzyme that catalyzes the chemical reaction “cytosine+H2O→uracil+NH3” or “5-methyl-cytosine+H2O→thymine+NH3.” As it may be apparent from the reaction formula, such chemical reactions result in a C to U/T nucleobase change. In the context of a gene, such a nucleotide change, or mutation, may in turn lead to an amino acid change in the protein, which may affect the protein's function, e.g., loss-of-function or gain-of-function. In some embodiments, the C to T nucleobase editor comprises a dCas9 or nCas9 fused to a cytosine deaminase. In some embodiments, the cytosine deaminase domain is fused to the N-terminus of the dCas9 or nCas9.


Non-limiting examples of suitable cytosine deaminase domains are provided below, as SEQ ID NOs: 276-298 and 487.











Human AID



(SEQ ID NO: 276)



MDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLD






FGYLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDC






ARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLHRAGVQ






IAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRIL






LPLYEVDDLRDAFRTLGL






Mouse AID



(SEQ ID NO: 277)



MDSLLMKQKKFLYHFKNVRWAKGRHETYLCYVVKRRDSATSCSLD






FGHLRNKSGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDC






ARHVAEFLRWNPNLSLRIFTARLYFCEDRKAEPEGLRRLHRAGVQ






IGIMTFKDYFYCWNTFVENRERTFKAWEGLHENSVRLTRQLRRIL






LPLYEVDDLRDAFRMLGF






Dog AID



(SEQ ID NO: 278)



MDSLLMKQRKFLYHFKNVRWAKGRHETYLCYVVKRRDSATSFSLD






FGHLRNKSGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDC






ARHVADFLRGYPNLSLRIFAARLYFCEDRKAEPEGLRRLHRAGVQ






IAIMTFKDYFYCWNTFVENREKTFKAWEGLHENSVRLSRQLRRIL






LPLYEVDDLRDAFRTLGL






Bovine AID



(SEQ ID NO: 279)



MDSLLKKQRQFLYQFKNVRWAKGRHETYLCYVVKRRDSPTSFSLD






FGHLRNKAGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDC






ARHVADFLRGYPNLSLRIFTARLYFCDKERKAEPEGLRRLHRAGV






QIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRI






LLPLYEVDDLRDAFRTLGL






Mouse APOBEC-3



(SEQ ID NO: 280)



MGPFCLGCSHRKCYSPIRNLISQETFKFHFKNLGYAKGRKDTFLC






YEVTRKDCDSPVSLHHGVFKNKDNIHAEICFLYWFHDKVLKVLSP






REEFKITWYMSWSPCFECAEQIVRFLATHHNLSLDIFSSRLYNVQ






DPETQQNLCRLVQEGAQVAAMDLYEFKKCWKKFVDNGGRRFRPWK






RLLTNFRYQDSKLQEILRPCYIPVPSSSSSTLSNICLTKGLPETR






FCVEGRRMDPLSEEEFYSQFYNQRVKHLCYYHRMKPYLCYQLEQF






NGQAPLKGCLLSEKGKQHAEILFLDKIRSMELSQVTITCYLTWSP






CPNCAWQLAAFKRDRPDLILHIYTSRLYFHWKRPFQKGLCSLWQS






GILVDVMDLPQFTDCWTNFVNPKRPFWPWKGLEIISRRTQRRLRR






IKESWGLQDLVNDFGNLQLGPPMS






Rat APOBEC-3



(SEQ ID NO: 281)



MGPFCLGCSHRKCYSPIRNLISQETFKFHFKNLRYAIDRKDTFLC






YEVTRKDCDSPVSLHHGVFKNKDNIHAEICFLYWFHDKVLKVLSP






REEFKITWYMSWSPCFECAEQVLRFLATHHNLSLDIFSSRLYNIR






DPENQQNLCRLVQEGAQVAAMDLYEFKKCWKKFVDNGGRRFRPWK






KLLTNFRYQDSKLQEILRPCYIPVPSSSSSTLSNICLTKGLPETR






FCVERRRVHLLSEEEFYSQFYNQRVKHLCYYHGVKPYLCYQLEQF






NGQAPLKGCLLSEKGKQHAEILFLDKIRSMELSQVIITCYLTWSP






CPNCAWQLAAFKRDRPDLILHIYTSRLYFHWKRPFQKGLCSLWQS






GILVDVMDLPQFTDCWTNFVNPKRPFWPWKGLEIISRRTQRRLHR






IKESWGLQDLVNDFGNLQLGPPMS







Rhesusmacaque APOBEC-3G




(SEQ ID NO: 130)



MVEPMDPRTFVSNFNNRPILSGLNTVWLCCEVKTKDPSGPPLDAK






IFQGKVYSKAKYHPEMRFLRWFHKWRQLHHDQEYKVTWYVSWSPC






TRCANSVATFLAKDPKVTLTIFVARLYYFWKPDYQQALRILCQKR






GGPHATMKIMNYNEFQDCWNKFVDGRGKPFKPRNNLPKHYTLLQA






TLGELLRHLMDPGTFTSNFNNKPWVSGQHETYLCYKVERLHNDTW






VPLNQHRGFLRNQAPNIHGFPKGRHAELCFLDLIPFWKLDGQQYR






VTCFTSWSPCFSCAQEMAKFISNNEHVSLCIFAARIYDDQGRYQE






GLRALHRDGAKIAMMNYSEFEYCWDTFVDRQGRPFQPWDGLDEHS






QALSGRLRAI



(italic: nucleic acid editing domain;



underline: cytoplasmic localization signal)






Chimpanzee APOBEC-3G



(SEQ ID NO: 131)



MKPHFRNPVERMYQDTFSDNFYNRPILSHRNTVWLCYEVKTKGPS






RPPLDAKIFRGQVYSKLKYHPEMRFFHWFSKWRKLHRDQEYEVTW






YISWSPCTKCTRDVATFLAEDPKVTLTIFVARLYYFWDPDYQEAL






RSLCQKRDGPRATMKIMNYDEFQHCWSKFVYSQRELFEPWNNLPK






YYILLHIMLGEILRHSMDPPTFTSNFNNELWVRGRHETYLCYEVE






RLHNDTWVLLNQRRGFLCNQAPHKHGFLEGRHAELCFLDVIPFWK






LDLHQDYRVTCFTSWSPCFSCAQEMAKFISNNKHVSLCIFAARIY






DDQGRCQEGLRTLAKAGAKISIMTYSEFKHCWDTFVDHQGCPFQP






WDGLEEHSQALSGRLRAILQNQGN






Green monkey APOBEC-3G



(SEQ ID NO: 132)



MNPQIRNMVEQMEPDIFVYYFNNRPILSGRNTVWLCYEVKTKDPS






GPPLDANIFQGKLYPEAKDHPEMKFLHWFRKWRQLHRDQEYEVTW






YVSWSPCTRCANSVATFLAEDPKVTLTIFVARLYYFWKPDYQQAL






RILCQERGGPHATMKIMNYNEFQHCWNEFVDGQGKPFKPRKNLPK






HYTLLHATLGELLRHVMDPGTFTSNFNNKPWVSGQRETYLCYKVE






RSHNDTWVLLNQHRGFLRNQAPDRHGFPKGRHAELCFLDLIPFWK






LDDQQYRVTCFTSWSPCFSCAQKMAKFISNNKHVSLCIFAARIYD






DQGRCQEGLRTLHRDGAKIAVMNYSEFEYCWDTFVDRQGRPFQPW






DGLDEHSQALSGRLRAI






Human APOBEC-3G



(SEQ ID NO: 133)



MKPHFRNTVERMYRDTFSYNFYNRPILSRRNTVWLCYEVKTKGPS






RPPLDAKIFRGQVYSELKYHPEMRFFHWFSKWRKLHRDQEYEVTW






YISWSPCTKCTRDMATFLAEDPKVTLTIFVARLYYFWDPDYQEAL






RSLCQKRDGPRATMKIMNYDEFQHCWSKFVYSQRELFEPWNNLPK






YYILLHIMLGEILRHSMDPPTFTFNFNNEPWVRGRHETYLCYEVE






RMHNDTWVLLNQRRGFLCNQAPHKHGFLEGRHAELCFLDVIPFWK






LDLDQDYRVTCFTSWSPCFSCAQEMAKFISKNKHVSLCIFTARIY






DDQGRCQEGLRTLAEAGAKISIMTYSEFKHCWDTFVDHQGCPFQP






WDGLDEHSQDLSGRLRAILQNQEN






Human APOBEC-3F



(SEQ ID NO: 134)



MKPHFRNTVERMYRDTFSYNFYNRPILSRRNTVWLCYEVKTKGPS






RPRLDAKIFRGQVYSQPEHHAEMCFLSWFCGNQLPAYKCFQITWF






VSWTPCPDCVAKLAEFLAEHPNVTLTISAARLYYYWERDYRRALC






RLSQAGARVKIMDDEEFAYCWENFVYSEGQPFMPWYKFDDNYAFL






HRTLKEILRNPMEAMYPHIFYFHFKNLRKAYGRNESWLCFTMEVV






KHHSPVSWKRGVFRNQVDPETHCHAERCFLSWFCDDILSPNTNYE






VTWYTSWSPCPECAGEVAEFLARHSNVNLTIFTARLYYFWDTDYQ






EGLRSLSQEGASVEIMGYKDFKYCWENFVYNDDEPFKPWKGLKYN






FLFLDSKLQEILE






Human APOBEC-3B



(SEQ ID NO: 135)



MNPQIRNPMERMYRDTFYDNFENEPILYGRSYTWLCYEVKIKRGR






SNLLWDTGVFRGQVYFKPQYHAEMCFLSWFCGNQLPAYKCFQITW






FVSWTPCPDCVAKLAEFLSEHPNVTLTISAARLYYYWERDYRRAL






CRLSQAGARVTIMDYEEFAYCWENFVYNEGQQFMPWYKFDENYAF






LHRTLKEILRYLMDPDTFTFNFNNDPLVLRRRQTYLCYEVERLDN






GTWVLMDQHMGFLCNEAKNLLCGFYGRHAELRFLDLVPSLQLDPA






QIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDY






DPLYKEALQMLRDAGAQVSIMTYDEFEYCWDTFVYRQGCPFQPWD






GLEEHSQALSGRLRAILQNQGN






Human APOBEC-3C:



(SEQ ID NO: 137)



MNPQIRNPMKAMYPGTFYFQFKNLWEANDRNETWLCFTVEGIKRR






SVVSWKTGVFRNQVDSETHCHAERCFLSWFCDDILSPNTKYQVTW






YTSWSPCPDCAGEVAEFLARHSNVNLTIFTARLYYFQYPCYQEGL






RSLSQEGVAVEIMDYEDFKYCWENFVYNDNEPFKPWKGLKTNFRL






LKRRLRESLQ






Human APOBEC-3A:



(SEQ ID NO: 138)



MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTS






VKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIY






RVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPL






YKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLD






EHSQALSGRLRAILQNQGN






Human APOBEC-3H:



(SEQ ID NO: 139)



MALLTAETFRLQFNNKRRLRRPYYPRKALLCYQLTPQNGSTPTRG






YFENKKKCHAEICFINEIKSMGLDETQCYQVTCYLTWSPCSSCAW






ELVDFIKAHDHLNLGIFASRLYYHWCKPQQKGLRLLCGSQVPVEV






MGFPKFADCWENFVDHEKPLSFNPYKMLEELDKNSRAIKRRLERI






KIPGVRAQGRYMDILCDAEV






Human APOBEC-3D



(SEQ ID NO: 140)



MNPQIRNPMERMYRDTFYDNFENEPILYGRSYTWLCYEVKIKRGR






SNLLWDTGVFRGPVLPKRQSNHRQEVYFRFENHAEMCFLSWFCGN






RLPANRRFQITWFVSWNPCLPCVVKVTKFLAEHPNVTLTISAARL






YYYRDRDWRWVLLRLHKAGARVKIMDYEDFAYCWENFVCNEGQPF






MPWYKFDDNYASLHRTLKEILRNPMEAMYPHIFYFHFKNLLKACG






RNESWLCFTMEVTKHHSAVFRKRGVFRNQVDPETHCHAERCFLSW






FCDDILSPNTNYEVTWYTSWSPCPECAGEVAEFLARHSNVNLTIF






TARLCYFWDTDYQEGLCSLSQEGASVKIMGYKDFVSCWKNFVYSD






DEPFKPWKGLQTNFRLLKRRLREILQ






Human APOBEC-1



(SEQ ID NO: 292)



MTSEKGPSTGDPTLRRRIEPWEFDVFYDPRELRKEACLLYEIKWG






MSRKIWRSSGKNTTNHVEVNFIKKFTSERDFHPSMSCSITWFLSW






SPCWECSQAIREFLSRHPGVTLVIYVARLFWHMDQQNRQGLRDLV






NSGVTIQIMRASEYYHCWRNFVNYPPGDEAHWPQYPPLWMMLYAL






ELHCIILSLPPCLKISRRWQNHLTFFRLHLQNCHYQTIPPHILLA






TGLIHPSVAWR






Mouse APOBEC-1



(SEQ ID NO: 293)



MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWG






GRHSVWRHTSQNTSNHVEVNFLEKFTTERYFRPNTRCSITWFLSW






SPCGECSRAITEFLSRHPYVTLFIYIARLYHHTDQRNRQGLRDLI






SSGVTIQIMTEQEYCYCWRNFVNYPPSNEAYWPRYPHLWVKLYVL






ELYCIILGLPPCLKILRRKQPQLTFFTITLQTCHYQRIPPHLLWA






TGLK






Rat APOBEC-1



(SEQ ID NO: 294)



MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWG






GRHSIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSW






SPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLI






SSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVL






ELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWA






TGLK






Petromyzon marinus CDA1 (pmCDA1)



(SEQ ID NO: 295)



MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGER






RACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINW






YSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEKNARNQ






IGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKT






LKRAEKRRSELSIMIQVKILHTTKSPAV






Evolved pmCDA1 (evoCDA1)



(SEQ ID NO: 487)



MTDAEYVRIHEKLDIYTFKKQFSNNKKSVSHRCYVLFELKRRGER






RACFWGYAVNKPQSGTERGIHAEIFSIRKVEEYLRDNPGQFTINW






YSSWSPCADCAEKILEWYNQELRGNGHTLKIWVCKLYYEKNARNQ






IGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKT






LKRAEKRRSELSIMFQVKILHTTKSPAV






Human APOBEC3G D316R_D317R



(SEQ ID NO: 296)



MKPHFRNTVERMYRDTFSYNFYNRPILSRRNTVWLCYEVKTKGPS






RPPLDAKIFRGQVYSELKYHPEMRFFHWFSKWRKLHRDQEYEVTW






YISWSPCTKCTRDMATFLAEDPKVTLTIFVARLYYFWDPDYQEAL






RSLCQKRDGPRATMKIMNYDEFQHCWSKFVYSQRELFEPWNNLPK






YYILLHIMLGEILRHSMDPPTFTFNFNNEPWVRGRHETYLCYEVE






RMHNDTWVLLNQRRGFLCNQAPHKHGFLEGRHAELCFLDVIPFWK






LDLDQDYRVTCFTSWSPCFSCAQEMAKFISKNKHVSLCIFTARIY






RRQGRCQEGLRTLAEAGAKISIMTYSEFKHCWDTFVDHQGCPFQP






WDGLDEHSQDLSGRLRAILQNQEN






Human APOBEC3G chain A



(SEQ ID NO: 297)



MDPPTFTFNFNNEPWVRGRHETYLCYEVERMHNDTWVLLNQRRGF






LCNQAPHKHGFLEGRHAELCFLDVIPFWKLDLDQDYRVTCFTSWS






PCFSCAQEMAKFISKNKHVSLCIFTARIYDDQGRCQEGLRTLAEA






GAKISIMTYSEFKHCWDTFVDHQGCPFQPWDGLDEHSQDLSGRLR






AILQ






Human APOBEC3G chain A D12OR_D121R



(SEQ ID NO: 298)



MDPPTFTFNFNNEPWVRGRHETYLCYEVERMHNDTWVLLNQRRGF






LCNQAPHKHGFLEGRHAELCFLDVIPFWKLDLDQDYRVTCFTSWS






PCFSCAQEMAKFISKNKHVSLCIFTARIYRRQGRCQEGLRTLAEA






GAKISIMTYSEFKHCWDTFVDHQGCPFQPWDGLDEHSQDLSGRLR






AILQ









Adenosine Deaminase Domains

In some embodiments, a nucleobase editor converts an A to G. In some embodiments, the nucleobase editor comprises an adenosine deaminase. An “adenosine deaminase” is an enzyme involved in purine metabolism. It is needed for the breakdown of adenosine from food and for the turnover of nucleic acids in tissues. Its primary function in humans is the development and maintenance of the immune system. An adenosine deaminase catalyzes hydrolytic deamination of adenosine (forming inosine, which base pairs as G) in the context of DNA. There are no known adenosine deaminases that act on DNA. Instead, known adenosine deaminase enzymes only act on RNA (tRNA or mRNA). Evolved deoxyadenosine deaminase enzymes that accept DNA substrates and deaminate dA to deoxyinosine and here use in adenosine nucleobase editors have been described, e.g., in PCT Application PCT/US2017/045381, filed Aug. 3, 2017, which published as WO 2018/027078, PCT Application No. PCT/US2019/033848, which published as WO 2019/226953, PCT Application No PCT/US2019/033848, filed May 23, 2019, and PCT Application No. PCT/US2020/028568, filed Apr. 17, 2020; each of which is herein incorporated by reference by reference. Non-limiting examples of evolved adenosine deaminases that accept DNA as substrates are provided below.


Non-limiting examples evolved adenosine deaminases that accept DNA as substrates that are suitable for use as adenosine deaminase domains of the disclosed adenine nucleobase editors are provided below. In some embodiments, the adenosine deaminase domain of any of the disclosed nucleobase editors comprises an amino acid sequence having at least 85% identity, at least 90% identity, at least 95% identity, at least 98% identity, or at least 99% identity to an amino acid sequence comprising SEQ ID NO: 141, 314-321, 358, 407, 409-420, 422-424, 426-431, 433, 434, 438-457, 491-495, and 514.


In some embodiments, the adenosine deaminase domain of any of the disclosed nucleobase editors comprises an amino acid sequence having at least 85% identity, at least 90% identity, at least 95% identity, at least 98% identity, or at least 99% identity to an amino acid sequence comprising SEQ ID NO: 492 (TadA 7.10). In some embodiments, the adenosine deaminase domain of the disclosed nucleobase editors comprise an amino acid sequence comprising SEQ ID NO: 492.


In some embodiments, the adenosine deaminase domain of any of the disclosed nucleobase editors comprises an amino acid sequence having at least 85% identity, at least 90% identity, at least 95% identity, at least 98% identity, or at least 99% identity to an amino acid sequence comprising SEQ ID NO: 494 (TadA-8e). In some embodiments, the adenosine deaminase domain of the disclosed nucleobase editors comprise an amino acid sequence comprising SEQ ID NO: 494.










ecTadA



(SEQ ID NO: 314)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADEC





AALLSDFFRMRRQEIKAQKKAQSSTD





ecTadA (D108N)


(SEQ ID NO: 315)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVLHHPGMNHRVEITEGILADEC





AALLSDFFRMRRQEIKAQKKAQSSTD





ecTadA (D108G)


(SEQ ID NO: 316)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARGAKTGAAGSLMDVLHHPGMNHRVEITEGILADEC





AALLSDFFRMRRQEIKAQKKAQSSTD





ecTadA (D108V)


(SEQ ID NO: 317)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARVAKTGAAGSLMDVLHHPGMNHRVEITEGILADEC





AALLSDFFRMRRQEIKAQKKAQSSTD





ecTadA (H8Y, D108N, N1275)


(SEQ ID NO: 318)



SEVEFSYEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVLHHPGMSHRVEITEGILADEC





AALLSDFFRMRRQEIKAQKKAQSSTD





ecTadA (H8Y, D108N, N1275, E155D)


(SEQ ID NO: 319)



SEVEFSYEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVLHHPGMSHRVEITEGILADEC





AALLSDFFRMRRQDIKAQKKAQSSTD





ecTadA (H8Y, D108N, N1275, E155G)


(SEQ ID NO: 320)



SEVEFSYEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVLHHPGMSHRVEITEGILADEC





AALLSDFFRMRRQGIKAQKKAQSSTD





ecTadA (H8Y, D108N, N127S, E155V)


(SEQ ID NO: 321)



SEVEFSYEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVLHHPGMSHRVEITEGILADEC





AALLSDFFRMRRQVIKAQKKAQSSTD





ecTadA (A106V, D108N, D147Y, andE155V)


(SEQ ID NO: 407)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHHPGMNHRVEITEGILADEC





AALLSYFFRMRRQVIKAQKKAQSSTD





ecTadA (S2A, I49F, A106V, D108N, D147Y, E155V)


(SEQ ID NO: 409)




AEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPFGRHDPTAHAEIMALRQGGLVM







QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHHPGMNHRVEITEGILADEC





AALLSYFFRMRRQVIKAQKKAQSSTD





ecTadA (H8Y, A106T, D108N, N1275, K1605)


(SEQ ID NO: 410)



SEVEFSYEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGTRNAKTGAAGSLMDVLHHPGMSHRVEITEGILADEC





AALLSDFFRMRRQEIKAQSKAQSSTD





ecTadA (R26G, L84F, A106V, R107H, D108N, H123Y, A142N,


A143D, D147Y, E155V, I156F)


(SEQ ID NO: 411)



SEVEFSHEYWMRHALTLAKRAWDEGEVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVHNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC






NDLLSYFFRMRRQVFKAQKKAQSSTD






ecTadA (E25G, R26G, L84F, A106V, R107H, D108N, H123Y,


(SEQ ID NO: 412)



A142N, A143D, D147Y, E155V, I156F)



SEVEFSHEYWMRHALTLAKRAWDGGEVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM





QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVHNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC






NDLLSYFFRMRRQVFKAQKKAQSSTD






ecTadA (E25D, R26G, L84F, A106V, R107K, D108N, H123Y,


A142N, A143G, D147Y, E155V, I156F)


(SEQ ID NO: 413)



SEVEFSHEYWMRHALTLAKRAWDDGEVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVKNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC






NGLLSYFFRMRRQVFKAQKKAQSSTD






ecTadA (R26Q, L84F, A106V, D108N, H123Y, A142N, D147Y, E155V, I156F)


(SEQ ID NO: 414)



SEVEFSHEYWMRHALTLAKRAWDEQEVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC






NALLSYFFRMRRQVFKAQKKAQSSTD






ecTadA (E25M, R26G, L84F, A106V, R107P, D108N, H123Y,


A142N, A143D, D147Y, E155V, I156F)


(SEQ ID NO: 415)



SEVEFSHEYWMRHALTLAKRAWDMGEVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVPNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC






NDLLSYFFRMRRQVFKAQKKAQSSTD






ecTadA (R26C, L84F, A106V, R107H, D108N, H123Y, A142N, D147Y, E155V, I156F)


(SEQ ID NO: 416)



SEVEFSHEYWMRHALTLAKRAWDECEVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVHNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC






NALLSYFFRMRRQVFKAQKKAQSSTD






ecTadA (L84F, A106V, D108N, H123Y, A142N, A143L, D147Y, E155V, I156F)


(SEQ ID NO: 417)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC






NLLLSYFFRMRRQVFKAQKKAQSSTD






ecTadA (R26G, L84F, A106V, D108N, H123Y, A142N, D147Y, E155V, I156F)


(SEQ ID NO: 418)



SEVEFSHEYWMRHALTLAKRAWDEGEVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC






NALLSYFFRMRRQVFKAQKKAQSSTD






ecTadA (R51H, L84F, A106V, D108N, H123Y, D147Y, E155V, I156F, K157N)


(SEQ ID NO: 419)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGHHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC





AALLSYFFRMRRQVFNAQKKAQSSTD





ecTadA (E25A, R26G, L84F, A106V, R107N, D108N, H123Y,


A142N, A143E, D147Y, E155V, I156F)


(SEQ ID NO: 420)



SEVEFSHEYWMRHALTLAKRAWDAGEVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVNNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC





NELLSYFFRMRRQVFKAQKKAQSSTD





ecTadA (N37T, P48T, L84F, A106V, D108N, H123Y, D147Y, E155V, I156F)


(SEQ ID NO: 422)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHTNRVIGEGWNRTIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYLHYPGMNHRVEITEGILADEC





AALLSYFFRMRRQVFKAQKKAQSSTD





ecTadA (N375, L84F, A106V, D108N, H123Y, D147Y, E155V, I156F)


(SEQ ID NO: 423)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHSNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVMQ






NYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECA





ALLSYFFRMRRQVFKAQKKAQSSTD





ecTadA (H36L, L84F, A106V, D108N, H123Y, D147Y, E155V, I156F)


(SEQ ID NO: 424)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVMQ






NYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECA





ALLSYFFRMRRQVFKAQKKAQSSTD





ecTadA (H36L, P48L, L84F, A106V, D108N, H123Y, D147Y, E155V, I156F)


(SEQ ID NO: 426)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRLIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC





AALLSYFFRMRRQVFKAQKKAQSSTD





ecTadA (H36L, L84F, A106V, D108N, H123Y, D147Y, E155V, K57N, I156F)


(SEQ ID NO: 427)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVMQ






NYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECA





ALLSYFFRMRRQVFNAQKKAQSSTD





ecTadA (H36L, L84F, A106V, D108N, H123Y, 5146C, D147Y, E155V, I156F)


(SEQ ID NO: 428)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVMQ






NYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECA





ALLCYFFRMRRQVFKAQKKAQSSTD





ecTadA (L84F, A106V, D108N, H123Y, 5146R, D147Y, E155V, I156F)


(SEQ ID NO: 429)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC





AALLRYFFRMRRQVFKAQKKAQSSTD





ecTadA (N375, R51H, L84F, A106V, D108N, H123Y, D147Y, E155V, I156F


(SEQ ID NO: 430)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHSNRVIGEGWNRPIGHHDPTAHAEIMALRQGGLVMQ






NYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECA





ALLSYFFRMRRQVFKAQKKAQSSTD





ecTadA (R51L, L84F, A106V, D108N, H123Y, D147Y, E155V, I156F, K157N


(SEQ ID NO: 431)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGLHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC





AALLSYFFRMRRQVFNAQKKAQSSTD





saTadA (D108N)


(SEQ ID NO: 433)



GSHMTNDIYFMTLAIEEAKKAAQLGEVPIGAIITKDDEVIARAHNLRETLQQPTAHAEHIAIERAAKVLGSWR






LEGCTLYVTLEPCVMCAGTIVMSRIPRVVYGADNPKGGCSGSLMNLLQQSNFNHRAIVDKGVLKEACSTLLT





TFFKNLRANKKSTN





saTadA (D107A_D108N)


(SEQ ID NO: 434)



GSHMTNDIYFMTLAIEEAKKAAQLGEVPIGAIITKDDEVIARAHNLRETLQQPTAHAEHIAIERAAKVLGSWR






LEGCTLYVTLEPCVMCAGTIVMSRIPRVVYGAANPKGGCSGSLMNLLQQSNFNHRAIVDKGVLKEACSTLLT





TFFKNLRANKKSTN





saTadA (G26P_D107A_D108N)


(SEQ ID NO: 141)



GSHMTNDIYFMTLAIEEAKKAAQLPEVPIGAIITKDDEVIARAHNLRETLQQPTAHAEHIAIERAAKVLGSWR






LEGCTLYVTLEPCVMCAGTIVMSRIPRVVYGAANPKGGCSGSLMNLLQQSNFNHRAIVDKGVLKEACSTLLT





TFFKNLRANKKSTN





saTadA (G26P_D107A_D108N_S142A)


(SEQ ID NO: 358)



GSHMTNDIYFMTLAIEEAKKAAQLPEVPIGAIITKDDEVIARAHNLRETLQQPTAHAEHIAIERAAKVLGSWR






LEGCTLYVTLEPCVMCAGTIVMSRIPRVVYGAANPKGGCSGSLMNLLQQSNFNHRAIVDKGVLKEACATLL





TTFFKNLRANKKSTN





saTadA (D107A_D108N_S142A)


(SEQ ID NO: 514)



GSHMTNDIYFMTLAIEEAKKAAQLGEVPIGAIITKDDEVIARAHNLRETLQQPTAHAEHIAIERAAKVLGSWR






LEGCTLYVTLEPCVMCAGTIVMSRIPRVVYGAANPKGGCSGSLMNLLQQSNFNHRAIVDKGVLKEACATLL





TTFFKNLRANKKSTN





ecTadA (P48S)


(SEQ ID NO: 438)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRSIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADEC





AALLSDFFRMRRQEIKAQKKAQSSTD





ecTadA (P48T)


(SEQ ID NO: 439)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRTIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADEC





AALLSDFFRMRRQEIKAQKKAQSSTD





ecTadA (P48A)


(SEQ ID NO: 440)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRAIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADEC





AALLSDFFRMRRQEIKAQKKAQSSTD





ecTadA (Al42N)


(SEQ ID NO: 441)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADEC






NALLSDFFRMRRQEIKAQKKAQSSTD






ecTadA (W23R)


(SEQ ID NO: 442)



SEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVMQ






NYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADECA





ALLSDFFRMRRQEIKAQKKAQSSTD





ecTadA (W23L)


(SEQ ID NO: 443)



SEVEFSHEYWMRHALTLAKRALDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVMQ






NYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADECA





ALLSDFFRMRRQEIKAQKKAQSSTD





ecTadA (R152P)


(SEQ ID NO: 444)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADEC





AALLSDFFRMPRQEIKAQKKAQSSTD





ecTadA (R152H)


(SEQ ID NO: 445)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADEC





AALLSDFFRMHRQEIKAQKKAQSSTD





ecTadA (L84F, A106V, D108N, H123Y, D147Y, E155V, I156F)


(SEQ ID NO: 446)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC





AALLSYFFRMRRQVFKAQKKAQSSTD





ecTadA (H36L, R51L, L84F, A106V, D108N, H123Y, S146C,


D147Y, E155V, I156F, K157N)


(SEQ ID NO: 447)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRPIGLHDPTAHAEIMALRQGGLVMQ






NYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECA





ALLCYFFRMRRQVFNAQKKAQSSTD





ecTadA (H36L, P48S, R51L, L84F, A106V, D108N, H123Y, 5146C,


D147Y, E155V, I156F, K157N)


(SEQ ID NO: 448)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRSIGLHDPTAHAEIMALRQGGLVMQ






NYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECA





ALLCYFFRMRRQVFNAQKKAQSSTD





ecTadA (H36L, P48A, R51L, L84F, A106V, D108N, H123Y, 5146C,


D147Y, E155V, I156F, K157N)


(SEQ ID NO: 449)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC





AALLCYFFRMRRQVFNAQKKAQSSTD





ecTadA (W23L, H36L, P48A, R51L, L84F, A106V, D108N, H123Y, 5146C,


D147Y, R152P, E155V, I156F, K157N)


(SEQ ID NO: 450)



SEVEFSHEYWMRHALTLAKRALDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQ






NYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECA





ALLCYFFRMPRQVFNAQKKAQSSTD





ecTadA (W23R, H36L, P48A, R51L, L84F, A106V, D108N, H123Y,


5146C, D147Y, R152P, E155V, I156F, K157N)


(SEQ ID NO: 479)



SEVEFSHEYWMRHALTLAKRALDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQ






NYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECA





ALLCYFFRMPRQVFNAQKKAQSSTD






Staphylococcusaureus TadA:



(SEQ ID NO: 451)



MGSHMTNDIYFMTLAIEEAKKAAQLGEVPIGAIITKDDEVIARAHNLRETLQQPTAHAEHIAIERAAKVLGSW






RLEGCTLYVTLEPCVMCAGTIVMSRIPRVVYGADDPKGGCSGSLMNLLQQSNFNHRAIVDKGVLKEACSTL





LTTFFKNLRANKKSTN






Bacillussubtilis TadA:



(SEQ ID NO: 452)



MTQDELYMKEAIKEAKKAEEKGEVPIGAVLVINGEIIARAHNLRETEQRSIAHAEMLVIDEACKALGTWRLE






GATLYVTLEPCPMCAGAVVLSRVEKVVFGAFDPKGGCSGTLMNLLQEERFNHQAEVVSGVLEEECGGMLS





AFFRELRKKKKAARKNLSE






Salmonella typhimurium (S.typhimurium) TadA:



(SEQ ID NO: 453)



MPPAFITGVTSLSDVELDHEYWMRHALTLAKRAWDEREVPVGAVLVHNHRVIGEGWNRPIGRHDPTAHAEI






MALRQGGLVLQNYRLLDTTLYVTLEPCVMCAGAMVHSRIGRVVFGARDAKTGAAGSLIDVLHHPGMNHR





VEIIEGVLRDECATLLSDFFRMRRQEIKALKKADRAEGAGPAV






Shewanella putrefaciens (S.putrefaciens)TadA:



(SEQ ID NO: 454)



MDEYWMQVAMQMAEKAEAAGEVPVGAVLVKDGQQIATGYNLSISQHDPTAHAEILCLRSAGKKLENYRL






LDATLYITLEPCAMCAGAMVHSRIARVVYGARDEKTGAAGTVVNLLQHPAFNHQVEVTSGVLAEACSAQL





SRFFKRRRDEKKALKLAQRAQQGIE






Haemophilusinfluenzae F3031 (H. influenzae) TadA:



(SEQ ID NO: 455)



MDAAKVRSEFDEKMMRYALELADKAEALGEIPVGAVLVDDARNIIGEGWNLSIVQSDPTAHAEIIALRNGA






KNIQNYRLLNSTLYVTLEPCTMCAGAILHSRIKRLVFGASDYKTGAIGSRFHFFDDYKMNHTLEITSGVLAEE





CSQKLSTFFQKRREEKKIEKALLKSLSDK






Caulobactercrescentus (C. crescentus) TadA:



(SEQ ID NO: 456)



MRTDESEDQDHRMMRLALDAARAAAEAGETPVGAVILDPSTGEVIATAGNGPIAAHDPTAHAEIAAMRAA






AAKLGNYRLTDLTLVVTLEPCAMCAGAISHARIGRVVFGADDPKGGAVVHGPKFFAQPTCHWRPEVTGGV





LADESADLLRGFFRARRKAKI






Geobactersulfurreducens (G. sulfurreducens) TadA:



(SEQ ID NO: 457)



MSSLKKTPIRDDAYWMGKAIREAAKAAARDEVPIGAVIVRDGAVIGRGHNLREGSNDPSAHAEMI






AIRQAARRSANWRLTGATLYVTLEPCLMCMGAIILARLERVVFGCYDPKGGAAGSLYDLSADPRL





NHQVRLSPGVCQEECGTMLSDFFRDLRRRKKAKATPALFIDERKVPPEP






Streptococcuspyogenes (S. pyogenes) TadA



(SEQ ID NO: 491)



MPYSLEEQTYFMQEALKEAEKSLQKAEIPIGCVIVKDGEIIGRGHNAREESNQAIMHAEIMAINEAN






AHEGNWRLLDTTLFVTIEPCVMCSGAIGLARIPHVIYGASNQKFGGADSLYQILTDERLNHRVQVE





RGLLAADCANIMQTFFRQGRERKKIAKHLIKEQSDPFD





TadA7.10:


(SEQ ID NO: 492)



SEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQ






GGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNH





RVEITEGILADECAALLCYFFRMPRQVFNAQKKAQSSTD





TadA7.10 (V106W) (E. coli)


(SEQ ID NO: 493)



SEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQ






GGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGWRNAKTGAAGSLMDVLHYPGMNH





RVEITEGILADECAALLCYFFRMPRQVFNAQKKAQSSTD





TadA-8e (E. coli)


(SEQ ID NO: 494)



SEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQ






GGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNSKRGAAGSLMNVLNYPGMNH





RVEITEGILADECAALLCDFYRMPRQVFNAQKKAQSSIN





TadA-8e (V106W) (E. coli)


(SEQ ID NO: 495)



SEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQ






GGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGWRNSKRGAAGSLMNVLNYPGMNH





RVEITEGILADECAALLCDFYRMPRQVFNAQKKAQSSIN






In some embodiments, the adenosine deaminase domain comprises a E. coli TadA (SEQ ID NO: 314). Additional non-limiting examples of ecTadA deaminase mutants suitable for the adenine nucleobase editors of the disclosure are provided in Table 1. More specifically, the mutations in ecTadA and constructs expressing nucleobase editors comprising the modified ecTadA contemplated for use in the disclosed nucleobase editors are provided in Table 1.









TABLE 1







EcTadA mutants for A to G nucleobase editor









Name
Construct Architecture
Mutations in TadA





pNMG-142
pCMV_ecTadA_XTEN_
wild-type



Cas9n_SGGS_NLS



pNMG-143
pCMV_ecTadA_XTEN_
D108N



Cas9n_SGGS_NLS



pNMG-144
pCMV_ecTadA_XTEN_
A106V_D108N



Cas9n_SGGS_NLS



pNMG-145
pCMV_ecTadA_XTEN_
D108G



Cas9n_SGGS_NLS



pNMG-146
pCMV_ecTadA_XTEN_
R107C_D108N



Cas9n_SGGS_NLS



pNMG-147
pCMV_ecTadA_XTEN_
D108V



Cas9n_SGGS_NLS



pNMG-155
pCMV_ecTadA_XTEN_
D108N



dead Cas9_




SGGS_UGI_NLS



pNMG-156
pCMV_ecTadA_XTEN_
D108N



nCas9_SGGS_




UGI_SGGS_NLS



pNMG-157
pCMV_ecTadA_XTEN_
D108G



deadCas9_SGGS_




UGI_SGGS_NLS



pNMG-158
pCMV_ecTadA_XTEN_
D108G



nCas9_SGGS_




UGI_SGGS_NLS



pNMG-160
pCMV_ecTadA_XTEN_
D108N



nCas9_SGGS_AAG*




(E125Q)_SGGS_NLS



pNMG-161
pCMV_ecTadA_XTEN_
D108N



Cas9n_SGGS_




EndoVID35ALNLS



pNMG-162
pCMV_ecTadA_XTEN_
H8Y_D108N_S127S_



Cas9n_SGGS_NLS
D147Y_Q154H


pNMG-163
pCMV_ecTadA_XTEN_
H8Y_R24W_D108N_



Cas9n_SGGS_NLS
N127S_D147Y_E155V


pNMG-164
pCMV_ecTadA_XTEN_
D108N_D147Y_E155V



Cas9n_SGGS_NLS



pNMG-165
pCMV_ecTadA_XTEN_
H8Y_D108N_S127S



Cas9n_SGGS_NLS



pNMG-171
pCMV_Cas9n_XTEN_
wild-type



ecTadA_SGGS_NLS



pNMG-172
pCMV_Cas9n_XTEN_
D108N



ecTadA_SGGS_NLS



pNMG-173
pCMV_Cas9n_XTEN_
H8Y_D108N_N127S_



ecTadA_SGGS_NLS
D147Y_Q154H


pNMG-174
pCMV_Cas9n_XTEN_
H8Y_R24W_D108N_



ecTadA_SGGS_NLS
N127S_D147Y_E155V


pNMG-175
pCMV_Cas9n_XTEN_
D108N_D147Y_E155V



ecTadA_SGGS_NLS



pNMG-176
pCMV_Cas9n_XTEN_
H8Y_D108N_S127S



ecTadA_SGGS_NLS



pNMG-177
pCMV_ecTadA_XTEN_
A106V_D108N_



Cas9n_SGGS_NLS
D147Y_E155V


pNMG-178
pCMV_ecTadA_XTEN_
D108N_D147Y_E155V



Cas9n_SGGS_




UGI_SGGS_NLS



pNMG-179
pCMV_ecTadA_
A106V_D108N_



XTEN_Cas9n_
D147Y_E155V



SGGS_AAG*(E125Q)_




SGGS_NLS



pNMG-180
pCMV_ecTadA_XTEN_
A106V_D108N_



Cas9n_SGGS_
D147Y_E155V



UGI_SGGS_NLS



pNMG-181
pCMV_ecTadA_XTEN_
D108N_D147Y_E155V



Cas9n_SGGS_AAG*




(E125Q)_SGGS_NLS



pNMG-182
pCMV_ecTadA_SGGS_
D108N_D147Y_E155V



nCas9_SGGS_NLS



pNMG-183
pCMV_ecTadA_(SGGS)2-
D108N_D147Y_E155V



XTEN-(SGGS)2_




nCas9_SGGS_NLS



pNMG-235
pCMV_ecTadA_XTEN_
A106V_D108N_



Cas9n_XTEN_AAG*
D147Y_E155V



(E125A)_SGGS_NLS



pNMG-236
pCMV_ecTadA_XTEN_
A106V_D108N_



Cas9n_XTEN_AAG*
D147Y_E155V



(E125Q)_SGGS_NLS



pNMG-237
pCMV_ecTadA_XTEN_
A106V_D108N_



Cas9n_XTEN_
D147Y_E155V



AAG*(wt)_SGGS_NLS



pNMG-238
pCMV_AAG*(E125A)_
A106V_D108N_



XTEN_ecTadA_
D147Y_E155V



XTEN_Cas9n_SGGS_NLS



pNMG-239
pCMV_AAG*(wt)_
A106V_D108N_



XTEN_ecTadA_
D147Y_E155V



XTEN_Cas9n_SGGS_NLS



pNMG-240
pCMV_ecTadA_XTEN_
A106V_D108N_



Cas9n_XTEN_
D147Y_E155V



EndoV&(D35A)_SGGS_NLS



pNMG-241
pCMV_ecTadA_XTEN_
A106V_D108N_



Cas9n_XTEN_
D147Y_E155V



EndoV*(wt)_SGGS_NLS



pNMG-242
pCMV_EndoVID35A)_
A106V_D108N_



XTEN_ecTadA_
D147Y_E155V



XTEN_Cas9n_SGGS_NLS



pNMG-243
pCMV_EndoV*(wt)_
A106V_D108N_



XTEN_ecTadA_




XTEN_Cas9n_SGGS_NLS
D147Y_E155V


pNMG-247
pCMV_ecTadA_XTEN_Cas9
wild-type



(wild-type)_SGGS_NLS



pNMG-248
pCMV_ecTadA_XTEN_Cas9
D108N_D147Y_



(wild-type)_SGGS_NLS
E155V


pNMG-249
pCMV_ecTadA_XTEN_Cas9
A106V_D108N_



(wild-type)_SGGS_NLS
D147Y_E155V


pNMG-250
pCMV_ecTadA_XTEN_
D108N_D147Y_



Cas9 (wild-type)_
E155V



SGGS_UGI_SGGS_NLS



pNMG-251
pCMV_ecTadA_XTEN_
A106V_D108N_



Cas9 (wild-type)_SGGS_
D147Y_E155V



AAG*(E125Q)_SGGS_NLS



pNMG-274
pCMV_ecTadA_SGGS_NLS
wild-type



(no Cas9 fusion)



pNMG-275
pCMV_ecTadA_SGGS_NLS
A106V_D108N_



(no Cas9 fusion)
D147Y_E155V


pNMG-276
pCMV_ecTadA-(SGGS)2-
(wild-type) +



XTEN-(SGGS)2_
(wild-type)



ecTadA_XTEN_nCas9_




SGGS_NLS



pNMG-277
pCMV_ecTadA-(SGGS)2-
(A106V_D108N_



XTEN-(SGGS)2_
D147Y_E155V) +



ecTadA_XTEN_nCas9_
(A106V_D108N_



SGGS_NLS
D147Y_E155V)


pNMG-278
pCMV_ecTadA_XTEN_
D108Q_D147Y_



nCas9_SGGS_NLS
E155V


pNMG-279
pCMV_ecTadA_XTEN_
D108M_D147Y_



nCas9_SGGS_NLS
E155V


pNMG-280
pCMV_ecTadA_XTEN_
D108L_D147Y_



nCas9_SGGS_NLS
E155V


pNMG-281
pCMV_ecTadA_XTEN_
D108K_D147Y_



nCas9_SGGS_NLS
E155V


pNMG-282
pCMV_ecTadA_XTEN_
D108I_D147Y_



nCas9_SGGS_NLS
E155V


pNMG-283
pCMV_ecTadA_XTEN_
D108F_D147Y_



nCas9_SGGS_NLS
E155V


pNMG-284
pCMV_ecTadA_LONGER
(wild-type) +



LINKER (92 a.a.)_
(A106V_D108N_



ecTadA_XTEN_nCas9_
D147Y_E155V)



SGGS_NLS



pNMG-285
pCMV_ecTadA_LONGER
(A106V_D108N_



LINKER (92 a.a.)_
D147Y_



ecTadA_XTEN_nCas9_
E155V) + (A106V_



SGGS_NLS
D108N_D147Y)


pNMG-285b
pCMV_ecTadA_LONGER
(A106V_D108N_



LINKER (92 a.a.)_
D147Y_



ecTadA_XTEN_nCas9_
E155V) + (A106V_



SGGS_NLS
D108N_D147Y)


pNMG-286
pCMV_ecTadA_XTEN_
A106V_D108M_



nCas9_SGGS_NLS
D147Y_E155V


pNMG-287
pCMV_ecTadA-(SGGS)2-
(A106V_D108N_



XTEN-(SGGS)2_
D147Y_E155V) +



ecTadA_XTEN-nCas9
(A106V_D108N_



(S. aureus)_SGGS_NLS
D147Y_E155V)


pNMG-289
pCMV_ecTadA-(SGGS)2-
(A106V_D108N_



XTEN-(SGGS)2_
D147Y_E155V) +



ecTadA_XTEN_nCas9_
(A106V_D108N_



SGGS_UGI_NLS
D147Y_E155V)


pNMG-290
pCMV_ecTadA-(SGGS)2-
(A106V_D108N_



XTEN-(SGGS)2_ecTadA_
D147Y_E155V) +



(SGGS)2-XTEN-(SGGS)2_
(A106V_D108N_



nCas9_SGGS_UGI_NLS
D147Y_E155V)


pNMG-293
pCMV_ecTadA_XTEN_
E59A_A106V_



Cas9n_SGGS_NLS
D108N_




D147Y_E155V


pNMG-294
pCMV_ecTadA_XTEN_
E59A



Cas9n_SGGS_NLS



pNMG-295
pCMV_ecTadA_SGGS_NLS
E59A



(no Cas9 fusion)



pNMG-296
pCMV_ecTadA_SGGS_NLS
E59A cat dead_



(no Cas9 fusion)
A106V_D108N_




D147Y_E155V


pNMG-297
pCMV_ecTadA-(SGGS)2-
(A106V_D108N_



XTEN-(SGGS)2_
D147Y_E155V) +



ecTadA_XTEN_nCas9_
(wild-type)



SGGS_NLS



pNMG-298
pCMV_ecTadA-(SGGS)2-
(D108M_D147Y_



XTEN-(SGGS)2_
E155V) + (D108M_



ecTadA_XTEN_nCas9_
D147Y_E155V)



SGGS_NLS



pNMG-320
pCMV_ecTadA-(SGGS)2-
(wild-type) +



XTEN-(SGGS)2_
(A106V_



ecTadA_XTEN_nCas9_
D108N_D147Y_



SGGS_NLS
E155V)


pNMG-321
pCMV_ecTadA-(SGGS)2-
(E59A_A106V_



XTEN-(SGGS)2_
D108N_



ecTadA_XTEN_nCas9_
D147Y_E155V) +



SGGS_NLS
(A106V_D108N_




D147Y_E155V)


pNMG-322
pCMV_ecTadA-(SGGS)2-
(A106V_D108N_



XTEN-(SGGS)2_
D147Y_



ecTadA_XTEN_nCas9_
E155V) + (E59A_



SGGS_NLS
A106V_D108N_




D147Y_E155V)


pNMG-335
pCMV_TadA3p-XTEN-
wild-type



TadA2p-XTEN-nCas9-NLS



pNMG-336
pCMV_ecTadA_(SGGS)2-
L84F_A106V_



XTEN-(SGGS)2_
D108N_H123Y_



nCas9_SGGS_UGI_
D147Y_E155V_



SGGS_NLS
I156Y


pNMG-337
pCMV_ecTadA_(SGGS)2-
A106V_D108N_



XTEN-(SGGS)2_
D147Y_E155V



nCas9_SGGS_UGI_




SGGS_NLS



pNMG-338
pCMV_ecTadA_(SGGS)2-
L84F_A106V_



XTEN-(SGGS)2_
D108N_H123Y_



nCas9_SGGS_UGI_
D147Y_E155V_



SGGS_NLS
I156F


pNMG-339
pCMV_ecTadA-(SGGS)2-
(L84F_A106V_



XTEN-(SGGS)2_
D108N_



ecTadA_(SGGS)2-
H123Y_D147Y_



XTEN-(SGGS)2_nCas9_
E155V_I156Y) +



SGGS_UGI_SGGS_NLS
(L84F_A106V_




D108N_




H123Y_D147Y_




E155V_I156Y)


pNMG-340
pCMV_ecTadA-(SGGS)
(A106V_D108N_



2-XTEN-(SGGS)2_ecTadA_
D147Y_E155V) +



(SGGS)2-XTEN-(SGGS)2_
(A106V_D108N_



nCas9_SGGS_UGI_
D147Y_E155V)



SGGS_NLS



pNMG-341
pCMV_ecTadA-(SGGS)2-
(L84F_A106V_



XTEN-(SGGS)2_
D108N_



ecTadA_(SGGS)2-XTEN-
H123Y_D147Y_



(SGGS)2_nCas9_SGGS_
E155V_I156F) +



UGI_SGGS_NLS
(L84F_A106V_




D108N_




H123Y_D147Y_




E155V_I156F)


pNMG-345
pCMV_S. aureusTadA-
wild-type



(SGGS)2-XTEN-(SGGS)2-





S.aureusTadA-(SGGS)2-





XTEN-(SGGS)2-nCas9_S




SGGS_NL



pNMG-346
pCMV_S. aureusTadA-
(D108N) +



(SGGS)2-XTEN-(SGGS)2-
(D108N)




S.aureusTadA-(SGGS)2-





XTEN-(SGGS)2-nCas9_




SGGS_NLS



pNMG-347
pCMV_S. aureusTadA-
(D107A_D018N) +



(SGGS)2-XTEN-(SGGS)2-
(D107A_D108N)




S.aureusTadA-(SGGS)2-





XTEN-(SGGS)2-nCas9_




SGGS_NLS



pNMG-348
pCMV_S. aureusTadA-
(G26P_D107A_



(SGGS)2-XTEN-(SGGS)2-
D108N) + (G26P_




S.aureusTadA-(SGGS)2-

D107A_D108N)



XTEN-(SGGS)2-nCas9_




SGGS_NLS



pNMG-349
pCMV_S. aureusTadA-
(G26P_D107A_



(SGGS)2-XTEN-(SGGS)2-
D108N_S142A) +




S.aureusTadA-(SGGS)2-

(G26P_D107A_



XTEN-(SGGS)2-nCas9_
D108N_S142A)



SGGS_NLS



pNMG-350
pCMV_S. aureusTadA-
(D104A_D108N_



(SGGS)2-XTEN-(SGGS)2-
S142A) + (D107A_




S.aureusTadA-(SGGS)2-

D108N_S142A)



XTEN-(SGGS)2-nCas9_




SGGS_NLS



pNMG-351
pCMV_ecTadA_(SGGS)2-
(R26G_L84F_



XTEN-(SGGS)2_
A106V_



nCas9_SGGS_NLS
R107H_D108N_




H123Y_A142N_




A143D_D147Y_




E155V_I156F)


pNMG-352
pCMV_ecTadA_(SGGS)2-
(E25G_R26G_



XTEN-(SGGS)2_
L84F_A106V_



nCas9_SGGS_NLS
R107H_D108N_




H123Y_A142N_




A143D_D147Y_




E155V_I156F)


pNMG-353
pCMV_ecTadA_(SGGS)2-
(E25D_R26G_



XTEN-(SGGS)2_
L84F_A106V_



nCas9_SGGS_NLS
R107K_D108N_




H123Y_A142N_




A143G_D147Y_




E155V_I156F)


pNMG-354
pCMV_ecTadA_(SGGS)2-
(R26Q_L84F_



XTEN-(SGGS)2_
A106V_



nCas9_SGGS_NLS
D108N_H123Y_




A142N_D147Y_




E155V_I156F)


pNMG-355
pCMV_ecTadA_(SGGS)2-
(E25M_R26G_



XTEN-(SGGS)2_
L84F_A106V_



nCas9_SGGS_NLS
R107P_D108N_




H123Y_A142N_




A143D_D147Y_




E155V_I156F)


pNMG-356
pCMV_ecTadA_(SGGS)2-
(R26C_L84F_



XTEN-(SGGS)2_
A106V_R107H_



nCas9_SGGS_NLS
D108N_H123Y_




A142N_D147Y_




E155V_I156F)


pNMG-357
pCMV_ecTadA_(SGGS)2-
(L84F_A106V_



XTEN-(SGGS)2_
D108N_



nCas9_SGGS_NLS
H123Y_A142N_




A143L_D147Y_




E155V_I156F)


pNMG-358
pCMV_ecTadA_(SGGS)2-
(R26G_L84F_A106V_



XTEN-(SGGS)2_
D108N_H123Y_



nCas9_SGGS_NLS
A142N_D147Y_




E155V_I156F)


pNMG-359
pCMV_ecTadA_(SGGS)2-
(E25A_R26G_



XTEN-(SGGS)2_
L84F_A106V_



nCas9_SGGS_NLS
R107N_D108N_




H123Y_A142N_




A143E_D147Y_




E155V_I156F)


pNMG-360
pCMV_ecTadA-(SGGS)
(R26G_L84F_



2-XTEN-(SGGS)2-
A106V_R107H_



ecTadA-(SGGS)2-XTEN-
D108N_H123Y_



(SGGS)2_nCas9_
A142N_A143D_



SGGS_NLS
D147Y_E155V_




I156F) + (R26G_




L84F_A106V_




R107H_D108N_




H123Y_A142N_




A143D_D147Y_




E155V_I156F)


pNMG-361
pCMV_ecTadA-(SGGS)
(E25G_R26G_



2-XTEN-(SGGS)2-
L84F_



ecTadA-(SGGS)2-XTEN-
A106V_R107H_



(SGGS)2_nCas9_
D108N_H123Y_



SGGS_NLS
A142N_A143D_




D147Y_E155V_




I156F) X 2


pNMG-362
pCMV_ecTadA-(SGGS)
(E25G_R26G_



2-XTEN-(SGGS)2-
L84F_



ecTadA-(SGGS)2-XTEN-
A106V_R107H_



(SGGS)2_nCas9_
D108N_H123Y_



SGGS_NLS
A142N_A143D_




D147Y_E155V_




I156F) X 2


pNMG-363
pCMV_ecTadA-(SGGS)
(R26Q_L84F_



2-XTEN-(SGGS)2-
A106V_D108N_



ecTadA-(SGGS)2-XTEN-
H123Y_A142N_



(SGGS)2_nCas9_
D147Y_E155V_



SGGS_NLS
I156F) X 2


pNMG-364
pCMV_ecTadA-(SGGS)
(E25M_R26G_L84F_



2-XTEN-(SGGS)2-
A106V_R107P_



ecTadA-(SGGS)2-XTEN-
D108N_H123Y_



(SGGS)2_nCas9_
A142N_A143D_



SGGS_NLS
D147Y_E155V_




I156F) X 2


pNMG-365
pCMV_ecTadA-(SGGS)
(R26C_L84F_



2-XTEN-(SGGS)2-
A106V_



ecTadA-(SGGS)2-XTEN-
R107H_D108N_



(SGGS)2_nCas9_
H123Y_A142N_



SGGS_NLS
D147Y_E155V_




I156F) X 2


pNMG-366
pCMV_ecTadA-(SGGS)
(L84F_A106V_



2-XTEN-(SGGS)2-
D108N_H123Y_



ecTadA-(SGGS)2-XTEN-
A142N_A143L_



(SGGS)2_nCas9_
D147Y_E155V_



SGGS_NLS
I156F) X 2


pNMG-367
pCMV_ecTadA-(SGGS)
(R26G_L84F_



2-XTEN-(SGGS)2-
A106V_D108N_



ecTadA-(SGGS)2-XTEN-
H123Y_A142N_



(SGGS)2_nCas9_
D147Y_E155V_



SGGS_NLS
I156F) X 2


pNMG-368
pCMV_ecTadA-(SGGS)
(E25A_R26G_



2-XTEN-(SGGS)2-
L84F_



ecTadA-(SGGS)2-XTEN-
A106V_R107N_



(SGGS)2_nCas9_
D108N_H123Y_



SGGS_NLS
A142N_A143E_




D147Y_E155V_




I156F) X 2


pNMG-369
pCMV_ecTadA-(SGGS)2-
(L84F_A106V_



XTEN-(SGGS)2-
D108N_H123Y_



ecTadA-(SGGS)2-XTEN-
D147Y_E155V_



(SGGS)2_nCas9_
I156Y) + (L84F_



SGGS_NLS
A106V_D108N_




H123Y_D147Y_




E155V_I156Y)


pNMG-370
pCMV_ecTadA-(SGGS)
(A106V_D108N_



2-XTEN-(SGGS)2-
D147Y_E155V) +



ecTadA-(SGGS)2-XTEN-
(A106V_D108N_



(SGGS)2_nCas9_
D147Y_E155V)



SGGS_NLS



pNMG-371
pCMV_ecTadA-(SGGS)2-
(L84F_A106V_



XTEN-(SGGS)2-
D108N_H123Y_



ecTadA-(SGGS)2-XTEN-
D147Y_E155V_



(SGGS)2_nCas9_
I156F) + (L84F_



SGGS_NLS
A106V_D108N_




H123Y_D147Y_




E155V_I156F)


pNMG-372
pCMV_ecTadA_(SGGS)
A106V_D108N_



2-XTEN-(SGGS)2_
A142N_D147Y_



Cas9n_SGGS_NLS
E155V


pNMG-373
pCMV_ecTadA_(SGGS)
R26G_A106V_



2-XTEN-(SGGS)2_
D108N_A142N_



Cas9n_SGGS_NLS
D147Y_E155V


pNMG-374
pCMV_ecTadA_(SGGS)2-
E25D_R26G_



XTEN-(SGGS)2_
A106V_R107K_



Cas9n_SGGS_NLS
D108N_A142N_




A143G_D147Y_




E155V


pNMG-375
pCMV_ecTadA_(SGGS)2-
R26G_A106V_



XTEN-(SGGS)2_
D108N_R107H_



Cas9n_SGGS_NLS
A142N_A143D_




D147Y_E155V


pNMG-376
pCMV_ecTadA_(SGGS)2-
E25D_R26G_



XTEN-(SGGS)2_
A106V_D108N_



Cas9n_SGGS_NLS
A142N_D147Y_




E155V


pNMG-377
pCMV_ecTadA_(SGGS)2-
A106V_R107K_



XTEN-(SGGS)2_
D108N_A142N_



Cas9n_SGGS_NLS
D147Y_E155V


pNMG-378
pCMV_ecTadA_(SGGS)2-
A106V_D108N_



XTEN-(SGGS)2_
A142N_A143G_



Cas9n_SGGS_NLS
D147Y_E155V


pNMG-379
pCMV_ecTadA_(SGGS)2-
A106V_D108N_



XTEN-(SGGS)2_
A142N_A143L_



Cas9n_SGGS_NLS
D147Y_E155V


pNMG-382
pCMV_ecTadA-(SGGS)2-
A106V_D108N_



XTEN-(SGGS)2-
A142N_D147Y_



ecTadA-(SGGS)2-
E155V X 2



XTEN-(SGGS)2_




nCas9_SGGS_NLS



pNMG-383
pCMV_ecTadA-(SGGS)2-
R26G_A106V_



XTEN-(SGGS)2-
D108N_A142N_



ecTadA-(SGGS)2-
D147Y_E155V X 2



XTEN-(SGGS)2_




nCas9_SGGS_NLS



pNMG-384
pCMV_ecTadA-(SGGS)2-
E25D_R26G_



XTEN-(SGGS)2-
A106V_R107K_



ecTadA-(SGGS)2-
D108N_A142N_



XTEN-(SGGS)2_
A143G_D147Y_



nCas9_SGGS_NLS
E155V X 2


pNMG-385
pCMV_ecTadA-(SGGS)2-
R26G_A106V_



XTEN-(SGGS)2-
D108N_



ecTadA-(SGGS)2-
R107H_A142N_



XTEN-(SGGS)2_
A143D_D147Y_



nCas9_SGGS_NLS
E155V X 2


pNMG-386
pCMV_ecTadA-(SGGS)2-
E25D_R26G_



XTEN-(SGGS)2-
A106V_D108N_



ecTadA-(SGGS)2-
A142N_D147Y_



XTEN-(SGGS)2_
E155V X 2



nCas9_SGGS_NLS



pNMG-387
pCMV_ecTadA-(SGGS)2-
A106V_R107K_



XTEN-(SGGS)2-
D108N_



ecTadA-(SGGS)2-
A142N_D147Y_



XTEN-(SGGS)2_
E155V X 2



nCas9_SGGS_NLS



pNMG-388
pCMV_ecTadA-(SGGS)2-
A106V_D108N_



XTEN-(SGGS)2-
A142N_



ecTadA-(SGGS)2-
A143G_D147Y_



XTEN-(SGGS)2_
E155V X 2



nCas9_SGGS_NLS



pNMG-389
pCMV_ecTadA-(SGGS)2-
A106V_D108N_



XTEN-(SGGS)2-
A142N_



ecTadA-(SGGS)2-
A143L_D147Y_



XTEN-(SGGS)2_
E155V X 2



nCas9_SGGS_NLS



pNMG-391
pCMV_ecTadA_(SGGS)2-
H36L_R51L_



XTEN-(SGGS)2_
L84F_



Cas9n_SGGS_
A106V_D108N_



UGI_SGGS_NLS
H123Y_S146C_




D147Y_E155V_




I156F_K157N


pNMG-392
pCMV_ecTadA_(SGGS)2-
N37T_P48T_



XTEN-(SGGS)2_
M70L_



Cas9n_SGGS_
L84F_A106V_



UGI_SGGS_NLS
D108N_H123Y_




D147Y_149V_




E155V_I156F


pNMG-393
pCMV_ecTadA_(SGGS)2-
N37S_L84F_



XTEN-(SGGS)2_
A106V_D108N_



Cas9n_SGGS_
H123Y_D147Y_



UGI_SGGS_NLS
E155V_I156F_




K161T


pNMG-394
pCMV_ecTadA_(SGGS)2-
H36L_L84F_



XTEN-(SGGS)2_
A106V_D108N_



Cas9n_SGGS_
H123Y_D147Y_



UGI_SGGS_NLS
Q154H_E155V_




I156F


pNMG-395
pCMV_ecTadA_(SGGS)2-
N72S_L84F_



XTEN-(SGGS)2_
A106V_D108N_



Cas9n_SGGS_
H123Y_S146R_



UGI_SGGS_NLS
D147Y_E155V_




I156F


pNMG-396
pCMV_ecTadA_(SGGS)2-
H36L_P48L_L84F_



XTEN-(SGGS)2_
A106V_D108N_



Cas9n_SGGS_
H123Y_E134G_



UGI_SGGS_NLS
D147Y_E155V_




I156F


pNMG-397
pCMV_ecTadA_(SGGS)2-
H36L_L84F_



XTEN-(SGGS)2_
A106V_D108N_



Cas9n_SGGS_
H123Y_D147Y_



UGI_SGGS_NLS
E155V_I156F_




K157N


pNMG-398
pCMV_ecTadA_(SGGS)2-
H36L_L84F_



XTEN-(SGGS)2_
A106V_D108N_



Cas9n_SGGS_
H123Y_S146C_



UGI_SGGS_NLS
D147Y_E155V_




I156F


pNMG-399
pCMV_ecTadA_(SGGS)2-
L84F_A106V_



XTEN-(SGGS)2_
D108N_H123Y_



Cas9n_SGGS_
S146R_D147Y_



UGI_SGGS_NLS
E155V_I156F_




K161T


pNMG-400
pCMV_ecTadA_(SGGS)2-
N37S_R51H_



XTEN-(SGGS)2_
D77G_L84F_



Cas9n_SGGS_
A106V_D108N_



UGI_SGGS_NLS
H123Y_D147Y_




E155V_I156F


pNMG-401
pCMV_ecTadA_(SGGS)2-
R51L_L84F_



XTEN-(SGGS)2_
A106V_D108N_



Cas9n_SGGS_
H123Y_D147Y_



UGI_SGGS_NLS
E155V_I156F_




K157N


pNMG-402
pCMV_ecTadA-(SGGS)2-
(H36L_R51L_L84F_



XTEN-(SGGS)2-ecTadA-
A106V_D108N_



(SGGS)2-XTEN-
H123Y_S146C_



(SGGS)2_nCas9_
D147Y_E155V_



SGGS_NLS
I156F_K157N) x 2


pNMG-403
pCMV_ecTadA-(SGGS)2-
(N37T_P48T_



XTEN-(SGGS)2-ecTadA-
M70L_L84F_



(SGGS)2-XTEN-
A106V_D108N_



(SGGS)2_nCas9_
H123Y_D147Y_



SGGS_NLS
I49V_E155V_




I156F) x 2


pNMG-404
pCMV_ecTadA-(SGGS)2-
(N37S_L84F_



XTEN-(SGGS)2-ecTadA-
A106V_D108N_



(SGGS)2-XTEN-
H123Y_D147Y_



(SGGS)2_nCas9_
E155V_I156F_



SGGS_NLS
K161T) x 2


pNMG-405
pCMV_ecTadA-(SGGS)2-
(H36L_L84F_



XTEN-(SGGS)2-ecTadA-
A106V_D108N_



(SGGS)2-XTEN-
H123Y_D147Y_



(SGGS)2_nCas9_
Q154H_E155V_



SGGS_NLS
I156F) x 2


pNMG-406
pCMV_ecTadA-(SGGS)2-
(N72S_L84F_



XTEN-(SGGS)2-ecTadA-
A106V_D108N_



(SGGS)2-XTEN-
H123Y_S146R_



(SGGS)2_nCas9_
D147Y_E155V_



SGGS_NLS
I156F) x 2


pNMG-407
pCMV_ecTadA-(SGGS)2-
(H36L_P48L_L84F_



XTEN-(SGGS)2-ecTadA-
A106V_D108N_



(SGGS)2-XTEN-
H123Y_E134G_



(SGGS)2_nCas9_
D147Y_E155V_



SGGS_NLS
I156F) x 2


pNMG-408
pCMV_ecTadA-(SGGS)2-
(H36L_L84F_



XTEN-(SGGS)2-ecTadA-
A106V_D108N_



(SGGS)2-XTEN-
H123Y_D147Y_



(SGGS)2_nCas9_
E155V_I156F_



SGGS_NLS
K157N) x 2


pNMG-409
pCMV_ecTadA-(SGGS)2-
(H36L_L84F_



XTEN-(SGGS)2-ecTadA-
A106V_D108N_



(SGGS)2-XTEN-
H123Y_S146C_



(SGGS)2_nCas9_
D147Y_E155V_



SGGS_NLS
I156F) x 2


pNMG-410
pCMV_ecTadA-(SGGS)2-
(L84F_A106V_



XTEN-(SGGS)2-ecTadA-
D108N_H123Y_



(SGGS)2-XTEN-
S146R_D147Y_



(SGGS)2_nCas9_
E155V_I156F_



SGGS_NLS
K161T) x 2


pNMG-411
pCMV_ecTadA-(SGGS)2-
(N37S_R51H_D77G_



XTEN-(SGGS)2-ecTadA-
L84F_A106V_



(SGGS)2-XTEN-
D108N_H123Y_



(SGGS)2_nCas9_
D147Y_E155V_



SGGS_NLS
I156F) x 2


pNMG-412
pCMV_ecTadA-(SGGS)2-
(R51L_L84F_



XTEN-(SGGS)2-ecTadA-
A106V_D108N_



(SGGS)2-XTEN-
H123Y_D147Y_



(SGGS)2_nCas9_
E155V_I156F_



SGGS_NLS
K157N) x 2


pNMG-440
pCMV_ecTadA_
D24G_Q71R_



(SGGS)2-XTEN-
L84F_H96L_



(SGGS)2_Cas9n_SGGS_
A106V_D108N_



UGI_SGGS_NLS
H123Y_D147Y_




E155V_I156F_K160E


pNMG-441
pCMV_ecTadA_
H36L_G67V_



(SGGS)2-XTEN-
L84F_A106V_



(SGGS)2_Cas9n_SGGS_
D108N_H123Y_



UGI_SGGS_NLS
S146T_D147Y_




E155V_I156F


pNMG-442
pCMV_ecTadA_
Q71L_L84F_



(SGGS)2-XTEN-
A106V_D108N_



(SGGS)2_Cas9n_SGGS_
H123Y_L137M_



UGI_SGGS_NLS
A143E_D147Y_




E155V_I156F


pNMG-443
pCMV_ecTadA_
E25G_L84F_



(SGGS)2-XTEN-
A106V_



(SGGS)2_Cas9n_SGGS_
D108N_H123Y_



UGI_SGGS_NLS
D147Y_E155V_




I156F_Q159L


pNMG-444
pCMV_ecTadA_
L84F_A91T_



(SGGS)2-XTEN-
F104I_



(SGGS)2_Cas9n_SGGS_
A106V_D108N_



UGI_SGGS_NLS
H123Y_D147Y_




E155V_I156F


pNMG-445
pCMV_ecTadA_
N72D_L84F_



(SGGS)2-XTEN-
A106V_



(SGGS)2_Cas9n_SGGS_
D108N_H123Y_



UGI_SGGS_NLS
G125A_D147Y_




E155V_I156F


pNMG-446
pCMV_ecTadA_
P48S_L84F_



(SGGS)2-XTEN-
S97C_



(SGGS)2_Cas9n_SGGS_
A106V_D108N_



UGI_SGGS_NLS
H123Y_D147Y_




E155V_I156F


pNMG-447
pCMV_ecTadA_
W23G_L84F_



(SGGS)2-XTEN-
A106V_D108N_



(SGGS)2_Cas9n_SGGS_
H123Y_D147Y_



UGI_SGGS_NLS
E155V_I156F


pNMG-448
pCMV_ecTadA_
D24G_P48L_Q71R_



(SGGS)2-XTEN-
L84F_A106V_



(SGGS)2_Cas9n_SGGS_
D108N_H123Y_



UGI_SGGS_NLS
D147Y_E155V_




I156F_Q159L


pNMG-449
pCMV_ecTadA-
(D24G_Q71R_



(SGGS)2-XTEN-
L84F_H96L_



(SGGS)2-ecTadA-
A106V_D108N_



(SGGS)2-XTEN-
H123Y_D147Y_



(SGGS)2_nCas9_
E155V_I156F_



SGGS_NLS
K160E) x 2


pNMG-450
pCMV_ecTadA-
(H36L_G67V_



(SGGS)2-XTEN-
L84F_



(SGGS)2-ecTadA-
A106V_D108N_



(SGGS)2-XTEN-
H123Y_S146T_



(SGGS)2_nCas9_
D147Y_E155V_



SGGS_NLS
I156F) x 2


pNMG-451
pCMV_ecTadA-
(Q71L_L84F_



(SGGS)2-XTEN-
A106V_



(SGGS)2-ecTadA-
D108N_H123Y_



(SGGS)2-XTEN-
L137M_A143E_



(SGGS)2_nCas9_
D147Y_E155V_



SGGS_NLS
I156F) x 2


pNMG-452
pCMV_ecTadA-
(E25G_L84F_



(SGGS)2-XTEN-
A106V_D108N_



(SGGS)2-ecTadA-
H123Y_D147Y_



(SGGS)2-XTEN-
E155V_I156F_



(SGGS)2_nCas9_
Q159L) x 2



SGGS_NLS



pNMG-453
pCMV_ecTadA-
(L84F_A91T_



(SGGS)2-XTEN-
F1041_A106V_



(SGGS)2-ecTadA-
D108N_H123Y_



(SGGS)2-XTEN-
D147Y_E155V_



(SGGS)2_nCas9_
I156F) x 2



SGGS_NLS



pNMG-454
pCMV_ecTadA-
(N72D_L84F_



(SGGS)2-XTEN-
A106V_D108N_



(SGGS)2-ecTadA-
H123Y_G125A_



(SGGS)2-XTEN-
D147Y_E155V_



(SGGS)2_nCas9_
I156F) x 2



SGGS_NLS



pNMG-455
pCMV_ecTadA-
(P48S_L84F_



(SGGS)2-XTEN-
S97C_A106V_



(SGGS)2-ecTadA-
D108N_H123Y_



(SGGS)2-XTEN-
D147Y_E155V_



(SGGS)2_nCas9_
I156F) x 2



SGGS_NLS



pNMG-456
pCMV_ecTadA-
(W23G_L84F_



(SGGS)2-XTEN-
A106V_



(SGGS)2-ecTadA-
D108N_H123Y_



(SGGS)2-XTEN-
D147Y_E155V_



(SGGS)2_nCas9_
I156F) x 2



SGGS_NLS



pNMG-457
pCMV_ecTadA-
(D24G_P48L_



(SGGS)2-XTEN-
Q71R_L84F_



(SGGS)2-ecTadA-
A106V_D108N_



(SGGS)2-XTEN-
H123Y_D147Y_



(SGGS)2_nCas9_
E155V_I156F_



SGGS_NLS
Q159L) x 2


pNMG-473
pCMV_ecTadA_(SGGS)2-
L84F_A106V_



XTEN-(SGGS)2_
D108N_H123Y_



Cas9n_SGGS_
A142N_D147Y_



UGI_SGGS_NLS
E155V_I156F


pNMG-474
pCMV_ecTadA-
L84F_A106V_



(SGGS)2-XTEN-
D108N_H123Y_



(SGGS)2-ecTadA-
A142N_D147Y_



(SGGS)2-XTEN-
E155V_



(SGGS)2_nCas9_
I156F x 2



SGGS_NLS



pNMG-475
pCMV_ecTadA-
(wild-type) +



(SGGS)2-XTEN-
(A106V_D108N_



(SGGS)2-ecTadA-
D147Y_E155V)



(SGGS)2-XTEN-




(SGGS)2_nCas9_




SGGS_NLS



pNMG-476
pCMV_ecTadA-
(wild-type) +



(SGGS)2-XTEN-
(L84F_A106V_



(SGGS)2-ecTadA-
D108N_H123Y_



(SGGS)2-XTEN-
D147Y_E155V_



(SGGS)2_nCas9_
I156F)



SGGS_NLS



pNMG-477
pCMV_ecTadA-
(wild-type) +



(SGGS)2-XTEN-
(H36L_R51L_



(SGGS)2-ecTadA-
L84F_A106V_



(SGGS)2-XTEN-
D108N_H123Y_



(SGGS)2_nCas9_
S146C_D147Y_



SGGS_NLS
E155V_I156F_




K157N)


pNMG-478
pCMV_ecTadA-
(wild-type) +



(SGGS)2-XTEN-
(N37S_L84F_



(SGGS)2-ecTadA-
A106V_D108N_



(SGGS)2-XTEN-
H123Y_D147Y_



(SGGS)2_nCas9_
E155V_I156F_



SGGS_NLS
K161T)


pNMG-479
pCMV_ecTadA-
(wild-type) +



(SGGS)2-XTEN-
(L84F_A106V_



(SGGS)2-ecTadA-
D108N_H123Y_



(SGGS)2-XTEN-
S146R_D147Y_



(SGGS)2_nCas9_
E155V_I156F_



SGGS_NLS
K161T)


pNMG-480
pCMV_ecTadA_
wild-type



(SGGS)2-XTEN-




(SGGS)2_Cas9n_




SGGS_NLS



pNMG-481
pCMV_ecTadA_
A106V_D108N



(SGGS)2-XTEN-




(SGGS)2_Cas9n_




SGGS_NLS



pNMG-482
pCMV_ecTadA-
wild-type +



(SGGS)2-XTEN-
wild-type



(SGGS)2-ecTadA-




(SGGS)2-XTEN-




(SGGS)2_nCas9_




SGGS_NLS



pNMG-483
pCMV_ecTadA-(SGGS)2-
(A106V_



XTEN-(SGGS)2-
D108N) x 2



ecTadA-(SGGS)2-




XTEN-(SGGS)2_




nCas9_SGGS_NLS



pNMG-484
pCMV_ecTadA-(SGGS)2-
(wild-type) +



XTEN-(SGGS)2-
(A106V_D108N)



ecTadA-(SGGS)2-




XTEN-(SGGS)2_




nCas9_SGGS_NLS



pNMG-485
pCMV_ecTadA_(SGGS)2-
H36L_R51L_



XTEN-(SGGS)2_Cas9n_
L84F_A106V_



SGGS_UGI_
D108N_H123Y_



SGGS_NLS
A142N_S146C_




D147Y_E155V_




I156F_K157N


pNMG-486
pCMV_ecTadA_(SGGS)2-
N37S_L84F_



XTEN-(SGGS)2_Cas9n_
A106V_D108N_



SGGS_UGI_
H123Y_A142N_



SGGS_NLS
D147Y_E155V_




I156F_K161T


pNMG-487
pCMV_ecTadA_(SGGS)2-
L84F_A106V_



XTEN-(SGGS)2_Cas9n_
D108N_D147Y_



SGGS_UGI_
E155V_I156F



SGGS_NLS



pNMG-488
pCMV_ecTadA_(SGGS)2-
R51L_L84F_



XTEN-(SGGS)2_Cas9n_
A106V_D108N_



SGGS_UGI_
H123Y_S146C_



SGGS_NLS
D147Y_E155V_




I156F_K157N_K161T


pNMG-489
pCMV_ecTadA_(SGGS)2-
L84F_A106V_



XTEN-(SGGS)2_Cas9n_
D108N_H123Y_



SGGS_UGI_
S146C_D147Y_



SGGS_NLS
E155V_I156F_




K161T


pNMG-490
pCMV_ecTadA_(SGGS)2-
L84F_A106V_D108N_



XTEN-(SGGS)2_Cas9n_
H123Y_S146C_



SGGS_UGI_
D147Y_E155V_



SGGS_NLS
I156F_K157N_




K160E_K161T


pNMG-491
pCMV_ecTadA_(SGGS)2-
L84F_A106V_D108N_



XTEN-(SGGS)2_Cas9n_
H123Y_S146C_



SGGS_UGI_
D147Y_E155V_



SGGS_NLS
I156F_K157N_K160E


pNMG-492
pCMV_ecTadA-(SGGS)2-
(wt) + (L84F_



XTEN-(SGGS)2-
A106V_D108N_



ecTadA-(SGGS)2-XTEN-
H123Y_A142N_



(SGGS)2_nCas9_
D147Y_E155V_



SGGS_NLS
I156F)


pNMG-493
pCMV_ecTadA-(SGGS)2-
(wt) + (D24G_



XTEN-(SGGS)2-
Q71R_L84F_H96L_



ecTadA-(SGGS)2-XTEN-
A106V_D108N_



(SGGS)2_nCas9_
H123Y_D147Y_



SGGS_NLS
E155V_I156F_K160E)


pNMG-494
pCMV_ecTadA-(SGGS)2-
(wt) + (H36L_R51L_



XTEN-(SGGS)2-
L84F_A106V_D108N_



ecTadA-(SGGS)2-XTEN-
H123Y_A142N_



(SGGS)2_nCas9_
S146C_D147Y_



SGGS_NLS
E155V_I156F_K157N)


pNMG-495
pCMV_ecTadA-(SGGS)2-
(wt) + (N37S_



XTEN-(SGGS)2-
L84F_A106V_D108N_



ecTadA-(SGGS)2-XTEN-
H123Y_A142N_D147Y_



(SGGS)2_nCas9_
E155V_I156F_K161T)



SGGS_NLS



pNMG-496
pCMV_ecTadA-(SGGS)2-
(wt) + (L84F_



XTEN-(SGGS)2-
A106V_D108N_D147Y_



ecTadA-(SGGS)2-XTEN-
E155V_I156F)



(SGGS)2_nCas9_




SGGS_NLS



pNMG-497
pCMV_ecTadA-(SGGS)2-
(wt) + (R51L_



XTEN-(SGGS)2-
L84F_A106V_D108N_



ecTadA-(SGGS)2-XTEN-
H123Y_S146C_D147Y_



(SGGS)2_nCas9_
E155V_I156F_



SGGS_NLS
K157N_K161T)


pNMG-498
pCMV_ecTadA-(SGGS)2-
(wt) + (L84F_



XTEN-(SGGS)2-
A106V_D108N_H123Y_



ecTadA-(SGGS)2-XTEN-
S146C_D147Y_



(SGGS)2_nCas9_
E155V_



SGGS_NLS
I156F_K161T)


pNMG-499
pCMV_ecTadA-(SGGS)2-
(wt) + (L84F_



XTEN-(SGGS)2-
A106V_D108N_H123Y_



ecTadA-(SGGS)2-XTEN-
S146C_D147Y_E155V_



(SGGS)2_nCas9_
I156F_K157N_



SGGS_NLS
K160E_K161T)


pNMG-500
pCMV_ecTadA-(SGGS)2-
(wt) + (L84F_



XTEN-(SGGS)2-
A106V_D108N_H123Y_



ecTadA-(SGGS)2-XTEN-
S146C_D147Y_E155V_



(SGGS)2_nCas9_
I156F_K157N_K160E)



SGGS_NLS



pNMG-513
pCMV_ecTadA-92
(wt) + (L84F_



a.a.-ecTadA-32a.a._
A106V_D108N_H123Y_



nCas9_SGGS_NLS
D147Y_E155V_I156F)


pNMG-514
pCMV_ecTadA-92
(L84F_A106V_D108N_



a.a.-ecTadA-32a.a._
H123Y_D147Y_E155V_



nCas9_SGGS_NLS
I156F) + (L84F_




A106V_D108N_H123Y_




D147Y_E155V_I156F)


pNMG-515
pCMV_ecTadA-92
(wt) + (L84F_A106V_



a.a.-ecTadA-32a.a._
D108N_H123Y_D147Y_



nCas9_SGGS_NLS
E155V_I156F)


pNMG-516
pCMV_ecTadA-92
(L84F_A106V_D108N_



a.a.-ecTadA-32a.a._
H123Y_D147Y_E155V_



nCas9_SGGS_NLS
I156F) + (L84F_




A106V_D108N_H123Y_




D147Y_E155V_I156F)


pNMG-517
pCMV_ecTadA-92
(wt) + (L84F_



a.a.-ecTadA-32a.a._
A106V_D108N_H123Y_



nCas9_SGGS_NLS
D147Y_E155V_I156F)


pNMG-518
pCMV_ecTadA-92
(L84F_A106V_D108N_



a.a.-ecTadA-32a.a._
H123Y_D147Y_E155V_



nCas9_SGGS_NLS
I156F) + (L84F_A106V_




D108N_H123Y_D147Y_




E155V_I156F)


pNMG-519
pCMV_ecTadA- 32 a.a.-_
R74Q



nCas9_SGGS_NLS



pNMG-520
pCMV_ecTadA- 32 a.a.-_
R74Q



nCas9_SGGS_NLS
L84F_A106V_D108N_




H123Y_D147Y_E155V_




I156F


pNMG-521
pCMV_ecTadA- 32 a.a.-_
R74A_L84F_A106V_



nCas9_SGGS_NLS
D108N_H123Y_




D147Y_E155V_I156F


pNMG-522
pCMV_ecTadA- 32 a.a.-_
R98Q



nCas9_SGGS_NLS



pNMG-523
pCMV_ecTadA- 32 a.a.-_
R129Q



nCas9_SGGS_NLS



pNMG-524
pCMV_ecTadA-(SGGS)2-
(wt + R74Q) +



XTEN-(SGGS)2-
(L84F_A106V_



ecTadA-(SGGS)2-XTEN-
D108N_H123Y_D147Y_



(SGGS)2_nCas9_
E155V_I156F)



SGGS_NLS



pNMG-525
pCMV_ecTadA-(SGGS)2-
(wt + R74Q) +



XTEN-(SGGS)2-
(R74Q_L84F_A106V_



ecTadA-(SGGS)2-XTEN-
D108N_H123Y_D147Y_



(SGGS)2_nCas9_
E155V_I156F)



SGGS_NLS



pNMG-526
pCMV_ecTadA-(SGGS)2-
(R74A_L84F_A106V_



XTEN-(SGGS)2-
D108N_H123Y_D147Y_



ecTadA-(SGGS)2-XTEN-
E155V_I156F) +



(SGGS)2_nCas9_
(R74A_L84F_A106V_



SGGS_NLS
D108N_H123Y_D147Y_




E155V_I156F)


pNMG-527
pCMV_ecTadA-(SGGS)2-
(wt + R98Q) +



XTEN-(SGGS)2-
(L84F_R98Q_A106V_



ecTadA-(SGGS)2-XTEN-
D108N_H123Y_D147Y_



(SGGS)2_nCas9_
E155V_I156F)



SGGS_NLS



pNMG-528
pCMV_ecTadA-(SGGS)2-
(wt + R129Q) +



XTEN-(SGGS)2-
(L84F_A106V_D108N_



ecTadA-(SGGS)2-XTEN-
H123Y_R129Q_D147Y_



(SGGS)2_nCas9_
E155V_I156F)



SGGS_NLS



pNMG-529
pCMV_ecTadA-(SGGS)2-
(L84F_A106V_D108N_



XTEN-(SGGS)2-
H123Y_D147Y_E155V_



ecTadA-(SGGS)2-XTEN-
I156F) + (H36L_



(SGGS)2_nCas9_
R51L_L84F_A106V_



SGGS_NLS
D108N_H123Y_




S146C_D147Y_




E155V_I156F_K157N)


pNMG-530
pCMV_ecTadA-(SGGS)2-
(H36L_R51L_L84F_



XTEN-(SGGS)2-
A106V_D108N_H123Y_



ecTadA-(SGGS)2-XTEN-
S146C_D147Y_



(SGGS)2_nCas9_
E155V_I156F_K157N) +



SGGS_NLS
(L84F_A106V_D108N_




H123Y_D147Y_E155V_




I156F)


pNMG-543
pCMV_ecTadA-
(P48S_L84F_A106V_



(SGGS)2-XTEN-
D108N_H123Y_



(SGGS)2_nCas9_
A142N_D147Y_



SGGS_NLS
E155V_I156F)


pNMG-544
pCMV_ecTadA-
(P48T_I49V_L84F_



(SGGS)2-XTEN-
A106V_D108N_H123Y_



(SGGS)2_nCas9_
A142N_D147Y_



SGGS_NLS
E155V_I156F_L157N)


pNMG-545
pCMV_ecTadA-(SGGS)2-
P48S_A142N



XTEN-(SGGS)2_




nCas9_SGGS_NLS



pNMG-546
pCMV_ecTadA-(SGGS)2-
P48T_I49V_A142N



XTEN-(SGGS)2_




nCas9_SGGS_NLS



pNMG-547
pCMV_ecTadA-
(wt) + (P48S_L84F_



(SGGS)2-XTEN-
A106V_D108N_H123Y_



(SGGS)2-ecTadA-
A142N_D147Y_



(SGGS)2-XTEN-
E155V_I156F)



(SGGS)2_nCas9_




SGGS_NLS



pNMG-548
pCMV_ecTadA-
(P48S_L84F_A106V_



(SGGS)2-XTEN-
D108N_H123Y_A142N_



(SGGS)2-ecTadA-
D147Y_E155V_



(SGGS)2-XTEN-
I156F) + (P48S_L84F_



(SGGS)2_nCas9_
A106V_D108N_H123Y_



SGGS_NLS
A142N_D147Y_




E155V_I156F))


pNMG-549
pCMV_ecTadA-(SGGS)2-
(P48S_A142N) +



XTEN-(SGGS)2-ecTadA-
(P48S_L84F_A106V_



(SGGS)2-XTEN-
D108N_H123Y_



(SGGS)2_nCas9_
A142N_D147Y_



SGGS_NLS
E155V_I156F))


pNMG-550
pCMV_ecTadA-(SGGS)2-
(P48S_A142N) +



XTEN-(SGGS)2-
(L84F_A106V_D108N_



ecTadA-(SGGS)2-XTEN-
H123Y_D147Y_E155V_



(SGGS)2_nCas9_
I156F)



SGGS_NLS



pNMG-551
pCMV_ecTadA-(SGGS)2-
(wt) + (P48T_I49V_



XTEN-(SGGS)2-
L84F_A106V_D108N_



ecTadA-(SGGS)2-XTEN-
H123Y_A142N_



(SGGS)2_nCas9_
D147Y_E155V_I156F_



SGGS_NLS
L157N)


pNMG-552
pCMV_ecTadA-(SGGS)2-
(P48T_I49V_L84F_



XTEN-(SGGS)2-
A106V_D108N_



ecTadA-(SGGS)2-XTEN-
H123Y_A142N_



(SGGS)2_nCas9_
D147Y_E155V_I156F_



SGGS_NLS
L157N) + (P48T_I49V_




L84F_A106V_D108N_




H123Y_A142N_




D147Y_E155V_I156F_




L157N)


pNMG-553
pCMV_ecTadA-(SGGS)2-
(P48T_I49V_A142N) +



XTEN-(SGGS)2-
(P48T_I49V_L84F_



ecTadA-(SGGS)2-XTEN-
A106V_D108N_H123Y_



(SGGS)2_nCas9_
A142N_D147Y_



SGGS_NLS
E155V_I156F_L157N)


pNMG-554
pCMV_ecTadA-(SGGS)2-
(P48T_I49V_A142N) +



XTEN-(SGGS)2-
(L84F_A106V_D108N_



ecTadA-(SGGS)2-XTEN-
H123Y_D147Y_E155V_



(SGGS)2_nCas9_
I156F)



SGGS_NLS



pNMG-555
pCMV_ecTadA-24 a.a.
(wt) + (H36L_R51L_



linker-ecTadA-24 a.a.
L84F_A106V_D108N_



linker_nCas9_SGGS_NLS
H123Y_S146C_D147Y_




E155V_I156F_K157N)


pNMG-556
pCMV_ecTadA-24 a.a.
(wt) + (H36L_R51L_



linker-ecTadA-24 a.a.
L84F_A106V_D108N_



linker_nCas9_SGGS_NLS
H123Y_S146C_




D147Y_E155V_




I156F_K157N)


pNMG-557
pCMV_ecTadA-24 a.a.
(wt) + (H36L_R51L_



linker-ecTadA-24 a.a.
L84F_A106V_D108N_



linker_nCas9_SGGS_NLS
H123Y_S146C_




D147Y_E155V_




I156F_K157N)


pNMG-558
pCMV_ecTadA-24 a.a.
(wt) + (H36L_R51L_



linker-ecTadA-24 a.a.
L84F_A106V_D108N_



linker_nCas9_SGGS_NLS
H123Y_S146C_




D147Y_E155V_




I156F_K157N)


pNMG-559
pCMV_ecTadA-24 a.a.
(wt) + (H36L_R51L_



linker-ecTadA-24 a.a.
L84F_A106V_D108N_



linker_nCas9_SGGS_NLS
H123Y_S146C_




D147Y_E155V_




I156F_K157N)


pNMG-560
pCMV_ecTadA-24 a.a.
(wt) + (H36L_R51L_



linker-ecTadA-24 a.a.
L84F_A106V_D108N_



linker_nCas9_SGGS_NLS
H123Y_S146C_




D147Y_E155V_




I156F_K157N)


pNMG-561
pCMV_ecTadA-24 a.a.
(wt) + (H36L_R51L_



linker-ecTadA-24 a.a.
L84F_A106V_D108N_



linker_nCas9_SGGS_NLS
H123Y_S146C_




D147Y_E155V_




I156F_K157N)


pNMG-562
pCMV_ecTadA-24 a.a.
(wt) + (H36L_R51L_



linker-ecTadA-24 a.a.
L84F_A106V_D108N_



linker_nCas9_SGGS_NLS
H123Y_S146C_




D147Y_E155V_




I156F_K157N)


pNMG-563
pCMV_ecTadA-24 a.a.
wild-type



linker-ecTadA-24 a.a.




linker_nCas9_SGGS_NLS



pNMG-564
pCMV_ecTadA-24 a.a.
(H36L_R51L_L84F_



linker-ecTadA-24 a.a.
A106V_D108N_



linker_nCas9_SGGS_NLS
H123Y_S146C_




D147Y_E155V_




I156F_K157N)


pNMG-565
pCMV_ecTadA-(SGGS)2-
(wt) + (H36L_R51L_



XTEN-(SGGS)2-
L84F_A106V_D108N_



ecTadA-(SGGS)2-XTEN-
H123Y_S146C_



(SGGS)2_nCas9_XTEN_
D147Y_E155V_



MBD4_SGGS_NLS
I156F_K157N)


pNMG-566
pCMV_ecTadA-(SGGS)2-
(wt) + (H36L_R51L_



XTEN-(SGGS)2-
L84F_A106V_D108N_



ecTadA-(SGGS)2-XTEN-
H123Y_S146C_



(SGGS)2_nCas9_
D147Y_E155V_



XTEN_TDG_
I156F_K157N)



SGGS_NLS



pNMG-572
pCMV_ecTadA- 32 a.a.-_
(H36L_P48S_R51L_



nCas9_SGGS_NLS
L84F_A106V_D108N_




H123Y_S146C_D147Y_




E155V_I156F_K157N)


pNMG-573
pCMV_ecTadA- 32 a.a.-_
(H36L_P48S_R51L_



nCas9_SGGS_NLS
L84F_A106V_




D108N_H123Y_




S146C_A142N_D147Y_




E155V_I156F_




K157N)


pNMG-574
pCMV_ecTadA- 32 a.a.-_
(H36L_P48T_I49V_



nCas9_SGGS_NLS
R51L_L84F_A106V_




D108N_H123Y_S146C_




D147Y_E155V_I156F_




K157N)


pNMG-575
pCMV_ecTadA- 32 a.a.-_
(H36L_P48T_I49V_



nCas9_SGGS_NLS
R51L_L84F_A106V_




D108N_H123Y_A142N_




S146C_D147Y_E155V_




I156F_K157N)


pNMG-576
pCMV_ecTadA-(SGGS)
(wt) + (H36L_P48S_



2-XTEN-(SGGS)2-
R51L_L84F_A106V_



ecTadA-(SGGS)2-
D108N_H123Y_



XTEN-(SGGS)2_
S146C_D147Y_E155V_



nCas9_SGGS_NLS
I156F_K157N)


pNMG-577
pCMV_ecTadA-(SGGS)
(wt) + (H36L_P48A_



2-XTEN-(SGGS)2-
R51L_L84F_A106V_



ecTadA-(SGGS)2-
D108N_H123Y_



XTEN-(SGGS)2_
A142N_S146C_D147Y_



nCas9_SGGS_NLS
R152P_E155V_I156F_




K157N)


pNMG-578
pCMV_ecTadA-(SGGS)
(wt) + (H36L_P48T_



2-XTEN-(SGGS)2-
I49V_R51L_L84F_



ecTadA-(SGGS)2-
A106V_D108N_



XTEN-(SGGS)2_
H123Y_S146C_D147Y_



nCas9_SGGS_NLS
E155V_I156F_K157N)


pNMG-579
pCMV_ecTadA-(SGGS)
(wt) + (H36L_P48A_



2-XTEN-(SGGS)2-
R51L_L84F_A106V_



ecTadA-(SGGS)2-
D108N_H123Y_



XTEN-(SGGS)2_
A142N_S146C_D147Y_



nCas9_SGGS_NLS
R152P_E155V_




I156F_K157N)


pNMG-580
pCMV_ecTadA-(SGGS)
(H36L_P48S_R51L_



2-XTEN-(SGGS)2-
L84F_A106V_D108N_



ecTadA-(SGGS)2-
H123Y_S146C_D147Y_



XTEN-(SGGS)2_
E155V_I156F_K157N) +



nCas9_SGGS_NLS
(H36L_P48S_R51L_




L84F_A106V_D108N_




H123Y_S146C_D147Y_




E155V_I156F_K157N)


pNMG-581
pCMV_ecTadA- 32 a.a.-_
(H36L_P48A_R51L_



nCas9_SGGS_NLS
L84F_A106V_D108N_




H123Y_S146C_D147Y_




E155V_I156F_K157N)


pNMG-583
pCMV_ecTadA- 32 a.a.-_
(H36L_P48A_



nCas9_SGGS_NLS
R51L_L84F_




A106V_D108N_H123Y_




A142N_S146C_D147Y_




E155V_I156F_K157N)


pNMG-586
pCMV_ecTadA-(SGGS)
(wt) + (H36L_P48A_



2-XTEN-(SGGS)2-
R51L_L84F_A106V_



ecTadA-(SGGS)2-
D108N_H123Y_S146C_



XTEN-(SGGS)2_
D147Y_E155V_I156F_



nCas9_SGGS_NLS
K157N)


pNMG-588
pCMV_ecTadA-
(wt) + (H36L_P48A_



(SGGS)2-XTEN-
R51L_L84F_A106V_



(SGGS)2-ecTadA-(SGGS)2-
D108N_H123Y_



XTEN-(SGGS)2_nCas9_
A142N_S146C_D147Y_



SGGS_NLS
R152P_E155V_I156F_




K157N)


pNMG-603
pCMV_ecTadA- 32 a.a.-_
(W23L_H36L_P48A_



nCas9_SGGS_NLS
R51L_L84F_A106V_




D108N_H123Y_S146C_




D147Y_E155V_I156F_




K157N)


pNMG-604
pCMV_ecTadA- 32 a.a.-_
(W23R_H36L_P48A_



nCas9_SGGS_NLS
R51L_L84F_A106V_




D108N_H123Y_S146C_




D147Y_E155V_I156F_




K157N)


pNMG-605
pCMV_ecTadA- 32 a.a.-_
(W23L_H36L_P48A_



nCas9_SGGS_NLS
R51L_L84F_A106V_




D108N_H123Y_S146R_




D147Y_E155V_I156F_




K161T)


pNMG-606
pCMV_ecTadA- 32 a.a.-_
(H36L_P48A_R51L_



nCas9_SGGS_NLS
L84F_A106V_D108N_




H123Y_S146C_D147Y_




R152H_E155V_I156F_




K157N)


pNMG-607
pCMV_ecTadA- 32 a.a.-_
(H36L_P48A_R51L_



nCas9_SGGS_NLS
L84F_A106V_D108N_




H123Y_S146C_D147Y_




R152P_E155V_I156F_




K157N)


pNMG-608
pCMV_ecTadA- 32 a.a.-_
(W23L_H36L_P48A_



nCas9_SGGS_NLS
R51L_L84F_A106V_




D108N_H123Y_S146C_




D147Y_R152P_E155V_




I156F_K157N)


pNMG-609
pCMV_ecTadA- 32 a.a.-_
(W23L_H36L_P48A_



nCas9_SGGS_NLS
R51L_L84F_A106V_




D108N_H123Y_A142A_




S146C_D147Y_E155V_




I156F_K157N)


pNMG-610
pCMV_ecTadA- 32 a.a.-_
(W23L_H36L_P48A_



nCas9_SGGS_NLS
R51L_L84F_A106V_




D108N_H123Y_A142A_




S146C_D147Y_R152P_




E155V_I156F_K157N)


pNMG-611
pCMV_ecTadA-(SGGS)2-
(wt) + (W23L_



XTEN-(SGGS)2-
H36L_P48A_R51L_



ecTadA-(SGGS)2-
L84F_A106V_D108N_



XTEN-(SGGS)2_
H123Y_S146C_D147Y_



nCas9_SGGS_NLS
E155V_I156F_K157N)


pNMG-612
pCMV_ecTadA-(SGGS)2-
(wt) + (W23R_H36L_



XTEN-(SGGS)2-
P48A_R51L_L84F_



ecTadA-(SGGS)2-
A106V_D108N_H123Y_



XTEN-(SGGS)2_
S146C_D147Y_E155V_



nCas9_SGGS_NLS
I156F_K157N)


pNMG-613
pCMV_ecTadA-(SGGS)2-
(wt) + (W23L_H36L_



XTEN-(SGGS)2-
P48A_R51L_L84F_



ecTadA-(SGGS)2-
A106V_D108N_



XTEN-(SGGS)2_nCas9_
H123Y_S146R_D147Y_



SGGS_NLS
E155V_I156F_K161T)


pNMG-614
pCMV_ecTadA-(SGGS)2-
(wt) + (H36L_P48A_



XTEN-(SGGS)2-
R51L_L84F_A106V_



ecTadA-(SGGS)2-
D108N_H123Y_A142N_



XTEN-(SGGS)2_nCas9_
S146C_D147Y_R152P_



SGGS_NLS
E155V_I156F_K157N)


pNMG-615
pCMV_ecTadA-(SGGS)2-
(wt) + (H36L_P48A_



XTEN-(SGGS)2-
R51L_L84F_A106V_



ecTadA-(SGGS)2-
D108N_H123Y_A142N_



XTEN-(SGGS)2_nCas9_
S146C_D147Y_R152P_



SGGS_NLS
E155V_I156F_K157N)


pNMG-616
pCMV_ecTadA-(SGGS)2-
(wt) + (W23L_H36L_



XTEN-(SGGS)2-
P48A_R51L_L84F_



ecTadA-(SGGS)2-
A106V_D108N_H123Y_



XTEN-(SGGS)2_nCas9_
S146C_D147Y_R152P_



SGGS_NLS
E155V_I156F_K157N)


pNMG-617
pCMV_ecTadA-(SGGS)2-
(wt) + (W23L_H36L_



XTEN-(SGGS)2-
P48A_R51L_L84F_



ecTadA-(SGGS)2-
A106V_D108N_



XTEN-(SGGS)2_nCas9_
H123Y_S146C_D147Y_



SGGS_NLS
R152P_E155V_I156F_




K157N)


pNMG-618
pCMV_ecTadA-(SGGS)2-
(wt) + (W23L_H36L_



XTEN-(SGGS)2-
P48A_R51L_L84F_



ecTadA-(SGGS)2-
A106V_D108N_H123Y_



XTEN-(SGGS)2_nCas9_
S146C_D147Y_R152P_



SGGS_NLS
E155V_I156F_K157N)


pNMG-619
pCMV_ecTadA-
(W23R_H36L_P48A_



32 a.a.-_nCas9_
R51L_L84F_A106V_



SGGS_NLS_K157N)
D108N_H123Y_S146C_




D147Y_R152P_




E155V_I156F


pNMG-620
pCMV_ecTadA-(SGGS)2-
(wt) + (W23R_H36L_



XTEN-(SGGS)2-
P48A_R51L_L84F_



ecTadA-(SGGS)2-
A106V_D108N_H123Y_



XTEN-(SGGS)2_nCas9_
S146C_D147Y_R152P_



SGGS_NLS
E155V_I156F_K157N)


pNMG-621
pCMV_ecTadA- 32 a.a.
(wt) + (H36L_P48A_



linker-ecTadA- 24 a.a.
R51L_L84F_A106V_



linker_nCas9_SGGS_NLS
D108N_H123Y_A142N_




S146C_D147Y_R152P_




E155V_I156F_K157N)


pNMG-622
pCMV_ecTadA- 32 a.a.
(wt) + (H36L_P48A_



linker-ecTadA- 24 a.a.
R51L_L84F_A106V_



linker_nCas9_SGGS_NLS
D108N_H123Y_A142N_




S146C_D147Y_R152P_




E155V_I156F_K157N)


pNMG-623
pCMV_ecTadA- 32 a.a.
(wt) +



linker-ecTadA- 24 a.a.
(W23L_H36L_P48A_



linker_nCas9_SGGS_NLS
R51L_L84F_A106V_




D108N_H123Y_S146C_




D147Y_R152P_E155V_




I156F_K157N)


pNMG-624
pCMV_ecTadA- 32 a.a.
(wt) + (W23R_



linker-ecTadA- 24 a.a.
H36L_P48A_R51L_



linker_nCas9_SGGS_NLS
L84F_A106V_D108N_




H123Y_S146C_




D147Y_R152P_




E155V_I156F_




K157N)









In some embodiments, the adenosine deaminase comprises one or more of a W23X, H36X, N37X, P48X, I49X, R51X, N72X, L84X, S97X, A106X, D108X, H123X, G125X, A142X, S146X, D147X, R152X, E155X, I156X, K157X, and/or K161X mutation in SEQ ID NO: 314, or one or more corresponding mutations in another adenosine deaminase, where the presence of X indicates any amino acid other than the corresponding amino acid in the wild-type adenosine deaminase. In some embodiments, the adenosine deaminase comprises one or more of W23L, W23R, H36L, P48S, P48A, R51L, L84F, A106V, D108N, H123Y, A142N, S146C, D147Y, R152P, E155V, I156F, and/or K157N mutation in SEQ ID NO: 314, or one or more corresponding mutations in another adenosine deaminase.


In some embodiments, the adenosine deaminase comprises or consists of one, two, three, four, five, six, seven, eight, nine, ten, eleven, or twelve mutations selected from H36X, P48X, R51X, L84X, A106X, D108X, H123X, S146X, D147X, E155X, I156X, and K157X in SEQ ID NO: 314, or a corresponding mutation or mutations in another adenosine deaminase, where X indicates the presence of any amino acid other than the corresponding amino acid in the wild-type adenosine deaminase. In some embodiments, the adenosine deaminase comprises or consists of one, two, three, four, five, six, seven, eight, nine, ten, eleven, or twelve mutations selected from H36L, P48S, R51L, L84F, A106V, D108N, H123Y, S146C, D147Y, E155V, I156F, and K157N in SEQ ID NO: 314, or a corresponding mutation or mutations in another adenosine deaminase. In some embodiments, the adenosine deaminase comprises or consists of a H36L, P48S, R51L, L84F, A106V, D108N, H123Y, S146C, D147Y, E155V, I156F, and K157N mutation in SEQ ID NO: 314, or corresponding mutations in another adenosine deaminase.


In some embodiments, the adenosine deaminase comprises or consists of one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, or thirteen mutations selected from H36X, P48X, R51X, L84X, A106X, D108X, H123X, A142X, S146X, D147X, E155X, I156X, and K157X in SEQ ID NO: 314, or a corresponding mutation or mutations in another adenosine deaminase, where X indicates the presence of any amino acid other than the corresponding amino acid in the wild-type adenosine deaminase. In some embodiments, the adenosine deaminase comprises or consists of one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, or thirteen mutations selected from H36L, P48S, R51L, L84F, A106V, D108N, H123Y, A142N, S146C, D147Y, E155V, I156F, and K157N in SEQ ID NO: 314, or a corresponding mutation or mutations in another adenosine deaminase. In some embodiments, the adenosine deaminase comprises or consists of a H36L, P48S, R51L, L84F, A106V, D108N, H123Y, A142N, S146C, D147Y, E155V, I156F, and K157N mutation in SEQ ID NO: 314, or corresponding mutations in another adenosine deaminase.


In some embodiments, the adenosine deaminase comprises or consists of one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, or fourteen mutations selected from W23X, H36X, P48X, R51X, L84X, A106X, D108X, H123X, A142X, S146X, D147X, E155X, I156X, and K157X in SEQ ID NO: 314, or a corresponding mutation or mutations in another adenosine deaminase, where X indicates the presence of any amino acid other than the corresponding amino acid in the wild-type adenosine deaminase. In some embodiments, the adenosine deaminase comprises or consists of one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, or fourteen mutations selected from W23L, H36L, P48A, R51L, L84F, A106V, D108N, H123Y, A142N, S146C, D147Y, E155V, I156F, and K157N in SEQ ID NO: 314, or a corresponding mutation or mutations in another adenosine deaminase. In some embodiments, the adenosine deaminase comprises or consists of a W23L, H36L, P48A, R51L, L84F, A106V, D108N, H123Y, A142N, S146C, D147Y, E155V, I156F, and K157N mutation in SEQ ID NO: 314, or corresponding mutations in another adenosine deaminase.


In some embodiments, the adenosine deaminase comprises or consists of one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, or fifteen mutations selected from W23X, H36X, P48X, R51X, L84X, A106X, D108X, H123X, A142X, S146X, D147X, R152X, E155X, I156X, and K157X in SEQ ID NO: 314, or a corresponding mutation or mutations in another adenosine deaminase, where X indicates the presence of any amino acid other than the corresponding amino acid in the wild-type adenosine deaminase. In some embodiments, the adenosine deaminase comprises or consists of one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, or fifteen mutations selected from W23L, H36L, P48A, R51L, L84F, A106V, D108N, H123Y, A142N, S146C, D147Y, R152P, E155V, I156F, and K157N in SEQ ID NO: 314, or a corresponding mutation or mutations in another adenosine deaminase. In some embodiments, the adenosine deaminase comprises or consists of a W23L, H36L, P48A, R51L, L84F, A106V, D108N, H123Y, A142N, S146C, D147Y, R152P, E155V, I156F, and K157N mutation in SEQ ID NO: 314, or corresponding mutations in another adenosine deaminase.


In some embodiments, the adenosine deaminase comprises or consists of one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, or fourteen mutations selected from W23X, H36X, P48X, R51X, L84X, A106X, D108X, H123X, S146X, D147X, R152X, E155X, I156X, and K157X in SEQ ID NO: 314, or a corresponding mutation or mutations in another adenosine deaminase, where X indicates the presence of any amino acid other than the corresponding amino acid in the wild-type adenosine deaminase. In some embodiments, the adenosine deaminase comprises or consists of one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, or fourteen mutations selected from W23R, H36L, P48A, R51L, L84F, A106V, D108N, H123Y, S146C, D147Y, R152P, E155V, I156F, and K157N in SEQ ID NO: 314, or a corresponding mutation or mutations in another adenosine deaminase. In some embodiments, the adenosine deaminase comprises or consists of a W23R, H36L, P48A, R51L, L84F, A106V, D108N, H123Y, S146C, D147Y, R152P, E155V, I156F, and K157N mutation in SEQ ID NO: 314, or corresponding mutations in another adenosine deaminase.


Nucleobase Editors


In some aspects, split nucleobase editors may be used in the present disclosure. Some aspects of the present disclosure relate to compositions comprising (i) a first nucleotide sequence encoding an N-terminal portion of a nucleobase editor fused at its C-terminus to an intein-N; and (ii) a second nucleotide sequence encoding an intein-C fused to the N-terminus of a C-terminal portion of the nucleobase editor.


Nucleobase editor variants are contemplated. For example, a nucleobase editor variant may also be “split” as described herein. The split nucleobase editors may comprise an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of the nucleobase editor sequences (SEQ ID NOs: 303-313, 362, 364, 365, 369-372, 399-406, 482, 489-490, 515-518, 550-552, and NOs: 323-342, 379-383, 385-388, 458-478, 480, 483, and 553) provided herein.


In some embodiments, the N-terminal portion of a split nucleobase editor comprises an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the corresponding N-terminal portion of any one of the nucleobase editors provided herein (e.g., a nucleobase editor comprising an N-terminal amino acid sequence of any one of SEQ ID NOs: 303-313, 362, 364, 365, 369-372, 399-406, 482, 489-490, 515-518, 550-552, and SEQ ID NOs: 323-342, 379-383, 385-388, 458-478, 480, 483, and 553). In some embodiments, the N-terminal portion of the split nucleobase editor comprises an amino acid sequence that is shorter or longer in length (e.g., by no more than 30%, no more than 25%, no more than 20%, no more than 15%, no more than 10%, no more than 5%, no more than 1% longer or shorter) than the corresponding portion of any of the nucleobase editors provided herein. In some embodiments, the N-terminal portion of the split nucleobase editor comprises an amino acid sequence that is shorter or longer in length (e.g., by no more than 200 amino acids, no more than 150 amino acids, no more than 100 amino acids, no more than 50 amino acids, no more than 10 amino acids, no more than 5 amino acids, or no more than 2 amino acids longer or shorter) than the corresponding portion of any of the nucleobase editors provided herein.


In some embodiments, the C-terminal portion of a split nucleobase editor comprises an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the corresponding C-terminal portion of any one of the nucleobase editors provided herein (e.g., a nucleobase editor comprising a C-terminal amino acid sequence of any one of SEQ ID NOs: 303-313, 362, 364, 365, 369-372, 399-406, 482, 489-490, 515-518, 550-552, or SEQ ID NOs: 323-342, 379-383, 385-388, 458-478, 480, 483, and 553). In some embodiments, the C-terminal portion of the split nucleobase editor comprises an amino acid sequence that is shorter or longer in length (e.g., by no more than 30%, no more than 25%, no more than 20%, no more than 15%, no more than 10%, no more than 5%, no more than 1% longer or shorter) than the corresponding portion of any of the nucleobase editors provided herein. In some embodiments, the C-terminal portion of the split nucleobase editor comprises an amino acid sequence that is shorter or longer in length (e.g., by no more than 200 amino acids, no more than 150 amino acids, no more than 100 amino acids, no more than 50 amino acids, no more than 10 amino acids, no more than 5 amino acids, or no more than 2 amino acids longer or shorter) than the corresponding portion of any of the nucleobase editors provided herein.


Exemplary adenine and cytidine nucleobase editors are described in Rees & Liu, Base editing: precision chemistry on the genome and transcriptome of living cells, Nat. Rev. Genet. 2018; 19(12):770-788; as well as U.S. Patent Publication No. 2018/0073012, published Mar. 15, 2018, which issued as U.S. Pat. No. 10,113,163, on Oct. 30, 2018; U.S. Patent Publication No. 2017/0121693, published May 4, 2017, which issued as U.S. Pat. No. 10,167,457 on Jan. 1, 2019; PCT Publication No. WO 2017/070633, published Apr. 27, 2017; U.S. Patent Publication No. 2015/0166980, published Jun. 18, 2015; U.S. Pat. No. 9,840,699, issued Dec. 12, 2017; and U.S. Pat. No. 10,077,453, issued Sep. 18, 2018, the contents of each of which are incorporated herein by reference in their entireties.


Non-limiting, exemplary types of nucleobase editors (including C to T, A to G, and C to G nucleobase editors) and their respective sequences are provided below. In some embodiments, the nucleobase editor is a variant of the nucleobase editors described herein. For example, in some embodiments, the nucleobase editor is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75% at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a nucleobase editor described herein (exemplary sequences are provided below). In some embodiments, the nucleobase editor comprises an amino acid sequence that is shorter or longer in length (e.g., by no more than 30%, no more than 25%, no more than 20%, no more than 15%, no more than 10%, no more than 5%, no more than 1% longer or shorter) than any of the nucleobase editors provided herein. In some embodiments, the nucleobase editor comprises an amino acid sequence that is shorter or longer in length (e.g., by no more than 500 amino acids, no more than 450 amino acids, no more than 400 amino acids, no more than 350 amino acids, no more than 300 amino acids, no more than 250 amino acids, no more than 200 amino acids, no more than 200 amino acids, no more than 150 amino acids, no more than 100 amino acids, no more than 50 amino acids, no more than 10 amino acids, no more than 5 amino acids longer or shorter) than any of the nucleobase editors provided herein.


Cytidine Nucleobase Editors

In some aspects, the methods of the present disclosure provides cytidine nucleobase editors (CBEs) comprising a napDNAbp domain and a cytosine deaminase domain that enzymatically deaminates a cytosine nucleobase of a C:G nucleobase pair to a uracil. The uracil may be subsequently converted to a thymine (T) by the cell's DNA repair and replication machinery. The mismatched guanine (G) on the opposite strand may subsequently be converted to an adenine (A) by the cell's DNA repair and replication machinery. In this manner, a target C:G nucleobase pair is ultimately converted to a T:A nucleobase pair.


In some aspects, the base editing methods of the disclosure comprise the use of a cytidine nucleobase editor. Exemplary cytidine nucleobase editors include, but are not limited to, BE3, BE3.9max, BE4max, BE4-SaKKH, BE3.9-NG, BE3.9-NRRH, or BE4max-VRQR. In certain embodiments, the cytidine nucleobase editor used in the disclosed methods is a BE4max, BE4-SaKKH, BE4max-VQR, or BE4max-VRQR. Other CBEs may be used to deaminate a C nucleobase in accordance with the disclosed methods.


In some aspects, the disclosure provides complexes of nucleobase editors and guide RNAs that comprise a CBE. Exemplary cytidine nucleobase editors of the disclosed complexes include, but are not limited to, BE3, BE3.9max, BE4max, BE4-SaKKH, BE3.9-NG, BE3.9-NRRH, BE4max-VQR, or BE4max-VRQR. In certain embodiments, the cytidine nucleobase editor used in the disclosed complexes is a BE4max, BE4-SaKKH, BE4max-VQR, or BE4max-VRQR. Other CBEs may be used to deaminate a C nucleobase in accordance with the disclosed complexes.


Exemplary complexes of CBEs may provide an off-target editing frequency of less than 2.0% after being contacted with a nucleic acid molecule comprising a target sequence, e.g., a target nucleobase pair. Further exemplary CBE complexes provide an off-target editing frequency of less than 1.5% after being contacted with a nucleic acid molecule comprising a target sequence comprising a target nucleobase pair. Further exemplary CBE complexes may provide an off-target editing frequency of less than 1.25%, less than 1.1%, less than 1%, less than 0.75%, less than 0.5%, less than 0.4%, less than 0.25%, less than 0.2%, less than 0.15%, less than 0.1%, less than 0.05%, or less than 0.025%, after being contacted with a nucleic acid molecule comprising a target sequence.


For instance, the cytidine nucleobase editors YE1-BE4, YE1-CP1028, YE1-SpCas9-NG (also referred to herein as YE1-NG), R33A-BE4, and R33A+K34A-BE4-CP1028, which are described below, may exhibit off-target editing frequencies of less than 0.75% (e.g., about 0.4% or less) while maintaining on-target editing efficiencies of about 60% or more, in target sequences in mammalian cells. Each of these nucleobase editors comprises modified cytosine deaminases (e.g., YE1, R33A, or R33A+K34A) and may further comprise a Cas9 domain with an expanded PAM window (e.g., SpCas9-NG or circularly permuted Cas9 domains, e.g., CP1028). These five nucleobase editors may be the most preferred for applications in which off-target editing, and in particular Cas9-independent off-target editing, must be minimized. In particular, nucleobase editors comprising a YE1 deaminase domain provide efficient on-target editing with greatly decreased Cas9-independent editing, as confirmed by whole-genome sequencing.


Exemplary CBEs may further possess an on-target editing efficiency of more than 50% after being contacted with a nucleic acid molecule comprising a target sequence. Further exemplary CBEs possess an on-target editing efficiency of more than 60% after being contacted with a nucleic acid molecule comprising a target sequence. Further exemplary CBEs possess an on-target editing efficiency of more than 65%, more than 70%, more than 75%, more than 80%, more than 82.5%, or more than 85% after being contacted with a nucleic acid molecule comprising a target sequence. The disclosed CBEs may exhibit indel frequencies of less than 0.75%, less than 0.6%, less than 0.5%, less than 0.4%, less than 0.3%, or less than 0.2% after being contacted with a nucleic acid molecule containing a target sequence.


The disclosed CBEs may further comprise one or more nuclear localization signals (NLSs) and/or two or more uracil glycosylase inhibitor (UGI) domains. Thus, the nucleobase editors may comprise the structure: NH2-[first nuclear localization sequence]-[cytosine deaminase domain]-[napDNAbp domain]-[first UGI domain]-[second UGI domain]-[second nuclear localization sequence]-COOH, wherein each instance of “]-[” indicates the presence of an optional linker sequence. Exemplary CBEs may have a structure that comprises the “BE4max” architecture, with an NH2-[NLS]-[cytosine deaminase]-[Cas9 nickase]-[UGI domain]-[UGI domain]-[NLS]-COOH structure, having optimized nuclear localization signals and wherein the napDNAbp domain comprises a Cas9 nickase. This BE4max structure was reported to have optimized codon usage for expression in human cells, as reported in Koblan et al., Nat Biotechnol. 2018; 36(9):843-846, herein incorporated by reference.


In other embodiments, exemplary CBEs may have a structure that comprises a modified BE4max architecture that contains a napDNAbp domain comprising a Cas9 variant other than Cas9 nickase, such as SpCas9-NG, xCas9, or circular permutant CP1028. Accordingly, exemplary CBEs may comprise the structure: NH2-[NLS]-[cytosine deaminase]-[xCas9]-[UGI domain]-[UGI domain]-[NLS]-COOH; or NH2-[NLS]-[cytosine deaminase]-[SpCas9-NG]-[UGI domain]-[UGI domain]-[NLS]-COOH, wherein each instance of “]-[” indicates the presence of an optional linker sequence.


The disclosed CBEs may comprise modified (or evolved) cytosine deaminase domains, such as deaminase domains that recognize an expanded PAM sequence, have improved efficiency of deaminating 5′-GC targets, and/or make edits in a narrower target window, In some embodiments, the disclosed cytidine nucleobase editors comprise evolved nucleic acid programmable DNA binding proteins (napDNAbp), such as an evolved Cas9.


Exemplary cytidine nucleobase editors comprise amino acid sequences that are at least least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99.5% identical to any one of the amino acid sequences SEQ ID NOs: 362, 365, 370-372, 399, 482, 489, 490, and 515-518. In particular embodiments, the disclosed cytidine nucleobase editors comprise an amino acid sequence that is at least 90% identical to any one of SEQ ID NOs: 365, 372, 399, 482, and 490. In particular embodiments, the disclosed cytidine nucleobase editors comprise the amino acid sequence of any one of SEQ ID NOs: 365, 372, 399, 482, and 490.


Where indicated, “BE4-” and “—BE4” refer to the BE4max architecture, or NH2-[first nuclear localization sequence]-[cytosine deaminase domain]-[32aa linker]-[SpCas9 nickase (nCas9, or nSpCas9) domain]-[9aa linker]-[first UGI domain]-[9aa-linker]-[second UGI domain]-[second nuclear localization sequence]-COOH. Where indicated, “BE4max, modified with SpCas9-NG” and “—SpCas9-NG” refer to a modified BE4max architecture in which the SpCas9 nickase domain has been replaced with an SpCas9-NG, i.e., NH2-[first nuclear localization sequence]-[cytosine deaminase domain]-[32aa linker]-[SpCas9-NG]-[9aa linker]-[first UGI domain]-[9aa-linker]-[second UGI domain]-[second nuclear localization sequence]-COOH.


As discussed above, preferred nucleobase editors comprise modified cytosine deaminases (e.g., YE1, R33A, or R33A+K34A) and may further comprise a modified napDNAbp domain such as a Cas9 domain with an expanded PAM window (e.g., SpCas9-NG). For the purposes of clarity, the cytosine deaminase domain in some of the following amino acid sequences may be indicated in Bold, and the napDNAbp domains may be indicated in underline.


Non-limiting examples of C to T nucleobase editors are provided below, as SEQ ID NOs: 303-313, 362, 364, 365, 367, 369-372, 399-406, 482, 489-490, 515-518, and 550-552.










His6-rAPOBEC1-XTEN-dCas9 for Escherichiacoli expression



(SEQ ID NO: 303)



MGSSHHHHHHMSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQ






NTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHAD





PRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLP





PCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNS





VGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRIC





YLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDST





DKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS





ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLL





AQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPE





KYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPH





QIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE





VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQK





KAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLK11KDKDFLDNEEN





EDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKT





ILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVV





DELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNE





KLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEE





VVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSR





MNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK





LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETG





EIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFD





SPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDL11KLPKYS





LFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYL





DE11EQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAEN11HLFTLTNLGAPAAFKYFDTTIDRKR





YTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





rAPOBEC1-XTEN-dCas9-NLS for mammalian expression


(SEQ ID NO: 304)



MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFI






EKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLI





SSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCHLGLPPCLNILRRKQPQ





LTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYK





VPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAK





VDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLAL





AHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENL





IAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFL





AAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNG





YAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRR





QEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFI





ERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNR





KVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLK11KDKDFLDNEENEDILEDIVLTLTL





FEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFAN





RNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHK





PENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRD





MYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ





LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIR





EVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVY





DVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATV





RKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAK





VEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDL11KLPKYSLFELENGRKRMLA





SAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDE11EQISEFSKRVIL





ADANLDKVLSAYNKHRDKPIREQAEN11HLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIH





QSITGLYETRIDLSQLGGDSGGSPKKKRKV





hAPOBEC1-XTEN-dCas9-NLS for Mammalian expression


(SEQ ID NO: 305)



MTSEKGPSTGDPTLRRRIEPWEFDVFYDPRELRKEACLLYEIKWGMSRKIWRSSGKNTTNHVEVN






FIKKFTSERDFHPSMSCSITWFLSWSPCWECSQAIREFLSRHPGVTLVIYVARLFWHMDQQNRQGL





RDLVNSGVTIQIMRASEYYHCWRNFVNYPPGDEAHWPQYPPLWMMLYALELHCIILSLPPCLKISR





RWQNHLTFFRLHLQNCHYQTIPPHILLATGLIHPSVAWRSGSETPGTSESATPESDKKYSIGLAIGTN





SVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRIC





YLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDST





DKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILS





ARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLL





AQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPE





KYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPH





QIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE





VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQK





KAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEEN





EDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKT





ILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVV





DELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNE





KLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEE





VVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSR





MNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK





LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETG





EIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFD





SPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYS





LFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYL





DEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRK





RYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





rAPOBEC1-XTEN-dCas9-UGI-NLS


(SEQ ID NO: 306)



MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFI






EKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLI





SSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCHLGLPPCLNILRRKQPQ





LTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYK





VPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAK





VDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLAL





AHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENL





IAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFL





AAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNG





YAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRR





QEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFI





ERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNR





KVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLK11KDKDFLDNEENEDILEDIVLTLTL





FEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFAN





RNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHK





PENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRD





MYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ





LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIR





EVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVY





DVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATV





RKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAK





VEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDL11KLPKYSLFELENGRKRMLA





SAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDE11EQISEFSKRVIL





ADANLDKVLSAYNKHRDKPIREQAEN11HLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIH





QSITGLYETRIDLSQLGGDSGGSTNLSDHEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAY





DESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV





rAPOBEC1-XTEN-SpCas9 nickase-UGI-NLS (BE3)


(SEQ ID NO: 307)



MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFI






EKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLI





SSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCHLGLPPCLNILRRKQPQ





LTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYK





VPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAK





VDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLAL





AHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENL





IAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFL





AAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNG





YAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRR





QEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFI





ERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNR





KVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLK11KDKDFLDNEENEDILEDIVLT1TL





FEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFAN





RNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHK





PENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRD





MYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ





LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIR





EVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVY





DVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATV





RKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAK





VEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDL11KLPKYSLFELENGRKRMLA





SAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDE11EQISEFSKRVIL





ADANLDKVLSAYNKHRDKPIREQAEN11HLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIH





QSITGLYETRIDLSQLGGDSGGSTNLSDHEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAY





DESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV





pmCDA1-XTEN-dCas9-UGI (bacteria)


(SEQ ID NO: 308)



MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGI






HAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEK





NARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSELSIMI





QVKILHTTKSPAVSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTD





RHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL





VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEG





DLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFG





NLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI





LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEE





FYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNRE





KIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNE





KVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYF





KKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLK





TYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSL





TFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN





QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINR





LSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF





DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLV





SDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQ





EIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNI





VKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKS





VKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNEL





ALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSA





YNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRID





LSQLGGDSGGSMTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVM





LLTSDAPEYKPWALVIQDSNGENKIKML





pmCDA1-XTEN-nCas9-UGI-NLS (mammalian construct)


(SEQ ID NO: 309)



MTDAEYVRIHEKLDIYTFKKQFFNNKKSVSHRCYVLFELKRRGERRACFWGYAVNKPQSGTERGI






HAEIFSIRKVEEYLRDNPGQFTINWYSSWSPCADCAEKILEWYNQELRGNGHTLKIWACKLYYEK





NARNQIGLWNLRDNGVGLNVMVSEHYQCCRKIFIQSSHNQLNENRWLEKTLKRAEKRRSELSIMI





QVKILHTTKSPAVSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTD





RHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL





VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEG





DLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFG





NLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI





LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEE





FYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNRE





KIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNE





KVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYF





KKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLK





TYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSL





TFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN





QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINR





LSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF





DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLV





SDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQ





EIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNI





VKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKS





VKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNEL





ALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSA





YNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRID





LSQLGGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLL





TSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV





huAPOBEC3G-XTEN-dCas9-UGI (bacteria)


(SEQ ID NO: 310)



MDPPTFTFNFNNEPWVRGRHETYLCYEVERMHNDTWVLLNQRRGFLCNQAPHKHGFLEGRHAE






LCFLDVIPFWKLDLDQDYRVTCFTSWSPCFSCAQEMAKFISKNKHVSLCIFTARIYDDQGRCQEGL





RTLAEAGAKISIMTYSEFKHCWDTFVDHQGCPFQPWDGLDEHSQDLSGRLRAILQSGSETPGTSES





ATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEAT





RLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAY





HEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQ





LFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAED





AKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDE





HHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKL





NREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSR





FAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTK





VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASL





GTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYT





GWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEH





IANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIK





ELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDN





KVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIK





RQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHA





HDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTE





ITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNS





DKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFL





EAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKG





SPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTL





TNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSMTNLSDIIEKE





TGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGE





NKIKML





huAPOBEC3G-XTEN-nCas9-UGI-NLS (mammalian construct)


(SEQ ID NO: 311)



MDPPTFTFNFNNEPWVRGRHETYLCYEVERMHNDTWVLLNQRRGFLCNQAPHKHGFLEGRHAE






LCFLDVIPFWKLDLDQDYRVTCFTSWSPCFSCAQEMAKFISKNKHVSLCIFTARIYDDQGRCQEGL





RTLAEAGAKISIMTYSEFKHCWDTFVDHQGCPFQPWDGLDEHSQDLSGRLRAILQSGSETPGTSES





ATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEAT





RLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAY





HEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQ





LFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAED





AKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDE





HHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKL





NREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSR





FAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTK





VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASL





GTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYT





GWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEH





IANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIK





ELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDN





KVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIK





RQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHA





HDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTE





ITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNS





DKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFL





EAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKG





SPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTL





TNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKET





GKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGE





NKIKMLSGGSPKKKRKV





huAPOBEC3G (D316R_D317R)-XTEN-nCas9-UGI-NLS (mammalian construct)


(SEQ ID NO: 312)



MDPPTFTFNFNNEPWVRGRHETYLCYEVERMHNDTWVLLNQRRGFLCNQAPHKHGFLEGRHAE






LCFLDVIPFWKLDLDQDYRVTCFTSWSPCFSCAQEMAKFISKNKHVSLCIFTARIYRRQGRCQEGL





RTLAEAGAKISIMTYSEFKHCWDTFVDHQGCPFQPWDGLDEHSQDLSGRLRAILQSGSETPGTSES





ATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEAT





RLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAY





HEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQ





LFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAED





AKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDE





HHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKL





NREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSR





FAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTK





VKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASL





GTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYT





GWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEH





IANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIK





ELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDN





KVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIK





RQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHA





HDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTE





ITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNS





DKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFL





EAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKG





SPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTL





TNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKET





GKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGE





NKIKMLSGGSPKKKRKV





High fidelity nucleobase editor


(SEQ ID NO: 313)



MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFI






EKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLI





SSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQP





QLTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEY





KVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMA





KVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLA





LAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLEN





LIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLF





LAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKN





GYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILR





RQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQS





FIERMTAFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKT





NRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLT





LTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGALSRKLINGIRDKQSGKTILDFLKSDG





FANRNFMALIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMG





RHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQN





GRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNY





WRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRAITKHVAQILDSRMNTKYDEND





KLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGD





YKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGR





DFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSV





LVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGR





KRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEF





SKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVL





DATLIHQSITGLYETRIDLSQLGGD





rAPOBEC1-XTEN-SaCas9n-UGI-NLS) (SaBE3 and SaBE3.9max)


(SEQ ID NO: 399)



MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFI






EKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLI





SSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQP





QLTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESKRNYILGLDIGITSVGYGIIDYETR






DVIDAGVRLFKEANVENNEGRRSKRGARRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEA







RVKGLSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLE







RLKKDGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYEGPGEGSPFG







WKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLNNLVITRDENEKLEYYEKFQIIEN







VFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKI







LTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKL







VPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQKMIN







EMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIP







RSVSFDNSFNNKVLVKQEEASKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLL







EERDINRFSVQKDFINRNLVDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKK







ERNKGYKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPH







QIKHIKDFKDYKYSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKS







PEKLLMYHHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKL







NAHLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAK







KLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPPRIIKTI







ASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVI






GNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV





rAPOBEC1-XTEN-SaCas9n-UGI-NLS


(SEQ ID NO: 400)



MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFI






EKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLI





SSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQP





QLTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESKRNYILGLDIGITSVGYGIIDYETR






DVIDAGVRLFKEANVENNEGRRSKRGARRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEA







RVKGLSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLE







RLKKDGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYEGPGEGSPFG







WKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLNNLVITRDENEKLEYYEKFQIIEN







VFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKI







LTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKL







VPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQKMIN







EMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIP







RSVSFDNSFNNKVLVKQEEASKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLL







EERDINRFSVQKDFINRNLVDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKK







ERNKGYKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPH







QIKHIKDFKDYKYSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKS







PEKLLMYHHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKL







NAHLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAK







KLKKISNQAEFIASFYKNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPPHIIKTI







ASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVI






GNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV





Nucleobase Editor 4-SSB


(SEQ ID NO: 401)



MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFI






EKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLI





SSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQP





QLTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEY





KVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMA





KVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLA





LAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLEN





LIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLF





LAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKN





GYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILR





RQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQS





FIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKT





NRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLT





LTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGF





ANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR





HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNG





RDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYW





RQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDK





LIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDY





KVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDF





ATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLV





VAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKR





MLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSK





RVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLD





ATLIHQSITGLYETRIDLSQLGGDSGGSGGSGGSASRGVNKVILVGNLGQDPEVRYMPNGGAVANI





TLATSESWRDKATGEMKEQTEWHRVVLFGKLAEVASEYLRKGSQVYIEGQLRTRKWTDQSGQD





RYTTEVVVNVGGTMQMLGGRQGGGAPAGGNIGGGQPQGGWGQPQQPQGGNQFSGGAQSRPQQ





SAPAAPSNEPPMDFDDDIPFSGGSPKKKRKV





Nucleobase Editor 4-(GGS)3


(SEQ ID NO: 402)



MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFI






EKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLI





SSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQP





QLTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEY





KVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMA





KVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLA





LAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLEN





LIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLF





LAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKN





GYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILR





RQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQS





FIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKT





NRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLT





LTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGF





ANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR





HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNG





RDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYW





RQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDK





LIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDY





KVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDF





ATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLV





VAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKR





MLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSK





RVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLD





ATLIHQSITGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNK





PESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV





Nucleobase Editor 4-XTEN


(SEQ ID NO: 403)



MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFI






EKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLI





SSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQP





QLTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEY





KVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMA





KVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLA





LAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLEN





LIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLF





LAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKN





GYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILR





RQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQS





FIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKT





NRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLT





LTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGF





ANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR





HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNG





RDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYW





RQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDK





LIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDY





KVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDF





ATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLV





VAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKR





MLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSK





RVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLD





ATLIHQSITGLYETRIDLSQLGGDSGSETPGTSESATPESTNLSDIIEKETGKQLVIQESILMLPEEVEE





VIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV





Nucleobase Editor 4-32 aa linker


(SEQ ID NO: 404)



MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFI






EKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLI





SSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQP





QLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGL





AIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRR





KNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKL





VDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA





KAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDL





DNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQ





QLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDN





GSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITP





WNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAF





LSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDF





LDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRD





KQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQ





TVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENT





QLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSD





NVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVA





QILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTAL





IKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIE





TNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPK





KYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLII





KLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQ





HKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDT





TIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILMLPE





EVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKR





KV





Nucleobase Editor 4-2X UGI


(SEQ ID NO: 405)



MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFI






EKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLI





SSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQP





QLTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEY





KVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMA





KVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLA





LAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLEN





LIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLF





LAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKN





GYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILR





RQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQS





FIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKT





NRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLT





LTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGF





ANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR





HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNG





RDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYW





RQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDK





LIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDY





KVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDF





ATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLV





VAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKR





MLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSK





RVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLD





ATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILV





HTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSTNLSDIIEKETGKQLVIQESIL





MLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSP





KKKRKV





Nucleobase Editor 4 (BE4)


(SEQ ID NO: 406)



MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNKHVEVNFI






EKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLI





SSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQP





QLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGL





AIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRR





KNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKL





VDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA





KAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDL





DNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQ





QLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDN





GSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITP





WNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAF





LSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDF





LDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRD





KQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQ





TVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENT





QLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSD





NVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVA





QILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTAL





IKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIE





TNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPK





KYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLII





KLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQ





HKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDT





TIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGKQLVIQE





SILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSG





GSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSD





APEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV





BE4max (also AncBE4max)


(SEQ ID NO: 482)



MKRTADGSEFESPKKKRKVSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRH






SIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYI





ARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLE





LYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESAT





PESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGE





TAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVD





EVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQ





TYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFD





LAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIK





RYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEEL





LVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLAR





GNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYN





ELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFN





ASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRR





RYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSL





HEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIE





EGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDD





SIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKA





GFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINN





YHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMN





FFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESIL





PKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEK





NPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHY





EKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENI





IHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSGGSG





GSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKP





WALVIQDSNGENKIKMLSGGSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDIL





VHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSKRTADGSEFEPKKKRKV





AID-BE4max


(SEQ ID NO: 489)



MDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLR






YISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRR





LHRAGVQIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRDAF





RTLGLSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSF





FHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKF





RGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPG





EKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNL





SDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYI





DGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFY





PFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTN





FDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVK





QLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDRE





MIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQ





LIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ





ELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAK





LITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVI





TLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRK





MIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVL





SMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKG





KSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGE





LQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADA





NLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSI





TGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILV





HTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSGGSGGSTNLSDIIEKETGKQL





VIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKM





LSGGSPKKKRKV





AID-VRQR-BE4max


(SEQ ID NO: 490)



MDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLR






YISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRR





LHRAGVQIAIMTFKDYFYCWNTFVENHERTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRDAF





RTLGLSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSF





FHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKF





RGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPG





EKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNL





SDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYI





DGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFY





PFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTN





FDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVK





QLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDRE





MIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQ





LIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ





ELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAK





LITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVI





TLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRK





MIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVL





SMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKG





KSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARE





LQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADA





NLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKQYRSTKEVLDATLIHQSI





TGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILV





HTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSGGSGGSTNLSDIIEKETGKQL





VIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKM





LSGGSKRTADGSEFEPKKKRKV





AncBE4max 689


(SEQ ID NO: 515)



MKRTADGSEFESPKKKRKVSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEIKWG







TSHKIWRHSSKNTTKHVEVNFIEKFTSERHFCPSTSCSITWFLSWSPCGECSKAITEFLSQHPN







VTLVIYVARLYHHMDQQNRQGLRDLVNSGVTIQIMTAPEYDYCWRNFVNYPPGKEAHWPR







YPPLWMKLYALELHAGILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSG






GSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNT





DRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESF





LVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIE





GDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLF





GNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSD





ILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQE





EFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNR





EKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPN





EKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYF





KKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLK





TYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSL





TFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN





QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINR





LSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF





DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLV





SDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQ





EIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNI





VKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKS





VKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNEL





ALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSA





YNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRID





LSQLGGDSGGSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTD





ENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSGGSGGSTNLSDIIEKETGKQLVIQESILMLP





EEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSKRTA





DGSEFEPKKKRKV





YE1-BE4


(SEQ ID NO: 516)



MKRTADGSEFESPKKKRKVSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRH






SIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSYSPCGECSRAITEFLSRYPHVTLFIYIA





RLYHHADPENRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLEL





YCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESATP





ESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEAT






RLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY







PTIYHLRKKLVDSTDICADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINA







SGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDD







LDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPE







KYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIH







LGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA







SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTN







RKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFE







DREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQL







IHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARE







NQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSD







YDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE







RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYK







VREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMN







FFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRN







SDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK







GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQK







QLFVEQHKHYLDETIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYF







DTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGKQLVIQES






ILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG





SGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDA





PEYKPWALVIQDSNGENKIKMLSGGSKRTADGSEFEPKKKRKV





YE2-BE4


(SEQ ID NO: 517)



MKRTADGSEFESPKKKRKVSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRH






SIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSYSPCGECSRAITEFLSRYPHVTLFIYIA





RLYHHADPRNRQGLEDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLEL





YCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESATP





ESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEAT






RLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY







PTIYHLRKKLVDSTDICADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINA







SGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDD







LDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPE







KYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIH







LGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA







SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTN







RKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFE







DREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQL







IHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARE







NQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSD







YDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE







RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYK







VREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMN







FFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRICVLSMPQVNIVKKTEVQTGGFSKESILPKRN







SDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK







GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQK







QLFVEQHKHYLDEHEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYF







DTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGKQLVIQES






ILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG





SGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDA





PEYKPWALVIQDSNGENKIKMLSGGSKRTADGSEFEPKKKRKV








YEE-BE4


(SEQ ID NO: 518)



MKRTADGSEFESPKKKRKVSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRH






SIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSYSPCGECSRAITEFLSRYPHVTLFIYIA





RLYHHADPENRQGLEDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLEL





YCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESATP





ESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEAT






RLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY







PTIYHLRKKLVDSTDICADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINA







SGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDD







LDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPE







KYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIH







LGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA







SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTN







RKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFE







DREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQL







IHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARE







NQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSD







YDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE







RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYK







VREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMN







FFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRN







SDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK







GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQK







QLFVEQHKHYLDEHEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYF







DTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGKQLVIQES






ILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG





SGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDA





PEYKPWALVIQDSNGENKIKMLSGGSKRTADGSEFEPKKKRKV





EE-BE4


(SEQ ID NO: 550)



MKRTADGSEFESPKKKRKVSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRH






SIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYI





ARLYHHADPENRQGLEDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLE





LYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESAT





PESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEA






TRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEK







YPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPIN







ASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDD







DLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLP







EKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQI







HLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDK







GASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFK







TNRINTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTL







FEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFM







QLIHDDSLTFKEDIQICAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMA







RENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL







SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK







AERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQF







YKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIM







NFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKR







NSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEA







KGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQ







KQLFVEQHKHYLDEHEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKY







FDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGKQLVIQE






SILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSG





GSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSD





APEYKPWALVIQDSNGENKIKMLSGGSKRTADGSEFEPKKKRKV





R33A-BE4


(SEQ ID NO: 551)



MKRTADGSEFESPKKKRKVSSETGPVAVDPTLRRRIEPHEFEVFFDPRELAKETCLLYEINWGGRH






SIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYI





ARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLE





LYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESAT





PESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEA






TRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEK







YPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPIN







ASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDD







DLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLP







EKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQI







HLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDK







GASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFK







TNRINTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTL







FEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFM







QLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMA







RENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL







SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK







AERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQF







YKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIM







NFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKR







NSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEA







KGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQ







KQLFVEQHKHYLDEHEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKY






FDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGKQLVIQE





SILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSG





GSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSD





APEYKPWALVIQDSNGENKIKMLSGGSKRTADGSEFEPKKKRKV





R33A + K34A-BE4


(SEQ ID NO: 552)



MKRTADGSEFESPKKKRKVSSETGPVAVDPTLRRRIEPHEFEVFFDPRELAAETCLLYEINWGGRH






SIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYI





ARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLE





LYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESAT





PESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEA






TRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEK







YPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPIN







ASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDD







DLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLP







EKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQI







HLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDK







GASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFK







TNRINTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTL







FEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFM







QLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMA







RENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL







SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK







AERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQF







YKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIM







NFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKR







NSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEA







KGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQ







KQLFVEQHKHYLDEHEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKY







FDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGKQLVIQE






SILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSG





GSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSD





APEYKPWALVIQDSNGENKIKMLSGGSKRTADGSEFEPKKKRKV





FERNY-BE4


(SEQ ID NO: 362)



MKRTADGSEFESPKKKRKVFERNYDPRELRKETYLLYEIKWGKSGKLWRHWCQNNRTQHAEVY






FLENIFNARRFNPSTHCSITWYLSWSPCAECSQKIVDFLKEHPNVNLEIYVARLYYHEDERNRQGL





RDLVNSGVTIRIMDLPDYNYCWKTFVSDQGGDEDYWPGHFAPWIKQYSLKLSGGSSGGSSGSETP





GTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFD






SGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDE







VAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQL







FEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS







KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKAL







VRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDN







GSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNF







EEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAI







VDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDI







VLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA







NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPEN







IVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQE







LDINRLSDYDVDHIVPQSFLKDDSIDNICVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRK







FDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFR







KDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYF







FYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSK







ESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNP







IDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSP







EDNEQKQLFVEQHKHYLDEHEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGA







PAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGK






QLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKI





KMLSGGSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENV





MLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSKRTADGSEFEPKKKRKV





AALN-BE4


(SEQ ID NO: 364)



MKRTADGSEFESPKKKRKVSSETGPVAVDPTLRRRIEPHEFEVFFDPRELAAETCLLYEINWGGRH






SIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYI





ARLYHLANPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLE





LYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESAT





PESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEA






TRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEK







YPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPIN







ASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDD







DLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLP







EKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQI







HLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDK







GASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFK







TNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTL







FEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFM







QLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMA







RENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL







SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK







AERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQF







YKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIM







NFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKR







NSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEA







KGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQ







KQLFVEQHKHYLDEHEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKY







FDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGKQLVIQE






SILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSG





GSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSD





APEYKPWALVIQDSNGENKIKMLSGGSKRTADGSEFEPKKKRKV





BE4max, modified with SpCas9-NG (“BE4-NG”)


(SEQ ID NO: 365)



MKRTADGSEFESPKKKRKVSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRH






SIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYI





ARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLE





LYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESAT





PESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEA






TRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEK







YPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPIN






ASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDD






DLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLP







EKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQI







HLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDK







GASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFK







TNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTL







FEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFM







QLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMA







RENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL







SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK







AERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQF







YKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIM







NFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESIRPKR







NSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK







GYKEVKKDLIIKLPKYSLFELENGRKRMLASARFLQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQK







QLFVEQHKHYLDEHEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPRAFKYF







DTTIDRKVYRSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGKQLVIQES






ILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG





SGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDA





PEYKPWALVIQDSNGENKIKMLSGGSKRTADGSEFEPKKKRKV





BE4max-SaKKH


(SEQ ID NO: 369)



MKRTADGSEFESPKKKRKVSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRH






SIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYI





ARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLE





LYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESAT





PESSGGSSGGSGKRNYILGLAIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARRLKRRR






RHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDTGN







ELSTKEQISRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTY







IDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLNNLVITRD







ENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGKPEFTNLKVYHDIKDITARKEII







ENAELLDQIAKILTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQI







AIFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQK







MINEMQKRNRQTNERIEEHRITGKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIP







RSVSFDNSFNNKVLVKQEENSKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDI







NRFSVQKDFINRNLVDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHH







AEDALHANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKY







SHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKL







KLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVKLSLK







PYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYKNDLIKINGELYRVI







GVNNDLLNRIEVNMIDITYREYLENMNDKRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKGS







GGSGGSGGSTNLSDHEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAP







EYKPWALVIQDSNGENKIKMLSGGSGGSGGSTNLSDHEKETGKQLVIQESILMLPEEVEEVIGNKPESDI







LVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSKRTADGSEFEPKKKRKV






BE4max-NRRH


(SEQ ID NO: 370)



MKRTADGSEFESPKKKRKVSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRH






SIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYI





ARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLE





LYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGGSSGGSSGSETPGTSESAT





PESSGGSSGGSDKKYSIGLTIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEA







TRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEK









YPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPIN









ASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDD









DLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMVKRYDEHHQDLTLLKALVRQQLP









EKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLISKQRTFDNGIIPHQI









HLGELHAILRRQGDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDK









GASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFK









TNRKVTV
K
QLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTL









FEDREMIEERLKTYAHLFDDKVMKQLKRLRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFM









QLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGGHKPENIVIEMA









RENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL









SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK









AERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQF









YKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIM









NFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKG









NSDKLIARKKDWDPKKYGGFNSPTAAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIGFLEA









KGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGVLHKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQ









KQLFVEQHKHYLDEHEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGVPAAFKY









FDTTIDKKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDS
GGSGGSGGSTNLSDITEKETGKQLVIQE






SILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSG





GSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSD





APEYKPWALVIQDSNGENKIKMLSGGSKRTADGSEFEPKKKRKV





BE4max-VQR


(SEQ ID NO: 371)



MKRTADGSEFESPKKKRKVSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEI







NWGGRHSIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRA







ITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVN







YSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQ







RLPPHILWATGLKSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWA








VITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEI









FSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKA









DLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSA









RLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNL









LAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLP









EKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNG









SIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETIT









PWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPA









FLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKD









KDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLI









NGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSP









AIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQI









LKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLT









RSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQL









VETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAH









DAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEIT









LANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNS









DKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDF







LEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLK







GSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIH









LFTLTNLGAPAAFKYFDTTIDRIWYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD
SGGSGGSG






GSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDA





PEYKPWALVIQDSNGENKIKMLSGGSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEE





VIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSKRTA






DGSEFEPKKKRKV






BE4max-VRQR


(SEQ ID NO: 372)




MKRTADGSEFESPKKKRKV
SSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEI








NWGGRHSIWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRA







ITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVN







YSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQ







RLPPHILWATGLKSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWA








VITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEI









FSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKA









DLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSA









RLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNL









LAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLP









EKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNG









SIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETIT









PWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPA









FLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKD









KDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLI









NGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSP









AIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQI









LKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLT









RSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQL









VETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAH









DAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEIT









LANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNS









DKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDF









LEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARELQKGNELALPSKYVNFLYLASHYEKLK









GSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIH









LFTLTNLGAPAAFKYFDTTIDRIWYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD
SGGSGGSG






GSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDA





PEYKPWALVIQDSNGENKIKMLSGGSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEE





VIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSKRTA






DGSEFEPKKKRKV







Adenine Nucleobase Editors

In some aspects, the base editing methods of the disclosure comprise the use of an adenine nucleobase editor. Exemplary adenine nucleobase editors include, but are not limited to, ABE7.10 (or ABEmax), ABE8e, ABE8e-SaKKH, ABE8e-NG, ABE-xCas9, ABE7.10-SaKKH, ABE7.10-NG, ABE7.10-VRQR, ABE7.10-VQR, ABE8e-NRTH, ABE8e-NRRH, ABE8e-VQR, or ABE8e-VRQR. In certain embodiments, the adenine nucleobase editor used in the disclosed methods is an ABE8e or an ABE7.10. ABE8e is sometimes referred to herein as “ABE8” or “ABE8.0”. The ABE8e nucleobase editor and variants thereof may comprise an adenosine deaminase domain containing a TadA-8e adenosine deaminase monomer (monomer form) or a TadA-8e adenosine deaminase homodimer or heterodimer (dimer form). Other ABEs may be used to deaminate an A nucleobase in accordance with the disclosed methods.


In some aspects, the disclosure provides complexes of adenine nucleobase editors and guide RNAs. Exemplary adenine nucleobase editors of the disclosed complexes include, but are not limited to, ABE7.10 (or ABEmax), ABE8e, ABE8e-SaKKH, ABE8e-NG, ABE-xCas9, ABE7.10-SaKKH, ABE7.10-NG, ABE7.10-VRQR, ABE7.10-VQR, ABE8e-NRTH, ABE8e-NRRH, ABE8e-VQR, or ABE8e-VRQR. In certain embodiments, the adenine nucleobase editor of any of the disclosed complexes is a ABE8e or an ABE7.10. Other ABEs may be used to deaminate a A nucleobase in accordance with the disclosed complexes.


The disclosed complexes of ABEs may possess an on-target editing efficiency of more than 50% after being contacted with a nucleic acid molecule comprising a target sequence. Further exemplary ABE complexes possess an on-target editing efficiency of more than 60% after being contacted with a nucleic acid molecule comprising a target sequence. Further exemplary ABEs possess an on-target editing efficiency of more than 65%, more than 70%, more than 75%, more than 80%, more than 82.5%, or more than 85% after being contacted with a nucleic acid molecule comprising a target sequence. The disclosed ABE complexes may exhibit indel frequencies of less than 0.75%, less than 0.6%, less than 0.5%, less than 0.4%, less than 0.3%, or less than 0.2% after being contacted with a nucleic acid molecule containing a target sequence.


Some aspects of the disclosure provide fusion proteins that comprise a nucleic acid programmable DNA binding protein (napDNAbp) and at least two adenosine deaminase domains. Without wishing to be bound by any particular theory, dimerization of adenosine deaminases (e.g., in cis or in trans) may improve the ability (e.g., efficiency) of the fusion protein to modify a nucleic acid base, for example to deaminate adenine. In some embodiments, any of the fusion proteins may comprise 2, 3, 4 or 5 adenosine deaminase domains. In some embodiments, any of the fusion proteins provided herein comprises two adenosine deaminases. In some embodiments, any of the fusion proteins provided herein contains only two adenosine deaminases. In some embodiments, the adenosine deaminases are the same. In some embodiments, the adenosine deaminases are any of the adenosine deaminases provided herein. In some embodiments, the adenosine deaminases are different.


In some embodiments, the first adenosine deaminase is any of the adenosine deaminases provided herein, and the second adenosine is any of the adenosine deaminases provided herein, but is not identical to the first adenosine deaminase. As one example, the fusion protein may comprise a first adenosine deaminase and a second adenosine deaminase that both comprise the amino acid sequence of SEQ ID NO: 10, which contains a W23R; H36L; P48A; R51L; L84F; A106V; D108N; H123Y; S146C; D147Y; R152P; E155V; I156F; and K157N mutation from ecTadA (SEQ ID NO: 1). In some embodiments, the fusion protein may comprise a first adenosine deaminase that comprises the amino acid sequence of SEQ ID NO: 1, and a second adenosine deaminase domain that comprises the amino acid sequence of TadA7.10 of SEQ ID NO: 10. In certain embodiments, the first and/or second deaminase is a TadA-8e deaminase. Additional fusion protein constructs comprising two adenosine deaminase domains are illustrated herein and are provided in the art.


In some embodiments, the fusion protein comprises two adenosine deaminases (e.g., a first adenosine deaminase and a second adenosine deaminase). In some embodiments, the fusion protein comprises a first adenosine deaminase and a second adenosine deaminase. In some embodiments, the first adenosine deaminase is N-terminal to the second adenosine deaminase in the fusion protein. In some embodiments, the first adenosine deaminase is C-terminal to the second adenosine deaminase in the fusion protein. In some embodiments, the first adenosine deaminase and the second deaminase are fused directly or via a linker. In some embodiments, the linker is any of the linkers provided herein, for example, any of the linkers described in the “Linkers” section. In some embodiments, the linker comprises the amino acid sequence of any one of SEQ ID NOs: 135-152. In some embodiments, the linker is 32 amino acids in length. In some embodiments, the linker comprises the amino acid sequence (SGGS)2-SGSETPGTSESATPES-(SGGS)2 (SEQ ID NO: 136), which may also be referred to as (SGGS)2-XTEN-(SGGS)2 (SEQ ID NO: 136). In some embodiments, the linker comprises the amino acid sequence (SGGS)n-SGSETPGTSESATPES-(SGGS)n (SEQ ID NO: 142), wherein n is 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. In some embodiments, the first adenosine deaminase is the same as the second adenosine deaminase. In some embodiments, the first adenosine deaminase and the second adenosine deaminase are any of the adenosine deaminases described herein. In some embodiments, the first adenosine deaminase and the second adenosine deaminase are different. In some embodiments, the first adenosine deaminase is any of the adenosine deaminases provided herein. In some embodiments, the second adenosine deaminase is any of the adenosine deaminases provided herein but is not identical to the first adenosine deaminase. In some embodiments, the first adenosine deaminase is an ecTadA adenosine deaminase. In some embodiments, the first adenosine deaminase comprises an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of the amino acid sequences set forth in any one of SEQ ID NOs: 1-10, or to any of the adenosine deaminases provided herein. In some embodiments, the first adenosine deaminase comprises the amino acid sequence of SEQ ID NO: 1. In some embodiments, the second adenosine deaminase comprises an amino acid sequence that is at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of the amino acid sequences set forth in any one of SEQ ID NOs: 1-10, or to any of the adenosine deaminases provided herein. In some embodiments, the second adenosine deaminase comprises the amino acid sequence of SEQ ID NO: 10.


In some embodiments, the general architecture of exemplary fusion proteins with a first adenosine deaminase, a second adenosine deaminase, and a napDNAbp comprises any one of the following structures, where NLS is a nuclear localization sequence (e.g., any NLS provided herein), NH2 is the N-terminus of the fusion protein, and COOH is the C-terminus of the fusion protein.


Fusion proteins comprising a first adenosine deaminase, a second adenosine deaminase, and a napDNAbp.


NH2-[first adenosine deaminase]-[second adenosine deaminase]-[napDNAbp]-COOH;


NH2-[first adenosine deaminase]-[napDNAbp]-[second adenosine deaminase]-COOH;


NH2-[napDNAbp]-[first adenosine deaminase]-[second adenosine deaminase]-COOH;


NH2-[second adenosine deaminase]-[first adenosine deaminase]-[napDNAbp]-COOH;


NH2-[second adenosine deaminase]-[napDNAbp]-[first adenosine deaminase]-COOH;


NH2-[napDNAbp]-[second adenosine deaminase]-[first adenosine deaminase]-COOH.


In some embodiments, the fusion proteins provided herein do not comprise a linker. In some embodiments, a linker is present between one or more of the domains or proteins (e.g., first adenosine deaminase, second adenosine deaminase, and/or napDNAbp). In some embodiments, the “]-[” used in the general architecture above indicates the presence of an optional linker.


Fusion proteins comprising a first adenosine deaminase, a second adenosine deaminase, a napDNAbp, and an NLS.

  • NH2-[NLS]-[first adenosine deaminase]-[second adenosine deaminase]-[napDNAbp]-COOH;
  • NH2-[first adenosine deaminase]-[NLS]-[second adenosine deaminase]-[napDNAbp]-COOH;
  • NH2-[first adenosine deaminase]-[second adenosine deaminase]-[NLS]-[napDNAbp]-COOH;
  • NH2-[first adenosine deaminase]-[second adenosine deaminase]-[napDNAbp]-[NLS]-COOH;
  • NH2-[NLS]-[first adenosine deaminase]-[napDNAbp]-[second adenosine deaminase]-COOH;
  • NH2-[first adenosine deaminase]-[NLS]-[napDNAbp]-[second adenosine deaminase]-COOH;
  • NH2-[first adenosine deaminase]-[napDNAbp]-[NLS]-[second adenosine deaminase]-COOH;
  • NH2-[first adenosine deaminase]-[napDNAbp]-[second adenosine deaminase]-[NLS]-COOH;
  • NH2-[NLS]-[napDNAbp]-[first adenosine deaminase]-[second adenosine deaminase]-COOH;
  • NH2-[napDNAbp]-[NLS]-[first adenosine deaminase]-[second adenosine deaminase]-COOH;
  • NH2-[napDNAbp]-[first adenosine deaminase]-[NLS]-[second adenosine deaminase]-COOH;
  • NH2-[napDNAbp]-[first adenosine deaminase]-[second adenosine deaminase]-[NLS]-COOH;
  • NH2-[NLS]-[second adenosine deaminase]-[first adenosine deaminase]-[napDNAbp]-COOH;
  • NH2-[second adenosine deaminase]-[NLS]-[first adenosine deaminase]-[napDNAbp]-COOH;
  • NH2-[second adenosine deaminase]-[first adenosine deaminase]-[NLS]-[napDNAbp]-COOH;
  • NH2-[second adenosine deaminase]-[first adenosine deaminase]-[napDNAbp]-[NLS]-COOH;
  • NH2-[NLS]-[second adenosine deaminase]-[napDNAbp]-[first adenosine deaminase]-COOH;
  • NH2-[second adenosine deaminase]-[NLS]-[napDNAbp]-[first adenosine deaminase]-COOH;
  • NH2-[second adenosine deaminase]-[napDNAbp]-[NLS]-[first adenosine deaminase]-COOH;
  • NH2-[second adenosine deaminase]-[napDNAbp]-[first adenosine deaminase]-[NLS]-COOH;
  • NH2-[NLS]-[napDNAbp]-[second adenosine deaminase]-[first adenosine deaminase]-COOH;
  • NH2-[napDNAbp]-[NLS]-[second adenosine deaminase]-[first adenosine deaminase]-COOH;
  • NH2-[napDNAbp]-[second adenosine deaminase]-[NLS]-[first adenosine deaminase]-COOH;
  • NH2-[napDNAbp]-[second adenosine deaminase]-[first adenosine deaminase]-[NLS]-COOH.


Exemplary ABEs include, without limitation, the following fusion proteins. For the purposes of clarity, the adenosine deaminase domain may be shown in Bold; mutations of the ecTadA deaminase domain are shown in Bold underlining; the XTEN linker is shown in italics; the UGI/AAG/EndoV domains are shown in Bold italics; and NLS is shown in underlined italics:


In some embodiments, an A to G nucleobase editor comprises the structure of NH2-[second adenosine deaminase]-[first adenosine deaminase]-[dCas9]-COOH. In some embodiments, the second adenosine deaminase is a wile-type ecTadA (SEQ ID NO: 314). In some embodiments, the a linker is used between each domain. In some embodiments, the linker is 32 amino acids long and comprises the amino acid sequence of SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 384).


Exemplary adenine nucleobase editors comprise amino acid sequences that are at least least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or at least 99.5% identical to any one of the amino acid sequences SEQ ID NOs: 379, 380, 382, 383, 386, and 388, 478 and 483. In particular embodiments, the disclosed adenine nucleobase editors comprise an amino acid sequence that is at least 90% identical to any of SEQ ID NOs: 388, 478, and 483. In particular embodiments, the disclosed adenine nucleobase editors comprise an amino acid sequence of any of SEQ ID NOs: 388, 478 and 483.


Non-limiting examples of A to G nucleobase editors are provided below, as SEQ ID NOs: 323-342, 379-383, 385-388, 458-478, 480, 483, and 553, provided below.










ecTadA(wt)-XTEN-nCas9-NLS



(SEQ ID NO: 323)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





ecTadA(D108N)-XTEN-nCas9-NLS: (mammalian construct, active on DNA)


(SEQ ID NO: 324)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





ecTadA(D108G)-XTEN-nCas9-NLS: (mammalian construct, active on DNA, A to G editing


(SEQ ID NO: 325)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARGAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





ecTadA(D108V)-XTEN-nCas9-NLS: (mammalian construct, active on DNA, A to G editing


(SEQ ID NO: 326)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARVAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





ecTadA(D108N)-XTEN-nCas9-UGI-NLS (BE3 analog of A to G editor)


(SEQ ID NO: 327)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHT





AYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV





ecTadA(D108G)-XTEN-nCas9-UGI-NLS (BE3 analog of A to G editor)


(SEQ ID NO: 328)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARGAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHT





AYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV





ecTadA(D108V)-XTEN-nCas9-UGI-NLS (BE3 analog of A to G editor)


(SEQ ID NO: 329)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARVAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHT





AYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV





ecTadA(D108N)-XTEN-dCas9-UGI-NLS (mammalian cells, BE2 analog of A to G editor)


(SEQ ID NO: 330)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHT





AYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV





ecTadA(D108G)-XTEN-dCas9-UGI-NLS (mammalian cells, BE2 analog of A to G editor)


(SEQ ID NO: 331)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARGAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHT





AYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV





ecTadA(D108V)-XTEN-dCas9-UGI-NLS (mammalian cells, BE2 analog of A to G editor)


(SEQ ID NO: 332)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARVAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVH





TAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV





ecTadA(D108N)-XTEN-nCas9-AAG(E125Q)-NLS-cat. alkyladenosine glycosylase


(SEQ ID NO: 333)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSKGHLTRLGLEFFDQPAVPLARAFLGQVLVRRLPNGTELRGRI





VETQAYLGPEDEAAHSRGGRQTPRNRGMFMKPGTLYVYIIYGMYFCMNISSQGDGACVLLRALEPLEGLET





MRQLRSTLRKGTASRVLKDRELCSGPSKLCQALAINKSFDQRDLAQDEAVWLERGPLEPSEPAVVAAARVG





VGHAGEWARKPLRFYVRGSPWVSVVDRVAEQDTQASGGSPKKKRKV





ecTadA(D108G)-XTEN-nCas9-AAG(E125Q)-NLS-cat. alkyladenosine glycosylase


(SEQ ID NO: 334)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARGAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSKGHLTRLGLEFFDQPAVPLARAFLGQVLVRRLPNGTELRGRI





VETQAYLGPEDEAAHSRGGRQTPRNRGMFMKPGTLYVYIIYGMYFCMNISSQGDGACVLLRALEPLEGLET





MRQLRSTLRKGTASRVLKDRELCSGPSKLCQALAINKSFDQRDLAQDEAVWLERGPLEPSEPAVVAAARVG





VGHAGEWARKPLRFYVRGSPWVSVVDRVAEQDTQASGGSPKKKRKV





ecTadA(D108V)-XTEN-nCas9-AAG(E125Q)-NLS-cat. alkyladenosine glycosylase


(SEQ ID NO: 335)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARVAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSKGHLTRLGLEFFDQPAVPLARAFLGQVLVRRLPNGTELRGRI





VETQAYLGPEDEAAHSRGGRQTPRNRGMFMKPGTLYVYIIYGMYFCMNISSQGDGACVLLRALEPLEGLET





MRQLRSTLRKGTASRVLKDRELCSGPSKLCQALAINKSFDQRDLAQDEAVWLERGPLEPSEPAVVAAARVG





VGHAGEWARKPLRFYVRGSPWVSVVDRVAEQDTQASGGSPKKKRKV





ecTadA(D108N)-XTEN-nCas9-EndoV(D35A)-NLS: contains cat. endonuclease V


(SEQ ID NO: 336)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSDLASLRAQQIELASSVIREDRLDKDPPDLIAGAAVGFEQGGE





VTRAAMVLLKYPSLELVEYKVARIATTMPYIPGFLSFREYPALLAAWEMLSQKPDLVFVDGHGISHPRRLGV





ASHFGLLVDVPTIGVAKKRLCGKFEPLSSEPGALAPLMDKGEQLAWVWRSKARCNPLFIATGHRVSVDSAL





AWVQRCMKGYRLPEPTRWADAVASERPAFVRYTANQPSGGSPKKKRKV





ecTadA(D108G)-XTEN-nCas9-EndoV(D35A)-NLS: contains cat. endonuclease V


(SEQ ID NO: 337)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARGAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSDLASLRAQQIELASSVIREDRLDKDPPDLIAGAAVGFEQGGE





VTRAAMVLLKYPSLELVEYKVARIATTMPYIPGFLSFREYPALLAAWEMLSQKPDLVFVDGHGISHPRRLGV





ASHFGLLVDVPTIGVAKKRLCGKFEPLSSEPGALAPLMDKGEQLAWVWRSKARCNPLFIATGHRVSVDSAL





AWVQRCMKGYRLPEPTRWADAVASERPAFVRYTANQPSGGSPKKKRKV





ecTadA(D108V)-XTEN-nCas9-EndoV(D35A)-NLS: contains cat. endonuclease V


(SEQ ID NO: 338)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARVAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSDLASLRAQQIELASSVIREDRLDKDPPDLIAGAAVGFEQGGE





VTRAAMVLLKYPSLELVEYKVARIATTMPYIPGFLSFREYPALLAAWEMLSQKPDLVFVDGHGISHPRRLGV





ASHFGLLVDVPTIGVAKKRLCGKFEPLSSEPGALAPLMDKGEQLAWVWRSKARCNPLFIATGHRVSVDSAL





AWVQRCMKGYRLPEPTRWADAVASERPAFVRYTANQPSGGSPKKKRKV





Variant resulting from first round of evolution (in bacteria)


ecTadA(H8Y_D108N_N127S)-XTEN-dCas9


(SEQ ID NO: 339)



MSEVEFSYEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVLHHPGMSHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGD





Enriched variants from second round of evolution (in bacteria) ecTadA


(H8Y_D108N_N127S_E155X)-XTEN-dCas9; X = D, G or V


(SEQ ID NO: 340)



MSEVEFSYEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVLHHPGMSHRVEITEGILADE





CAALLSDFFRMRRQXIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGD





pNMG-160: ecTadA(D108N)-XTEN-nCas9-GGS-AAG*(E125Q)-GGS-NLS


(SEQ ID NO: 341)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDGGSKGHLTRLGLEFFDQPAVPLARAFLGQVLVRRLPNGTELRGRI





VETQAYLGPEDEAAHSRGGRQTPRNRGMFMKPGTLYVYIIYGMYFCMNISSQGDGACVLLRALEPLEGLET





MRQLRSTLRKGTASRVLKDRELCSGPSKLCQALAINKSFDQRDLAQDEAVWLERGPLEPSEPAVVAAARVG





VGHAGEWARKPLRFYVRGSPWVSVVDRVAEQDTQAGGSPKKKRKV





pNMG-161: ecTadA(D108N)-XTEN-nCas9-GGS-EndoV*(D35A)-GGS-NLS


(SEQ ID NO: 342)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARNAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDGGSDLASLRAQQIELASSVIREDRLDKDPPDLIAGAAVGFEQGGEV





TRAAMVLLKYPSLELVEYKVARIATTMPYIPGFLSFREYPALLAAWEMLSQKPDLVFVDGHGISHPRRLGVA





SHFGLLVDVPTIGVAKKRLCGKFEPLSSEPGALAPLMDKGEQLAWVWRSKARCNPLFIATGHRVSVDSALA





WVQRCMKGYRLPEPTRWADAVASERPAFVRYTANQPGGSPKKKRKV





pNMG-371: ecTadA(L84F_A106V_D108N_H123Y_D147Y_E155V_I156F)-SGGS-


SGGS-XTEN-SGGS-SGGS-ecTadA(L84F_A106V_D108N_H123Y_D147Y_E155V_I156F)-


SGGS-SGGS-XTEN-SGGS-SGGS-nCas9-SGGS-NLS


(SEQ ID NO: 458)



SEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVM






QNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADEC





AALLSYFFRMRRQVFKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLSYFFRMRRQVF





KAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETALATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQLEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-616 amino acid sequence: ecTadA(wild type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(W23L_H36L_P48A_R51L_L84F_A106V_D108N_H123Y_S146C_D147Y_R152P_E155V_


I156F_K157N)-(SGGS)2-XTEN-(SGGS)2_nCas9_SGGS_NLS


(SEQ ID NO: 459)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRALDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLCYFFRMPRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-624 amino acid sequence: ecTadA(wild type)-32 a.a. linker-


ecTadA(W23R_H36L_P48A_R51L_L84F_A106V_D108N_123Y_S146C_D147Y_R152P_E155V_


I156F_K157N)-24 a.a. linker_nCas9_SGGS_NLS


(SEQ ID NO: 460)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLCYFFRMPRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTD





RHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKK





HERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFI





QLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAE





DAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDL





TLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTF





DNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE





VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDL





LFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTL





FEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLI





HDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQT





TQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHI





VPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELD





KAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHA





HDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGE





IRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPK





KYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYS





LFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISE





FSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIH





QSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-476 amino acid sequence (evolution #3 hetero dimer, wt TadA + TadA evo #3


mutations): ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(L84F_A106V_D108N_H123Y_D147Y_E155V_I156F)-(SGGS)2-XTEN-


(SGGS)2_nCas9_SGGS_NLS


(SEQ ID NO: 461)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLSYFFRMRRQVF





KAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-477 amino acid sequence: ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(H36L_R51L_L84F_A106V_D108N_H123Y_S146C_D147Y_E155V_I156F_K157N)-


(SGGS)2-XTEN-(SGGS)2_nCas9_SGGS_NLS


(SEQ ID NO: 462)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRPIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLCYFFRMRRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-558 amino acid sequence: ecTadA(wild-type)-32 a.a. linker-


ecTadA(H36L_R51L_L84F_A106V_D108N_H123Y_S146C_D147Y_E155V_I156F_K157N)-


24 a.a. linker_nCas9_SGGS_NLS


(SEQ ID NO: 463)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRPIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLCYFFRMRRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTD





RHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKK





HERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFI





QLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAE





DAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDL





TLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTF





DNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE





VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDL





LFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTL





FEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLI





HDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQT





TQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHI





VPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELD





KAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHA





HDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGE





IRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPK





KYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYS





LFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISE





FSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIH





QSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-576 amino acid sequence: ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(H36L_P48S_R51L_L84F_A106V_D108N_H123Y_S146C_D147Y_E155V_I156F


K157N)-(SGGS)2-XTEN-(SGGS)2_nCas9_GGS_NLS


(SEQ ID NO: 464)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRSIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLCYFFRMRRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-577 amino acid sequence: ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(H36L_P48S_R51L_L84F_A106V_D108N_H123Y_S146C_A142N_D147Y_E155V_I156F_


K157N)-(SGGS)2-XTEN-(SGGS)2_nCas9_GGS_NLS


(SEQ ID NO: 465)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRSIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECNALLCYFFRMRRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-586 amino acid sequence: ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(H36L_P48A_R51L_L84F_A106V_D108N_H123Y_S146C_D147Y_E155V_I156F_


K157N)-(SGGS)2-XTEN-(SGGS)2_nCas9_GGS_NLS


(SEQ ID NO: 466)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLCYFFRMRRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-588 amino acid sequence: ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(H36L_P48A_R51L_L84F_A106V_D108N_H123Y_S146C_A142N_D147Y_E155V_I156F_


K157N)-(SGGS)2-XTEN-(SGGS)2_nCas9_GGS_NLS


(SEQ ID NO: 467)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECNALLCYFFRMRRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-620 amino acid sequence: ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(W23R_H36L_P48A_R51L_L84F_A106V_D108N_H123Y_S146C_D147Y_R152P_E155V_


I156F_K157N)-(SGGS)2-XTEN-(SGGS)2_nCas9_GGS_NLS


(SEQ ID NO: 468)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLCYFFRMPRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-617 amino acid sequence: ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(W23L_H36L_P48A_R51L_L84F_A106V_D108N_H123Y_A142A_S146C_D147Y_E155V_


I156F_K157N)-(SGGS)2-XTEN-(SGGS)2_nCas9_GGS_NLS


(SEQ ID NO: 469)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRALDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECNALLCYFFRMRRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-618 amino acid sequence: ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(W23L_H36L_P48A_R51L_L84F_A106V_D108N_H123Y_A142A_S146C_D147Y_R152P_


E155V_I156F_K157N)-(SGGS)2-XTEN-(SGGS)2_nCas9_GGS_NLS


(SEQ ID NO: 470)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRALDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECNALLCYFFRMAPRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-620 amino acid sequence: ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(W23R_H36L_P48A_R51L_L84F_A106V_D108N_H123Y_S146C_D147Y_R152P_E155V_


I156F_K157N)-(SGGS)2-XTEN-(SGGS)2_nCas9_GGS_NLS


(SEQ ID NO: 471)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLCYFFRMPRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-621 amino acid sequence: ecTadA(wild-type)-32 a.a. linker-


ecTadA(H36L_P48A_R51L_L84F_A106V_D108N_H123Y_S146C_D147Y_R152P_E155V_I156F_


K157N)-24 a.a. linker nCas9_GGS_NLS


(SEQ ID NO: 472)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLCYFFRMPRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTD





RHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKK





HERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFI





QLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAE





DAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDL





TLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTF





DNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE





VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDL





LFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTL





FEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLI





HDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQT





TQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHI





VPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELD





KAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHA





HDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGE





IRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPK





KYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYS





LFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISE





FSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIH





QSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-622 amino acid sequence: ecTadA(wild-type)-32 a.a. linker-


ecTadA(H36L_P48A_R51L_L84F_A106V_D108N_H123Y_A142N_S146C_D147Y_R152P_E155V_


I156F_K157N)-24 a.a. linker_nCas9_GGS_NLS


(SEQ ID NO: 473)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECNALLCYFFRMPRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTD





RHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKK





HERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFI





QLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAE





DAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDL





TLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTF





DNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE





VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDL





LFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTL





FEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLI





HDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQT





TQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHI





VPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELD





KAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHA





HDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGE





IRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPK





KYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYS





LFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISE





FSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIH





QSITGLYETRIDLSQLGGDSGGSPKKKRKV





pNMG-623 amino acid sequence: ecTadA(wild-type)-32 a.a. linker-


ecTadA(W23L_H36L_P48A_R51L_L84F_A106V_D108N_H123Y_S146C_D147Y_R152P_E155V_


I156F_K157N)-24 a.a. linker_nCas9_GGS_NLS


(SEQ ID NO: 474)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRALDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLCYFFRMPRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTD





RHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKK





HERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFI





QLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAE





DAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDL





TLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTF





DNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE





VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDL





LFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTL





FEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLI





HDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQT





TQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHI





VPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELD





KAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHA





HDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGE





IRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPK





KYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYS





LFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISE





FSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIH





QSITGLYETRIDLSQLGGDSGGSPKKKRKV





ABE6.3 ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(H36L_P48S_R51L_L84F_A106V_D108N_H123Y_S146C_D147Y_E155V_I156F_K157N)-


(SGGS)2-XTEN-(SGGS)2_nCas9_SGGS_NLS


(SEQ ID NO: 475)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRSIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLCYFFRMRRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV*





ABE7.8 ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(W23L_H36L_P48A_R51L_L84F_A106V_D108N_H123Y_A142N_S146C_D147Y_E155V_


I156F_K157N)-(SGGS)2-XTEN-(SGGS)2_nCas9_SGGS_NLS


(SEQ ID NO: 476)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRALDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECNALLCYFFRMRRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV*





ABE7.9 ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(W23L_H36L_P48A_R51L_L84F_A106V_D108N_H123Y_A142N_S146C_D147Y_R152P¬_


E155V_I156F_K157N)-(SGGS)2-XTEN-(SGGS)2_nCas9_SGGS_NLS


(SEQ ID NO: 477)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRALDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECNALLCYFFRMPRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV*





ABE7.10 ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(W23R_H36L_P48A_R51L_L84F_A106V_D108N_H123Y_S146C_D147Y_R152P¬1_


E155V_I156F_K157N)-(SGGS)2-XTEN-(SGGS)2_nCas9_SGGS_NLS


(SEQ ID NO: 478)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLCYFFRMPRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV*





ABE6.4: ecTadA(wild-type)-(SGGS)2-XTEN-(SGGS)2-


ecTadA(H36L_P48S_R51L_L84F_A106V_D108N_H123Y_A142N_S146C_D147Y_E155V_I156F_


K157N)-(SGGS)2-XTEN-(SGGS)2_nCas9_SGGS_NLS


(SEQ ID NO: 480)



MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIGRHDPTAHAEIMALRQGGLV






MQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADE





CAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHA





LTLAKRAWDEREVPVGAVLVLNNRVIGEGWNRSIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEP





CVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECNALLCYFFRMRRQVF





NAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKK





FKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEES





FLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD





NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN





FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR





YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE





DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSE





ETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG





EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL





EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA





NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL





SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE





RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVRE





INNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKT





EITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKD





LIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY





LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK





EVLDATLIHQSITGLYETRIDLSQLGGDSGGSPKKKRKV





ABEmax


(SEQ ID NO: 483)



MKRTADGSEFESPKKKRKVMSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNRPIG






RHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDAKTGAAGSLMDV





LHHPGMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSG





GSSGGSSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQG





GLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGIL





ADECAALLCYFFRMPRQVFNAQKKAQSSTDSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIG





TNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEI





FSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLAL





AHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGE





KKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI





LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPI





LEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVG





PLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELT





KVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDL





LKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGI





RDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKV





VDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYY





LQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ





LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITL





KSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEI





GKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQT





GGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFE





KNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKG





SPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAP





AAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDKRTADGSEFEPKKKRKV





ABE8e (monomer)


(SEQ ID NO: 379)



MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGW






NRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNSK





RGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPRQVFNAQKKAQSSINSGGSSGGS





SGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIK





KNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEED





KKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP





DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIA





LSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVN





TEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFI





KPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKIL





TFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPK





HSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECF





DSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLF





DDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI





QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKG





QKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDV





DHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK





AERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRK





DFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA





TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTE





VQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELL





GITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSK





YVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKH





RDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLG





GDSGGSKRTADGSEFEPKKKRKV





ABE8e (dimer)


(SEQ ID NO: 380)



MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGW






NRPIGRHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIHSRIGRVVFGARDA





KTGAAGSLMDVLHHPGMNHRVEITEGILADECAALLSDFFRMRRQEIKAQKKAQSSTDSGGSSGG





SSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIG





EGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGV





RNSKRGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPRQVFNAQKKAQSSINSGG





SSGGSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTD





RHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFL





VEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEG





DLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFG





NLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDI





LRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEE





FYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNRE





KIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNE





KVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYF





KKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLK





TYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSL





TFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN





QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINR





LSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF





DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLV





SDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQ





EIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNI





VKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKS





VKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNEL





ALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSA





YNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRID





LSQLGGDSGGSKRTADGSEFEPKKKRKV





SaABE8e


(SEQ ID NO: 381)



MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGW






NRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNSK





RGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPRQVFNAQKKAQSSINSGGSSGGS





SGSETPGTSESATPESSGGSSGGSGKRNYILGLAIGITSVGYGIIDYETRDVIDAGVRLFKEANVENN





EGRRSKRGARRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALL





HLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSD





YVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTY





FPEELRSVKYAYNADLYNALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILV





NEEDIKGYRVTSTGKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSE





LTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKVDLSQQKEIPTTLVD





DFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQKMINEMQKRNRQTNERIEEIIRTT





GKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEE





NSKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNL





VDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN





ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVD





KKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKL





KLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKV





VKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYNN





DLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPPRIIKTIASKTQSIKKYSTDILGNL





YEVKSKKHPQIIKKGSGGSKRTADGSEFEPKKKRKV





SpCas9NG-ABE8e (“ABE8e-NG”)


(SEQ ID NO: 382)



MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWN






RAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNSKRGA





AGSLMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPRQVFNAQKKAQSSINSGGSSGGSSGS





ETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLI





GALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHE





RHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSD





VDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGL





TPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKA





PLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEK





MDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPY





YVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLY





EYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEIS





GVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVM





KQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVS





GQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRE





RMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQS





FLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLS





ELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKV





REINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYS





NIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKEST





RPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFE





KNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARFLQKGNELALPSKYVNFLYLASH





YEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAE





NIIHLFTLTNLGAPRAFKYFDTTIDRKVYRSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSKRT





ADGSEFEPKKKRKV





SaKKH-ABE8e (“ABE8e-KKH”)


(SEQ ID NO: 383)



MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGW






NRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNSK





RGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPRQVFNAQKKAQSSINSGGSSGGS





SGSETPGTSESATPESSGGSSGGSGKRNYILGLAIGITSVGYGIIDYETRDVIDAGVRLFKEANVENN





EGRRSKRGARRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALL





HLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSD





YVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTY





FPEELRSVKYAYNADLYNALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILV





NEEDIKGYRVTSTGKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSE





LTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKVDLSQQKEIPTTLVD





DFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQKMINEMQKRNRQTNERIEEIIRTT





GKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEE





NSKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNL





VDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN





ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVD





KKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKL





KLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKV





VKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYKN





DLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPPHIIKTIASKTQSIKKYSTDILGNL





YEVKSKKHPQIIKKGSGGSKRTADGSEFEPKKKRKV





ABE8-NRTH: NLS TadA linker, TadA, NRTH


(SEQ ID NO: 553)



MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGW






NRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNSK





RGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPRQVFNAQKKAQSSINSGGSSGGS





SGSETPGTSESATPESSGGSSGGSSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWN






RAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNSKRGAAG







SLMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPRQVFNAQKKAQSSINSGGSSGGSSGSETPGTS







ESATPESSGGSSGGSDKKYSIGLTIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGE







TAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAY







HEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEE







NPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKD







TYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMVKRYDEHHQDLTLLKALVR







QQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGII







PHQIHLGELHAILRRQGDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEV







VDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDL







LFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLT







LTLFEDREMIEERLKTYAHLFDDKVMKQLKRLRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR







NFMQLIHDDSLTFKEDIQKAQVSCQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGGHKPENIVI







EMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDI







NRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDN







LTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDF







QFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYS







NIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESIL







PKGNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIGF







LEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASASVLHKGNELALPSKYVNFLYLASHYEKLKGSSEDN







KQKQLFVEQHKHYLDEHEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGASAAF







KYFDTTIGRKLYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSKRTADGSEFEPKKKRKV






ABE8-NRRH: NLS TadA linker, TadA, NRRH


(SEQ ID NO: 385)




MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNN









RVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGA









MIHSRIGRVVFGVRNSKRGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFY









RMPRQVFNA
Q
KKAQSSIN
SGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWM







RHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRL







IDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNSKRGAAGSLMNVLNYPGMNHRVEITEGILAD







ECAALLCDFYRMPRQVFNAQKKAQSSINSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKK








YSIGLTIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARR









RYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTI









YHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEEN









PINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQ









LSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMVKRYDEHHQ









DLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLN









REDLLRKQRTFDNGIIPHQIHLGELHAILRRQGDFYPFLKDNREKIEKILTFRIPYYVGPLARGNS









RFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNEL









TKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFN









ASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKR









LRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSCQG









DSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGGHKPENIVIEMARENQTTQKGQKNSRER









MKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQ









SFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGG









LSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQF









YKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYF









FYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQT









GGFSKESILPKGNSDKLIARKKDWDPKKYGGFNSPTAAYSVLVVAKVEKGKSKKLKSVKELLGIT









IMERSSFEKNPIGFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGVLHKGNELALPSKY









VNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKH









RDKPIREQAENIIHLFTLTNLGVPAAFKYFD
TT
IDKKRYTSTKEVLDATLIHQSITGLYETRIDLSQ









LGGD
SGGSKRTADGSEFEPKKKRKV






xCas9(3.7)-ABE(7.10): (ecTadA(wt)-linker(32 aa)-ecTadA*(7.10)-linker(32 aa)-


nxCas9(3.7)-NLS):


(SEQ ID NO: 386)




custom-character








custom-character







custom-character
custom-character







custom-character

SEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVI









GEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVITEPCVMCAGAMIHSRIGRVVF









GVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALLCYFFRMPRQVFNAQKKAQSSTD








custom-character
custom-character DKKYSIGLAIGTNSVGWAVITDEYKVPSK






KFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMA





KVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLR





LIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINTASGVDAKAILSA





RLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDTKLQLSKDTYDDDL





DNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKLYDEHHQDLTLLK





ALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRED





LLRKQRTFDNGIIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNS





RFAWMTRKSEETITPWNFEKVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTV





YNELTKVKYVTEGMRKPAFLSGDQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVET





SGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHL





FDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFIQLIHDDSL





TFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIE





MARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRD





MYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMK





NYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMN





TKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKK





YPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRP





LIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR





KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFL





EAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGVLQKGNELALPSKYVNFLYLASHY





EKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIR





EQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLG





GDcustom-characterPKKKRKV





ABE8-VRQR: NLS TadA linker, TadA, SpCas9-VROR


(SEQ ID NO: 387)




MKRTADGSEFESPKKKRKV

SEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNN










RVIGEGWNRAIGLHDPTAHAEIMALR
Q
GGLVM
Q
NYRLIDATLYVTFEPCVMCAGA









MIHSRIGRVVFGVRNSKRGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFY









RMPR
Q
VFNACIKKA
Q
SSIN
SGGSSGGSSGSETPGTSESATPESSGGSSGGSSEVEFSHEYWM







RHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGLVMQNYRLI







DATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNSKRGAAGSLMNVLNYPGMNHRVEITEGILADE







CAALLCDFYRMPRQVFNAQKKAQSSINSGGSSGGSSGSETPGTSESATPESSGGSSGGSDKKY







SIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRR







YTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIY







HLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPI







NASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQL







SKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQD







LTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNRE







DLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSR







FAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELT







KVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNA







SLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRR







RYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQG







DSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERM







KRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQS







FLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGL







SELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFY







KVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFF







YSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTG







GFSKESILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI







MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARELQKGNELALPSKYV







NFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR







DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKQYRSTKEVLDATLIHQSITGLYETRIDLSQL







GGDSGGSKRTADGSEFEPKKKRKV






ABE8e(TadA-8e V106W)


(SEQ ID NO: 388)



MKRTADGSEFESPKKKRKVSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGW






NRAIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGWRNS





KRGAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPRQVFNAQKKAQSSINSGGSSG





GSSGSETPGTSESATPESSGGSSGGSDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHS





IKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEE





DKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLN





PDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLI





ALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRV





NTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYK





FIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEK





ILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVL





PKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIE





CFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAH





LFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKE





DIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQ





KGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY





DVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNL





TKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDF





RKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIG





KATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVK





KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVK





ELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELAL





PSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAY





NKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDL





SQLGGDSGGSKRTADGSEFEPKKKRKV






For the full AAV genome sequences with that encode the CBE3.9max and ABEmax nucleobase editor constructs used in Examples 4 and 5, described below, see FIGS. 26A-26U. All constructs cloned in the px601 backbone, and pseudospacer-containing backbones were cut with Esp3I/BsmBI endonucleases. Primers listed in FIGS. 25A-25B were annealed and ligated with standard molecular biology techniques. The U6-sgRNA cassette was omitted from the ABEmax N-terminal constructs to keep the total construct size under the maximum AAV particle packaging limit.


Uracil Glycosylase Inhibitor Domains

In some embodiments, the N-terminal portion of a split nucleobase editor further comprises an inhibitor of uracil glycosylase (UGI). In some embodiments, the first nucleotide sequence encodes a polypeptide of the structure: NH2-[UGI]-[nucleobase modifying enzyme]-[N-terminal portion of dCas9 or nCas9]-[intein-N]. In some embodiments, the first nucleotide sequence encodes a polypeptide is of the structure: NH2-[nucleobase modifying enzyme]-[UGI]-[N-terminal portion of dCas9 or nCas9]-[intein-N].


In some embodiments, the C-terminal portion of a split nucleobase editor further comprises an enzyme that inhibits the activity of uracil glycosylase (UGI). In some embodiments, the second nucleotide sequence encodes a polypeptide of the structure: NH2-[intein-C]-[C-terminal portion of dCas9 or nCas9]-[UGI]-COOH. In some embodiments, the second nucleotide sequence encodes a polypeptide of the structure: NH2-[intein-C]-[C-terminal portion of dCas9 or nCas9]-[nucleobase modifying enzyme]-[UGI]-COOH. In some embodiments, the second nucleotide sequence encodes a polypeptide of the structure: NH2-[intein-C]-[C-terminal portion of dCas9 or nCas9]-[UGI]-[nucleobase modifying enzyme]-COOH.


Non-limiting, exemplary uracil glycosylase inhibitor sequences are provided below.










Bacillusphage PBS2 (Bacteriophage PBS2) Uracil-



DNA glycosylase inhibitor


(SEQ ID NO: 299)


MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDE





STDENVMLLTSDAPEYKPWALVIQDSNGENKIKML






Erwiniatasmaniensis SSB (themostable single-



stranded DNA binding protein)


(SEQ ID NO: 300)


MASRGVNKVILVGNLGQDPEVRYMPNGGAVANITLATSESWRDKQTGET





KEKTEWHRVVLFGKLAEVAGEYLRKGSQVYIEGALQTRKWTDQAGVEKY





TTEVVVNVGGTMQMLGGRSQGGGASAGGQNGGSNNGWGQPQQPQGGNQF





SGGAQQQARPQQQPQQNNAPANNEPPIDFDDDIP





UdgX (binds to uracil in DNA but does not excise)


(SEQ ID NO: 301)


MAGAQDFVPHTADLAELAAAAGECRGCGLYRDATQAVFGAGGRSARIMM





IGEQPGDKEDLAGLPFVGPAGRLLDRALEAADIDRDALYVTNAVKHFKF





TRAAGGKRRIHKTPSRTEVVACRPWLIAEMTSVEPDVVVLLGATAAKAL





LGNDFRVTQHRGEVLHVDDVPGDPALVATVHPSSLLRGPKEERESAFAG





LVDDLRVAADVRP





UDG (catalytically inactive human UDG, binds to


uracil in DNA but does not excise)


(SEQ ID NO: 302)


MIGQKTLYSFFSPSPARKRHAPSPEPAVQGTGVAGVPEESGDAAAIPAK





KAPAGQEEPGTPPSSPLSAEQLDRIQRNKAAALLRLAARNVPVGFGESW





KKHLSGEFGKPYFIKLMGFVAEERKHYTVYPPPHQVFTWTQMCDIKDVK





VVILGQEPYHGPNQAHGLCFSVQRPVPPPPSLENIYKELSTDIEDFVHP





GHGDLSGWAKQGVLLLNAVLTVRAHQANSHKERGWEQFTDAVVSWLNQN





SNGLVFLLWGSYAQKKGSAIDRKRHHVLQTAHPSPLSVYRGFFGCRHFS





KTNELLQKSGKKPIDWKEL






In some embodiments, when the N-terminal portion and the C-terminal portion of the nucleobase are joined, to form a complete split nucleobase editor. In some embodiments, the split nucleobase editor may comprise any one of the following structures:


NH2-[nucleobase modifying enzyme]-[dCas9 or nCas9]-COOH


NH2-[UGI]-[nucleobase modifying enzyme]-[dCas9 or nCas9]-COOH


NH2-[nucleobase modifying enzyme]-[UGI]-[dCas9 or nCas9]-COOH


NH2-[nucleobase modifying enzyme]-[dCas9 or nCas9]-[UGI]-COOH


NH2-[dCas9 or nCas9]-[nucleobase modifying enzyme]-COOH


NH2-[UGI]-[dCas9 or nCas9]-[nucleobase modifying enzyme]-COOH


NH2-[dCas9 or nCas9]-[UGI]-[nucleobase modifying enzyme]-COOH or


NH2-[dCas9 or nCas9]-[nucleobase modifying enzyme]-[UGI]-COOH.


In some embodiments, the first nucleotide sequence or the second nucleotide sequence (encoding either the split Cas9 protein or the split nucleobase editor) is operably linked to a nucleotide sequence encoding at least one bipartite nuclear localization signal (NLS). For example, the first nucleotide sequence may be operably linked to a nucleotide sequence encoding one or more (e.g., 2, 3, 4, 5, or more) bipartite NLS. In some embodiments, the second nucleotide sequence may be operably linked to a nucleotide sequence encoding one or more (e.g., 2, 3, 4, 5, or more) bipartite NLSs. As such, the split Cas9 or split nucleobase editor formed by joining the N-terminal portion and the C-terminal portion may comprise one or more bipartite NLSs. For example, the split Cas9 or split nucleobase editor may comprise any one of the following structures (bNLS means one or more bipartite nuclear localization signals):


NH2-bNLS-[Cas9]-COOH


NH2-[Cas9]-bNLS-COOH


NH2-bNLS-[nucleobase modifying enzyme]-[dCas9 or nCas9]-COOH


NH2-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-COOH


NH2-[nucleobase modifying enzyme]-[dCas9 or nCas9]-bNLS-COOH


NH2-bNLS-[nucleobase modifying enzyme]-[dCas9 or nCas9]-bNLS-COOH


NH2-bNLS-[nucleobase modifying enzyme]-[dCas9 or nCas9]-COOH


NH2-bNLS-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-bNLS-COOH


NH2-bNLS-[UGI]-[nucleobase modifying enzyme]-[dCas9 or nCas9]-COOH


NH2-[UGI]-bNLS-[nucleobase modifying enzyme]-[dCas9 or nCas9]-COOH


NH2-[UGI]-[nucleobase modifying enzyme]-bNLS[dCas9 or nCas9]-COOH


NH2-[UGI]-[nucleobase modifying enzyme]-[dCas9 or nCas9]-bNLS-COOH


NH2-bNLS-[UGI]-bNLS-[nucleobase modifying enzyme]-[dCas9 or nCas9]-COOH


NH2-bNLS-[UGI]-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-COOH


NH2-bNLS-[UGI]-[nucleobase modifying enzyme]-[dCas9 or nCas9]-bNLS-COOH


NH2-[UGI]-bNLS-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-COOH


NH2-[UGI]-bNLS-[nucleobase modifying enzyme]-[dCas9 or nCas9]-bNLS-COOH


NH2-[UGI]-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-bNLS-COOH


NH2-bNLS-[UGI]-bNLS-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-COOH


NH2-bNLS-[UGI]-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-bNLS-COOH


NH2-[UGI]-bNLS-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-bNLS-COOH


NH2-bNLS-[UGI]-bNLS-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-bNLS-COOH


NH2-bNLS-[nucleobase modifying enzyme]-[UGI]-[dCas9 or nCas9]-COOH


NH2-[nucleobase modifying enzyme]-bNLS-[UGI]-[dCas9 or nCas9]-COOH


NH2-[nucleobase modifying enzyme]-[UGI]-bNLS-[dCas9 or nCas9]-bNLS-COOH


NH2-bNLS-[nucleobase modifying enzyme]-bNLS-[UGI]-[dCas9 or nCas9]-COOH


NH2-bNLS-[nucleobase modifying enzyme]-[UGI]-bNLS-[dCas9 or nCas9]-COOH


NH2-bNLS-[nucleobase modifying enzyme]-[UGI]-[dCas9 or nCas9]-bNLS-COOH


NH2-[nucleobase modifying enzyme]-bNLS-[UGI]-bNLS-[dCas9 or nCas9]-COOH


NH2-[nucleobase modifying enzyme]-bNLS-[UGI]-[dCas9 or nCas9]-bNLS-COOH


NH2-[nucleobase modifying enzyme]-[UGI]-bNLS-[dCas9 or nCas9]-bNLS-COOH


NH2-bNLS-[nucleobase modifying enzyme]-bNLS-[UGI]-bNLS-[dCas9 or nCas9]-COOH


NH2-bNLS-[nucleobase modifying enzyme]-bNLS-[UGI]-[dCas9 or nCas9]-bNLS-COOH


NH2-[nucleobase modifying enzyme]-bNLS-[UGI]-bNLS-[dCas9 or nCas9]-bNLS-COOH


NH2-bNLS-[nucleobase modifying enzyme]-bNLS-[UGI]-bNLS-[dCas9 or nCas9]-bNLS-COOH


NH2-bNLS-[nucleobase modifying enzyme]-[dCas9 or nCas9]-[UGI]-COOH


NH2-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-[UGI]-COOH


NH2-[nucleobase modifying enzyme]-[dCas9 or nCas9]-bNLS-[UGI]-COOH


NH2-[nucleobase modifying enzyme]-[dCas9 or nCas9]-[UGI]-bNLS-COOH


NH2-bNLS-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-[UGI]-COOH


NH2-bNLS-[nucleobase modifying enzyme]-[dCas9 or nCas9]-bNLS-[UGI]-COOH


NH2-bNLS-[nucleobase modifying enzyme]-[dCas9 or nCas9]-[UGI]-bNLS-COOH


NH2-bNLS-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-bNLS-[UGI]-COOH


NH2-bNLS-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-[UGI]-bNLS-COOH


NH2-bNLS-[nucleobase modifying enzyme]-[dCas9 or nCas9]-bNLS-[UGI]-bNLS-COOH


NH2-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-bNLS-[UGI]-bNLS-COOH


NH2-bNLS-[nucleobase modifying enzyme]-bNLS-[dCas9 or nCas9]-bNLS-[UGI]-bNLS-COOH


NH2-bNLS-[dCas9 or nCas9]-[nucleobase modifying enzyme]-COOH


NH2-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-COOH


NH2-[dCas9 or nCas9]-[nucleobase modifying enzyme]-bNLS-COOH


NH2-bNLS-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-COOH


NH2-bNLS-[dCas9 or nCas9]-[nucleobase modifying enzyme]-bNLS-COOH


NH2-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-bNLS-COOH


NH2-bNLS-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-bNLS-COOH


NH2-bNLS-[UGI][dCas9 or nCas9]-[nucleobase modifying enzyme]-COOH


NH2-[UGI]-bNLS-[dCas9 or nCas9]-[nucleobase modifying enzyme]-COOH


NH2-[UGI][dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-COOH


NH2-[UGI][dCas9 or nCas9]-[nucleobase modifying enzyme]-bNLS-COOH


NH2-bNLS-[UGI]-bNLS-[dCas9 or nCas9]-[nucleobase modifying enzyme]-COOH


NH2-bNLS-[UGI][dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-COOH


NH2-bNLS-[UGI][dCas9 or nCas9]-[nucleobase modifying enzyme]-bNLS-COOH


NH2-[UGI]-bNLS-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-COOH


NH2-[UGI]-bNLS-[dCas9 or nCas9]-[nucleobase modifying enzyme]-bNLS-COOH


NH2-[UGI][dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-bNLS-COOH


NH2-bNLS-[UGI]-bNLS-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-COOH


NH2-bNLS-[UGI][dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-bNLS-COOH


NH2-bNLS-[UGI]-bNLS-[dCas9 or nCas9]-[nucleobase modifying enzyme]-bNLS-COOH


NH2-[UGI]-bNLS-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-bNLS-COOH


NH2-bNLS-[UGI]-bNLS-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-bNLS-COOH


NH2-bNLS-[dCas9 or nCas9]-[UGI]-[nucleobase modifying enzyme]-COOH


NH2-[dCas9 or nCas9]-bNLS-[UGI]-[nucleobase modifying enzyme]-COOH


NH2-[dCas9 or nCas9]-[UGI]-bNLS-[nucleobase modifying enzyme]-COOH


NH2-[dCas9 or nCas9]-[UGI]-[nucleobase modifying enzyme]-bNLS-COOH


NH2-bNLS-[dCas9 or nCas9]-bNLS[UGI]-[nucleobase modifying enzyme]-COOH


NH2-bNLS-[dCas9 or nCas9]-[UGI]-bNLS-[nucleobase modifying enzyme]-COOH


NH2-bNLS-[dCas9 or nCas9]-[UGI]-[nucleobase modifying enzyme]-bNLS-COOH


NH2-[dCas9 or nCas9]-bNLS-[UGI]-bNLS-[nucleobase modifying enzyme]-COOH


NH2-[dCas9 or nCas9]-bNLS-[UGI]-[nucleobase modifying enzyme]-bNLS-COOH


NH2-[dCas9 or nCas9]-[UGI]-bNLS-[nucleobase modifying enzyme]-bNLS-COOH


NH2-bNLS-[dCas9 or nCas9]-bNLS-[UGI]-bNLS-[nucleobase modifying enzyme]-COOH


NH2-bNLS-[dCas9 or nCas9]-bNLS-[UGI]-[nucleobase modifying enzyme]-bNLS-COOH


NH2-bNLS-[dCas9 or nCas9]-[UGI]-bNLS-[nucleobase modifying enzyme]-bNLS-COOH


NH2-[dCas9 or nCas9]-bNLS-[UGI]-bNLS-[nucleobase modifying enzyme]-bNLS-COOH


NH2-bNLS-[dCas9 or nCas9]-bNLS-[UGI]-bNLS-[nucleobase modifying enzyme]-bNLS-COOH


NH2-bNLS-[dCas9 or nCas9]-[nucleobase modifying enzyme]-[UGI]-COOH


NH2-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-[UGI]-COOH


NH2-[dCas9 or nCas9]-[nucleobase modifying enzyme]-bNLS-[UGI]-COOH


NH2-[dCas9 or nCas9]-[nucleobase modifying enzyme]-[UGI]-bNLS-COOH


NH2-bNLS-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-[UGI]-COOH


NH2-bNLS-[dCas9 or nCas9]-[nucleobase modifying enzyme]-bNLS-[UGI]-COOH


NH2-bNLS-[dCas9 or nCas9]-[nucleobase modifying enzyme]-[UGI]-bNLS-COOH


NH2-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-bNLS-[UGI]-COOH


NH2-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-[UGI]-bNLS-COOH


NH2-[dCas9 or nCas9]-[nucleobase modifying enzyme]-bNLS-[UGI]-bNLS-COOH


NH2-bNLS-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-bNLS-[UGI]-COOH


NH2-bNLS-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-[UGI]-bNLS-COOH


NH2-bNLS-[dCas9 or nCas9]-[nucleobase modifying enzyme]-bNLS-[UGI]-bNLS-COOH


NH2-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-bNLS-[UGI]-bNLS-COOH


or


NH2-bNLS-[dCas9 or nCas9]-bNLS-[nucleobase modifying enzyme]-bNLS-[UGI]-bNLS-COOH


Herein, “NH2—” represents the N-terminus of a protein or polypeptide, and “—COOH” represents the C-terminus of a protein or polypeptide. “]-[” represents a peptide bond or a linker. In some embodiments, linkers may be used to link any of the protein or protein domains described herein. The linker may be as simple as a covalent bond, or it may be a polymeric linker many atoms in length. In some embodiments, the linker is a polypeptide or based on amino acids. In some embodiments, the linker is not peptide-like. In some embodiments, the linker is a covalent bond (e.g., a carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.). In some embodiments, the linker is a carbon-nitrogen bond of an amide linkage. In some embodiments, the linker is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic or heteroaliphatic linker. In some embodiments, the linker is polymeric (e.g., polyethylene, polyethylene glycol, polyamide, polyester, etc.). In some embodiments, the linker comprises a monomer, dimer, or polymer of aminoalkanoic acid. In some embodiments, the linker comprises an aminoalkanoic acid (e.g., glycine, ethanoic acid, alanine, beta-alanine, 3-aminopropanoic acid, 4-aminobutanoic acid, 5-pentanoic acid, etc.). In some embodiments, the linker comprises a monomer, dimer, or polymer of aminohexanoic acid (Ahx). In some embodiments, the linker is based on a carbocyclic moiety (e.g., cyclopentane, cyclohexane). In some embodiments, the linker comprises a polyethylene glycol moiety (PEG). In some embodiments, the linker comprises amino acids. In some embodiments, the linker comprises a peptide. In some embodiments, the linker comprises an aryl or heteroaryl moiety. In some embodiments, the linker is based on a phenyl ring. The linker may include functionalized moieties to facilitate attachment of a nucleophile (e.g., thiol, amino) from the peptide to the linker. Any electrophile may be used as part of the linker. Exemplary electrophiles include, but are not limited to, activated esters, activated amides, Michael acceptors, alkyl halides, aryl halides, acyl halides, and isothiocyanates.


In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein). In some embodiments, the linker is a bond (e.g., a covalent bond), an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-110, 110-120, 120-130, 130-140, 140-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated. In some embodiments, a linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 377), which may also be referred to as the XTEN linker. In some embodiments, a linker comprises the amino acid sequence: SGGS (SEQ ID NO: 378). In some embodiments, a linker comprises the amino acid sequence: (SGGS)n (SEQ ID NO: 557), (GGGS)n (SEQ ID NO: 558), (GGGGS)n (SEQ ID NO: 559), (G)n (SEQ ID NO: 390), (EAAAK). (SEQ ID NO: 560), (GGS)n (SEQ ID NO: 562), SGSETPGTSESATPES (SEQ ID NO: 377), or (XP)n (SEQ ID NO: 563) motif, or a combination of any of these, wherein n is independently an integer between 1 and 30, inclusive, and wherein X is any amino acid. In some embodiments, n is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15. In some embodiments, the linker comprises the amino acid sequence: SGSETPGTSESATPES (SEQ ID NO: 377), and SGGS (SEQ ID NO: 378). In some embodiments, the linker comprises the amino acid sequence: SGGSSGSETPGTSESATPESSGGS (SEQ ID NO: 561). In some embodiments, a linker comprises the amino acid sequence: SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 384). In some embodiments, a linker comprises the amino acid sequence: GGSGGSPGSPAGSPTSTEEGTSESATPESGPGTSTEPSEGSAPGSPAGSPTSTEEGTSTEPSE GSAPGTSTEPSEGSAPGTSESATPESGPGSEPATSGGSGGS (SEQ ID NO: 564).


In some embodiments, the linker is 24 amino acids in length. In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPES (SEQ ID NO: 343). In some embodiments, the linker is 40 amino acids in length. In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGSSGGSSGGS (SEQ ID NO: 391). In some embodiments, the linker is 64 amino acids in length. In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGSSGGSSGGSSGSETPGTSESATPESSGGSSGG S (SEQ ID NO: 392). In some embodiments, the linker is 92 amino acids in length. In some embodiments, the linker comprises the amino acid sequence PGSPAGSPTSTEEGTSESATPESGPGTSTEPSEGSAPGSPAGSPTSTEEGTSTEPSEGSAPGTS TEPSEGSAPGTSESATPESGPGSEPATS (SEQ ID NO: 393).


In some embodiments, the first and second nucleotide sequences are on the same nucleic acid vector. In some embodiments, the first and second nucleotide sequences are on different nucleic acid vectors. In some embodiments, the vector is a plasmid. In some embodiments, the nucleic acid vector is a recombinant genome of a adeno-associated virus (rAAV). In some embodiments, the nucleic acid vector is the genome of an adeno-associated virus packaged in a rAAV particle. In some embodiments, the first and/or the second nucleotide sequence is operably linked to a promoter. In some embodiments, the nucleic acid vector further comprise a nucleotide sequence encoding one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) gRNAs operably linked to a promoter. In some embodiments, the promoter is a constitutive promoter. In some embodiments, the promoter is an inducible promoter.


An inducible promoter of the present disclosure may be induced by (or repressed by) one or more physiological condition(s), such as changes in light, pH, temperature, radiation, osmotic pressure, saline gradients, cell surface binding, and the concentration of one or more extrinsic or intrinsic inducing agent(s). An extrinsic inducer signal or inducing agent may comprise, without limitation, amino acids and amino acid analogs, saccharides and polysaccharides, nucleic acids, protein transcriptional activators and repressors, cytokines, toxins, petroleum-based compounds, metal containing compounds, salts, ions, enzyme substrate analogs, hormones, or combinations thereof.


Inducible promoters of the present disclosure include any inducible promoter described herein or known to one of ordinary skill in the art. Examples of inducible promoters include, without limitation, chemically/biochemically-regulated and physically-regulated promoters such as alcohol-regulated promoters, tetracycline-regulated promoters (e.g., anhydrotetracycline (aTc)-responsive promoters and other tetracycline-responsive promoter systems, which include a tetracycline repressor protein (tetR), a tetracycline operator sequence (tetO) and a tetracycline transactivator fusion protein (tTA)), steroid-regulated promoters (e.g., promoters based on the rat glucocorticoid receptor, human estrogen receptor, moth ecdysone receptors, and promoters from the steroid/retinoid/thyroid receptor superfamily), metal-regulated promoters (e.g., promoters derived from metallothionein (proteins that bind and sequester metal ions) genes from yeast, mouse and human), pathogenesis-regulated promoters (e.g., induced by salicylic acid, ethylene or benzothiadiazole (BTH)), temperature/heat-inducible promoters (e.g., heat shock promoters), and light-regulated promoters (e.g., light responsive promoters from plant cells). Other inducible promoter systems are known in the art and may be used in accordance with the present disclosure.


In some embodiments, inducible promoters of the present disclosure function in prokaryotic cells (e.g., bacterial cells). Examples of inducible promoters for use prokaryotic cells include, without limitation, bacteriophage promoters (e.g. Pls icon, T3, T7, SP6, PL) and bacterial promoters (e.g., Pbad, PmgrB, Ptrc2, Plac/ara, Ptac, Pm), or hybrids thereof (e.g. PLlacO, PLtetO). Examples of bacterial promoters for use in accordance with the present disclosure include, without limitation, positively regulated E. coli promoters, such as positively regulated 670 promoters (e.g., inducible pBad/araC promoter, Lux cassette right promoter, modified lamdba Prm promote, plac Or2-62 (positive), pBad/AraC with extra REN sites, pBad, P(Las) TetO, P(Las) CIO, P(Rhl), Pu, FecA, pRE, cadC, hns, pLas, pLux), GS promoters (e.g., Pdps), 632 promoters (e.g., heat shock), and 654 promoters (e.g., glnAp2); negatively regulated E. coli promoters such as negatively regulated 670 promoters (e.g., Promoter (PRM+), modified lamdba Prm promoter, TetR-TetR-4C P(Las) TetO, P(Las) CIO, P(Lac) IQ, RecA_DlexO_DLacO1, dapAp, FecA, Pspac-hy, pcI, plux-cI, plux-lac, CinR, CinL, glucose controlled, modified Pr, modified Prm+, FecA, Pcya, rec A (SOS), Rec A (SOS), EmrR_regulated, BetI_regulated, pLac_lux, pTet_Lac, pLac/Mnt, pTet/Mnt, LsrA/cI, pLux/cI, LacI, LacIQ, pLacIQ1, pLas/cI, pLas/Lux, pLux/Las, pRecA with LexA binding site, reverse BBa_R0011, pLacI/ara-1, pLacIq, rrnB P1, cadC, hns, PfhuA, pBad/araC, nhaA, OmpF, RcnR), σS promoters (e.g., Lutz-Bujard LacO with alternative sigma factor σ38), σ32 promoters (e.g., Lutz-Bujard LacO with alternative sigma factor σ32), and σ54 promoters (e.g., glnAp2); negatively regulated B. subtilis promoters such as repressible B. subtilis σA promoters (e.g., Gram-positive IPTG-inducible, Xyl, hyper-spank) and σB promoters. Other inducible microbial promoters may be used in accordance with the present disclosure.


In some embodiments, inducible promoters of the present disclosure function in eukaryotic cells (e.g., mammalian cells). Examples of inducible promoters for use eukaryotic cells include, without limitation, chemically-regulated promoters (e.g., alcohol-regulated promoters, tetracycline-regulated promoters, steroid-regulated promoters, metal-regulated promoters, and pathogenesis-related (PR) promoters) and physically-regulated promoters (e.g., temperature-regulated promoters and light-regulated promoters).


Guide RNAs

The present disclosure further provides guide RNAs for use in accordance with the disclosed base editors and methods of editing. The disclosure provides guide RNAs that are designed to recognize target sequences. Such gRNAs may be designed to have guide sequences (or “spacers”) having complementarity to a protospacer within the target sequence. Guide RNAs are also provided for use with one or more of the disclosed fusion proteins, e.g., in the disclosed methods of editing a nucleic acid molecule. Such gRNAs may be designed to have guide sequences having complementarity to a protospacer within a target sequence to be edited, and to have backbone sequences that interact specifically with the napDNAbp domains of any of the disclosed nucleobase editors, such as Cas9 nickase domains of the disclosed nucleobase editors.


The disclosure further provides methods for editing a target nucleic acid molecule, e.g., a single nucleobase within a genome, with a nucleobase editor described herein, e.g., a split nucleobase editor. Such methods involve transducing (e.g., via transfection) cells with a plurality of complexes each comprising a fusion protein (e.g., a fusion protein comprising a Cas9 nickase (nCas9) domain and an adenosine deaminase domain) and a gRNA molecule. In some embodiments, the gRNA is bound to the napDNAbp domain (e.g., nCas9 domain) of the fusion protein. In some embodiments, each gRNA comprises a guide sequence of at least 10 contiguous nucleotides (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 contiguous nucleotides) that is complementary to a target sequence. In certain embodiments, the methods involve the transfection of nucleic acid constructs (e.g., plasmids) that each (or together) encode the components of a complex of fusion protein and gRNA molecule.


Some aspects of the invention relate to guide sequences (“guide RNA” or “gRNA”) that are capable of guiding a napDNAbp or a nucleobase editor comprising a napDNAbp to a target site, e.g. a target site in the NPC1 gene or TMC1 gene. Exemplary guide sequences suitable for targeting the NPC1 and Tmc1 genes and used in the experiments of Examples 1-4 are provided in Table 6 (SEQ ID NOs: 669-743). The guide RNA may be 15-100 nucleotides in length and comprise a sequence of at least 10, at least 15, or at least 20 contiguous nucleotides that is complementary to a target nucleotide sequence. The guide RNA may comprise a sequence of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 contiguous nucleotides that is complementary to a target nucleotide sequence.


In other aspects, the present specification provides complexes comprising the nucleobase editors described herein and a gRNA bound to the Cas9 domain of the fusion protein, such as a single guide RNA. In various embodiments, nucleobase editors (e.g., the split nucleobase editors provided herein) can be complexed, bound, or otherwise associated with (e.g., via any type of covalent or non-covalent bond) one or more guide sequences, i.e., the sequence which becomes associated or bound to the nucleobase editor and directs its localization to a specific target sequence having complementarity to the guide sequence or a portion thereof. The particular design aspects of a guide sequence will depend upon the nucleotide sequence of a genomic target site of interest (e.g., in human NPC) and the type of napDNA/RNAbp (e.g., type of Cas protein) present in the nucleobase editor, among other factors, such as PAM sequence locations, percent G/C content in the target sequence, the degree of microhomology regions, secondary structures, etc. Accordingly, in some embodiments, the disclosure provides compositions comprising complexes any of the disclosed nucleobase editors and a guide RNA comprising a guide sequence comprising a nucleotide sequence of any of SEQ ID NOs: 669-743. In some embodiments of the disclosed complexes, the guide RNA comprises a sequence that differs from any of SEQ ID NOs: 669-743 by no more than 1, 2, 3, or 4 nucleotides.


In some embodiments, the disclosure provides compositions comprising i) vectors encoding any of the disclosed nucleobase editors and ii) a guide RNA comprising a guide sequence comprising a nucleotide sequence of any of SEQ ID NOs: 669-743. In some embodiments, these vectors comprise i) a nucleic acid encoding an N-terminal portion of a split nucleobase editor, ii) a nucleic acid encoding a C-terminal portion of a split nucleobase editor, and iii) a guide RNA comprising a guide sequence comprising a nucleotide sequence of any of SEQ ID NOs: 669-743. In some embodiments of the disclosed vectors, the guide RNA comprises a sequence that differs from any of SEQ ID NOs: 669-743 by no more than 1, 2, 3, or 4 nucleotides.


The present disclosure also provides compositions of guide RNAs. In particular embodiments, the disclosure provides compositions of guide RNAs comprising a guide sequence comprising a nucleotide sequence of any of SEQ ID NOs: 669-743. The present disclosure also provides methods of editing target DNA sequences in an NPC1 gene or a TMC1 gene using compositions and/or complexes comprising any of the disclosed guide RNAs.


In some embodiments, a guide sequence is less than about 35, 30, 25, 20, 15, 12, or fewer nucleotides in length. The ability of a guide sequence to direct sequence-specific binding of a nucleobase editor to a target sequence may be assessed by any suitable assay. For example, the components of a nucleobase editor, including the guide sequence to be tested, may be provided to a host cell having the corresponding target sequence (e.g., a HGADFN 167 or HGADFN 188 cell line), such as by transfection with vectors encoding the components of a nucleobase editor disclosed herein, followed by an assessment of preferential cleavage within the target sequence. Similarly, cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a nucleobase editor, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible, and will occur to those skilled in the art.


In addition to the SDS, the gRNA comprises a scaffold sequence (corresponding to the tracrRNA in the native CRISPR/Cas system) that is required for its association with Cas9 (sometimes referred to herein as the “gRNA handle,” “gRNA core” or “gRNA backbone”). In various embodiments, the guide RNA scaffold binds an S. pyogenes Cas9. In other embodiments, the guide RNA scaffold binds an S. aureus Cas9. In some embodiments, the guide RNAs for use in accordance with the disclosed methods of editing comprise a backbone structure that is recognized by an S. pyogenes Cas9 protein or domain, such as an SpCas9 domain of the disclosed nucleobase editors. The backbone structure recognized by an SpCas9 protein may comprise the sequence 5′-[guide sequence]-guuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuuuu-3′ (SEQ ID NO: 443), wherein the guide sequence comprises a sequence that is complementary to the protospacer of the target sequence. See U.S. Publication No. 2015/0166981, published Jun. 18, 2015, the disclosure of which is incorporated by reference herein. In other embodiments, the guide RNAs for use in accordance with the disclosed methods of editing comprise a backbone structure that is recognized by an S. aureus Cas9 protein. The backbone structure recognized by an SaCas9 protein may comprise the sequence 5′-[guide sequence]-guuuuaguacucuguaaugaaaauuacagaaucuacuaaaacaaggcaaaaugccguguuuaucucgucaacuuguuggcgag auuuuuuu-3′ (SEQ ID NO: 565).


In other embodiments, the guide RNAs for use in accordance with the disclosed methods of editing comprise a backbone structure that is recognized by an Lachnospiraceae bacterium Cas12a protein. The backbone structure recognized by an LbCas12a protein may comprise the sequence 5′-[guide sequence]-uaauuucuacuaaguguagau-3′ (SEQ ID NO: 566). In other embodiments, the guide RNAs for use in accordance with the disclosed methods of editing comprise a backbone structure that is recognized by an Acidaminococcus sp. BV3L6 Cas12a protein. The backbone structure recognized by an AsCas12a protein may comprise the sequence 5′-[guide sequence]-uaauuucuacucuuguagau-3′ (SEQ ID NO: 567).


Other non-limiting, suitable gRNA scaffold sequences that may be used in accordance with the present disclosure are listed in Table 2. In other embodiments, the guide RNAs for use in accordance with the disclosed methods of editing comprise a backbone structure that comprises any of SEQ ID NOs: 359-361, 363, 366, 368, and 569-575.









TABLE 2







Guide RNA Handle Sequences









Organism
gRNA scaffold sequence
SEQ ID NO






S.pyogenes

GUUUAAGAGCUAUGCUGGAAAGCCACGGUGAA
359



AAAGUUCAACUAUUGCCUGAUCGGAAUAAAUU




UGAACGAUACGACAGUCGGUGCUUUUUUU







S.pyogenes

GUUUAAGAGCUAGAAAUAGCAAGUUUAAAUAA
360



GGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA




CCGAGUCGGUGCUUUUUU







S.

GUUUUUGUACUCUCAAGAUUCAAUAAUCUUGC
361



thermophilus

AGAAGCUACAAAGAUAAGGCUUCAUGCCGAAA



CRISPR1
UCAACACCCUGUCAUUUUAUGGCAGGGUGUUU




U







S.

GUUUUAGAGCUGUGUUGUUUGUUAAAACAACA
568



thermophilus

CAGCGAGUUAAAAUAAGGCUUAGUCCGUACUC



CRISPR3
AACUUGAAAAGGUGGCACCGAUUCGGUGUUUU




U







C.jejuni

AAGAAAUUUAAAAAGGGACUAAAAUAAAGAGU
363



UUGCGGGACUCUGCGGGGUUACAAUCCCCUAAA




ACCGCUUUU







F.novicida

AUCUAAAAUUAUAAAUGUACCAAAUAAUUAAU
569



GCUCUGUAAUCAUUUAAAAGUAUUUUGAACGG




ACCUCUGUUUGACACGUCUGAAUAACUAAAA







S.

UGUAAGGGACGCCUUACACAGUUACUUAAAUC
570



thermophilus2

UUGCAGAAGCUACAAAGAUAAGGCUUCAUGCC




GAAAUCAACACCCUGUCAUUUUAUGGCAGGGU




GUUUUCGUUAUUU







M.mobile

UGUAUUUCGAAAUACAGAUGUACAGUUAAGAA
366



UACAUAAGAAUGAUACAUCACUAAAAAAAGGC




UUUAUGCCGUAACUACUACUUAUUUUCAAAAU




AAGUAGUUUUUUUU







L.innocua

AUUGUUAGUAUUCAAAAUAACAUAGCAAGUUA
571



AAAUAAGGCUUUGUCCGUUAUCAACUUUUAAU




UAAGUAGCGCUGUUUCGGCGCUUUUUUU







S.pyogenes

GUUGGAACCAUUCAAAACAGCAUAGCAAGUUA
368



AAAUAAGGCUAGUCCGUUAUCAACUUGAAAAA




GUGGCACCGAGUCGGUGCUUUUUUU







S.mutans

GUUGGAAUCAUUCGAAACAACACAGCAAGUUA
572



AAAUAAGGCAGUGAUUUUUAAUCCAGUCCGUA




CACAACUUGAAAAAGUGCGCACCGAUUCGGUGC




UUUUUUAUUU







S.

UUGUGGUUUGAAACCAUUCGAAACAACACAGC
573



thermophilus

GAGUUAAAAUAAGGCUUAGUCCGUACUCAACU




UGAAAAGGUGGCACCGAUUCGGUGUUUUUUUU







N.

ACAUAUUGUCGCACUGCGAAAUGAGAACCGUU
574



meningitidis

GCUACAAUAAGGCCGUCUGAAAAGAUGUGCCG




CAACGCUCUGCCCCUUAAAGCUUCUGCUUUAAG




GGGCA







P.multocida

GCAUAUUGUUGCACUGCGAAAUGAGAGACGUU
575



GCUACAAUAAGGCUUCUGAAAAGAAUGACCGU




AACGCUCUGCCCCUUGUGAUUCUUAAUUGCAAG




GGGCAUCGUUUUU









In some embodiments, a guide sequence is selected to reduce the degree of secondary structure within the guide sequence. Secondary structure may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker and Stiegler (Nucleic Acids Res. 9 (1981), 133-148). Another example folding algorithm is the online webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see, e.g., A. R. Gruber et al., 2008, Cell 106(1): 23-24; and P A Carr & G M Church, 2009, Nature Biotechnology 27(12): 1151-62). Additional algorithms may be found in Chuai, G. et al., DeepCRISPR: optimized CRISPR guide RNA design by deep learning, Genome Biol. 19:80 (2018), and PCT Application No. PCT/US2018/065886 and U.S. Pat. No. 8,871,445, issued Oct. 28, 2014, the entireties of each of which are incorporated herein by reference.


In general, a tracr mate sequence includes any sequence that has sufficient complementarity with a tracr sequence to promote one or more of: (1) excision of a guide sequence flanked by tracr mate sequences in a cell containing the corresponding tracr sequence; and (2) formation of a complex at a target sequence, wherein the complex comprises the tracr mate sequence hybridized to the tracr sequence. In general, degree of complementarity is with reference to the optimal alignment of the tracr mate sequence and tracr sequence, along the length of the shorter of the two sequences. Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the tracr sequence or tracr mate sequence. In some embodiments, the degree of complementarity between the tracr sequence and tracr mate sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, the tracr sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length. In some embodiments, the tracr sequence and tracr mate sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a hairpin. Preferred loop forming sequences for use in hairpin structures are four nucleotides in length, and most preferably have the sequence GAAA. However, longer or shorter loop sequences may be used, as may alternative sequences. The sequences preferably include a nucleotide triplet (for example, AAA), and an additional nucleotide (for example C or G). Examples of loop forming sequences include CAAA and AAAG. In an embodiment of the invention, the transcript or transcribed polynucleotide sequence has at least two or more hairpins. In preferred embodiments, the transcript has two, three, four or five hairpins. In a further embodiment of the invention, the transcript has at most five hairpins. In some embodiments, the single transcript further includes a transcription termination sequence; preferably this is a polyT sequence, for example six T nucleotides. Further non-limiting examples of single polynucleotides comprising a guide sequence, a tracr mate sequence, and a tracr sequence are as follows (listed 5′ to 3′), where “N” represents a base of a guide sequence, the first block of lower case letters represent the tracr mate sequence, and the second block of lower case letters represent the tracr sequence, and the final poly-T sequence represents the transcription terminator: (1) NNNNNNNNgtttttgtactctcaagatttaGAAAtaaatcttgcagaagctacaaagataaggctt catgccgaaatcaacaccctgtcattttatggcagggtgttttcgttatttaaTTTTTT (SEQ ID NO: 201); (2) NNNNNNNNNNNNNNNNNNgtttttgtactctcaGAAAtgcagaagctacaaagataaggcttcatgccgaaatca acaccctgtcattttatggcagggtgttttcgttatttaaTTTTTT (SEQ ID NO: 202); (3) NNNNNNNNNNNNNNNNNNNNgtttttgtactctcaGAAAtgcagaagctacaaagataaggcttcatgccgaaatca acaccctgtcattttatggcagggtgtTTTTT (SEQ ID NO: 203); (4) NNNNNNNNNNNNNNNNNNNNgttttagagctaGAAAtagcaagttaaaataaggctagtccgttatcaacttgaaaa agtggcaccgagtcggtgcTTTTTT (SEQ ID NO: 204); (5) NNNNNNNNNNNNNNNNNNNNgttttagagctaGAAATAGcaagttaaaataaggctagtccgttatcaacttgaa aaagtgTTTTTTT (SEQ ID NO: 205); and (6) NNNNNNNNNNNNNNNNNNNNgttttagagctagAAATAGcaagttaaaataaggctagtccgttatcaTTTTTT TT (SEQ ID NO: 206). In some embodiments, sequences (1) to (3) are used in combination with Cas9 from S. thermophilus CRISPR. In some embodiments, sequences (4) to (6) are used in combination with Cas9 from S. pyogenes. In some embodiments, the tracr sequence is a separate transcript from a transcript comprising the tracr mate sequence.


It will be apparent to those of skill in the art that in order to target any of the fusion proteins comprising a Cas9 domain and a deaminase, as disclosed herein, to a target site to be edited, it is typically necessary to co-express the fusion protein together with a guide RNA, e.g., an sgRNA. As explained in more detail elsewhere herein, a guide RNA typically comprises a tracrRNA framework allowing for Cas9 binding, and a guide sequence, which confers sequence specificity to the Cas9:nucleic acid editing enzyme/domain fusion protein.


Recombinant Adeno-Associated Viral (rAAV) Vectors


Some aspects of the present disclosure relate to using recombinant adeno-associated virus vectors for the delivery of a split Cas9 protein or a split nucleobase editor into a cell. The N-terminal portion of the Cas9 protein or the nucleobase editor and the C-terminal portion of the Cas9 protein or the nucleobase editor are delivered by separate rAAV vectors or particles into the same cell, since the full-length Cas9 protein or nucleobase editors exceeds the packaging limit of rAAV (˜4.9 kb).


As such, in some embodiments, a composition for delivering the split Cas9 protein or split nucleobase editor into a cell (e.g., a mammalian cell, a human cell) is provided. In some embodiments, the composition of the present disclosure comprises: (i) a first recombinant adeno-associated virus (rAAV) particle comprising a first nucleotide sequence encoding a N-terminal portion of a Cas9 protein or nucleobase editor fused at its C-terminus to an intein-N; and (ii) a second recombinant adeno-associated virus (rAAV) particle comprising a second nucleotide sequence encoding an intein-C fused to the N-terminus of a C-terminal portion of the Cas9 protein or nucleobase editor. The rAAV particles of the present disclosure comprise a rAAV vector (i.e., a recombinant genome of the rAAV) encapsidated in the viral capsid proteins.


In some embodiments, any of the disclosed rAAV vectors encoding the N-terminal portions or the C-terminal portions of the split nucleobase editors may comprise a nucleotide sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of the sequences depicted in FIGS. 26A-26U (SEQ ID NOs: 642-653). In particular embodiments, the disclosed rAAV vectors comprise a nucleotide sequence that is at least 90% identical to any one of the sequences set forth as SEQ ID NOs: 642-653. In some embodiments, the disclosed rAAV vectors comprise a nucleotide sequence that comprises any one of the sequences of SEQ ID NOs: 642-653.


In some embodiments, any of the disclosed nucleic acid molecules encoding an N-terminal portion of a nucleobase editor fused at its C-terminus to an intein-N may comprise a nucleotide sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of the nucleotide sequences of SEQ ID NOs: 642, 644, 646, 648, 650, and 652. In some embodiments, any of the disclosed nucleic acid molecules encoding an N-terminal portion of a nucleobase editor may comprise a nucleotide sequence that differs by about 10, 15, 20, 25, 30, 35, 40, 45, 50, or more than 50 nucleotides from any one of the sequences of SEQ ID NOs: 642, 644, 646, 648, 650, and 652. In particular embodiments, any of the disclosed nucleic acid molecules encoding an N-terminal portion of a nucleobase editor fused at its C-terminus to an intein-N may comprise a nucleotide sequence that comprises any one of the sequences of SEQ ID NOs: 642, 644, 646, 648, 650, and 652.


In some embodiments, any of the disclosed nucleic acid molecules encoding a C-terminal portion of a nucleobase editor fused at its N-terminus to an intein-C may comprise a nucleotide sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of the nucleotide sequences of SEQ ID NOs: 643, 645, 647, 649, 651, and 653. In some embodiments, any of the disclosed nucleic acid molecules encoding a C-terminal portion of a nucleobase editor may comprise a nucleotide sequence that differs by about 10, 15, 20, 25, 30, 35, 40, 45, 50, or more than 50 nucleotides from any one of the sequences of SEQ ID NOs: 643, 645, 647, 649, 651, and 653. In particular embodiments, any of the disclosed nucleic acid molecules encoding an N-terminal portion of a nucleobase editor fused at its C-terminus to an intein-N may comprise a nucleotide sequence that comprises any one of the sequences of SEQ ID NOs: 643, 645, 647, 649, 651, and 653.


In some embodiments, the disclosure provides compositions comprising a first nucleic acid molecule encoding a C-terminal portion of a nucleobase editor fused at its N-terminus to an intein-C that comprises a nucleotide sequence that is at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of the nucleotide sequences of SEQ ID NOs: 642, 644, 646, 648, 650, and 652; and a second nucleic acid molecule encoding a C-terminal portion of a nucleobase editor fused at its N-terminus to an intein-C that comprises a nucleotide sequence that is at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to any one of the nucleotide sequences of SEQ ID NOs: 643, 645, 647, 649, 651, and 653. In particular embodiments, the compositions comprise a first nucleic acid molecule that comprises any one of the nucleotide sequences of SEQ ID NOs: 642, 644, 646, 648, 650, and 652, and a second nucleic acid molecule that comprises any one of the nucleotide sequences of SEQ ID NOs: 643, 645, 647, 649, 651, and 653. The disclosure also provides rAAV particles comprising any of the first nucleic acid molecules and second nucleic acid molecules described herein.


In some embodiments, the rAAV vector comprises: (1) a heterologous nucleic acid region comprising the first or second nucleotide sequence encoding the N-terminal portion or C-terminal portion of a split Cas9 protein or a split nucleobase editor in any form as described herein, (2) one or more nucleotide sequences comprising a sequence that facilitates expression of the heterologous nucleic acid region (e.g., a promoter), and (3) one or more nucleic acid regions comprising a sequence that facilitate integration of the heterologous nucleic acid region (optionally with the one or more nucleic acid regions comprising a sequence that facilitates expression) into the genome of a cell. In some embodiments, viral sequences that facilitate integration comprise Inverted Terminal Repeat (ITR) sequences. In some embodiments, the first or second nucleotide sequence encoding the N-terminal portion or C-terminal portion of a split Cas9 protein or a split nucleobase editor is flanked on each side by an ITR sequence. In some embodiments, the nucleic acid vector further comprises a region encoding an AAV Rep protein as described herein, either contained within the region flanked by ITRs or outside the region. The ITR sequences can be derived from any AAV serotype (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) or can be derived from more than one serotype. In some embodiments, the ITR sequences are derived from AAV2, AAV8, AAV9, or AAV6.


Thus, in some embodiments, the rAAV particles disclosed herein comprise at least one rAAV2 particle, rAAV6 particle, rAAV8 particle, rPHP.B particle, rPHP.eB particle, or rAAV9 particle, or a variant thereof. In particular embodiments, the disclosed rAAV particles are rPHP.B particles, rPHP.eB particles, rAAV9 particles.


ITR sequences and plasmids containing ITR sequences are known in the art and commercially available (see, e.g., products and services available from Vector Biolabs, Philadelphia, Pa.; Cellbiolabs, San Diego, Calif.; Agilent Technologies, Santa Clara, Ca; and Addgene, Cambridge, Mass.; and Gene delivery to skeletal muscle results in sustained expression and systemic delivery of a therapeutic protein. Kessler P D, Podsakoff G M, Chen X, McQuiston S A, Colosi P C, Matelis L A, Kurtzman G J, Byrne B J. Proc Natl Acad Sci USA. 1996 Nov. 26; 93(24):14082-7; and Curtis A. Machida. Methods in Molecular Medicine™. Viral Vectors for Gene Therapy Methods and Protocols. 10.1385/1-59259-304-6:201 © Humana Press Inc. 2003. Chapter 10. Targeted Integration by Adeno-Associated Virus. Matthew D. Weitzman, Samuel M. Young Jr., Toni Cathomen and Richard Jude Samulski; U.S. Pat. Nos. 5,139,941 and 5,962,313, all of which are incorporated herein by reference). Exemplary ITR sequences are provided below.









AAV2:


(SEQ ID NO: 576)


TTGGCCACTCCCTCTCTGCGCGCTCGCTCGCTCACTGAGGCCGGGCGAC


CAAAGGTCGCCCGACGCCCGGGCTTTGCCCGGGCGGCCTCAGTGAGCGA


GCGAGCGCGCAGAGAGGGAGTGGCCAACTCCATCACTAGGGGTTCCT





AAV3:


(SEQ ID NO: 577)


TTGGCCACTCCCTCTATGCGCACTCGCTCGCTCGGTGGGGCCTGGCGAC


CAAAGGTCGCCAGACGGACGTGCTTTGCACGTCCGGCCCCACCGAGCGA


GCGAGTGCGCATAGAGGGAGTGGCCAACTCCATCACTAGAGGTATGGC





AAV5:


(SEQ ID NO: 578)


CTCTCCCCCCTGTCGCGTTCGCTCGCTCGCTGGCTCGTTTGGGGGGGTG


GCAGCTCAAAGAGCTGCCAGACGACGGCCCTCTGGCCGTCGCCCCCCCA


AACGAGCCAGCGAGCGAGCGAACGCGACAGGGGGGAGAGTGCCACACTC


TCAAGCAAGGGGGTTTTGTA





AAV6:


(SEQ ID NO: 389)


TTGCCCACTCCCTCTATGCGCGCTCGCTCGCTCGGTGGGGCCTGCGGAC


CAAAGGTCCGCAGACGGCAGAGCTCTGCTCTGCCGGCCCCACCGAGCGA


GCGAGCGCGCATAGAGGGAGTGGGCAACTCCATCACTAGGGGTA






In some embodiments, the rAAV vector of the present disclosure comprises one or more regulatory elements to control the expression of the heterologous nucleic acid region (e.g., promoters, transcriptional terminators, and/or other regulatory elements). In some embodiments, the first and/or second nucleotide sequence is operably linked to one or more (e.g., 1, 2, 3, 4, 5, or more) transcriptional terminators. Non-limiting examples of transcriptional terminators that may be used in accordance with the present disclosure include transcription terminators of the bovine growth hormone gene (bGH), human growth hormone gene (hGH), SV40, CW3, ϕ, or combinations thereof. The efficiencies of several transcriptional terminators have been tested to determine their respective effects in the expression level of the split Cas9 protein or the split nucleobase editor (e.g., see FIG. 4). In some embodiments, the transcriptional terminator used in the present disclosure is a bGH transcriptional terminator. In some embodiments, the rAAV vector further comprises a Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE). In certain embodiments, the WPRE is a truncated WPRE sequence, such as W3. In some embodiments, the WPRE is inserted 5′ of the transcriptional terminator.


In some embodiments, the composition comprising the rAAV particle (in any form contemplated herein) further comprises a pharmaceutically acceptable carrier. In some embodiments, the composition is formulated in appropriate pharmaceutical vehicles for administration to human or animal subjects.


Some examples of materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids (23) serum component, such as serum albumin, HDL and LDL; (22) C2-C12 alcohols, such as ethanol; and (23) other non-toxic compatible substances employed in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservative and antioxidants can also be present in the formulation. The terms such as “excipient”, “carrier”, “pharmaceutically acceptable carrier” or the like are used interchangeably herein.


Methods of Treatment and Uses

Other aspects of the present disclosure provide methods of delivering the split Cas9 protein or the split nucleobase editor into a cell to form a complete and functional Cas9 protein or nucleobase editor. For example, in some embodiments, a cell is contacted with a composition described herein (e.g., compositions comprising nucleotide sequences encoding the split Cas9 or the split nucleobase editor or AAV particles containing nucleic acid vectors comprising such nucleotide sequences). In some embodiments, the contacting results in the delivery of such nucleotide sequences into a cell, wherein the N-terminal portion of the Cas9 protein or the nucleobase editor and the C-terminal portion of the Cas9 protein or the nucleobase editor are expressed in the cell and are joined to form a complete Cas9 protein or a complete nucleobase editor.


It should be appreciated that any rAAV particle, nucleic acid molecule or composition provided herein may be introduced into the cell in any suitable way, either stably or transiently. In some embodiments, the disclosed proteins may be transfected into the cell. In some embodiments, the cell may be transduced or transfected with a nucleic acid molecule. For example, a cell may be transduced (e.g., with a virus encoding a split protein), or transfected (e.g., with a plasmid encoding a split protein) with a nucleic acid molecule that encodes a split protein, or an rAAV particle containing a viral genome encoding one or more nucleic acid molecules. Such transduction may be a stable or transient transduction. In some embodiments, cells expressing a split protein or containing a split protein may be transduced or transfected with one or more guide RNA sequences, for example in delivery of a split Cas9 (e.g., nCas9) protein. In some embodiments, a plasmid expressing a split protein may be introduced into cells through electroporation, transient (e.g., lipofection) and stable genome integration (e.g., nucleofection or piggybac) and viral transduction or other methods known to those of skill in the art.


In some aspects, the invention provides methods comprising delivering one or more base editor-encoding polynucleotides, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a cell using a non-viral delivery method. Methods of non-viral delivery of nucleic acids include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™ and Lipofectin™). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Feigner, WO 1991/17424; WO 1991/16024. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration).


In certain embodiments, the compositions provided herein comprise a lipid and/or polymer. In certain embodiments, the lipid and/or polymer is cationic. The preparation of such lipid particles is well known. See, e.g. U.S. Pat. Nos. 4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; 4,921,757; and 9,737,604, each of which is incorporated herein by reference.


In some embodiments, the target nucleotide sequence is a DNA sequence in a genome, e.g. a eukaryotic genome. In certain embodiments, the target nucleotide sequence is in a mammalian (e.g. a human) genome.


The target nucleotide sequence may comprise a target sequence (e.g., a point mutation) associated with a disease, disorder, or condition. The target sequence may comprise a T to C (or A to G) point mutation associated with a disease, disorder, or condition, and wherein the deamination of the mutant C base results in mismatch repair-mediated correction to a sequence that is not associated with a disease, disorder, or condition. The target sequence may otherwise comprise a G to A (or C to T) point mutation associated with a disease, disorder, or condition, and wherein the deamination of the mutant A base results in mismatch repair-mediated correction to a sequence that is not associated with a disease, disorder, or condition. The target sequence may encode a protein, and where the point mutation is in a codon and results in a change in the amino acid encoded by the mutant codon as compared to a wild-type codon. The target sequence may also be at a splice site, and the point mutation results in a change in the splicing of an mRNA transcript as compared to a wild-type transcript. In addition, the target may be at a non-coding sequence of a gene, such as a promoter, and the point mutation results in increased or decreased expression of the gene.


Thus, in some aspects, the deamination of a mutant C results in a change of the amino acid encoded by the mutant codon, which in some cases can result in the expression of a wild-type amino acid. In other aspects, the deamination of a mutant A results in a change of the amino acid encoded by the mutant codon, which in some cases can result in the expression of a wild-type amino acid.


The methods described herein involving contacting a cell with a composition or rAAV particle can occur in vitro, ex vivo, or in vivo. In certain embodiments, the step of contacting occurs in a subject. In certain embodiments, the subject has been diagnosed with a disease, disorder, or condition.


In some embodiments, the methods disclosed herein involve contacting a mammalian cell with a composition or rAAV particle. In particular embodiments, the methods involve contacting a retinal cell, cortical cell or cerebellar cell.


The split Cas9 protein or split nucleobase editor delivered using the methods described herein preferably have comparable activity compared to the original Cas9 protein or nucleobase editor (i.e., unsplit protein delivered to a cell or expressed in a cell as a whole). For example, the split Cas9 protein or split nucleobase editor retains at least 50% (e.g., at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%) of the activity of the original Cas9 protein or nucleobase editor. In some embodiments, the split Cas9 protein or split nucleobase editor is more active (e.g., 2-fold, 5-fold, 10-fold, 100-fold, 1000-fold, or more) than that of an original Cas9 protein or nucleobase editor.


The compositions described herein may be administered to a subject in need thereof in a therapeutically effective amount to treat and/or prevent a disease or disorder the subject is suffering from. Any disease or disorder that maybe treated and/or prevented using CRISPR/Cas9-based genome-editing technology may be treated by the split Cas9 protein or the split nucleobase editor described herein. It is to be understood that, if the nucleotide sequences encoding the split Cas9 protein or the nucleobase editor does not further encode a gRNA, a separate nucleic acid vector encoding the gRNA may be administered together with the compositions described herein.


Exemplary suitable diseases, disorders or conditions include, without limitation the disease or disorder is selected from the group consisting of: cystic fibrosis, phenylketonuria, epidermolytic hyperkeratosis (EHK), chronic obstructive pulmonary disease (COPD), Charcot-Marie-Toot disease type 4J, neuroblastoma (NB), von Willebrand disease (vWD), myotonia congenital, hereditary renal amyloidosis, dilated cardiomyopathy, hereditary lymphedema, familial Alzheimer's disease, prion disease, chronic infantile neurologic cutaneous articular syndrome (CINCA), congenital deafness, Niemann-Pick disease type C (NPC) disease, and desmin-related myopathy (DRM). In particular embodiments, the disease or condition is Niemann-Pick disease type C (NPC) disease.


In some embodiments, the disease, disorder or condition is associated with a point mutation in an NPC1 gene, a DNMT1 gene, a PCSK9 gene, or a TMC1 gene. In certain embodiments, the point mutation is a T3182C mutation in NPC1, which results in an I1061T amino acid substitution.


In certain embodiments, the point mutation is an A545G mutation in TMC1, which results in a Y182C amino acid substitution. TMC1 encodes a protein that forms mechanosensitive ion channels in sensory hair cells of the inner ear and is required for normal auditory function. The Y182C amino acid substitution is associated with congenital deafness.


In some embodiments, the disease, disorder or condition is associated with a point mutation that generates a stop codon, for example, a premature stop codon within the coding region of a gene.


Additional exemplary diseases, disorders and conditions include cystic fibrosis (see, e.g., Schwank et al., Functional repair of CFTR by CRISPR/Cas9 in intestinal stem cell organoids of cystic fibrosis patients. Cell stem cell. 2013; 13: 653-658; and Wu et. al., Correction of a genetic disease in mouse via use of CRISPR-Cas9. Cell stem cell. 2013; 13: 659-662, neither of which uses a deaminase fusion protein to correct the genetic defect); phenylketonuria—e.g., phenylalanine to serine mutation at position 835 (mouse) or 240 (human) or a homologous residue in phenylalanine hydroxylase gene (T>C mutation)—see, e.g., McDonald et al., Genomics. 1997; 39:402-405; Bernard-Soulier syndrome (BSS)—e.g., phenylalanine to serine mutation at position 55 or a homologous residue, or cysteine to arginine at residue 24 or a homologous residue in the platelet membrane glycoprotein IX (T>C mutation)—see, e.g., Noris et al., British Journal of Haematology. 1997; 97: 312-320, and Ali et al., Hematol. 2014; 93: 381-384; epidermolytic hyperkeratosis (EHK)—e.g., leucine to proline mutation at position 160 or 161 (if counting the initiator methionine) or a homologous residue in keratin 1 (T>C mutation)—see, e.g., Chipev et al., Cell. 1992; 70: 821-828, see also accession number P04264 in the UNIPROT database at www[dot]uniprot[dot]org; chronic obstructive pulmonary disease (COPD)—e.g., leucine to proline mutation at position 54 or 55 (if counting the initiator methionine) or a homologous residue in the processed form of α1-antitrypsin or residue 78 in the unprocessed form or a homologous residue (T>C mutation)—see, e.g., Poller et al., Genomics. 1993; 17: 740-743, see also accession number P01011 in the UNIPROT database; Charcot-Marie-Toot disease type 4J—e.g., isoleucine to threonine mutation at position 41 or a homologous residue in FIG. 4 (T>C mutation)—see, e.g., Lenk et al., PLoS Genetics. 2011; 7: e1002104; neuroblastoma (NB)—e.g., leucine to proline mutation at position 197 or a homologous residue in Caspase-9 (T>C mutation)—see, e.g., Kundu et al., 3 Biotech. 2013, 3:225-234; von Willebrand disease (vWD)—e.g., cysteine to arginine mutation at position 509 or a homologous residue in the processed form of von Willebrand factor, or at position 1272 or a homologous residue in the unprocessed form of von Willebrand factor (T>C mutation)—see, e.g., Lavergne et al., Br. J. Haematol. 1992, see also accession number P04275 in the UNIPROT database; 82: 66-72; myotonia congenital—e.g., cysteine to arginine mutation at position 277 or a homologous residue in the muscle chloride channel gene CLCN1 (T>C mutation)—see, e.g., Weinberger et al., The J. of Physiology. 2012; 590: 3449-3464; hereditary renal amyloidosis—e.g., stop codon to arginine mutation at position 78 or a homologous residue in the processed form of apolipoprotein AII or at position 101 or a homologous residue in the unprocessed form (T>C mutation)—see, e.g., Yazaki et al., Kidney Int. 2003; 64: 11-16; dilated cardiomyopathy (DCM)—e.g., tryptophan to Arginine mutation at position 148 or a homologous residue in the FOXD4 gene (T>C mutation), see, e.g., Minoretti et. al., Int. J. of Mol. Med. 2007; 19: 369-372; hereditary lymphedema—e.g., histidine to arginine mutation at position 1035 or a homologous residue in VEGFR3 tyrosine kinase (A>G mutation), see, e.g., Irrthum et al., Am. J. Hum. Genet. 2000; 67: 295-301; familial Alzheimer's disease—e.g., isoleucine to valine mutation at position 143 or a homologous residue in presenilinl (A>G mutation), see, e.g., Gallo et. al., J. Alzheimer's disease. 2011; 25: 425-431; Prion disease—e.g., methionine to valine mutation at position 129 or a homologous residue in prion protein (A>G mutation)—see, e.g., Lewis et. al., J. of General Virology. 2006; 87: 2443-2449; chronic infantile neurologic cutaneous articular syndrome (CINCA)—e.g., Tyrosine to Cysteine mutation at position 570 or a homologous residue in cryopyrin (A>G mutation)—see, e.g., Fujisawa et. al. Blood. 2007; 109: 2903-2911; and desmin-related myopathy (DRM)—e.g., arginine to glycine mutation at position 120 or a homologous residue in αβ crystallin (A>G mutation)—see, e.g., Kumar et al., J. Biol. Chem. 1999; 274: 24137-24141. The entire contents of all references and database entries is incorporated herein by reference.


Suitable routes of administrating the composition for pain suppression include, without limitation: topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, gingival, intradental, intracochlear, transtympanic, intraorgan, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseus, periocular, intratumoral, intracerebral, parenteral, and intracerebroventricular administration.


The compositions of this disclosure may be administered or packaged as a unit dose, for example. The term “unit dose” when used in reference to a pharmaceutical composition of the present disclosure refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent, i.e., a carrier or vehicle.


Treatment of a disease or disorder includes delaying the development or progression of the disease, or reducing disease severity. Treating the disease does not necessarily require curative results.


As used therein, “delaying” the development of a disease means to defer, hinder, slow, retard, stabilize, and/or postpone progression of the disease. This delay can be of varying lengths of time, depending on the history of the disease and/or individuals being treated. A method that “delays” or alleviates the development of a disease, or delays the onset of the disease, is a method that reduces probability of developing one or more symptoms of the disease in a given time frame and/or reduces extent of the symptoms in a given time frame, when compared to not using the method. Such comparisons are typically based on clinical studies, using a number of subjects sufficient to give a statistically significant result.


“Development” or “progression” of a disease means initial manifestations and/or ensuing progression of the disease. Development of the disease can be detectable and assessed using standard clinical techniques as well known in the art. However, development also refers to progression that may be undetectable. For purpose of this disclosure, development or progression refers to the biological course of the symptoms. “Development” includes occurrence, recurrence, and onset.


As used herein “onset” or “occurrence” of a disease includes initial onset and/or recurrence. Conventional methods, known to those of ordinary skill in the art of medicine, can be used to administer the isolated polypeptide or pharmaceutical composition to the subject, depending upon the type of disease to be treated or the site of the disease.


In some aspects, the present disclosure provides uses of any one of the split nucleobase editors described herein and a guide RNA targeting this nucleobase editor to a target in the manufacture of a medicament. In some aspects, uses of any one of the nucleobase editors and guide RNAs described herein are provided in the manufacture of a kit for base editing, wherein the base editing comprises contacting the nucleic acid molecule with the split nucleobase editor and guide RNA under conditions suitable for the substitution of the adenine (A) of a A:T nucleobase pair in the target with a guanine (G), or for the substitution of the cytosine (C) of a C:T nucleobase pair in the target with a thymine (T). In some embodiments, the step of contacting of induces separation of the double-stranded DNA at a target region. In some embodiments, the step of contacting further comprises nicking one strand of the double-stranded DNA, wherein the one strand comprises an unmutated strand.


In some embodiments of the described uses, the step of contacting is performed in vitro. In other embodiments, the step of contacting is performed in vivo. In some embodiments, the step of contacting is performed in a subject (e.g., a human subject or a non-human animal subject). In some embodiments, the step of contacting is performed in a cell, such as a human or non-human animal cell.


The present disclosure also provides uses of any one of the nucleobase editors or any one of the complexes of nucleobase editors and guide RNAs described herein as a medicament. The present disclosure also provides uses of the described pharmaceutical compositions or cells comprising, and vectors or rAAV particles encoding, any of the disclosed nucleobase editors or complexes herein as a medicament. In particular embodiments, the medicament is for treatment of Niemann-Pick disease type C (NPC) disease, congenital deafness, or hearing loss.


Kits

The compositions of the present disclosure may be assembled into kits. In some embodiments, the kit comprises nucleic acid vectors for the expression of the nucleobase editors described herein. In some embodiments, the kit further comprises appropriate guide nucleotide sequences (e.g., gRNAs) or nucleic acid vectors for the expression of such guide nucleotide sequences, to target the Cas9 protein or nucleobase editor to the desired target sequence.


The kit described herein may include one or more containers housing components for performing the methods described herein and optionally instructions for use. Any of the kit described herein may further comprise components needed for performing the assay methods. Each component of the kits, where applicable, may be provided in liquid form (e.g., in solution) or in solid form, (e.g., a dry powder). In certain cases, some of the components may be reconstitutable or otherwise processible (e.g., to an active form), for example, by the addition of a suitable solvent or other species (for example, water), which may or may not be provided with the kit.


In some embodiments, the kits may optionally include instructions and/or promotion for use of the components provided. As used herein, “instructions” can define a component of instruction and/or promotion, and typically involve written instructions on or associated with packaging of the disclosure. Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit, for example, audiovisual (e.g., videotape, DVD, etc.), Internet, and/or web-based communications, etc. The written instructions may be in a form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which can also reflect approval by the agency of manufacture, use or sale for animal administration. As used herein, “promoted” includes all methods of doing business including methods of education, hospital and other clinical instruction, scientific inquiry, drug discovery or development, academic research, pharmaceutical industry activity including pharmaceutical sales, and any advertising or other promotional activity including written, oral and electronic communication of any form, associated with the disclosure. Additionally, the kits may include other components depending on the specific application, as described herein.


The kits may contain any one or more of the components described herein in one or more containers. The components may be prepared sterilely, packaged in a syringe and shipped refrigerated. Alternatively it may be housed in a vial or other container for storage. A second container may have other components prepared sterilely. Alternatively the kits may include the active agents premixed and shipped in a vial, tube, or other container.


The kits may have a variety of forms, such as a blister pouch, a shrink wrapped pouch, a vacuum sealable pouch, a sealable thermoformed tray, or a similar pouch or tray form, with the accessories loosely packed within the pouch, one or more tubes, containers, a box or a bag. The kits may be sterilized after the accessories are added, thereby allowing the individual accessories in the container to be otherwise unwrapped. The kits can be sterilized using any appropriate sterilization techniques, such as radiation sterilization, heat sterilization, or other sterilization methods known in the art. The kits may also include other components, depending on the specific application, for example, containers, cell media, salts, buffers, reagents, syringes, needles, a fabric, such as gauze, for applying or removing a disinfecting agent, disposable gloves, a support for the agents prior to administration, etc.


Host Cells

Cells that may contain any of the compositions described herein include prokaryotic cells and eukaryotic cells. The methods described herein are used to deliver a Cas9 protein or a nucleobase editor into a eukaryotic cell (e.g., a mammalian cell, such as a human cell). In some embodiments, the cell is in vitro (e.g., cultured cell. In some embodiments, the cell is in vivo (e.g., in a subject such as a human subject). In some embodiments, the cell is ex vivo (e.g., isolated from a subject and may be administered back to the same or a different subject).


Mammalian cells of the present disclosure include human cells, primate cells (e.g., vero cells), rat cells (e.g., GH3 cells, OC23 cells) or mouse cells (e.g., MC3T3 cells). There are a variety of human cell lines, including, without limitation, human embryonic kidney (HEK) cells, HeLa cells, cancer cells from the National Cancer Institute's 60 cancer cell lines (NCI60), DU145 (prostate cancer) cells, Lncap (prostate cancer) cells, MCF-7 (breast cancer) cells, MDA-MB-438 (breast cancer) cells, PC3 (prostate cancer) cells, T47D (breast cancer) cells, THP-1 (acute myeloid leukemia) cells, U87 (glioblastoma) cells, SHSY5Y human neuroblastoma cells (cloned from a myeloma) and Saos-2 (bone cancer) cells. In some embodiments, rAAV vectors are delivered into human embryonic kidney (HEK) cells (e.g., HEK 293 or HEK 293T cells). In some embodiments, rAAV vectors are delivered into stem cells (e.g., human stem cells) such as, for example, pluripotent stem cells (e.g., human pluripotent stem cells including human induced pluripotent stem cells (hiPSCs)). A stem cell refers to a cell with the ability to divide for indefinite periods in culture and to give rise to specialized cells. A pluripotent stem cell refers to a type of stem cell that is capable of differentiating into all tissues of an organism, but not alone capable of sustaining full organismal development. A human induced pluripotent stem cell refers to a somatic (e.g., mature or adult) cell that has been reprogrammed to an embryonic stem cell-like state by being forced to express genes and factors important for maintaining the defining properties of embryonic stem cells (see, e.g., Takahashi and Yamanaka, Cell 126 (4): 663-76, 2006, incorporated by reference herein). Human induced pluripotent stem cell cells express stem cell markers and are capable of generating cells characteristic of all three germ layers (ectoderm, endoderm, mesoderm).


Additional non-limiting examples of cell lines that may be used in accordance with the present disclosure include 293-T, 293-T, 3T3, 4T1, 721, 9L, A-549, A172, A20, A253, A2780, A2780ADR, A2780cis, A431, ALC, B16, B35, BCP-1, BEAS-2B, bEnd.3, BHK-21, BR 293, BxPC3, C2C12, C3H-10T1/2, C6, C6/36, Cal-27, CGR8, CHO, CML T1, CMT, COR-L23, COR-L23/5010, COR-L23/CPR, COR-L23/R23, COS-7, COV-434, CT26, D17, DH82, DU145, DuCaP, E14Tg2a, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, Hepa1c1c7, High Five cells, HL-60, HMEC, HT-29, HUVEC, J558L cells, Jurkat, JY cells, K562 cells, KCL22, KG1, Ku812, KYO1, LNCap, Ma-Mel 1, 2, 3 . . . 48, MC-38, MCF-10A, MCF-7, MDA-MB-231, MDA-MB-435, MDA-MB-468, MDCK II, MG63, MONO-MAC 6, MOR/0.2R, MRCS, MTD-1A, MyEnd, NALM-1, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NW-145, OPCN/OPCT Peer, PNT-1A/PNT 2, PTK2, Raji, RBL cells, RenCa, RIN-5F, RMA/RMAS, S2, Saos-2 cells, Sf21, Sf9, SiHa, SKBR3, SKOV-3, T-47D, T2, T84, THP1, U373, U87, U937, VCaP, WM39, WT-49, X63, YAC-1 and YAR cells.


Without further elaboration, it is believed that one skilled in the art can, based on the above description, utilize the present disclosure to its fullest extent. The following specific embodiments are, therefore, to be construed as merely illustrative, and not limitative of the remainder of the disclosure in any way whatsoever. All publications cited herein are incorporated by reference for the purposes or subject matter referenced herein.


EXAMPLES

In order that the invention described herein may be more fully understood, the following examples are set forth. The synthetic examples described in this application are offered to illustrate the compounds and methods provided herein and are not to be construed in any way as limiting their scope.


Example 1: AAV Delivery of Split Nucleobase Editor

This study was designed to show that a nucleobase editor may be delivered by recombinant AAV (rAAV) in two sections, which may be joined to form a complete and active nucleobase editor in cells via protein splicing. Different elements of the rAAV constructs were tested for optimized nucleobase editor expression and activity.


Recombinant AAV (rAAV) is widely used for transgene delivery. Transgenes were inserted into the AAV genome between the inverted terminal repeat (ITR) sequences and packaged into AAV viral particles, which are used to transduce a host cell (e.g., mammalian cell, human cell). However, there is a limitation on the size of the transgene that may be packaged into rAAV, typically approximately 4.9 kilobases. Nucleic acids encoding a nucleobase editor (e.g., cytosine deaminase-dCas9-UGI) typically exceed the packaging limit of rAAV. As described herein, the nucleic acids encoding a nucleobase editor were split (see FIG. 1A), and each section was packaged into a separate rAAV particle. The two sections of the nucleobase editor were delivered to the cells and can be ligated to form a complete nucleobase editor via protein splicing (e.g., mediated by an intein, such as the DnaE intein; see FIG. 1C). The ligated, complete nucleobase editor was active in editing target bases (see FIG. 1B). The rAAV constructs encoding the split nucleobase editors were tested in different cell lines, e.g., U118 and HEK293T, and are active in editing the target base (see FIGS. 3A-3B and FIGS. 5A-5B).


Different transcriptional terminators and nuclear localization signals (NLS) were tested in the rAAV constructs to optimize the expression and activity of the nucleobase editors (see FIGS. 4, 6, and 7).


Example 2: Editing of DNMT1 Gene in Mouse Neuron Using AAV Encoded Split Nucleobase Editor

This study was designed to test the base editing activity of an AAV encoded split nucleobase editor in vivo. A split nucleobase editor as shown in FIG. 1A was used. The amino acid sequence of the linker between the dCas9 domain and the deaminase domain is SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 384). A guide RNA targeting a well-characterized site in the DNMT1 gene was selected. It was expected that the cells would be able to tolerate the editing. These experiments aim to determine whether AAV encoded split nucleobase editor can edit the locus in vitro or in vivo in several cell types including primary neurons.


In one experiment, AAV vectors encoding the split nucleobase editor and a guide RNA targeting DNMT1 were used to transduce dissociated mouse cortical neurons, two days after the cortical neurons were isolated and cultured. The neurons were harvested 16 days post transduction and the DNMT1 gene was sequenced (FIG. 8A) to determine editing efficiency as well as off-target effects. An editing efficiency of 17.34% (C to T editing, darker grey in FIG. 8B) was detected, while only 0.82% of undesired editing (C to G or C to A change, lighter grey in FIG. 8B) was detected.


In another experiment, cultured mouse Neuro-2 cells were either transduced with AAV vectors encoding the split nucleobase editor and a guide RNA targeting DNMT1, or transfected with lipid-encapsulated DNA encoding the nucleobase editor and guide RNA, allowing direct comparison of editing efficiency using different delivery methods of the nucleobase editor (FIG. 9A). An editing efficiency of 5.96% (C to T editing, dark grey in FIG. 9B) was observed for AAV encoded split nucleobase editor, while an editing efficiency of 27.3% (C to T editing, dark grey in FIG. 9B) was observed for lipid-transfected DNA encoded nucleobase editor. The amount of undesired products was 0.15% for AAV encoded split nucleobase editor and 1.3% for lipid-transfected DNA encoded nucleobase editor (C to G or C to A change, lighter grey in FIG. 9B).


Example 3: AAV-Mediated Central Nervous System, Liver, Heart, and Muscle Delivery of Cytosine and Adenine Nucleobase Editors
Results
Development of a Split-Intein Approach to CBE and ABE Reconstitution

It was reasoned that the use of a trans-splicing intein would enable CBE and ABE to be divided into halves that are each smaller than the AAV packaging size limit, enabling dual AAV packaging of nucleobase editors (FIG. 10A). To generate a split-intein CBE, each split DnaE intein half from Nostoc punctiforme (Npu)18 was fused to each half of the original CBE BE3, dividing BE3 within the S. pyogenes Cas9 domain15,19 immediately before Cys 574 or Thr 638. It was observed that dividing BE3 just before Cys 574 with the split Npu intein (referred to hereafter as the Npu-BE3 construct), resulted in robust on-target base editing (34±6.4% average editing by high-throughput sequencing among unsorted cells targeting six genomic loci, FIG. 10B) in HEK293T cells following co-transfection of plasmids expressing each split half, plus a third plasmid expressing sgRNA. Notably, target C.G-to-T.A editing efficiency was higher, rather than lower, than editing levels following transfection of a plasmid expressing an intact BE3, which resulted in an average of 22±7.9% editing across the six sites (FIGS. 10B and 10C), indicating that intein splicing at Cys 574 does not limit editing efficiency in this system. It is believed that higher expression levels of each split-intein nucleobase editor half, relative to that of the much larger intact nucleobase editor proteins, may account for increased editing from split-intein nucleobase editors. Interestingly, the second tested BE3 split site, ahead of Thr 638, did not support robust base editing (averaging 10±10% editing across six sites) even though both split sites support Cas9 nuclease activity15, suggesting that nucleobase editors impose additional requirements for productive intein splicing or productive editing compared to Cas9 nuclease.


After identifying a BE3 split site that does not impair base editing efficiencies following intein splicing, split-intein CBE performance was optimized. The performance of the Npu split intein was compared with that of Cfa, a synthetic split intein developed from the consensus sequences of fast-splicing DnaE homologs from a variety of organisms20. Npu-BE3 outperformed Cfa-BE3, which resulted in 25±10% average base editing (FIGS. 10B and 10C). To incorporate recent architectural improvements in the newer BE4 nucleobase editor5, as well as improved expression and nuclear localization of BE4max6, Npu-BE4 constructs were generated and two codon usages were tested. Consistent with the recent report6, it was observed that codon and nuclear localization signal (NLS) optimization of Npu-BE4max resulted in higher base editing efficiencies than Npu-BE4 using IDT codon optimization (44±4.2% editing vs. 26±3.0% editing, FIG. 10D). It was also found that the second UGI domain did not increase the editing efficiency of Npu-BE4max; a single UGI in the BEmax architecture yields 48±3.0% editing (FIGS. 10D and 10E). In light of these results, the second UGI was omitted from future AAV constructs to minimize viral genome size, resulting in a spliced NLS- and codon-optimized APOBEC-Cas9 nickase-UGI construct that is referred to hereafter as CBE3.9max.


Using the Cys 574 Cas9 split site and the Npu split intein, a split optimized adenine nucleobase editor (Npu-ABEmax) construct was also generated that reconstitutes ABEmax6 activity to edit a test site in the mouse DNMT1 gene (63±5.4% A.T-to-G.C editing from Npu-ABEmax, compared to 63±6.3% editing from non-split ABEmax, FIG. 10F). Finally, seven split sites were screened in S. aureus Cas9-BE3 (SaBE3)21, and a site was identified immediately before Cys 535 that fully recapitulated unsplit SaBE3 activity in HEK293T cells (FIGS. 16A-16F). A recent report demonstrated that another intein split site, preceding Ser 740, reconstitutes full-length SaCas9 nuclease activity and supports split Sa-BE3 activity in vivo22. Together, these results establish optimized split-intein CBE and ABE halves that, upon protein splicing, reconstitute cytosine and adenine nucleobase editors with no apparent loss in editing efficiency.


Development of Split-Intein CBE and ABE AAV

After developing a viable way to divide both classes of nucleobase editors into split intein-fused halves, a series of AAV particles was generated and characterized to optimize base editing efficiency and minimize AAV genome size to support efficient AAV production23. Several post-transcriptional regulatory element sequences (PREs) and sgRNA positions were tested in the context of AAV, rather than plasmid delivery, to maximize the in vivo relevance of the optimization process.


To avoid effects specific to cultured cells, PHP.B24 was used, which is an evolved AAV variant that efficiently crosses the blood-brain barrier in mice, to test PRE variants in the mouse CNS. 1×1011 vg of PHP.B-CMV-eGFP-NLS was delivered into 8-week-old mice by retro-orbital injection, and harvested brain tissue for imaging after a 3-week incubation. W3, a truncated Woodchuck hepatitis virus PRE (WPRE) sequence25, increased PHP.B-delivered GFP-NLS expression levels in the brain ˜19-fold compared to no regulatory sequence (FIGS. 11A-11E). This increase in payload gene expression was comparable to the increase from using the full-length WPRE sequence (20-fold; FIGS. 11A-11C), but W3 is 350 bp smaller than full-length WPRE.


Although the tendency of the CMV promoter to be silenced over time in vivo may be beneficial for some genome editing applications by minimizing off-target editing opportunities19,26,27, silencing was avoided to maximize editing efficiency in this initial study. The Cbh promoter is a ubiquitous, constitutive promoter that is less sensitive to silencing in vivo than the CMV promoter28. Exemplary nucleobase editor AAV constructs therefore contained the W3 sequence, Npu intein, and Cbh promoter, which is referred to hereafter as v3 AAV. To optimize split-base editor AAV configurations, murine 3T3 cells were transduced with dual v3 AAV-PHP.B encoding split-CBE3.9 and a validated sgRNA targeting the mouse DNMT1 locus29. DNMT1 acts redundantly with DNMT3a in the mammalian brain30 and is therefore well-suited for proof-of-concept studies. A dose of 2×1011 viral genomes (vg) of v3 AAV per well of 50,000 NIH 3T3 cells, using a 1:1 ratio of the two AAVs, resulted in 14±4.8% C.G-to-T.A editing at the DNMT1 locus. NLS- and codon-optimized CBE3.9max constructs, termed v4 AAV-CBE3.9max, improved C.G-to-T.A editing efficiency to 37±18%, a 2.6-fold increase relative to unoptimized v3 AAV CBE3.9 (FIGS. 11D and 11E).


After optimizing PRE, promoter, NLS, and codon usage, the impact of different guide RNA placements and orientations were tested within the AAV genome. Guide RNA transcription efficiency is known to be sensitive to proximity and orientation relative to AAV ITRs31. Moving the U6-sgRNA cassette to the 3′ end of the viral genome and reversing its orientation31, yielding v5 AAV, improved C.G-to-T.A editing efficiency a further 1.5-fold relative to v4 AAV, for a total 3.9-fold total improvement compared to the initial v3 AAV constructs (56±12% for v5 AAV-CBE3.9max versus 14±4.8% for v3 AAV-CBE3.9). These transduction experiments were repeated at a lower virus dose, 2×1010 vg per well, and observed 14-fold higher C.G-to-T.A editing efficiency for v5 AAV compared to v3 AAV, and 5.6-fold higher editing for v5 AAV compared to v4 AAV (1.7±0.73% for v3 AAV-CBE3.9, 4.1±2.2% for v4 AAV-CBE3.9max, and 23±5.2% for v5 AAV-CBE3.9max) (FIGS. 11D and 11E). Based on these results, the optimized v5 AAV architecture was used for all subsequent experiments.


Next the performance of the optimized AAV split-intein nucleobase editor constructs was characterized in vivo. AAV9 is reported to transduce tissues including liver, skeletal muscle, heart, and CNS32-34. Dual AAV9 particles were generated in the v5 AAV architecture encoding the optimized split CBE3.9max (FIG. 11D) or ABEmax nucleobase editors (FIG. 17), together with a guide RNA programmed to install a point mutation in DNMT1, resulting in A8T for CBE3.9max, and a silent mutation for ABEmax. Systemic (retro-orbital) injections of v5 AAV9-CBEmax or v5 AAV9-ABEmax were performed in 6- to 9-week-old C57BL/6 mice. Four weeks after injection of 2×1012 vg total per mouse, DNMT1 editing was measured in the heart, skeletal muscle, brain, liver, lung, kidney, spleen, and reproductive organs. Following a single dual-AAV injection, both split-intein ABE and CBE v5 AAVs resulted in substantial whole-organ base editing of heart (CBE: 15±3.8% C.G-to-T.A editing efficiency in unsorted cells; ABE: 20±1.4% A.T-to-G.C editing efficiency in unsorted cells) skeletal muscle (CBE: 4.4±2.4%, ABE: 9.2±4.0%), and liver (CBE: 21±17%; ABE: 38±2.9%) (FIGS. 12A and 12B), three organs that are reported to be transduced by AAV9. Consistent with the previously reported intravenous transduction profile of AAV935, there was little editing in lung, kidney, spleen, and reproductive organs, and no detectable editing in harvested sperm (FIGS. 18A-18C). Together, these results establish that AAV9 delivery of split-intein CBE and ABE enables efficient in vivo base editing in tissues known to be transduced by AAV9.


A recent study by Ryu, Kim and coworkers reported AAV-mediated delivery of ABE split by trans-mRNA splicing8. The rAAV constructs reported in Ryu et al.8 were modified to enable direct comparison by replacing the muscle-specific Spc5-12 promoter with the Cbh promoter for ubiquitous expression, and replacing the DMD-targeting sgRNA with the DNMT1-targeting sgRNA. To directly compare the efficiency of AAV-delivered nucleobase editors reconstituted through split intein-mediated splicing, versus trans-mRNA splicing, trans-mRNA splicing constructs were generated with the DNMT1-targeting sgRNA and Cbh promoter. In side-by-side comparisons measuring base editing in three tissues, split intein-spliced v5 AAV ABE on average provided 4.5-fold higher base editing efficiencies than trans-RNA-spliced ABE (FIG. 12D). These results suggest that intein-mediated nucleobase editor protein splicing is more efficient than nucleobase editor mRNA trans-splicing. This efficiency difference may arise from the requirements of AAV genome concatamerization36 followed by transcription and splicing of the ITR sequences, which have been reported to destabilize pre-mRNA37, for successful trans-mRNA splicing.


Notably, base editing efficiencies in heart and skeletal muscle from split-intein AAV9 constructs (FIGS. 12A-12D) are comparable to or higher than gene rescue efficiencies reported to improve phenotypes in DMD animal models38,39, and editing in the liver is above the correction thresholds required for phenotypic improvement in several inborn errors of metabolism40-42. These findings suggest that the split-AAV nucleobase editor systems reported here may be suitable for developing treatments to correct animal models of human genetic diseases. It is further noted that these constructs have been optimized for general editing efficiency, and not for application-specific improvements including tissue- or cell type-specific promoters, which could further improve specificity and activity in therapeutically relevant cells. Tissues that are not well-transduced by intravenous AAV9 injections may be transduced by other existing AAV variants, such as AAV4 transduction of the lung43, or by different delivery routes, such as AAV9 transduction of kidney cells by retrograde ureteral infusion44.


Recently, Villiger et al. developed an intein-split S. aureus CBE (see Villiger, L. et al. Nature Medicine 24, 1519-1525 (2018), incorporated herein by reference). To compare those constructs to the v5 constructs described herein, a v5 S. aureus CBE using intein-split SaBE3.9max was generated, which has the same NLS- and codon optimizations as the S. pyogenes Npu-BE3.9max construct, and was cloned into the v5 AAV architecture. Then, dual AAV genomes in AAV8 were packaged with an sgRNA designed to generate the PCSK9 W8X mutation31, 3-week-old mice were injected either 1×1011 or 1×1012 total vg per animal retro-orbitally, and liver tissue was harvested for high-throughput sequencing 4 weeks after injection. The Villiger constructs were modified only by replacement of the liver-specific P3 promoter with Cbh, and the Pah-targeting guide with PCKS9 W8X. At the higher dose, the constructs performed comparably (v5 AAV saCBE: 20±0.9% W8X-encoding alleles; Villiger saCBE: 18±1.6% W8X-encoding alleles). At the lower dose, however, no reduction in editing by the v5 AAV saCBE constructs (25±6.0% W8X alleles) was observed, but a substantial reduction in the editing efficiency of the Villiger constructs (8.2±3.2% W8X alleles) (FIG. 18C) was observed. It was concluded that the higher 1×1012 vg dose reaches an editing ceiling due to processes extrinsic to the nucleobase editor, such as host DNA repair processes or cell state-specific factors. At the lower dose of the Villiger constructs, the nucleobase editor itself is limiting. These results demonstrate that the v5 AAV saCBE constructs can outperform the corresponding constructs developed by Villiger.


Base Editing in CNS by Split-Intein CBE and ABE AAV

The above results establish an in vivo CBE and ABE delivery solution for somatic tissues transduced following systemic AAV injection. Delivery to the central nervous system (CNS), however, is especially challenging. Although AAV9 has been reported45 to cross the blood-brain barrier and transduce CNS cells, minimal editing was observed in the brain following adult retro-orbital injection (FIGS. 12A-12D). To enable in vivo base editing of cells in the CNS, three complementary approaches were explored. First, neonatal cerebroventricular (P0 ICV) injections were performed. Similar to intrathecal injections currently used to deliver nusinersin to treat spinal muscular atrophy (SMA) patients46, ICV injections are direct injections into cerebrospinal fluid. Second, retro-orbital injections were performed in six-week-old mice using split-intein nucleobase editor AAV based on PHP.eB, a laboratory-evolved AAV9 variant with improved ability to penetrate the blood-brain barrier in C57BL/6 mice47-49. Finally, subretinal injections were performed to directly transduce retinal tissue, given that AAV-mediated retinal transduction has already been shown to treat ocular disorders11.


For all CNS delivery experiments, dual split-intein CBE or ABE v5 AAV targeting DNMT1 were combined together with an AAV encoding a Cbh promoter-driven nuclear membrane-localized GFP-KASH29 fusion to enable FACS isolation of cells with GFP-positive nuclei. Sorting for GFP-positive cells enriches cell types that are transducible by AAV and that can transcribe genes from the Cbh promoter. This enrichment is especially useful in the CNS, where the heterogeneity of interspersed cell types limits enrichment from physical dissection alone. For example, in the cerebellum, only Purkinje cells, comprising less than 1% of total cerebellar tissue50,51, are well-transduced by known AAV variants at P052,53. These neurons, however, are critically important as their degeneration causes a number of cerebellar ataxias54,55. FACS isolation facilitates quantification of editing in this sparse population, as shown by comparison of editing among sorted and unsorted cell populations (FIGS. 13A-13F).


To determine optimal AAV variants for P0 ICV injections, 4×1010 vg total of v5 CBE AAV was co-injected with 1×1010 vg of KASH-GFP (FIG. 13A). Four AAV variants were tested that were hypothesized to efficiently transduce CNS cells following these neonatal direct brain injections: AAV8 and AAV9, which have both been reported to transduce neurons following P0 injections52, and laboratory-evolved PHP.B and PHP.eB AAV variants24,47, which efficiently transduce CNS tissue in older animals. Measurements of GFP-positive nuclei by flow cytometry showed that in cortical tissue, transduction percentages varied from 43±2.2% (AAV8) to 65±4.4% (PHP.eB). In cerebellar tissue, none of the four serotypes efficiently transduced cells (AAV8: 0.8±0.4%; AAV9: 2.7±0.7%; PHP.B: 1.6±0.2%; PHP.eB: 2.5±0.5%) (FIG. 13B). The low transduction in cerebellum is consistent with previous reports that Purkinje cells represent nearly all cerebellar neurons transduced following P0 injections52,53,56. To confirm that transduced cerebellar cells were Purkinje neurons, L7-GFP mice, which express cytoplasmic GFP in Purkinje neurons, were injected with an mCherry-expressing AAV9 construct, and observed robust transduction only in GFP-positive cells (FIGS. 19A-19B). Importantly, most Purkinje cells were transduced, suggesting that GFP-positive nuclei reflect a relatively large and unbiased sample of the overall Purkinje cell population. Taken together, these results suggest that all four variants transduce CNS cells with comparable efficiency.


Next, cerebellar and cortical tissue were sequenced. In cortex, it was found that all four tested AAV variants mediated comparable and efficient C.G-to-T.A base editing among GFP-positive cells (65-70% base editing), as well as among unsorted cells (32-50% base editing) (FIG. 13C). In cerebellum, all four AAV variants again resulted in comparable and efficient base editing (FIG. 13C), resulting in 35-52% editing among GFP-positive cells. Since Purkinje cells form the vast majority of transduced cerebellar cells52,53,56 but represent only a small percentage of cerebellar tissue, base editing in unsorted cerebellar tissue was inefficient as expected, ranging from 0.52% (AAV8) to 2.5% (AAV9).


Having demonstrated cytosine base editing in the brain with v5 AAV-CBE3.9max, adenine base editing was tested with v5 AAV-delivered ABEmax. Since all AAV variants tested produced similar CBE3.9max base editing efficiencies, P0 ICV injections of split-intein ABEmax were characterized using only AAV9. It was observed that AAV9-delivered split-intein ABEmax edited cortex with high efficiency (87±4.0% A.T-to-G.C editing among GFP-positive cells; 43±9.1% editing among unsorted cells) and cerebellum (64±5.6% among GFP-positive cells; 1.3±0.5% among unsorted cells, consistent with the small percentage of Purkinje neurons in cerebellum) (FIG. 13D).


Although direct CNS injections resulted in robust base editing in the brain, it was also sought to determine whether peripheral delivery of AAV via intravenous injection might efficiently edit the CNS, since intravenous injections offer substantial convenience, cost, and safety advantages. 4×1012 vg of v5 AAV-PHP.eB encoding CBE3.9max mixed with 2×1011 vg GFP-KASH were injected retro-orbitally into nine-week old animals (FIG. 13E). After 3-4 weeks, brain tissue was harvested and sorted. Highly efficient C.G-to-T.A base editing was observed in cortex (74±1.2% among GFP-positive cells, and 59±3.0% among unsorted cells) and cerebellum (70±2.6% among GFP-positive cells, and 35±3.0% among unsorted cells; FIG. 13F). These data indicated that, in contrast to P0 ICV injection, intravenous injection of PHP.eB AAV in adult mice results in robust base editing in unsorted cerebellar tissue, likely due to an increase in the types of cells transduced in adult tissue following expression of AAV receptor proteins. Unlike the restrictive tropism observed at P0, in adult animals PHP.eB transduces several cell types in cerebellum including granule cells and Olig2+ oligodendrocytes24. Collectively, these findings establish high-efficiency cytosine and adenine base editing in the central nervous system of a mammal.


In Vivo Base Editing of Retinal Cells

Genome editing approaches to treating inherited ocular disorders are of special interest given the accessibility of the eye, its immune-privileged status, and the prevalence and impact of congenital blindness. Therefore, the ability of subretinal injections of split-intein ABEmax v5 AAV or split-intein CBE3.9max v5 AAV to efficiently base edit photoreceptors and other retinal cells was tested. Rhodopsin-Cre mice, which express Cre only in retinal rod photoreceptor cells, were bred to Ai9 mice57 to generate animals that express tdTomato only in rod photoreceptor cells. Subretinal injections of split-intein CBE3.9max or ABEmax dual AAV were performed, targeting DNMT1 in two-week old mice (FIG. 14A). Two AAV variants were tested: PHP.B, as used above for P0 injections, and Anc80, which contains a computationally reconstructed ancestral AAV capsid sequence58. PHP.B-Cbh-GFP or Anc80-Cbh-GFP was co-injected as a marker for transduced cells.


Three weeks post-injection, retinal cells were sorted into GFP+/tdTomato+ (transduced rods), GFP+/tdTomato− (marker transduced non-rods), GFP−/tdTomato+ (unmarked rods), or double-negative (unmarked non-rods) cells. PHP.B-GFP transduced 65±2.8% of rods and 9.6±1.4% of non-rods, while a 6-fold lower dose of Anc80-GFP transduced cells much less efficiently (FIG. 14B). When delivered at the same dose (5×109 vg), both PHP.B and Anc80 showed comparable transduction efficiency in the retina, and the majority of cells transduced by both variants were photoreceptors (FIG. 14C). Both PHP.B and Anc80 AAV efficiently delivered split-intein nucleobase editors into retinal cells, with PHP.B-mediated split-intein CBE3.9max resulting in 48±5.9% C.G-to-T.A editing among GFP+/tdTomato+ rod photoreceptors (19±8.7% among all tdTomato-positive rods), and Anc80-mediated split-intein ABEmax resulting in 37±22% A.T-to-G.C editing among GFR+/tdTomato+ rod photoreceptors (26±16% editing among all rod photoreceptor cells) (FIGS. 14D-14F). These editing efficiencies, even among unsorted PHP.B-transduced rod photoreceptors, are similar to the frequencies of wild-type alleles required to improve retinal function in mosaic Pde6b mutant mice59. The editing efficiencies observed are also comparable to those reported in preclinical data for EDIT-101, a single-vector AAV treatment for Leber congenital amaurosis that delivers Cas9 nuclease60, suggesting that dual-vector AAV co-transduction in retinal tissue can achieve therapeutically relevant editing efficiencies.


Interestingly, although ABE delivery generated very few indels in retinal cells, consistent with previous results from cultured cells4, and both ABE and CBE delivery in non-retinal tissues in the experiments described above generally resulted in base edit:indel ratios >10:1 (FIGS. 22A-22C), CBE delivery to retinal cells generated substantial indels, with base edit:indel ratios between 2:1 and 1:1. Despite the substantial frequency of indels, there was little overlap between indel-containing and base-edited alleles. Excluding indel-containing reads did not reduce the number of reads with C.G-to-T.A editing (FIGS. 20A-20B), indicating that base edited alleles in general do not contain indels. These observations suggest that CBE-mediated indels in retinal cells occur through uracil excision pathways that are mutually exclusive with pathways that lead to cytosine base editing outcomes, or that base edited or indel-containing products are poor substrates for subsequent indel-generating or base editing processes, respectively.


In Vivo Correction of a Causal Niemann-Pick Mutation in Mouse CNS

Integrating the above developments, AAV-mediated in vivo nucleobase editor delivery was applied to correct a mutation associated with human disease in the CNS of an animal. NPC1 mediates intracellular lipid transport, and loss-of-function mutations cause Niemann-Pick type C (NPC) disease, a neurodegenerative ataxia. NPC1 c.3182T>C (encoding Ile1061Thr) is the most prevalent mutation in humans that causes NPC1 disease61,62. Previous work suggests that Niemann-Pick disease is primarily a CNS disorder; genetic deletion of NPC1 in the CNS alone causes Niemann-Pick disease in mice63, while expression of wild-type NPC1 in the CNS alone prevents the disease64,65. Furthermore, deletion of NPC1 in Purkinje cells alone causes motor impairment66. Chimeric studies suggest that the death of Purkinje neurons is cell-autonomous and therefore amenable to mosaic rescue67. NPC1I1061T homozygous mice develop ataxia and have a reduced lifespan of approximately 17 weeks62.


To test if base editing of NPC1I1061T in the CNS might extend lifespan, P0 NPC1I1061T (c.3182T>C) homozygous mice were injected with 4×1010 or 1×1011 vg total CBE3.9max v5 AAV9 (2×1010 or 5×1010 vg of each AAV half) targeting the NPC1I1061T mutation and 1×1010 vg of KASH-GFP, which are referred to as low dose and medium dose, respectively. Base editing at this site should directly reverse the I1061T mutation back to wild-type NPC1 (FIG. 15A). Although no difference was found in lifespan between low-dose and untreated animals (FIG. 15B), medium-dose animals survived significantly longer than untreated animals (FIG. 15C, 12% longer median lifespan; χ2=4.631, df=1, p=0.031 by Mantel-Cox test). Animals were euthanized at the onset of morbidity to harvest brain tissue for high-throughput DNA sequencing, and GFP-positive cortical and cerebellar nuclei were sorted as described above (FIGS. 13A-13F).


To determine if v5 AAV9-CBE injection increases the number of surviving Purkinje neurons, a cohort of age-matched injected and untreated mice were compared at P98-P105, close to the lifespan of the untreated mice. In agreement with the observed lifespan extension, injection of AAV9 AAV-CBE increases the number of surviving Purkinje neurons, from 24% of wild-type to 38% of wild-type (uninjected, 5.1±1.2 Purkinje neurons per mm of Purkinje cell layer; injected, 8.0±0.8 PCs/mm; wild-type, 21.1±5.5 PCs/mm; uninjected vs. injected, p=0.03) (FIG. 15G). Quantitatively similar increases in Purkinje cell survival mediated by small molecules in NPC1−/− mice have previously been associated with lifespan increases similar to those that were observed80. These results demonstrate that AAV-mediated CNS base editing of NPC1 increases the survival of Purkinje neurons to an extent consistent with the lifespan increase of the treated mice. To further probe the possibility that NPC1 base editing improves cellular markers of NPC1 disease and to determine whether the CBE-mediated mosaic rescue might provide systemic benefits, CD68+ reactive microglia, a measure of CNS inflammation65,81 were examined. The density of CD68+ cells and total CD68+ tissue area in mice injected with AAV9 AAV-CBE was quantified, finding modest decreases in CD68+ tissue area in agreement with the modest increase in Purkinje cell survival (FIG. 15H, decrease from 19.9±0.05% to 16.7±0.08%; p=0.005. Single-channel images included in FIG. 28A). Although CD68+ cell density decreased from 913±26 to 850±30 cells/mm2, this difference was not statistically significant (FIG. 28B, p=0.15).


In animals given a low dose of v5 AAV, the NPC1I1061T mutation was corrected with 31±16% efficiency in unsorted cortical nuclei, and in 46±22% of GFP-positive nuclei. In cerebellum, editing of 0.4±0.5% was observed in unsorted tissue, and 11±8.4% in GFP-positive nuclei, which correspond to the critical Purkinje neuron population that must be edited to treat NPC1 disease. In medium-dose animals, cortical editing of 48±8.2% and 81±3.7% was observed in unsorted and sorted nuclei, respectively, and cerebellar editing of 0.3±0.2% and 42±14% of unsorted and sorted nuclei, respectively (FIG. 15D). In all cases, C-to-T editing without bystander edits or indels was predominant among edited alleles; over 94% of edited alleles cleanly correct the I1061T mutation and encode the wild-type allele (FIGS. 15E and 15F).


It was also determined whether off-target editing might occur in the sorted cerebellar and cortical nuclei. Candidate loci were identified using two methods: one method was utilizing CRISPOR, a bioinformatics method to predict off-target sites with Cas9 activity, and the second method was empirically determining off-target Cas9 loci using CIRCLE-seq on gDNA harvested from the liver of an untreated NPC1I1061T mouse. Amplicon sequencing was then performed to confirm editing at eight total candidate loci identified by either method. Only a single confirmed off-target site was observed, an intronic sequence in Epas1>3 kb away from the nearest exonic sequences, which was edited at a low efficiency of 0.3±0.05% (FIGS. 29A-29D).


Previous work with mosaic animals' has shown that approximately 30-40% wild-type cells are required for measurable phenotypic improvement. Since the above data suggest ˜11% Purkinje cell editing in low-dose animals with no lifespan extension, and ˜42% Purkinje cell editing in medium-dose animals with modest but significant lifespan extension, the results broadly agree with the modest lifespan gains observed in mosaic animal studies67. It is noted that unedited cells may have degenerated, and thus editing levels in sequenced tissue represent upper limits of the initial percentage of edited cells. To minimize the effect of degeneration on the frequency of edited cells, base editing was measured in heterozygous NPC1I1061T/+ mice, which do not show NPC1 disease phenotypes, following medium-dose P0 injections. At P29, it was found that 31±5.8% of GFP-positive cerebellar nuclei were edited, which increased to 54±10% at P110. In sorted cortical nuclei, the percent of edited cells increased from 59±5.4% to 82±7.2% (FIGS. 21A-21B), suggesting that C.G to T.A editing continues for more than four weeks after P0 injection.


To test whether CBE is chronically expressed, NPC1+/+ mice were injected with v5 AAV-CBE at P0 and brains were harvested at P110 for staining against Cas9 and GFP. Expression of both Cas9 and GFP was observed at P110 in cerebellar and cortical tissue (FIGS. 21B-21C), suggesting that, consistent with previous studies, AAV mediates long-term neuronal transgene expression. Although the above data are consistent with a prolonged editing activity window, and though NPC1+/− heterozygotes do not have any cellular markers of disease67, the possibility that the apparent continued editing in heterozygotes may simply be the result of a survival advantage in edited cells cannot be ruled out.


These results establish that dual AAV split-intein nucleobase editor delivery in Niemann-Pick type C mice directly corrects a substantial fraction of pathogenic alleles in the CNS. Together, these results demonstrate for the first time base editing to treat an animal model of a human CNS disease, correcting the causal mutation and prolonging lifespan.


Discussion

This study describes an optimized dual AAV system that delivers split-intein cytosine and adenine nucleobase editors, resulting in therapeutically relevant in vivo genome editing efficiencies following injection of ˜1013-1014 vg/kg, a dosage comparable to those currently used in human gene therapy trials32. The optimizations described above greatly improve the efficiency of AAV-encoded nucleobase editors and may also be useful to other AAV-based systems for the delivery of genome editing agents8,22. Many somatic cell types of therapeutic and scientific interest can be efficiently transduced with known AAV variants, including hematopoetic cells68, liver69, sensory organs11, and CNS32, suggesting that this work may facilitate a broad range of studies in animal models of many human genetic diseases. Finally, different injection routes were tested to deliver AAV-packaged split-base editors in postnatal mice and demonstrate, for the first time, efficient base editing in brain and retina, enabling causal gene correction and partial phenotypic rescue of Niemann-Pick type C disease.


The mouse studies described here use AAV injections of no more than 4×1012 vg per 20-g animal, which corresponds to a maximum dose of 2×1014 vg/kg, consistent with the maximum dosages delivered intravenously in non-human primate studies' and clinical trials32 for CNS delivery. Notably, in the eye, subretinal injections of the optimized nucleobase editor AAVs achieve genome editing efficiencies comparable to those of preclinical delivery systems optimized for retinal editing60. Intravenous v5 AAV injections also achieve therapeutically relevant editing levels in liver, muscle, and cardiac tissue. The viral base editing systems developed in this study therefore are suitable for testing base editing strategies in animal models of human disease, a key step in advancing base editing towards human therapeutic application. AAV optimization (FIGS. 11A-11E) reduced the viral dose required for efficient base editing to amounts known to be tolerated by humans, enabling more practical and therapeutically relevant editing in animal models of human genetic diseases compared to the much higher doses previously used in trans-splicing mRNA viral vectors8.


While it was initially anticipated that the requirement of simultaneous transduction by two viruses would sharply lower editing efficiencies, the surprisingly high overall in vivo editing efficiencies observed even among unsorted cells (for example, up to 59% of cortex), together with similar levels of transduction of single AAVs expressing GFP (FIG. 13B) strongly suggest that transducible cells are particularly amenable to transduction by multiple AAVs. Editing efficiency may be further increased by tissue-specific optimization such as selection of a delivery route that biases AAV concentrations towards relevant tissues, such as hepatic artery injections to transduce liver71, and tissue-specific promoter and terminator variation to enhance expression in specific cell types.


The split-intein nucleobase editor delivery system developed here brings the strengths of base editing, including high editing efficiency, minimization of unwanted byproducts arising from double-stranded DNA breaks, and compatibility with post-mitotic somatic cells2,9, to in vivo settings in the diverse tissue types that are well-transduced by natural or engineered AAVs. The split-intein dual AAV approach described here may also facilitate the in vivo delivery of genes that are too large for a direct gene augmentation approach.


Methods
Cell Culture

HEK239T/17 (ATCC CRL-11268) and 3T3 cells (ATCC CRL-1658) were maintained in DMEM (Thermo Fisher 10569044) supplemented with 10% (v/v) fetal bovine serum (Thermo Fisher), at 37° C. with 5% CO2. Cells were verified to be free of mycoplasma by ATCC upon purchase, and periodically during culture.


HEK293T and 3T3 Transfection and Genomic DNA Preparation

HEK293T cells were seeded into 48-well Poly-D-Lysine-coated plates (Corning 354509) at 30,000 cells/well. One day after plating, cells were transfected by Lipofectamine 2000 (Thermo Fisher) according to the manufacturer's directions with 1 μg DNA in a 1:1 molar ratio of nucleobase editor and sgRNA plasmids, plus 10 ng of fluorescent protein expression plasmid as a transfection control. Cells were cultured for 3 days before genomic DNA was extracted by replacement of culture media with 100 μL lysis buffer (10 mM Tris-HCl, pH 7.5, 0.05% SDS, 25 μg/mL proteinase K (NEB) and 37° C. incubation for 1 hour. Proteinase K was inactivated by 30-minute incubation at 80° C. 3T3 cells were transfected using the same procedure at 50,000 cells/well.


Western Blotting

HEK293T cells were seeded into 12-well plates at 125,000 cells per well. Cells were transfected as described above with all amounts scaled up 3x. For conditions with transfection of only one split-half, EGFP-expressing plasmid was used to normalize the amount of DNA used. 3 days after transfection, cells were gently lifted and triturated by pipetting PBS across the well surface. 10% of the volume was removed for HTS analysis, and the remaining cells were washed with ice-cold PBS, and incubated on ice for 15 minutes in lysis buffer (300 mM NaCl, 50 mM Tris pH 8, 1% IGEPAL 0.5% deoxycholic acid, 10 mM MgCl) plus 25 U/mL salt active nuclease (Arcticzymes 70910-202) to reduce lysate viscosity and complete EDTA-free protease inhibitor cocktail (Roche). After 10 minutes, SDS and EDTA were added to 0.5% and 1 mM, respectively, and lysates were rocked an additional 15 minutes at 4° C. before clarification by centrifugation at 14,000 g for 15 minutes at 4° C. Lysates were normalized using BCA (Pierce BCA Protein Assay Kit), and 2.5 mg of reduced protein was loaded onto each gel lane. Transfer was performed with an iBlot 2 dry blotting system (Thermo Fisher) using the following program: 20 V for 1 minute, then 23 V for 4 minutes, then 25 V for 2 minutes for a total transfer time of 7 minutes. Blocking was performed at room temperature for 30 minutes with block buffer: 1% BSA in TBST (150 mM NaCl, 0.5% Tween-20, 50 mM Tris-Cl, pH 7.5). Membranes were then incubated in primary antibody diluted in block buffer at 4° C. overnight. After a wash step, secondary antibodies diluted in TBST were added. Membranes were washed again and imaged using a LI-COR Odyssey. Wash. steps were 3×5 minute washes in TBST. Primary antibodies used were rabbit anti-GAPDH, 1:1000 (Cell Signaling Technologies D16H11); rabbit anti-HA, 1:1000 (Cell Signaling Technologies C29F4), mouse anti-FLAG 1 μg/mL (clone M2, Sigma F1804). LI-COR IRDye 680RD goat anti-rabbit (#926-68071) and goat antimouse (#926-68070) secondary antibodies were used at 1:10,000-1:20,000 dilutions.


High-Throughput Sequencing and Data Analysis

Genomic DNA was amplified by qPCR using Phusion Hot Start II DNA polymerase with use of SYBR gold for quantification. 3% DMSO was added to all gDNA PCR reactions. To minimize PCR bias, reactions were stopped during the exponential amplification phase. 1 uL of the unpurified gDNA PCR product was used as a template for subsequent barcoding PCR (8 cycles, annealing temperature 61° C.). Pooled barcoding PCR products were gel-extracted (Min-elute columns, Qiagen) and quantified by qPCR (KAPA KK4824) or Qubit dsDNA HS assay kit (Thermo Fisher). Sequencing of pooled amplicons was performed using an Illumina MiSeq according to the manufacturer's instructions. All oligonucleotide sequences used for gDNA amplification are provided in FIGS. 25A-25B.


Initial de-multiplexing and FASTQ generation were performed by bcl2fastq2 running on BaseSpace (Illumina) with the following flags: --ignore-missing-bcls --ignore-missing-filter --ignore-missing-positions --ignore-missing-controls --auto-set-to-zero-barcode-mismatches -- find-adapters-with-sliding-window --adapter-stringency 0.9--mask-short-adapter-reads 35--minimum-trimmed-read-length 35. Alignment of fastq files and quantification of editing frequency was performed by CRISPResso2 in batch mode with the following flags: --min_bp_quality_or_N 20--base_editor_output -p 2-w 20-wc -10.


AAV Production

AAV production was performed as previously described24 with some alterations. HEK293T/17 cells were maintained in DMEM/10% FBS without antibiotic in 150 mm dishes (Thermo Fisher 157150), and passaged every 2-3 days. Cells for production were split 1:3 1 day before PEI transfection. 5.7 μg AAV genome, 11.4 μg pHelper (Clontech), and 22.8 μg rep-cap plasmid were transfected per plate. 1 day after transfection, media was exchanged for DMEM/5% FBS. 3 days after transfection, cells were scraped with a rubber cell scraper (Corning), pelleted by centrifugation for 10 minutes at 2000 g, resuspended in 500 μL hypertonic lysis buffer per plate (40 mM Tris base, 500 mM NaCl, 2 mM MgCl2 with 100 U/mL salt active nuclease (Arcticzymes 70910-202), and incubated at 37° C. for 1 h to lyse cells.


Media was decanted, combined with a 5× solution of 40% PEG in 2.5 M NaCl (final concentration 8% PEG/500 mM NaCl), incubated on ice for 2 hours to facilitate PEG precipitation, and centrifuged at 3200 g for 40 minutes. The supernatant was discarded and the pellet resuspended in 500 μL lysis buffer per plate and added to the cell lysate. Incubation at 37° C. was continued for 30 minutes. Crude lysates were either incubated at 4° C. overnight or directly used for ultracentrifugation.


Cell lysates were gently clarified by centrifugation at 2000 g for 10 minutes and added to Beckman Quick-seal tubes via 16-gauge 5″ disposable needles (Air-Tite N165). A discontinuous iodixanol gradient was formed by sequentially floating layers: 9 mL 15% iodixanol in 500 mM NaCl and 1×PBS-MK (1×PBS plus 1 mM MgCl2 and 2.5 mM KCl), 6 mL 25% iodixanol in 1×PBS-MK, and 5 mL each of 40% and 60% iodixanol in 1×PBS-MK. Phenol red at a final concentration of 1 μg/mL was added to the 15, 25, and 60% layers to facilitate identification.


Ultracentrifugation was performed using a Ti 70 rotor in a Sorvall WX+ series ultracentrifuge (Thermo Fisher) at 58,600 rpm for 2:15 (h:mm) at 18° C. Following ultracentrifugation, roughly 4 mL of solution was withdrawn from the 40%-60% iodixanol interface via an 18-gauge needle, dialyzed with PBS containing 0.001% F-68, and ultrafiltered via 100-kD MWCO columns (EMD Millipore). The concentrated viral solution was sterile-filtered using a 0.22 μm filter, quantified via qPCR (AAVpro Titration Kit v.2, Clontech), and stored at 4° C. until use.


Animals

All experiments in live animals were approved by the Broad Institute and Massachusetts Eye and Ear Institutional and Animal Care and Use Committees. NPC1 mice were euthanized at the onset of morbidity, defined as profound ataxia leading to an inability to acquire food and water, as evidenced by a low body condition score and minimal responsiveness to touch. Wild-type C57BL/6 mice were from Charles River (#027). Jackson Labs supplied all transgenic mice: Npc1tm(I1061T)Dso (#027704), Ai9 (#007909), Rhodopsin-iCre (#015850), and L7-GFP (#004690).


Retro-Orbital Injections

AAV was diluted to 200 μL in 0.9% NaCl (Fresenius Kabi 918610) before injection. Anesthesia was induced with 4% isoflurane. Following induction as measured by unresponsiveness to a toe pinch, the right eye was protruded by gentle pressure on the skin, and a tuberculin syringe advanced, with the bevel facing away from the eye, into the retrobulbar sinus where AAV mix was slowly injected. For assessments of CNS editing, 1×1011 vg GFP-KASH virus was added to the injection mix as a transduction marker. gDNA was purified from minced tissue using Agencourt DNAdvance kits (Beckman Coulter A48705) in accordance with the manufacturer's directions.


P0 Ventricle Injections

Drummond PCR pipettes (5-000-1001-X10) were pulled at ramp and passed through a Kimwipe three times, resulting in a tip size roughly 100 μm. A small amount of Fast Green was added to the AAV injection solution to assess ventricle targeting. The injection solution was loaded via front-filling using the included Drummond plungers. P0 pups were anesthetized by placement on ice for 2-3 minutes, until they were immobile and unresponsive to a toe pinch. 2 μL of injection mix was injected freehand into each ventricle. Ventricle targeting was assessed by the spread of fast green throughout the ventricles via transillumination of the head.


Nuclear Isolation and Sorting

Cerebella were separated from the brain with surgical scissors, hemispheres were separated using a scalpel, and the hippocampus and neocortex were separated from underlying midbrain tissue with a curved spatula. Nuclei were isolated from brain tissue as previously described72. All steps were performed on ice or at 4° C. Dissected tissue was homogenized using a glass dounce homogenizer (Sigma D8938) (20 strokes with pestle A followed by 20 strokes with pestle B) in 2 mL ice-cold EZ-PREP buffer (Sigma NUC-101). Samples were incubated for 5 minutes with an additional 2 mL EZ-PREP buffer. Nuclei were centrifuged at 500 g for 5 minutes, and the supernatant removed. Samples were resuspended with gentle pipetting in 4 mL ice-cold Nuclei Suspension Buffer (NSB) consisting of 100 μg/mL BSA and 3.33 μM Vybrant DyeCycle Violet (Thermo Fisher) in 1×PBS, and centrifuged at 500 g for 5 minutes. The supernatant was removed and nuclei were resuspended in 1-2 mL NSB, passed through a 35 μm strainer, and sorted into 200 μL Agencourt DNAdvance lysis buffer using a MoFlo Astrios (Beckman Coulter) at the Broad Institute flow cytometry core. Genomic DNA was purified according to the Agencourt DNAdvance instructions for 200 μL volume.


P14 Sub-Retinal Injections

1 μL of AAV mix for sub-retinal injections consisted of 4×109 vg of each split CBE nucleobase editor half, and 2×109 vg GFP for the PHP.B variant. The Anc80+CBE3.9max mixture was divided equally: 3.3×108 vg of each split nucleobase editor half, and 3.3×108 vg GFP. The Anc80+ABEmax mixture consisted of 4.5×108 vg of each split nucleobase editor half, and 4.5×108 vg GFP. PHP.B or Anc80 GFP alone at 5×109 vg/μL was injected into wild-type C57BL/6 mice to assess transduction efficiency. P14 mice were anesthetized by intraperitoneal of ketamine (140 mg/kg) and xylazine (14 mg/kg). Using a microscope for visualization, a small incision was made at the limbus by a 30-gauge needle, and a Hamilton syringe with a 33-gauge blunt-ended needle was used to inject 1 μL of AAV mix. Following injection, mice were placed on a 37° C. warming pad until they recovered.


Retina Dissociation and Cell Sorting

Three weeks post-injection, eyes were enucleated and stored in BGJB medium (Thermo Fisher) on ice as described previously73. Retinas were isolated under a fluorescent dissection microscope to record the transfected region and dissociated into single cells by incubation in solution A containing 1 mg/mL pronase (Sigma-Aldrich) and 2 mM EGTA in BGJB medium at 37° C. for 20 minutes. Solution A was gently removed, followed by adding equal amount of solution B containing 100 U/mL DNase I (New England Biolabs), 0.5% BSA, 2 mM EGTA in BGJB medium. Cells were collected and re-suspended in 1×PBS, filtered through a cell strainer (BD Biosciences, San Jose, Calif.), and sorted using a FACSAriaII (BD Biosciences).


Retinal Histology

Mice injected with PHP.B or Anc80 GFP alone were sacrificed 3 weeks post-injection and perfused with 4% paraformaldehyde in 1×PBS. Eyes were dissected and eye cups were embedded in OCT freezing medium. 10 μm Retinal cryosections were cut and stained with DAPI. Images were taken using an Eclipse Ti microscope (Nikon).


Brain Immunohistochemistry

Mice were transcardially perfused with PBS followed by 4% PFA. Harvested brains were rotated in 4% PFA at 4° C. overnight for post-fixation. Brains were transferred to 30% sucrose in 1×PBS for cryoprotection and rotated at 4° C. until equilibrated, as assessed by loss of buoyancy. Cryoprotected brains were frozen in a dry ice-ethanol bath and sectioned horizontally on a Leica CM1950 at 20 p.m. Slides were rinsed with 10 mM glycine in PBS before blocking and permeabilization in 3% BSA (Jackson Immunoresearch) and 0.1% Trition-X 100 in PBS. Slides were incubated in primary antibody at 4° C. overnight, washed three times for 10 minutes each with PBS containing 0.1% Triton-X (PBSTx), incubated with secondary antibody at room temperature for 1 hour, washed 3×10 minutes with PBSTx, and mounted in ProLong Diamond Antifade with DAPI (Thermo Fisher). Slides were cured overnight at room temperature before imaging. Care was taken to minimize light exposure at all steps. Primary antibodies used were as follows: chicken anti-GFP, 10 μg/mL (Abcam ab13970); rabbit anti-RFP, 1.6 μg/mL (Rockland 600-401-379); rabbit anti-Calbindin, 0.1 μg/mL. (Cell Signaling Technology D1I4Q). Alexa-conjugated goat secondary antibodies (Thermo Fisher) were used at 1:500. Images were captured and stitched at 10× magnification using a Zeiss Axio Scan.Z1. Image intensity was kept below 50% saturation to prevent oversaturation.


Image Analysis

Images were analyzed using ImageJ (Fiji), ilastik74, and CellProfiler75. A subset of images were manually analyzed by a blinded experimenter to validate the accuracy of the final imaging pipelines. Differences between the automated and manual counts were <10%.


Off-Target Analysis

CIRCLE-seq was performed as previously described76. PCR amplification before sequencing was conducted using PhusionU polymerase, and products were gel-purified and quantified with a KAPA library quantification kit before loading onto an Illumina MiSeq. Data was processed using the CIRCLE-Seq analysis pipeline with parameters: “read_threshold: 4; window_size: 3; mapq_threshold: 50; start_threshold: 1; gap_threshold: 3; mismatch_threshold: 6; merged_analysis: True”. The three sites found by CIRCLE-seq analysis were chosen for PCR amplification and high-throughput sequencing. CRISPOR analysis77 was done and the top five offtarget candidates by CFD score were analyzed by amplicon sequencing.


NPC1I1061T Survival Measurements


NPC1I1061T mice were euthanized at the onset of morbidity, defined functionally as profound ataxia leading to an inability to acquire food and water, as evidenced by a low body condition score78,79 and minimal responsiveness to touch. In all cases, low body condition score preceded profound ataxia. Profound ataxia was the diagnostic criterion for morbundity. The endpoint was designed to minimize suffering while providing accurate survival data. Euthanasia recommendations were made by a blinded veterinary technician. All survival groups were mixed-gender.


Statistical Analysis

The logrank (Mantel-Cox) test was used to compare Kaplan-Meier survival curves (GraphPad).


Data and Materials Availability

Key plasmids from this work are available from Addgene (depositor: David R. Liu) and other plasmids are available upon request. All unmodified reads for sequencing-based data in the manuscript are available from the NCBI Sequence Read Archive, accession number PRJNA532891. AAV genome sequences are provided as FIGS. 26A-26U.


REFERENCES



  • 1 Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic acids research 42, D980-985, doi:10.1093/nar/gkt1113 (2014).

  • 2 Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nature reviews. Genetics 19, 770-788, doi:10.1038/s41576-018-0059-1 (2018).

  • 3 Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420-424, doi:10.1038/nature17946 (2016).

  • 4 Gaudelli, N. M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464-471, doi:10.1038/nature24644 (2017).

  • 5 Komor, A. C. et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A nucleobase editors with higher efficiency and product purity. Sci Adv 3, eaao4774, doi:10.1126/sciadv.aao4774 (2017).

  • 6 Koblan, L. W. et al. Improving cytidine and adenine nucleobase editors by expression optimization and ancestral reconstruction. Nature biotechnology, doi:10.1038/nbt.4172 (2018).

  • 7 Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, doi:10.1126/science.aaf8729 (2016).

  • 8 Ryu, S. M. et al. Adenine base editing in mouse embryos and an adult mouse model of Duchenne muscular dystrophy. Nature biotechnology 36, 536-539, doi:10.1038/nbt.4148 (2018).

  • 9 Yeh, W. H., Chiang, H., Rees, H. A., Edge, A. S. B. & Liu, D. R. In vivo base editing of post-mitotic sensory cells. Nat Commun 9, 2184, doi:10.1038/s41467-018-04580-3 (2018).

  • 10 Chadwick, A. C., Wang, X. & Musunuru, K. In Vivo Base Editing of PCSK9 (Proprotein Convertase Subtilisin/Kexin Type 9) as a Therapeutic Alternative to Genome Editing. Arterioscler Thromb Vasc Biol 37, 1741-1747, doi:10.1161/ATVBAHA.117.309881 (2017).

  • 11 Russell, S. et al. Efficacy and safety of voretigene neparvovec (AAV2-hRPE65v2) in patients with RPE65-mediated inherited retinal dystrophy: a randomised, controlled, open-label, phase 3 trial. Lancet 390, 849-860, doi:10.1016/S0140-6736(17)31868-8 (2017).

  • 12 Carvalho, L. S. et al. Evaluating Efficiencies of Dual AAV Approaches for Retinal Targeting. Front Neurosci 11, 503, doi:10.3389/fnins.2017.00503 (2017). 13 Wu, Z., Yang, H. & Colosi, P. Effect of genome size on AAV vector packaging. Molecular therapy: the journal of the American Society of Gene Therapy 18, 80-86, doi:10.1038/mt.2009.255 (2010).

  • 14 Liu, D. R., Levy, Jonathan M., Yeh, Wei Hsi. AAV Delivery Of Nucleobase Editors. International Patent Application Publication No. WO 2018/027078 (2018).

  • 15 Truong, D. J. J. et al. Development of an intein-mediated split-Cas9 system for gene therapy. Nucleic acids research 43, 6450-6458, doi:10.1093/nar/gkv601 (2015).



16 Zetsche, B., Volz, S. E. & Zhang, F. A split-Cas9 architecture for inducible genome editing and transcription modulation. Nature biotechnology 33, 139-142, doi:10.1038/nbt.3149 (2015).

  • 17 Wright, A. V. et al. Rational design of a split-Cas9 enzyme complex. Proc Natl Acad Sci USA 112, 2984-2989, doi:10.1073/pnas.1501698112 (2015).
  • 18 Zettler, J., Schutz, V. & Mootz, H. D. The naturally split Npu DnaE intein exhibits an extraordinarily high rate in the protein trans-splicing reaction. FEBS letters 583, 909-914, doi:10.1016/j.febslet.2009.02.003 (2009).
  • 19 Davis, K. M., Pattanayak, V., Thompson, D. B., Zuris, J. A. & Liu, D. R. Small molecule-triggered Cas9 protein with improved genome-editing specificity. Nat Chem Biol 11, 316-318, doi:10.1038/nchembio.1793 (2015).
  • 20 Stevens, A. J. et al. Design of a Split Intein with Exceptional Protein Splicing Activity. J Am Chem Soc 138, 2162-2165, doi:10.1021/jacs.5b13528 (2016).
  • 21 Kim, Y. B. et al. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytosine deaminase fusions. Nature biotechnology 35, 371-376 (2017).
  • 22 Villiger, L. et al. Treatment of a metabolic liver disease by in vivo genome base editing in adult mice. Nature medicine 24, 1519-1525, doi:10.1038/s41591-018-0209-1 (2018).
  • 23 Grieger, J. C. & Samulski, R. J. Packaging capacity of adeno-associated virus serotypes: impact of larger genomes on infectivity and postentry steps. Journal of virology 79, 9933-9944, doi:10.1128/JVI.79.15.9933-9944.2005 (2005).
  • 24 Deverman, B. E. et al. Cre-dependent selection yields AAV variants for widespread gene transfer to the adult brain. Nature biotechnology 34, 204-209, doi:10.1038/nbt.3440 (2016).
  • 25 Choi, J. H. et al. Optimization of AAV expression cassettes to improve packaging capacity and transgene expression in neurons. Mol Brain 7, 17, doi:10.1186/1756-6606-7-17 (2014).
  • 26 Zuris, J. A. et al. Cationic lipid-mediated delivery of proteins enables efficient protein-based genome editing in vitro and in vivo. Nature biotechnology 33, 73-80, doi:10.1038/nbt.3081 (2015).
  • 27 Rees, H. A. et al. Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery. Nat Commun 8, 15790, doi:10.1038/ncomms15790 (2017).
  • 28 Gray, S. J. et al. Optimizing promoters for recombinant adeno-associated virus-mediated gene expression in the peripheral and central nervous system using self-complementary vectors. Hum Gene Ther 22, 1143-1153, doi:10.1089/hum.2010.245 (2011).
  • 29 Swiech, L. et al. In vivo interrogation of gene function in the mammalian brain using CRISPR-Cas9. Nature biotechnology 33, 102-106, doi:10.1038/nbt.3055 (2015).
  • 30 Feng, J. et al. Dnmt1 and Dnmt3a maintain DNA methylation and regulate synaptic function in adult forebrain neurons. Nature neuroscience 13, 423-430, doi:10.1038/nn.2514 (2010).
  • 31 Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186-191, doi:10.1038/nature14299 (2015).
  • 32 Mendell, J. R. et al. Single-Dose Gene-Replacement Therapy for Spinal Muscular Atrophy. N Engl J Med 377, 1713-1722, doi:10.1056/NEJMoa1706198 (2017).
  • 33 Wu, Z., Asokan, A. & Samulski, R. J. Adeno-associated virus serotypes: vector toolkit for human gene therapy. Molecular therapy: the journal of the American Society of Gene Therapy 14, 316-327, doi:10.1016/j.ymthe.2006.05.009 (2006).
  • 34 Duan, D. Systemic AAV Micro-dystrophin Gene Therapy for Duchenne Muscular Dystrophy. Molecular therapy: the journal of the American Society of Gene Therapy, doi:10.1016/j.ymthe.2018.07.011 (2018).
  • 35 Inagaki, K. et al. Robust systemic transduction with AAV9 vectors in mice: efficient global cardiac gene transfer superior to that of AAV8. Molecular therapy: the journal of the American Society of Gene Therapy 14, 45-53, doi:10.1016/j.ymthe.2006.03.014 (2006).
  • 36 Duan, D., Yue, Y. & Engelhardt, J. F. Expanding AAV packaging capacity with trans-splicing or overlapping vectors: a quantitative comparison. Molecular therapy: the journal of the American Society of Gene Therapy 4, 383-391, doi:10.1006/mthe.2001.0456 (2001).
  • 37 Xu, Z. et al. Trans-splicing adeno-associated viral vector-mediated gene therapy is limited by the accumulation of spliced mRNA but not by dual vector coinfection efficiency. Hum Gene Ther 15, 896-905, doi:10.1089/hum.2004.15.896 (2004).
  • 38 van Putten, M. et al. Low dystrophin levels increase survival and improve muscle pathology and function in dystrophin/utrophin double-knockout mice. FASEB journal: official publication of the Federation of American Societies for Experimental Biology 27, 2484-2495, doi:10.1096/fj.12-224170 (2013).
  • 39 Li, D., Yue, Y. & Duan, D. Marginal level dystrophin expression improves clinical outcome in a strain of dystrophin/utrophin double knockout mice. PloS one 5, e15286, doi:10.1371/journal.pone.0015286 (2010).
  • 40 Tuchman, M., Jaleel, N., Morizono, H., Sheehy, L. & Lynch, M. G. Mutations and polymorphisms in the human ornithine transcarbamylase gene. Hum Mutat 19, 93-107, doi:10.1002/humu.10035 (2002).
  • 41 Treacy, E. P. et al. Analysis of Phenylalanine Hydroxylase Genotypes and Hyperphenylalaninemia Phenotypes Using L-[1-13C]Phenylalanine Oxidation Rates in Vivo: A Pilot Study 1. Pediatric Research 42, 430, doi:10.1203/00006450-199710000-00002 (1997).
  • 42 Hamman, K. et al. Low therapeutic threshold for hepatocyte replacement in murine phenylketonuria. Molecular therapy: the journal of the American Society of Gene Therapy 12, 337-344, doi:10.1016/j.ymthe.2005.03.025 (2005).
  • 43 Zincarelli, C., Soltys, S., Rengo, G. & Rabinowitz, J. E. Analysis of AAV serotypes 1-9 mediated gene expression and tropism in mice after systemic injection. Molecular therapy: the journal of the American Society of Gene Therapy 16, 1073-1080, doi:10.1038/mt.2008.76 (2008).
  • 44 Asico, L. D. et al. Nephron segment-specific gene expression using AAV vectors. Biochem Biophys Res Commun 497, 19-24, doi:10.1016/j.bbrc.2018.01.169 (2018).
  • 45 Foust, K. D. et al. Intravascular AAV9 preferentially targets neonatal neurons and adult astrocytes. Nature biotechnology 27, 59-65, doi:10.1038/nbt.1515 (2009).
  • 46 Mercuri, E. et al. Nusinersen versus Sham Control in Later-Onset Spinal Muscular Atrophy. N Engl J Med 378, 625-635, doi:10.1056/NEJMoa1710504 (2018).
  • 47 Chan, K. Y. et al. Engineered AAVs for efficient noninvasive gene delivery to the central and peripheral nervous systems. Nature neuroscience, doi:10.1038/nn.4593 (2017).
  • 48 Hordeaux, J. et al. The Neurotropic Properties of AAV-PHP.B Are Limited to C57BIJ6J Mice. Molecular therapy: the journal of the American Society of Gene Therapy, doi:10.1016/j.ymthe.2018.01.018 (2018).
  • 49 Huang, Q. et al. Delivering genes across the blood-brain barrier: LY6A, a novel cellular receptor for AAV-PHP.B capsids. bioRxiv, 538421, doi:10.1101/538421 (2019).
  • 50 Harvey, R. J. & Napper, R. M. Quantitative study of granule and Purkinje cells in the cerebellar cortex of the rat. J Comp Neurol 274, 151-157, doi:10.1002/cne.902740202 (1988).
  • 51 Vogel, M. W., Sunter, K. & Herrup, K. Numerical matching between granule and Purkinje cells in lurcher chimeric mice: a hypothesis for the trophic rescue of granule cells from target-related cell death. The Journal of neuroscience: the official journal of the Society for Neuroscience 9, 3454-3462 (1989).
  • 52 Kim, J. Y. et al. Viral transduction of the neonatal brain delivers controllable genetic mosaicism for visualising and manipulating neuronal circuits in vivo. Eur J Neurosci 37, 1203-1220, doi:10.1111/ejn.12126 (2013).
  • 53 Kim, J. Y., Grunke, S. D., Levites, Y., Golde, T. E. & Jankowsky, J. L. Intracerebroventricular viral injection of the neonatal mouse brain for persistent and widespread neuronal transduction. Journal of visualized experiments: JoVE, 51863, doi:10.3791/51863 (2014).
  • 54 Hoxha, E., Balbo, I., Miniaci, M. C. & Tempia, F. Purkinje Cell Signaling Deficits in Animal Models of Ataxia. Front Synaptic Neurosci 10, 6, doi:10.3389/fnsyn.2018.00006 (2018).
  • 55 Matilla-Duenas, A. et al. Consensus paper: pathological mechanisms underlying neurodegeneration in spinocerebellar ataxias. Cerebellum 13, 269-302, doi:10.1007/s12311-013-0539-y (2014).
  • 56 Chakrabarty, P. et al. Capsid serotype and timing of injection determines AAV transduction in the neonatal mice brain. PloS one 8, e67680, doi:10.1371/journal.pone.0067680 (2013).
  • 57 Madisen, L. et al. A robust and high-throughput Cre reporting and characterization system for the whole mouse brain. Nature neuroscience 13, 133-140, doi:10.1038/nn.2467 (2010).
  • 58 Zinn, E. et al. In Silico Reconstruction of the Viral Evolutionary Lineage Yields a Potent Gene Therapy Vector. Cell Rep 12, 1056-1068, doi:10.1016/j.celrep.2015.07.019 (2015).
  • 59 Koch, S. F. et al. Genetic rescue models refute nonautonomous rod cell death in retinitis pigmentosa. Proc Natl Acad Sci USA 114, 5259-5264, doi:10.1073/pnas.1615394114 (2017).
  • 60 Maeder, M. L. et al. Development of a gene-editing approach to restore vision loss in Leber congenital amaurosis type 10. Nature medicine, doi:10.1038/s41591-018-0327-9 (2019).
  • 61 Park, W. D. et al. Identification of 58 novel mutations in Niemann-Pick disease type C: correlation with biochemical phenotype and importance of PTC1-like domains in NPC1. Hum Mutat 22, 313-325, doi:10.1002/humu.10255 (2003).
  • 62 Praggastis, M. et al. A murine Niemann-Pick C1 I1061T knock-in model recapitulates the pathological features of the most prevalent human disease allele. The Journal of neuroscience: the official journal of the Society for Neuroscience 35, 8091-8106, doi:10.1523/JNEUROSCI.4173-14.2015 (2015).
  • 63 Yu, T., Shakkottai, V. G., Chung, C. & Lieberman, A. P. Temporal and cell-specific deletion establishes that neuronal Npc1 deficiency is sufficient to mediate neurodegeneration. Human Molecular Genetics 20, 4440-4451, doi:10.1093/hmg/ddr372 (2011).
  • 64 Loftus, S. K. et al. Rescue of neurodegeneration in Niemann-Pick C mice by a prion-promoter-driven Npc1 cDNA transgene. Hum Mol Genet 11, 3107-3114 (2002).
  • 65 Lopez, M. E., Klein, A. D., Dimbil, U. J. & Scott, M. P. Anatomically defined neuron-based rescue of neurodegenerative Niemann-Pick type C disorder. The Journal of neuroscience: the official journal of the Society for Neuroscience 31, 4367-4378, doi:10.1523/JNEUROSCI.5981-10.2011 (2011).
  • 66 Elrick, M. J. et al. Conditional Niemann-Pick C mice demonstrate cell autonomous Purkinje cell neurodegeneration. Human Molecular Genetics 19, 837-847, doi:10.1093/hmg/ddp552 (2010).
  • 67 Ko, D. C. et al. Cell-autonomous death of cerebellar purkinje neurons with autophagy in Niemann-Pick type C disease. PLoS Genet 1, 81-95, doi:10.1371/journal.pgen.0010007 (2005).
  • 68 Ling, C. et al. High-Efficiency Transduction of Primary Human Hematopoietic Stem/Progenitor Cells by AAV6 Vectors: Strategies for Overcoming Donor-Variation and Implications in Genome Editing. Scientific reports 6, 35495, doi:10.1038/srep35495 (2016).
  • 69 Nathwani, A. C. et al. Long-term safety and efficacy of factor IX gene therapy in hemophilia B. N Engl J Med 371, 1994-2004, doi:10.1056/NEJMoal407309 (2014).
  • 70 Hinderer, C. et al. Severe Toxicity in Nonhuman Primates and Piglets Following High-Dose Intravenous Administration of an Adeno-Associated Virus Vector Expressing Human SMN. Hum Gene Ther, doi:10.1089/hum.2018.015 (2018).
  • 71 Manno, C. S. et al. Successful transduction of liver in hemophilia by AAV-Factor IX and limitations imposed by the host immune response. Nature medicine 12, 342-347, doi:10.1038/nm1358 (2006).
  • 72 Habib, N. et al. Massively parallel single-nucleus RNA-seq with DroNc-seq. Nature methods 14, 955-958, doi:10.1038/nmeth.4407 (2017).
  • 73 Li, P. et al. Allele-Specific CRISPR-Cas9 Genome Editing of the Single-Base P23H Mutation for Rhodopsin-Associated Dominant Retinitis Pigmentosa. The CRISPR Journal 1, 55-64, doi:10.1089/crispr.2017.0009 (2018).
  • 74 Sommer, C., Strähle, C., Köthe, U. & Hamprecht, F. A. in Eighth IEEE International Symposium on Biomedical Imaging (ISBI2011). 230-233.
  • 75 Carpenter, A. E. et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol 7, R100, doi:10.1186/gb-2006-7-10-r100 (2006).
  • 76 Tsai, S. Q. et al. CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets. Nature methods 14, 607-614, doi:10.1038/nmeth.4278 (2017).
  • 77 Haeussler, M. et al. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol 17, 148, doi:10.1186/s13059-016-1012-2 (2016).
  • 78 Ullman-Cullere, M. H. & Foltz, C. J. Body condition scoring: a rapid and accurate method for assessing health status in mice. Lab Anim Sci 49, 319-323 (1999).
  • 79 Foltz, C. & Ullman-Cullere, M. Guidelines for Assessing the Health and Condition of Mice. Lab Animal 28 (1998).
  • 80 Langmade, S. J. et al. Pregnane X receptor (PXR) activation: a mechanism for neuroprotection in a mouse model of Niemann-Pick C disease. Proc Natl Acad Sci USA 103, 13807-13812, doi:10.1073/pnas.0606218103 (2006).
  • 81 Hughes, M. P. et al. AAV9 intracerebroventricular gene therapy improves lifespan, locomotor function and pathology in a mouse model of Niemann-Pick type C1 disease. Hum Mol Genet 27, 3079-3098, doi:10.1093/hmg/ddy212 (2018).
  • 82 L. D. Landegger, B. Pan, C. Askew, S. J. Wassmer, S. D. Gluck, A. Galvin, R. Taylor, A. Forge, K. M. Stankovic, J. R. Holt, L. H. Vandenberghe, A synthetic AAV vector enables safe and efficient gene transfer to the mammalian inner ear. Nature Biotechnology 35,28 0-284 (2017).
  • 83 B. W. Thuronyi, L. W. Koblan, J. M. Levy, W.-H. Yeh, C. Zheng, G. A. Newby, C. Wilson, M. Bhaumik, O. Shubina-Oleinik, J. R. Holt, D. R. Liu, Continuous evolution of nucleobase editors with expanded target compatibility and improved activity. Nature Biotechnology, (2019).


Example 4: Editing of TMC1 Gene in Baringo Mice Using AAV Encoded Split Nucleobase Editor

Sensory hair cells of Baringo mice have a complete loss of auditory sensory transduction and thus are profoundly deaf. The Baringo (Tmc1Y182C/Y182C; Tmc2+/+) mouse model is homozygous for a recessive loss-of-function T.A-to-C.G mutation in Tmc1 (c.A545G) that substitutes Tyr 182 for Cys (p.Y182C), results in profound deafness by 4 weeks of age. TMC1 protein is required for proper sensory transduction in hair cells of the cochlea. To repair the p.Y182C mutation several optimized cytidine nucleobase editors (CBEmax variants) and guide RNAs were tested in Baringo mouse embryonic fibroblasts. The most promising CBE, derived from an activation-induced cytosine deaminase (AID), was packaged into dual AAV vectors using a split-intein system. The dual AID-CBEmax AAVs were injected into the inner ears of Baringo mice at postnatal day 1 (P1). Injected mice showed up to 51% correction of the c.A545G point mutation in Tmc1 transcripts, which restored the wild-type Tmc1 coding sequence (c.A545A) in sensory hair cells of the inner ear. Repair of Tmc1 in vivo rescued hair-cell sensory transduction, hair-cell morphology, and substantial low-frequency hearing four weeks post-injection.


Base Editing Tmc1 In Vitro

To develop a base editing strategy capable of correcting the Baringo mutation (Tmc1 c.A545G), protospacer sequences at the target site were searched. Three protospacer-adjacent motifs (PAMs) were identified that allow binding of S. pyogenes Cas9 (SpCas9, AGG PAM) or the engineered VRQR SpCas9 variant (GGA or TGA PAM) to the target locus in a manner that positions the target Tmc1 nucleotide within or near the cytosine base editing activity window (approximately protospacer positions 4-8, counting the PAM as positions 21-23). Three candidate guide RNAs position this target C:G base pair at protospacer position 8 (sgRNA1, AGG PAM), position 7 (sgRNA2, GGA PAM), or position 10 (sgRNA3, TGA PAM) (FIG. 30A).


Potential bystander edits near the target nucleotide in Tmc1, which is located in the sequence 5′ . . . AACAGGAAGcustom-characterACGAGGCCAC . . . 3′ (SEQ ID NO: 513), were considered. When the target nucleotide is at protospacer position 8 (C8), no other C nucleotides lie within the canonical CBE activity window (18). The closest bystander C, at protospacer position 10, if edited to a T would result in a silent mutation, because both TCG and TCA on the opposite DNA strand encode Serine. The nearest non-silent Cs are located at C−8 and C15, well outside the base editing activity window when using any of the three candidate sgRNAs described above (FIG. 30A). Thus, anticipated products of base editing should revert Cys 182 back to Tyr, with minimal other non-synonymous amino acid changes (FIG. 34).


The target Tmc1 nucleotide is in an AGcustom-character sequence context. It was previously noted that APOBEC1-derived CBEs (including the commonly used BE3 and BE4 variants), edit Gcustom-character targets less efficiently, consistent with the known DNA sequence preferences of APOBEC1 deaminase. In contrast with APOBEC1, the CDA1 deaminase from P. marinus, and human AID deaminase both deaminate Gcustom-character substrates efficiently. To compare the activity of CDA1- and AID-derived nucleobase editors at the Baringo mutation site, nuclear localization-optimized, codon-optimized BE4max (also known as APOBEC1-BE4max) that replaces APOBEC1 with CDA1 (resulting in CDA1-BE4max) was constructed, with a highly active laboratory-evolved CDA1 variant recently described83 (resulting in evoCDA1-BE4max), or with human AID deaminase (resulting in AID-BE4max).


Next, cells from Baringo mouse embryos were isolated to compare the editing efficiency of APOBEC1-BE4max, CDA1-BE4max, evoCDA1-BE4max, and AID-BE4max for targeting Tmc1. Mouse embryonic fibroblasts (MEFs) were extracted from Baringo embryos at day 13.5. The ability of APOBEC1-BE4max, CDA1-BE4max, evoCDA1-BE4max, and AID-BE4max to convert the target Tmc1 base pair from pathogenic C:G to wildtype T:A using sgRNA1 was evaluated.


To minimize variability from nucleobase editor expression differences among cells, plasmids encoding each nucleobase editor as a P2A-GFP fusion were constructed and GFP-positive cells were analyzed by high-throughput DNA sequencing (HTS). Since P2A is a self-cleaving peptide that couples GFP production with full-length nucleobase editor translation, GFP-positive cells must also express nucleobase editor. Baringo MEF cells were nucleofected with two-plasmid mixtures in which one plasmid expressed sgRNA1 and the other expressed APOBEC1-BE4max-P2A-GFP, CDA1-BE4max-P2A-GFP, evoCDA1-BE4max-P2A-GFP, or AID-BE4max-P2A-GFP. After three days, the GFP-positive cells were isolated and sequenced.


As anticipated, APOBEC1-BE4max+sgRNA1 showed inefficient (mean±SEM of 2.0±0.7%) editing at Gcustom-character8, likely due to the disfavored sequence context of the target C. In contrast, CDA1-BE4max resulted in 12-fold improved target base editing efficiency (23±1.4%), AID-BE4max resulted in 21-fold more efficient editing (43±0.6%), and evoCDA1-BE4max resulted in 25-fold higher editing (50±2.8%), compared to APOBEC1-BE4max (FIG. 30B). APOBEC1-BE4max, CDA1-BE4max, and AID-BE4max all induced low (1.9%) indels at the target locus, while evoCDA1-BE4max resulted in a much higher (18%±1.9%) indel frequency (FIG. 30B), consistent with previous findings83. The ratio of desired base edit:indels for AID-BE4max (ratio of 23) was much more favorable than for evoCDA1-BE4max (ratio of 2.7).


Subsequently, the effect of varying the position of the Baringo mutation among sgRNA1, sgRNA2, and sgRNA3, which place the target C at protospacer positions 8, 7, or 10, respectively, was tested (FIG. 30A). SpCas9-based AID-BE4max was used with sgRNA1 to access its AGG PAM, and used AID-VRQR-BE4max, which contains the VRQR variant of SpCas9 that is compatible with NGA PAM sites, with sgRNA2 and sgRNA3 to access their TGA or GGA PAMs, respectively. Cells were transfected with plasmids encoding each pair of nucleobase editor-P2A-GFP:sgRNA variant into Baringo MEF cells, sorted for GFP-positive cells, and analyzed them by HTS. 43±0.6% editing from AID-BE4max+sgRNA1, 39±1.4% editing from AID-VRQR-BE4max sgRNA2, and 23±1.4% editing from AID-VRQR-BE4max+sgRNA3 was observed (FIG. 30C). Since the AGG PAM accessed by sgRNA1 resulted in the highest editing efficiency, consistent with sgRNA1 placement of the target nucleotide into the canonical CBE activity window (positions 4-8), AID-BE4max+sgRNA1 using a dual-AAV delivery system was chosen for moving forward in vivo.


Dual-AAV Delivery of Tmc1-Targeted Nucleobase Editors In Vitro

To successfully prevent mutant Tmc1-mediated hearing loss using base editing, the nucleobase editor and guide RNA, or their encoding DNA, must be delivered into cochlear hair cells in the inner ear. Anc80L65, an ancestrally reconstructed AAV hereafter referred to as Anc80, was selected due to its demonstrated safety and efficacy in the mouse inner ear82. To validate the ability of Anc80 to deliver genes into inner hair cells (IHCs) and outer hair cells (OHCs) of Baringo mice, 7.2×108 vg of Anc80 AAV encoding GFP driven by the chicken (3-actin hybrid (Cbh) promoter was administered by intracochlear injection into the inner ear of P1 Baringo mice. This viral dose, corresponding to 1.8×109 vg/kg, is well within the range of AAV known to be tolerated in human retina in clinical applications. High viral transduction efficiency was observed in MC (41.7% in apex and 22.6% in base of cochlea) and low transduction in OHC (8.3% in apex and 2.6% in base of cochlea) (FIGS. 35A-35C).


Since the coding sequence of nucleobase editors (˜5.2 kB) exceeds the DNA capacity of AAVs, AID-BE4max was modified in two ways to enable AAV-mediated delivery. First, the nucleobase editor was divided into two halves (an N-terminal half and a C-terminal half) between Glu573 and Cys574, and fused each nucleobase editor half with one half of the Npu trans-splicing split intein. Co-expression of both nucleobase editor-intein halves results in rapid protein splicing, reconstituting full-length nucleobase editor. Second, the second uracil glycosylase inhibitor (UGI) domain was removed in each, yielding AID-BE3.9max. It was recently shown that removing the second UGI copy in split-intein CBE variants minimally affects base editing efficiency. These two changes enabled the nucleobase editor along with sgRNA1 and all necessary promoter and regulatory sequences to fit within two AAVs (≤4,849 bp each).


To test whether this split-intein dual AAV strategy mediated efficient base editing of Tmc1, Baringo MEF cells were transduced with dual AAVs encoding AID-BE3.9max+gRNA1 at two dosages. The high dose of the N-terminus half was 6.1×108 vg and the low dose was 3.1×107 vg; the high dose of the C-terminus half was 8.3×108 vg and the low dose was 4.2×107 vg. After applying the dual AAV encoding AID-BE3.9max+sgRNA1 to MEF cells, cells were cultured for two weeks before analyzing editing outcomes using HTS (FIG. 30D). Treatment of Baringo MEF cells with the high dose of AID-BE3.9max AAV resulted in 57% editing (with 4.6% indels) of pathogenic C.G to wild-type TA at Tmc1Y182C/Y182C in unsorted cells. Treatment of the MEF cells with the low dose of AID-BE3.9max AAV resulted in 5-10% editing (FIG. 30D). Given the high editing efficiency from high-dose AAV treatment, without sorting for AAV-infected cells, dual AID-BE3.9max+sgRNA1 was used for subsequent in vivo experiments.


Off-Target Analysis of Tmc1 Base Editing

Next, base editing at off-target genomic loci bound by the Cas9:sgRNA1 complex was investigated. Previous reports using unbiased genome-wide off-target detection methods for nucleobase editors have observed that off-target substrates of nucleobase editors are generally a subset of off-targets for the corresponding Cas9 nuclease. CIRCLE-seq, a current unbiased, sensitive, cell-free off-target detection protocol, was used to identify potential off-target editing sites associated with Cas9 and sgRNA1. Genomic DNA was extracted and fragmented from Baringo MEFs, the ˜500-bp DNA fragments were ligated into circles, and Cas9 was incubated with sgRNA1. After Cas9 incubation, the cut circles were ligated to adaptors and identified the location of DNA cleavage events by HTS (FIG. 31A). This process applied to sgRNA1 resulted in the identification of 28 candidate off-target sites with notable CIRCLE-seq signals (>10 reads).


Then, amplicon sequencing was performed to measure base editing at the ten genomic sites with the largest number of CIRCLE-seq reads, including the on-target site and the top nine off-target sites (FIG. 31A). The on-target base editing efficiency that was observed for the Baringo allele (from Baringo MEF cells transduced with AAV in vitro) was 57% (FIG. 31B). HTS of the candidate off-target amplicons revealed no off-target editing at any protospacer position (FIG. 31B) above that of an untreated control sample (≤0.1% mutation frequency above the untreated control) at any of the nine tested off-target sites tested (FIG. 31B and FIG. 36). Collectively, these data suggest that base editing of Tmc1Y182C/Y182C by AAV-delivered AID-BE3.9max and sgRNA1 occurs efficiently and is not accompanied by substantial editing at candidate off-target sites identified by CIRCLE-seq.


Characterizing Sensory Transduction Currents in Tmc1Y182C/Y182C; Tmc2Δ/Δ mice


While the Tmc1 Y182C mutation is known to cause deafness in Baringo (Tmc1Y182C/Y182C; Tmc2+/+) mice by 4 weeks of age, the consequence of this mutation on hair cell function has not been previously reported. To determine the effect of the Baringo mutation on sensory transduction currents, the cochlea from Baringo mice was dissected at P8 and recorded currents from the sensory hair cells on the same day of dissection. Robust hair-cell current amplitudes were observed (FIGS. 37A-37B).


Based on previous reports, it was hypothesized that the robust currents in P8 mice were the result of transient expression of Tmc2, which encodes transmembrane channel-like 2 and is redundant with Tmc1 in neonatal mice (P8 or younger). To isolate the consequences of the Y182C substitution on transduction current, Baringo mice were crossed with Tmc2 knockout mice to generate Tmc1Y182C/Y182C; Tmc2Δ/Δ mice. Hair cells from Tmc1Y182C/Y182C; Tmc2Δ/Δ mice lacked sensory transduction currents entirely (FIGS. 37A-37B), even during the first postnatal week (P7-8). Collectively, these findings indicate that the Baringo mutation results in a complete loss of TMC1 function. It was concluded that after early postnatal expression of Tmc2 has declined to near zero, the loss of sensory transduction in mature hair cells due to the c.A545G point mutation is the proximal cause of deafness in Baringo mice. These results also suggest that successful base editing of the Tmc1Y182C/Y182C mutation might restore hair-cell sensory transduction and perhaps auditory function.


Tmc1 Base Editing In Vivo

After establishing that AAV-mediated base editing can directly correct the Tmc1Y182C/Y182C mutation in cultured Baringo MEF cells (FIG. 30), and that hair cells from Tmc1Y182C/Y182C; Tmc2Δ/Δ mice lack sensory transduction, the ability of intracochlear injection of dual AAV encoding AID-BE3.9max+sgRNA1 to correct DNA encoding Tmc1Y182C/Y182C was tested. The injection was performed at P1 and the organ of Corti (the part of the cochlea containing hair cells) was extracted from bulk cochlear tissue of treated Baringo (Tmc1Y182C/Y182C; Tmc2+/+) mice at P14. DNA from cochlear tissue of injected Baringo mice was sequenced, and base editing was observed at the Tmc1 locus in the organ of Corti from all three treated mice examined (FIG. 31C). Even though the fraction of hair cells in the dissected organ of Corti is estimated to be less than 2% of total cells harvested for DNA sequencing, the whole organ of Corti from treated mice contained the desired base edit in Tmc1 at an average frequency of 2.3±0.4% (FIG. 31C). Since Anc80 AAV is known to preferentially target IHC, 2.3% editing in the entire organ of Corti is consistent with substantial base editing of IHCs.


To more directly assess the base editing efficiency of hair cells within organ of Corti samples, cochlear Tmc1 mRNA of treated mice was sequenced by reverse transcription of total mRNA and amplicon sequencing using primers specific to Tmc1. Given that Tmc1 in the cochlea is only expressed among hair cells, base-edited Tmc1 cDNA observed in the cochlea likely reflects base editing of hair cells. Indeed, 10 to 51% editing efficiency of Tmc1 mRNA was observed, which is 5- to 25-fold higher than DNA editing levels measured in bulk organ of Corti tissue (FIG. 31C). Together, these observations confirm successful in vivo base editing of the Tmc1 locus from treatment with dual AAV.


AAV-Mediated In Vivo Base Editing Preserves Inner Hair Cell Stereocilia Morphology

Inner and outer hair cells of Baringo mice begin to die around four weeks of age, progressing from the base of the cochlea toward the apex. To investigate the ability of AAV delivered AID-BE3.9max+sgRNA1 to preserve hair cells and hair bundle morphology, Baringo mice were injected at P1, euthanized at P28, and inner ear was excised tissue for histological examination. No overt evidence of inflammation or tissue damage was observed in any of the injected ears. Cochleas were harvested and the entire organ of Corti was dissected, mounted and stained. Given the lack of high-quality anti-TMC1 antibody to visualize TMC1 directly, an anti-Myo7A antibody stain was used to label surviving hair cells. Confocal microscopy analysis of the immunostained organ of Corti tissue revealed no significant differences in overall OHC or IHC survival between untreated and treated Baringo mice (FIGS. 38A-38C). Both groups had significant loss of OHCs, especially in the basal region of the cochlea where almost no surviving OHCs were observed. The IHCs of both groups appeared, by confocal microscopy, to be mostly intact in both apical and basal turns of the cochlea, consistent with prior characterization of Baringo mice.


Hair bundle morphology was observed using scanning electron microscopy (SEM). High resolution SEM images revealed striking morphological differences between treated and untreated Baringo hair bundles, particularly in the cochlear apex. Baringo mice injected with AAV-AID-BE3.9max+sgRNA1 had both IHC and OHC bundles from the apical end of the cochlea with morphologies more similar to those of wild-type mice than untreated Baringo mice (FIGS. 31D-31F). At the basal end of cochlea from treated Baringo mice, IHC, but not OHC hair bundles showed preserved morphologies compared to untreated Baringo mice (FIGS. 39A-39C). These morphological differences suggest that treatment with AID-BE3.9max+sgRNA1 promotes preservation of normal hair bundle morphology, which is otherwise disrupted in untreated Baringo mice. Since normal hair bundle morphology is a prerequisite for normal hair cell function, these findings raise the possibility that preservation of hair bundles from base editing with AID-BE3.9max+sgRNA1 might render Baringo hair cells functional.


Base Editing Tmc1 In Vivo Restores Hair-Cell Sensory Transduction Current

After establishing that AAV-mediated base editing can directly correct the Tmc1Y182C/Y182C mutation in cultured Baringo MEF cells (FIGS. 30A-30D), and that hair cells from Tmc1Y182C/Y182C; Tmc2Δ/Δ mice lack sensory transduction, whether intracochlear injection of dual AAV encoding AID-BE3.9max+sgRNA1 could rescue sensory transduction currents in auditory hair cells of Tmc1Y182C/Y182C; Tmc2Δ/Δ mice was next tested. To identify hair cells with functional sensory transduction, an uptake of FM1-43, a styryl dye that enters hair cells through sensory transduction channels was visualized. Hair cells lacking functional TMC1 and TMC2 proteins do not internalize FM1-43, whereas cells with functional sensory transduction channels readily take up FM1-43.


A FM1-43 uptake was imaged in two groups of Tmc1Y182C/Y182C; Tmc2Δ/Δ mice: an untreated control group, and a treated group that received an intracochlear injection of 1 μL of 7.2×108 vg total of dual AAV encoding AID-BE3.9max+sgRNA1 at P1. After 5-7 days of treatment, the cochlea from both groups of mice was dissected (Tmc1Y182C/Y182C; Tmc2Δ/Δ), the cochleas were cultured in vitro for 7-10 days, and FM1-43 was applied. No FM1-43 uptake in the IHCs or OHCs of untreated mice was observed, but robust FM1-43 uptake among 75±10% (n=4 cochleas) of IHCs of treated mice, and very little FM1-43 uptake in OHCs of treated mice was observed (FIGS. 32A-32B). These results suggest restoration of function in IHCs of base-editor treated mice, but not in untreated mice.


To directly assess the effect of in vivo base editing on IHC function, sensory transduction currents from IHCs were recorded. 3.1×109 vg of each AAV encoding AID-BE3.9max+sgRNA1 was injected into the inner ear of P1 Tmc1Y182C/R182C; tmc2Δ/Δ mice and the organ of Corti was extracted at P5. Extracted P5 organ of Corti tissue was maintained in culture and incubated for an additional 7-10 days before cellular recording. In agreement with the FM1-43 uptake data (FIGS. 32A-32B), IHCs of mice injected with dual AAV encoding AID-BE3.9max:sgRNA1 displayed robust sensory transduction at both time points tested (P14 and P18) (FIG. 32C). Indeed, nine of fourteen IHCs from treated mice exhibited current amplitudes that were indistinguishable from those of wild-type (Tmc1Y182C/Y182C; Tmc2+/+) mice. In contrast, untreated Tmc1Y182C/Y182C; Tmc2Δ/Δ mice showed no transduction currents in any of the four tested IHCs at P8 (FIG. 32C, leftmost data).


Collectively, these results demonstrate that in vivo delivery of dual AAVs encoding AID-BE3.9max and sgRNA1 restored wild-type (FIG. 32C, in black) sensory transduction in a substantial fraction of IHCs from treated Tmc1Y182C/Y182C; Tmc2Δ/Δ mice, which without treatment show no sensory transduction currents.


In Vivo Base Editing Rescues Auditory Function

The rescue of IHC morphology and restoration of IHC sensory transduction in base-edited Baringo mice suggests that these mice may exhibit rescued cochlear function compared to untreated Baringo mice, which are profoundly deaf at 4 weeks of age. To test this possibility, auditory brainstem responses (ABRs) were measured at P30 in untreated Baringo mice and Baringo (Tmc1Y182C/Y182C; Tmc2+/+) mice injected at P1.


The ABR threshold is the lowest decibel (dB) level needed to generate identifiable auditory brainstem waveforms. Representative families of ABR waveforms recorded in response to 5.6-kHz tone bursts of varying sound intensity are illustrated in FIGS. 33A-33B. The waveform families in FIGS. 33A-33B were selected to illustrate representative responses of wild-type (Tmc1182C/Y182C; Tmc2+/+) control mice with or without treatment with dual AAV encoding AID-BE3.9max+sgRNA1 intracochlear injection (7.2×108 vg total viral genomes) (FIG. 33A), and Baringo mice with or without the same AAV treatment. The ABR threshold for a 5.6 kHz tone burst for wild-type (Tmc1Y182C/Y182C; Tmc2+/+) control groups (injected or uninjected) was 30 dB (FIG. 33A; lighter-shaded lines at 30 dB). In contrast, untreated Baringo mice showed no detectable ABR thresholds at the maximum sound level tested (110 dB), indicating profound deafness (FIG. 33B). Importantly, treated Baringo mice had ABR thresholds as low as 60 dB (FIG. 33B), representing at least 50 dB of improvement compared to untreated Baringo mice.


A summary plot of ABR thresholds as a function of frequency for all four groups are illustrated in FIG. 33C. Of the ten untreated Baringo (Tmc1Y182C/Y182C; Tmc2+/+) mice, none showed detectable auditory function across all frequencies tested, even at 110 dB. In contrast, of 15 Baringo (Tmc1Y182C/Y182C; Tmc2+/+) mice injected with AAV encoding AID-BE3.9max+sgRNA1, nine showed rescue of some auditory function, with ABR thresholds at 5.6 kHz and 8.0 kHz averaging ˜90 dB, and ABR thresholds at higher frequencies 11.3 kHz, 16.0 kHz, 22.6 kHz, 32.0 kHz averaging ˜95-100 dB (FIG. 33C). Thus, across all treated Baringo mice, AAV-delivered AID-BE3.9max+sgRNA1 improved ABR thresholds by at least 5 to at least 50 dB across all frequencies tested.


The function of outer hair cells (OHCs) using distortion product otoacoustic emissions (DPOAE) were also measured (FIG. 33D). DPOAE analysis revealed that none of the 15 treated Baringo mice showed recovery of DPOAEs relative to untreated mice. The lack of DPOAEs suggest a lack of OHC recovery, consistent with the lack of functional recovery of OHCs and the lack of OHC bundles in the base (FIGS. 39A-39C). This lack of DPOAE recovery likely resulted from lower viral transduction efficiency of Anc80 in OHCs, as previously reported or the lower efficiency of the Cbh promoter in OHCs as noted above.


Finally, to rule out any possible adverse effects of the injection procedure, AAV transduction, or post-splicing intein peptide in the ABR or DPOAE tests, AAV encoding AID-BE3.9max+sgRNA1 was injected into the inner ears of four wild-type mice (FIGS. 33C-33D; lighter-shaded lines, n=4). ABR and DPOAE thresholds of treated wild-type mice were not significantly different (each frequency has a p-value >0.1) than those of the untreated wild-type mice (FIGS. 33C-33D; blue lines), confirming that the injection technique, viral capsid, AID-BE3.9max, and sgRNA1 did not have any apparent effect on auditory function in the absence of the Tmc1Y182C/Y182C mutation.


Collectively, these results demonstrate that AAV-mediated base editing of Tmc1Y182C/Y182C improves auditory function in Baringo mice and represent the first in vivo rescue of a recessive sensory impairment disease by base editing.


Discussion

Recessive loss-of-function mutations cause most known genetic hearing loss diseases. As described herein, base editing was used in vitro and in vivo to correct a point mutation in transmembrane channel-like 1 (Tmc1) that causes profound deafness. Base editing fully restored hair-cell function in a subset of cells, preserved hair-cell morphology, and rescued auditory sensitivity especially to low frequencies in a mouse model of human recessive deafness. These results represent the first correction (rather than disruption) of a pathogenic mutation in the inner ear resulting in improved auditory function and demonstrate the promise of base editing to directly correct loss-of-function recessive mutations. Among 108 recorded human Tmc1 mutations that likely cause genetic hearing loss, can, in principle, be corrected with cytosine or adenine nucleobase editors (Table 5). The focus of these Examples was on a recessive loss-of-function mutation; however, the nucleobase editors described herein may also be used to correct dominant mutations.


In vivo delivery of AAV encoding an optimized nucleobase editor and guide RNA resulted in up to 50% base editing efficiency in restoring the wild-type coding sequence of Tmc1 in hair cells (HCs) in Baringo mice. Importantly, base-edited hair cells were mostly IHCs, which upon treatment resisted morphological degeneration normally seen in untreated Baringo mice. The treated mice also exhibited normal sensory transduction currents, unlike IHCs of untreated Baringo mice. Treated mice exhibited ABR thresholds at 5.6 kHz improved by at least 10-50 dB compared to the undetectable ABR thresholds observed in untreated Baringo mice. Given that the untreated Baringo mouse model used herein has no detectable auditory function at 4 weeks of age, this level of auditory function rescue represents a major improvement. For a patient with a similar loss-of-function TMC1 mutation, a corresponding improvement would represent the difference between hearing nothing at all to being able to detect salient auditory cues in the environment, such as alarms, ringing phones, or sirens from an emergency vehicle. Moreover, this level of auditory function could be supplemented with hearing aids that extend auditory functional recovery.


To rescue auditory sensitivity over a greater range of frequencies, it will be necessary to develop a similarly efficient base editing delivery strategy for editing outer hair cells (OHCs). The development of viral capsids or promoters capable of supporting dual OHC transduction with higher efficiency thus holds promise to further improve outcomes of correcting mutations that cause genetic hearing loss. In addition, the onset of degeneration at the basal (high-frequency) end of the cochlea is thought to occur earlier than at the apical (low-frequency) end, suggesting the importance of treating as early as possible to rescue high-frequency auditory function.


Materials and Methods
Study Design

The methods described herein aimed to use base editing in the post-natal mouse inner ear to correct a recessive loss-of-function point mutation that causes congenital deafness, resulting in the rescue of hair-cell sensory transduction, hair-cell morphology, and auditory function. nucleobase editor variants that correct a recessive mutation in Tmc1 were identified in cultured cells and in vivo. AAV vectors were used to deliver nucleobase editors in vitro and in vivo, and editing outcomes were evaluated using high-throughput sequencing, quantitative RT-PCR, immunolocalization and confocal microscopy, scanning electron microscopy, imaging of FM1-43 uptake, single-cell current transduction recording, histology and imaging of whole cochleas, and measurement of ABR and DPOAE thresholds. Left ears were injected and right ears were used as uninjected controls. Each experiment was replicated as indicated by n values in the figure legends. All experiments with mice and viral vectors were approved by the Institutional Animal Care and Use Committee (Protocols #17-03-3396R and 18-01-3610R) at Boston Children's Hospital and the Institutional Biosafety Committee.


Mice

Wild-type mice were C57BL/6J (Jackson Laboratories). Two genotypes of mutant mice were used: Tmc1Y182C/Y182C; Tmc2+/+ and Tmc1Y182C/Y182C; Tmc2Δ/Δ. The Tmc1p.Y182C “Baringo” mice were obtained from Murdoch Children's Research Institute (The Royal Children's Hospital, Australia). Mice with genotype Tmc1Y182C/Y182C; Tmc2Δ/Δ were obtained by crossing of Tmc1Δ/Δ; Tmc2Δ/Δ with Tmc1Y182C/Y182C; Tmc2+/+. Mice that carried mutant alleles of Tmc1 and Tmc2 were on C57BL/6J or BALB/c backgrounds as described previously. Wild-type control mice were C57BL/6J (Jackson Laboratories). All procedures met the NIH guidelines for the care and use of laboratory animals and were approved by the Institutional Animal Care and Use Committees at Boston Children's Hospital (Protocols #17-03-3396R and 18-01-3610R). Mice ages P0-P1 were used for in vivo delivery of viral vectors according to protocols mentioned above. Mice were genotyped using toe clip (before P8) or ear punch (after P8) and PCR was performed as described previously. For all studies, both male and female mice were used in approximately equal proportions.


Baringo (Tmc1Y182C/Y182C; Tmc2+/+) Mouse Embryonic Fibroblast Cell Generation


Baringo females at 3-4 weeks of age were treated with single intra-peritoneal injection of 5 U each of pregnant mare's serum gonadotropin (Prospec) followed by human chorionic gonadotropin (Sigma) after 44-45 hours and paired with Baringo males. The following morning, females were examined for copulatory plugs to confirm matings and marked as 0.5 dpc. At day 13.5 females were sacrificed by CO2 inhalation followed by cervical dislocation. Embryos were harvested in PBS under aseptic conditions. To harvest primary embryonic fibroblasts, each embryo was eviscerated and head was removed. The remaining parts of each embryo were minced to prepare single-cell suspensions and treated with 0.25% Trypsin-EDTA (Gibco) at 37° C. for 10 minutes, followed by centrifugation for 10 minutes. Pellets were resuspended in growth media containing DMEM, 10% FBS, penicillin-streptomycin (100 U/mL) and plated on 15-cm tissue culture plates, then incubated at 37° C. until confluent. The Baringo colony is maintained ad libitum and all animal procedures are approved by the Children's Hospital IACUC in compliance with relevant ethical regulations.


Nucleofection and Viral Infection of Baringo (Tmc1Y182C/Y182C; Tmc2+/+) MEF Cells


MEF cells were cultivated until confluent, then pooled. Replicates were performed on the same day using three separate nucleofections followed by cultivation in separate wells. Each nucleofection contained 400 ng nucleobase editor as a P2A-GFP plasmid and 100 ng guide RNA plasmid. Transfection programs were optimized following manufacturer's instructions (CZ-167, P4 Primary Cell 4D-Nucleofector X Kit, Lonza). Cells were sorted at the MIT FACS core three days after nucleofection and genomic DNA was purified directly after sorting. Next, high-throughput DNA sequencing (HTS) was performed. For AAV infection, each AAV was added to a single well of a 48-well plate. After 2 weeks, the DNA was extracted and analyzed by HTS.


Genomic DNA Purification

Genomic DNA was purified from sorted cells or cochlea tissue using Agencourt DNAdvance kits (Beckman Coulter A48705) following the manufacturer's directions.


RNA Isolation from the Cochlea


RNA isolation was performed with the RNeasy Plus Micro Kit (QIAGEN) according to the manufacturer's instructions. In brief, 250 μL of RLT Plus Buffer (QIAGEN) b-mercaptoethanol was added to each tube with one cochlea in it; tissue was homogenized by pipetting, fast freezing, and vertexing, and transferred into a DNA eliminator column. Subsequent binding and washing steps for RNA isolation using the RNeasy columns were performed according to the manufacturer's instructions. RNA was eluted from the RNeasy column with 45 μL of RNase-free water (QIAGEN). Total RNA was converted into cDNA on the same day.


cDNA Generation for Targeted RNA Amplicon Sequencing


cDNA was generated from the isolated RNA using the Prot® Script II First Strand cDNA Synthesis Kit (New England Biolabs) according to the manufacturer's instructions with Oligo-dT primers. Amplification of cDNA for high-throughput sequencing was performed to the top of the linear range (29 cycles) using qPCR as described below. High-throughput sequencing of amplicons was performed as described below. Sequences were aligned to the reference sequence for each RNA, obtained from the NCBI.


CIRCLE-seq

CIRCLE-seq was performed as previously described. PCR amplification before sequencing was conducted using PhusionU polymerase, and products were gel-purified and quantified with a KAPA library quantification kit before loading onto an Illumina MiSeq. Data was processed using the CIRCLE-Seq analysis pipeline with parameters: “read_threshold: 4; window_size: 3; mapq_threshold: 50; start_threshold: 1; gap_threshold: 3; mismatch_threshold: 6; merged_analysis: True”. The top ten most common sites based on CIRCLE-seq read count were chosen for PCR amplification and high-throughput sequencing.


High-Throughput DNA Sequencing and Data Analysis

Genomic DNA was amplified by qPCR using Q5 High-Fidelity 2× Master Mix with use of SYBR gold for quantification. To minimize PCR bias, reactions were stopped during the exponential amplification phase. 2 uL of the unpurified gDNA PCR product was used as a template for subsequent barcoding PCR (8 cycles, annealing temperature 61° C.). Pooled barcoding PCR products were gel-extracted (Min-elute columns, Qiagen) and quantified by qPCR (KAPA KK4824). Sequencing of pooled amplicons was performed using an Illumina MiSeq according to the manufacturer's instructions. All oligonucleotide sequences used for gDNA amplification are provided in Table 3.


Initial de-multiplexing and FASTQ generation were performed by bcl2fastq2 running on BaseSpace (Illumina) with the following flags: --ignore-missing-bcls --ignore-missing-filter --ignore-missing-positions --ignore-missing-controls --auto-set-to-zero-barcode-mismatches -- find-adapters-with-sliding-window --adapter-stringency 0.9--mask-short-adapter-reads 38--minimum-trimmed-read-length 38. Alignment of fastq files and quantification of editing frequency was performed by CRISPResso2 in batch mode with the following flags: --min_bp_quality_or_N 20--base_editor_output -p 2-w 20-wc -10.


For quantification of conversion to wild-type Tmc1 protein (FIGS. 30A-30D), the percentage of aligned reads around the target site that matched the sequences are given in Table 4, all of which contain the targeted coding mutation with no other non-silent mutations or indels, were summed for each replicate from the CRISPResso2 allele table.


Tissue Preparation

Temporal bones were harvested from mouse pups at P0-P5. Pups were euthanized by rapid decapitation and temporal bones were dissected in MEM (Invitrogen, Carlsbad, Calif.) supplemented with 10 mM HEPES, 0.05 mg/ml ampicillin, and 0.01 mg/ml ciprofloxacin at pH 7.4. The membranous labyrinth was isolated under a dissection scope, Reissner's membrane was peeled back, and the tectorial membrane and stria vascularis were mechanically removed. Organ of Corti cultures were pinned flatly beneath a pair of thin glass fibers adhered at one end with Sylgard to an 18-mm round glass coverslip. Tissues were either used acutely or kept in culture in presence of 1% Fetal Bovine Serum. Cultures were maintained for 7 to 10 days. For mice older than P10, temporal bones were harvested after euthanizing the animal with inhaled CO2, and cochlear whole mounts were generated.


Electrophysiological Recording

Recordings were performed in standard artificial perilymph solution containing (in mM): 144 NaCl, 0.7 NaH2PO4, 5.8 KCl, 1.3 CaCl2, 0.9 MgCl2, 5.6 D-glucose, and 10 HEPES-NaOH, adjusted to pH 7.4 and 320 mOsmol/kg. Vitamins (1:50) and amino acids (1:100) were added from concentrates (Invitrogen, Carlsbad, Calif.). Hair cells were viewed from the apical surface using an upright Axioskop FS microscope (Zeiss, Oberkochen, Germany) equipped with a 63× water immersion objective with differential interference contrast optics. Recording pipettes (3-5 MΩ) were pulled from borosilicate capillary glass (Garner Glass, Claremont, Calif.) and filled with intracellular solution containing (in mM): 135 KCl, 5 EGTA-KOH, 10 HEPES, 2.5 K2ATP, 3.5 MgCl2, 0.1 CaCl2, pH 7.4. Currents were recorded under whole-cell voltage-clamp at a holding potential of −64 mV at room temperature. Data were acquired using an Axopatch 200A (Molecular devices, Palo Alto, Calif.) filtered at 10 kHz with a low pass Bessel filter, digitized at ≥20 kHz with a 12-bit acquisition board (Digidata 1322) and pClamp 8.2 and 10.5 (Molecular Devices, Palo Alto, Calif.). Data were analyzed offline with OriginLab software.


Viral Vector Generation

Anc80L65 vectors carrying the split coding sequences of AID-BE3.9max, inteins, sgRNA1, and Cbh promoter (a hybrid form of chicken (3-actin promoter) were generated using a helper virus free system and a double transfection method. All viruses were produced by the Viral Core at Boston Children's Hospital. Titers were calculated by qPCR with ITR primers (LITR-F: GACCTTTGGTCGCCCGGCCT (SEQ ID NO: 481); LITR-R: GAGTTGGCCACTCCCTCTCTGC (SEQ ID NO: 484)) and GFP primers (GFP-F: AGAACGGCATCAAGGTGAAC (SEQ ID NO: 485); GFP-R: GAACTCCAGCAGGACCATGT (SEQ ID NO: 486)). All three vectors were purified using an iodixanol step gradient followed by ion exchange chromatography. Virus aliquots were stored at −80° C. The titer was 6.11×1012 per mL for BE3.9max-AID-N-terminal and 8.26×1012 per mL for C-terminal virus.


FM1-43 Imaging

FM1-43 (Invitrogen) was diluted in extracellular recording solution (5 μM final concentration) and applied to tissues for 10 seconds, then washed three times in extracellular recording solution to remove excess and prevent uptake via endocytosis. After 5 minutes the intracellular FM1-43 was imaged (Zeiss Axioscope FS Plus) using an FM1-43 filter set and epifluorescence light source with a 63× water immersion objective, or by confocal microscopy.


Confocal Microscopy

All injected and non-injected cochleae were harvested after animals were sacrificed by CO2 inhalation. Temporal bones were removed and immersion fixed for 1 hour at room temperature with 4% paraformaldehyde. Cochleae were then rinsed in PBS and stored at 4° C. in preparation for dissection and immunohistochemistry. Before dissection, temporal bones were decalcified in 120 mM EDTA for 24 h (for P30). For the subsequent immunohistochemical analysis, tissues were infiltrated with 0.01% Triton X-100 for 30 minutes and blocked in 2.5% normal goat serum (Jackson ImmunoResearch) and 2.5% bovine serum albumin (Jackson ImmunoResearch) diluted in PBS (blocking solution) for 1 h and subsequently stained with a rabbit anti-Myosin VIIa primary antibody (Proteus Biosciences, Product #: 25-6790, 1:500 dilution in blocking solution) at 4° C. overnight. A secondary antibody cocktail consisting of a mixture of donkey anti-rabbit antibody conjugated to AlexaFluor 555 (Life Technologies, 1:200 dilution (2 mg/mL)), AlexaFluor 555-phalloidin and AlexaFluor 647-phalloidin (Molecular Probes, 1:200 dilution (2 mg/mL)) as a counterstain to label filamentous actin was applied for 2 h. Samples were mounted on glass coverslips with Vectashield mounting medium (Vector Laboratories), and imaged at 10×-63× magnification using a Zeiss LSM800 confocal microscope. Three-dimensional projection images were generated from Z-stacks using ZenBlue (Zeiss).


Scanning Electron Microscopy (SEM)

SEM was performed at ˜P30 (4 weeks) along the organ of Corti of control and mutant mice. Organ of Corti explants were fixed in 2.5% glutaraldehyde in 0.1 M cacodylate buffer (Electron Microscopy Sciences) supplemented with 2 mM CaCl2 for 1 hour at room temperature. Specimens were dehydrated in a graded series of acetone (35%, 70%, 95%, and 100% (×2)), critical-point dried from liquid CO2, sputter-coated with 4-5 nm of platinum (Q150T, Quorum Technologies, United Kingdom), and observed with a field emission scanning electron microscope (S-4800, Hitachi, Japan).


Auditory Brainstem Responses (ABR)

ABR recordings were conducted from mice anesthetized via IP injection (0.1 mL/10 g-body weight) with 1 mL of ketamine (50 mg/mL) and 0.75 mL of xylazine (20 mg/mL). Subcutaneous needle electrodes were inserted into the skin (a) dorsally between the two ears (reference electrode); (b) behind the left pinna (recording electrode); and (c) dorsally at the rump of the animal (ground electrode). Prior to the onset of ABR testing, the meatus at the base of the pinna was trimmed away to expose the ear canal, and sound pressure at the entrance of the ear canal was calibrated for each individual test subject at all stimulus frequencies. For ABR recordings the ear canal and hearing apparatus (EPL Acoustic system, MEE, Boston) were presented with 5-millisecond tone pips. ABR potentials were amplified (10,000×), filtered (0.3-10 kHz), and digitized using custom data acquisition software (LabVIEW) from the Eaton-Peabody Laboratories Cochlear Function Test Suite. Sound level was raised in 5 to 10 dB steps from 0 to 110 dB sound pressure level (decibels SPL). At each level, 512 to 1024 responses were averaged (with stimulus polarity alternated) after “artifact rejection”. Threshold was determined by visual inspection. Data were analyzed and plotted using Origin-2015 (OriginLab Corporation, MA).


Distortion Product Otoacoustic Emissions (DPOAE)

DPOAE data were collected under the same conditions, and during the same recording sessions, as ABR data. DPOAE at 2f1−f2 were measured with f2 frequencies from 5.6 to 45.2 kHz in half-octave steps (f2/f1=1.22) and L1−L2=10 dB SPL. At each f2, L2 was varied between 10 and 80 dB sound-pressure level (SPL) in 10 dB SPL increments. DPOAE threshold was defined from the average spectra as the L2-level eliciting a DPOAE of magnitude 5 dB SPL above the noise floor. The mean noise floor level was under 0 dB across all frequencies. Iso-response curves were interpolated from plots of DPOAE amplitude versus sound level. Threshold was defined as the f2 level required to produce DPOAEs above 0 dB.


In Vivo Injection of AAV

Inner ear injections were performed as approved by the Institutional Animal Care and Use Committees at Boston Children's Hospital animal protocol #17-03-3396R and 18-01-3610R. Pups were anesthetized by rapid induction of hypothermia for 2-4 minutes on ice water until loss of consciousness, and this state was maintained on a cooling platform for 10-15 minutes during the surgery. Approximately 1 μL of dual AAV were injected in neonatal mice P0-P1. Upon anesthesia, post-auricular incision was made to expose the otic bulla and visualize the cochlea. Standard post-operative care was applied.


Statistical Analysis

Statistical analyses were performed with Origin 2016 (OriginLab Corporation) or Prism 7. Data are presented as mean values ±standard deviations (SD) or standard error of the mean (SEM) as noted in the text and figure legend. Student's t-test was used to determine statistical significance (p-values). Error bars and n values of biological replicates for experiments are defined in the respective paragraphs and figure legends.









TABLE 3







Primers used for high-throughput DNA sequencing.








Primer Name
Sequence





HTS_fwd_Baringo_gDNA
TGGAGTTCAGACGTGTGCTCTTCCGATCTCCCTTATTGGAA



GTCAGGGCTTA (SEQ ID NO: 579)





HTS_rev_Baringo_gDNA
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNGGA



GGATCACTAAGAGAAGGCT (SEQ ID NO: 580)





HTS_fwd_Baringo_cDNA
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNAATG



AAGGCGCTCTTGGGAA (SEQ ID NO: 581)





HTS_rev_Baringo_cDNA
TGGAGTTCAGACGTGTGCTCTTCCGATCTCGTACGGTAAA



CCCCAGAGG (SEQ ID NO: 582)





HTS_fwd_Baringo_off_1
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNGGTG



TCCGCCTGGCTC (SEQ ID NO: 583)





HTS_rev_Baringo_off_1
TGGAGTTCAGACGTGTGCTCTTCCGATCTCACCTGTCCTCT



GGTCTGGA (SEQ ID NO: 584)





HTS_fwd_Baringo_off_2
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNACAA



AAGAAGGGGGAGCGAC (SEQ ID NO: 585)





HTS_rev_Baringo_off_2
TGGAGTTCAGACGTGTGCTCTTCCGATCTTGCACAGCATA



AAAGGGTGC (SEQ ID NO: 586)





HTS_fwd_Baringo_off_3
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNGGCA



AGGGGCATCCTTATGT (SEQ ID NO: 587)





HTS_rev_Baringo_off_3
TGGAGTTCAGACGTGTGCTCTTCCGATCTCACTGAAACTTG



CCATCGCC (SEQ ID NO: 496)





HTS_fwd_Baringo_off_4
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNTCTG



AACAGGTTAGAGGGTGC (SEQ ID NO: 497)





HTS_rev_Baringo_off_4
TGGAGTTCAGACGTGTGCTCTTCCGATCTAATTCCTAAGTT



CCAGGGAGTC (SEQ  ID NO: 498)





HTS_fwd_Baringo_off_5
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNGCTC



ATTCTAAAATTCATAGCCT (SEQ ID NO: 499)





HTS_rev_Baringo_off_5
TGGAGTTCAGACGTGTGCTCTTCCGATCTTAGCATGCTGGG



AACCAGAC (SEQ ID NO: 500)





HTS_fwd_Baringo_off_6
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNAGGT



CCTAGGGTCATTCGGG (SEQ ID NO: 501)





HTS_rev_Baringo_off_6
TGGAGTTCAGACGTGTGCTCTTCCGATCTAGTAGCCTTCAG



CTGCCAAC (SEQ ID NO: 502)





HTS_fwd_Baringo_off_7
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNGCCT



CTGACTGTGTGGCAAG (SEQ ID NO: 503)





HTS_rev_Baringo_off_7
TGGAGTTCAGACGTGTGCTCTTCCGATCTACATTGCCTTCT



CCACTCTTCC (SEQ ID NO: 504)





HTS_fwd_Baringo_off_8
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNACCA



GGGCATGTCATGAAAAC (SEQ ID NO: 505)





HTS_rev_Baringo_off_8
TGGAGTTCAGACGTGTGCTCTTCCGATCTCAGGAGCACAC



CTATCAGGC (SEQ ID NO: 506)





HTS_fwd_Baringo_off_9
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNACTA



GAGCCACTAGGAAGAGGG (SEQ ID NO: 507)





HTS_rev_Baringo_off_9
TGGAGTTCAGACGTGTGCTCTTCCGATCTTTCTAGCTTGCT



CCTGGGCT (SEQ ID NO: 508)
















TABLE 4







CRISPResso2 output for base editing at the


target locus.









%


Sequence
conversion





CCACCTGAGGAATAGGAAGTACGAGGCCACTGAGGAAC
25.23


(SEQ ID NO: 509)






CCACCTGAGGAATAGGAAGTATGAGGCCACTGAGGAAC
10.51


(SEQ ID NO: 510)






CCACCTGAGGAACAGGAAGTACGAGGCCACTGAGGAAC
6.73


(SEQ ID NO: 511)






CCACCTGAGGAACAGGAAGTATGAGGCCACTGAGGAAC
1.37


(SEQ ID NO: 512)









An example of the CRISPResso2 output from a single AID-BE4max-mediated base editing experiment is shown. The c.A545G mutation is in italics, silent bystander cytosines are bold, and the AGG PAM is underlined. The total conversion to sequences encoding wild-type TMC1 protein was 44%.









TABLE 5







List of base editing targets to correct known


pathogenic point mutations in TMC1.










Base

GRCh37-
GRCh37-


editor
Pathogenic Mutation
Chromo
Location













ABE
NM_138691.2(TMC1):c.−540C>T
9
75136717


ABE
NM_138691.2(TMC1):c.−350C>T
9
75192895


n/a
NM_138691.2(TMC1):c.−329C>A
9
75192916


ABE
NM_138691.2(TMC1):c.−252C>T
9
75231337


ABE
NM_138691.2(TMC1):c.−220C>T
9
75231369


CBE
NM_138691.2(TMC1):c.−124T>C
9
75242908


n/a
NM_138691.2(TMC1):c.7C>A
9
75263571



(p.Pro3Thr)




ABE
NM_138691.2(TMC1):
9
75309449



c.65−10C>T




ABE
NM_138691.2(TMC1):c.100C>T
9
75309494



(p.Arg34Ter)




n/a
NM_138691.2(TMC1):c.135C>A
9
75309529



(p.Thr45=)




n/a
NM_138691.2(TMC1):c.141T>A
9
75309535



(p.Asp47Glu)




n/a
NM_138691.2(TMC1):c.145A>C
9
75309539



(p.Ile49Leu)




ABE
NM_138691.2(TMC1):
9
75309631



c.236+1G>A




n/a
NM_138691.2(TMC1):
9
75315429



c.237−5T>A




n/a
NM_138691.2(TMC1):c.241G>A
9
75315438



(p.Glu81Lys)




CBE
NM_138691.2(TMC1):c.265T>C
9
75315462



(p.Leu89=)




ABE
NM_138691.2(TMC1):c.339G>A
9
75315536



(p.Met113Ile)




n/a
NM_138691.2(TMC1):c.373A>C
9
75355045



(p.Lys125Gln)




ABE
NM_138691.2(TMC1):c.403G>A
9
75355075



(p.Gly135Arg)




ABE
NM_138691.2(TMC1):c.421C>T
9
75355093



(p.Arg141Trp)




ABE
NM_138691.2(TMC1):c.448G>A
9
75355120



(p.Ala150Thr)




ABE
NM_138691.2(TMC1):c.472C>T
9
75357378



(p.Arg158Cys)




ABE
NM_138691.2(TMC1):c.473G>A
9
75357379



(p.Arg158His)




ABE
NM_138691.2(TMC1):c.483G>A
9
75357389



(p.Glu161=)




n/a
NM_138691.2(TMC1):c.534A>T
9
75357440



(p.Glu178Asp)




n/a
NM_138691.2(TMC1):c.557C>G
9
75366787



(p.Ala186Gly)




n/a
NM_138691.2(TMC1):c.603T>G
9
75366833



(p.Val201=)




n/a
NM_138691.2(TMC1):c.624C>A
9
75366854



(p.Ser208Arg)




ABE
NM_138691.2(TMC1):c.637C>T
9
75366867



(p.Pro213Ser)




ABE
NM_138691.2(TMC1):c.674C>T
9
75369733


ABE
NM_138691.2(TMC1):c.684C>T
9
75369743



(p.Thr228=)




n/a
NM_138691.2(TMC1):c.703G>T
9
75369762



(p.Ala235Ser)




ABE
NM_138691.2(TMC1):
9
75387317



c.742−12G>A




ABE
NM_138691.2(TMC1):c.760G>A
9
75387347



(p.Val254Ile)




n/a
NM_138691.2(TMC1):c.777T>C
9
75387364



(p.Tyr259=)









The ClinVar database was searched for pathogenic SNPs in TMC1. Of all 108 pathogenic mutations found in patients, 72 mutations are in principle reversible with CBE or ABE nucleobase editor.


Exemplary guide sequences (expressed as protospacer sequences) suitable for targeting the NPC1 genes and used in the experiments of Examples 1-4 are provided in Table 6 below. The base editor and target correction is shown alongside the relevant guide sequence. Associated amino acid changes in the Niemann-Pick C1 (NPC1) protein are also shown. The target nucleotide (C or A) in the guide sequence is capitalized.









TABLE 6







List of guide RNA sequences used to correct known pathogenic point mutations in


NPC1.










Base





editor
Pathogenic Mutation
Guide sequence
SEQ ID NO:





CBE
NM_000271.5(NPC1):c.3591 + 2T > C
ctccgCgagtaccctgagca
669





ABE
NM_000271.5(NPC1):c.3591 + 1G > A
ctccAtgagtaccctgagca
670





CBE
NM_000271.5(NPC1):c.3566A > G (p.Glu1189Gly)
gccCcttccgcgcgctccac
671





ABE
NM_000271.5(NPC1):c.3503G > A (p.Cys1168Tyr)
ttctAcagccacataaccag
672





ABE
NM_000271.5(NPC1):c.3477 + 2T > C
gtgatggAgagtcctcatac
673





CBE
NM_000271.5(NPC1):c.3467A > G (p.Asn1156Ser)
caggtCgaccaaggatacag
674





ABE
NM_000271.5(NPC1):c.3451G > A (p.Ala1151Thr)
cActgtatccttggtcaacc
675





CBE
NM_000271.5(NPC1):c.3425T > C (p.Met1142Thr)
ttaCgtggctctggggcatc
676





ABE
NM_000271.5(NPC1):c.3289G > A (p.Asp1097Asn)
gacAacactatcttcaacct
677





CBE
NM_000271.5(NPC1):c.3259T > C (p.Phe1087Leu)
tgtcCtctacgaacagtacc
678





CBE
NM_000271.5(NPC1):c.3246 - 2A > G
cacacCggaggggagaggg
679





ABE
NM_000271.5(NPC1):c.3229C > T (p.Arg1077Ter)
tcgAtaggcactgccgttaa
680





CBE
NM_000271.5(NPC1):c.3182T > C (p.Ile1061Thr)
cttaCagccagtaatgtcac
681





ABE
NM_000271.5(NPC1):c.3175C > T (p.Arg1059Ter)
aagtcAggctttcttcagag
682





ABE
NM_000271.5(NPC1):c.3160G > A (p.Ala1054Thr)
ttgacActctgaagaaagcc
683





CBE
NM_000271.5(NPC1):c.3127A > G (p.Thr1043Ala)
gCgtggtaggtcatgaagta
684





ABE
NM_000271.5(NPC1):c.3104C > T (p.Ala1035Val)
gtacgtgActccgaccctgg
685





CBE
NM_000271.5(NPC1):c.3056A > G (p.Tyr1019Cys)
actaCaggcagcatgtcccc
686





ABE
NM_000271.5(NPC1):c.3042 - 1G > A
tcaAgggacatgctgcctat
687





ABE
NM_000271.5(NPC1):c.2974G > A (p.Gly992Arg)
ctcagAggggagacttcatg
688





ABE
NM_000271.5(NPC1):c.2932C > T (p.Arg978Cys)
cagcAaacgcaggcagggt
689





ABE
NM_000271.5(NPC1):c.2893C > T (p.Gln965Ter)
aactAgtcagtgatattgtc
690





ABE
NM_000271.5(NPC1):c.2873G > A (p.Arg958Gln)
tgtcAagtggacaatatcac
691





ABE
NM_000271.5(NPC1):c.2872C > T (p.Arg958Ter)
actcAacagcaagacgactg
692





ABE
NM_000271.5(NPC1):c.2861C > T (p.Ser954Leu)
gcaagacAactgtggcttca
693





ABE
NM_000271.5(NPC1):c.2848G > A (p.Val950Met)
ggAtgaagccacagtcgtct
694





ABE
NM_000271.5(NPC1):c.2842G > A (p.Asp948Asn)
tttcAactgggtgaagccac
695





ABE
NM_000271.5(NPC1):c.2830G > A (p.Asp944Asn)
gatcAacgattatttcgact
696





ABE
NM_000271.5(NPC1):c.2819C > T (p.Ser940Leu)
acAagggggcgaagcctatt
697





ABE
NM_000271.5(NPC1):c.2801G > A (p.Arg934Gln)
ccAaataggcttcgccccct
698





ABE
NM_000271.5(NPC1):c.2780C > T (p.Ala927Val)
gcAccgcgttaaatatctgc
699





ABE
NM_000271.5(NPC1):c.2764C > T (p.Gln922Ter)
ctActgcaccagggaatcat
700





ABE
NM_000271.5(NPC1):c.2761C > T (p.Gln921Ter)
ctAcaccagggaatcattgt
701





ABE
NM_000271.5(NPC1):c.2728G > A (p.Gly910Ser)
tgtgcAgcggcatgggctgc
702





ABE
NM_000271.5(NPC1):c.2713C > T (p.Gln905Ter)
gttctAccccttggaagaag
703





ABE
NM_000271.5(NPC1):c.2665G > A (p.Val889Met)
gcctAtgtactttgtcctgg
704





ABE
NM_000271.5(NPC1):c.2660C > T (p.Pro887Leu)
gcAgacccgcatgcaggtac
705





ABE
NM_000271.5(NPC1):c.2594C > T (p.Ser865Leu)
gcatcAaaagagactgatcc
706





CBE
NM_000271.5(NPC1):c.2474A > G (p.Tyr825Cys)
agaaCaggagtttttgaaga
707





ABE
NM_000271.5(NPC1):c.2366G > A (p.Arg789His)
ttaaacAtcaagaggtaagt
708





ABE
NM_000271.5(NPC1):c.2128C > T (p.Gln710Ter)
atacctAgtaggcctgcacc
709





ABE
NM_000271.5(NPC1):c.2072C > T (p.Pro691Leu)
cAggatgacttcaatcacaa
710





CBE
NM_000271.5(NPC1):c.2054T > C (p.Ile685Thr)
caCtgtgattgaagtcatcc
711





ABE
NM_000271.5(NPC1):c.2050C > T (p.Leu684Phe)
gaAggtcaagggcaaccca
712





ABE
NM_000271.5(NPC1):c.1990G > A (p.Val664Met)
tcAtgctgagctcggtggct
713





ABE
NM_000271.5(NPC1):c.1948 - 1G > A
tcaAgtggattcgaaggtct
714





ABE
NM_000271.5(NPC1):c.1947 + 1G > A
tctgAtaagccggggggggg
715





ABE
NM_000271.5(NPC1):c.1918G > A (p.Gly640Arg)
ccttgAggcacatgaaaagc
716





CBE
NM_000271.5(NPC1):c.1832A > G (p.Asp611Gly)
tcaCcttcaatacttcgttc
717





ABE
NM_000271.5(NPC1):c.1819C > T (p.Arg607Ter)
tcAttcagcagtgaaggaaa
718





ABE
NM_000271.5(NPC1):c.1628C > T (p.Pro543Leu)
cacAggaacactggtccacc
719





ABE
NM_000271.5(NPC1):c.1554 - 1009G > A
acAggtgggtcatatgcaga
720





ABE
NM_000271.5(NPC1):c.1553G > A (p.Arg518Gln)
tacAgtaagtggcaagagac
721





ABE
NM_000271.5(NPC1):c.1552C > T (p.Arg518Trp)
accAtacgcagtacagaaag
722





ABE
NM_000271.5(NPC1):c.1547G > A (p.Cys516Tyr)
actAcgtacggtaagtggca
723





ABE
NM_000271.5(NPC1):c.1421C > T (p.Pro474Leu)
atacAgtgaaagaggggcca
724





ABE
NM_000271.5(NPC1):c.1339C > T (p.Gln447Ter)
ttAtaagtcaagaacctgaa
725





ABE
NM_000271.5(NPC1):c.1327 - 1G > A
caAgttcttgacttacaaat
726





ABE
NM_000271.5(NPC1):c.81G > A (p.Trp27Ter)
tgAtatggagagtgtggaat
727





ABE
NM_000271.5(NPC1):c.1312C > T (p.Gln438Ter)
ctAtatgtcaagcggaggtc
728





ABE
NM_000271.5(NPC1):c.1298C > T (p.Pro433Leu)
ggaAgtccaaagggtacatc
729





ABE
NM_000271.5(NPC1):c.1219C > T (p.Gln407Ter)
agctActccgtccggaagaa
730





ABE
NM_000271.5(NPC1):c.1211G > A (p.Arg404Gln)
ttccAgacggagcagctcat
731





ABE
NM_000271.5(NPC1):c.3G > A (p.Met1Ile)
cagcatAaccgctcgcggcc
732





ABE
NM_000271.5(NPC1):c.1165C > T (p.Arg389Cys)
caggcAagcctggctgctgg
733





ABE
NM_000271.5(NPC1):c.1142G > A (p.Trp381Ter)
ctAgtcagcccccagcagcc
734





CBE
NM_000271.5(NPC1):c.1133T > C (p.Val378Ala)
aatccagCtgacctctggtc
735





ABE
NM_000271.5(NPC1):c.956 - 1G > A
ccaAgagaggcgtcctgctg
736





CBE
NM_000271.5(NPC1):c.1A > G (p.Met1Val)
ggtcaCgctgtggccgcgca
737





ABE
NM_000271.5(NPC1):c.721C > T (p.Gln241Ter)
tcttAgcagctacatggtgc
738





CBE
NM_000271.5(NPC1):c.631 + 2T > C
aggCaggtataaagattcca
739





ABE
NM_000271.5(NPC1):c.530G > A (p.Cys177Tyr)
ctgtAtgggaaggacgctga
740





ABE
NM_000271.5(NPC1):c.433C > T (p.Gln145Ter)
tattAtaactctttcacatt
741





ABE
NM_000271.5(NPC1):c.346C > T (p.Arg116Ter)
tctgtcAagggctacatgtc
742





CBE
NM_000271.5(NPC1):c.337T > C (p.Cys113Arg)
tgacaCgtagccctcgacag
743









Example 5: Image Analyses

To minimize variability, tissue from all conditions was harvested and processed at the same time. A single set of microscope settings was used to collect all images in FIGS. 23 and 24. The AxioScan czi to tif converter was used to convert czi files to multichannel tiffs.


For the determination of GFP nuclei (FIGS. 11A-11E), Purkinje neuron counts, and CD68+ cell counts (FIGS. 15A-15H), ilastik was used to identify fluorescent objects. Experimenter-annotated images (cropped subfields of the images included for publication) were used to manually train the pixel classification module of the program to accurately identify nuclei based on size and morphology. The trained pixel classification module was then used to analyze all images. The probability files from ilastik were imported into CellProfiler for counting. In CellProfiler, objects were detected and counted using the “Mask Image”, “Smooth”, “Enhance Edge,” “Identify Primary Objects,” and “calculate statistic” modules, and the program was instructed to only count objects with specific diameters (GFP images were set to 15 and 100 pixels; CD68 images were set between 10 and 100 pixels). The “Overlay Outlines” module, which generates an image of outlined objects, was used to manually check the automated output. ilastik and Cell Profiler are available at ilastik.org/documentation/pixelclassification/pixelclassification.html and Cellprofiler.org, respectively. The percentage of CD68+ area in the brain was calculated using CellProfiler and ImageJ by dividing the total CD68+ area from “Calculate Statistic” in CellProfiler with total brain area as manually outlined in ImageJ. For quantification of GFP image intensity in FIGS. 11A-11E, ImageJ was used to quantify overall image intensity. A custom macro programmed in the ImageJ macro language (IJM) and generated from Imager s batch processing macro template was used to identify brain tissue, subtract background with a rolling-ball algorithm, and quantify signal intensity. The output is a csv file of the 8-bit image intensity histogram. Each of the 256 rows was a paired (intensity, pixel #) value, with the sum of all pixel #'s adding to the number of pixels in the image. Pixels with an intensity of 1-15 (of 256) were manually set to an intensity of zero after visual inspection showed these pixels corresponded to small-diameter background fluorescence which was not removed by the rolling-ball algorithm (radius=100 px).














/*


* Macro template to process multiple images in a folder


*/


run(“Bio-Formats Macro Extensions”);


#@ File (label = “Input directory”, style = “directory”) input


#@ File (label = “Output directory”, style = “directory”) output


#@ String (label = “File suffix”, value = “.tif”) suffix


processFolder(input);


// function to scan folders/subfolders/files to find files with correct suffix


function processFolder(input) {


list = getFileList(input);


list = Array.sort(list);


for (i = 0; i < list.length; i++) {


if(File.isDirectory(input + File.separator + list[i]))


processFolder(input + File.separator + list[i]);


if(endsWith(list[i], suffix))


processFile(input, output, list[i]);


}


}


function processFile(input, output, file) {


// Do the processing here by adding your own code.


// Leave the print statements until things work, then remove them.


print(“Processing: ” + input + File.separator + file);


active_image = input+File.separator+file;


open(active_image);


Stack.setChannel(1); //DAPI


run(“Enhance Contrast”, “saturated=0.35”);


setAutoThreshold(“Triangle dark no-reset”);


Stack.setChannel(2); //GFP


setMinAndMax(0, 10000);


DAPI=“C1-” + getTitle;


GFP=“C2-” + getTitle;


dir = getDirectory(“image”);


run(“8-bit”);


run(“Split Channels”);


selectWindow(DAPI);


run(“Convert to Mask”);


run(“Create Selection”);


roiManager(“Add”);


roiManager(“Select”, 0);


run(“Enlarge...”, “enlarge=60 pixel”);


roiManager(“Update”);


roiManager(“Select”, 0);


run(“Enlarge...”, “enlarge=-60 pixel”);


roiManager(“Update”);


selectWindow(GFP);


roiManager(“Select”, 0);


run(“Subtract Background...”, “rolling=100”);


roiManager(“Select”, 0);


GFP_tiff_path = output+File.separator+GFP;


saveAs(“Tiff”, GFP_tiff_path);


histo_title=getInfo(“window.title”);


histo_save = output+File.separator+histo_title+“.csv”;


save_histogram( );


saveAs(“Results”, histo_save);


roiManager(“Reset”);


run(“Close All”);


}


function save_histogram( ) {


nBins = 256;


run(“Clear Results”);


row = 0;


getHistogram(values, counts, nBins);


for (i = 0; i<nBins; i++) {


setResult(“Value”, row, values[i]);


setResult(“Count”, row, counts[i]);


row++;


}


updateResults( );


}









EQUIVALENTS AND SCOPE

In the claims articles such as “a,” “an,” and “the” may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Claims or descriptions that include “or” between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process.


Furthermore, the invention encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims is introduced into another claim. For example, any claim that is dependent on another claim can be modified to include one or more limitations found in any other claim that is dependent on the same base claim. Where elements are presented as lists, e.g., in Markush group format, each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should it be understood that, in general, where the invention, or aspects of the invention, is/are referred to as comprising particular elements and/or features, certain embodiments of the invention or aspects of the invention consist, or consist essentially of, such elements and/or features. For purposes of simplicity, those embodiments have not been specifically set forth in haec verba herein.


It is also noted that the terms “comprising” and “containing” are intended to be open and permits the inclusion of additional elements or steps. Where ranges are given, endpoints are included. Furthermore, unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or sub-range within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise.


This application refers to various issued patents, published patent applications, journal articles, and other publications, all of which are incorporated herein by reference. If there is a conflict between any of the incorporated references and the instant specification, the specification shall control. In addition, any particular embodiment of the present invention that falls within the prior art may be explicitly excluded from any one or more of the claims. Because such embodiments are deemed to be known to one of ordinary skill in the art, they may be excluded even if the exclusion is not set forth explicitly herein. Any particular embodiment of the invention can be excluded from any claim, for any reason, whether or not related to the existence of prior art.


Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation many equivalents to the specific embodiments described herein. The scope of the present embodiments described herein is not intended to be limited to the above Description, but rather is as set forth in the appended claims. Those of ordinary skill in the art will appreciate that various changes and modifications to this description may be made without departing from the spirit or scope of the present invention, as defined in the following claims.

Claims
  • 1. A nucleic acid molecule encoding a N-terminal portion of a nucleobase editor fused at its C-terminus to a first intein sequence, wherein the nucleic acid molecule is operably linked to a first promoter,further comprising a nucleic acid segment encoding a guide RNA (gRNA) operably linked to a second promoter, wherein the direction of transcription of the nucleic acid segment is reversed relative to the direction of transcription of the nucleic acid molecule.
  • 2. The nucleic acid molecule of claim 1, wherein the first intein sequence comprises the amino acid sequence as set forth in SEQ ID NO: 351.
  • 3. The nucleic acid molecule of claim 1 or 2 further comprising a transcriptional terminator.
  • 4. The nucleic acid molecule of claim 3, wherein the transcriptional terminator is the transcriptional terminator from a bGH gene, hGH gene, or SV40 gene.
  • 5. The nucleic acid molecule of any one of claims 1-4 further comprising a woodchuck hepatitis posttranscriptional regulatory element (WPRE) inserted 5′ of the transcriptional terminator, optionally wherein the WPRE is a truncated WPRE sequence.
  • 6. The nucleic acid molecule of claim 1, wherein the first promoter is a Cbh promoter.
  • 7. A composition comprising the nucleic acid molecule of any one of claims 1-6.
  • 8. A recombinant AAV (rAAV) particle comprising the nucleic acid molecule of any one of claims 1-6.
  • 9. A nucleic acid molecule encoding a C-terminal portion of a nucleobase editor fused at its N-terminus to an intein sequence, wherein the nucleic acid molecule is operably linked to a first promoter,further comprising a nucleic acid segment encoding a guide RNA (gRNA) operably linked to a second promoter, wherein the direction of transcription of the nucleic acid segment is reversed relative to the direction of transcription of the nucleic acid molecule.
  • 10. The nucleic acid molecule of claim 9, wherein the intein sequence comprises the amino acid sequence as set forth in SEQ ID NO: 353.
  • 11. The nucleic acid molecule of claim 9 or 10 further comprising a transcriptional terminator.
  • 12. The nucleic acid molecule of claim 11, wherein the transcriptional terminator is the transcriptional terminator from a bGH gene, hGH gene, or SV40 gene.
  • 13. The nucleic acid molecule of any one of claims 9-12 further comprising a WPRE inserted 5′ of the transcriptional terminator.
  • 14. The nucleic acid molecule of any one of claims 9-12 further comprising a sequence encoding a uracil glycosylase inhibitor (UGI) at the 3′ end of the nucleic acid molecule.
  • 15. The nucleic acid molecule of claim 14, wherein the UGI comprises the amino acid sequence as set forth in any one of SEQ ID NOs: 299-302.
  • 16. The nucleic acid molecule of any one of claims 9-16, wherein the first promoter is a Cbh promoter.
  • 17. A composition comprising the nucleic acid molecule of any one of claims 9-16.
  • 18. A recombinant AAV (rAAV) particle comprising the nucleic acid molecule of any one of claims 9-16.
  • 19. The nucleic acid molecule of any one of claim 1-6 or 9-16, wherein the nucleobase editor comprises a deaminase.
  • 20. The nucleic acid molecule of claim 19, wherein the deaminase is a cytosine deaminase.
  • 21. The nucleic acid molecule of claim 19, wherein the deaminase is an adenine deaminase.
  • 22. A composition comprising: a) the nucleic acid molecule of any one of claims 1-6, andb) the nucleic acid molecule of any one of claims 9-16.
  • 23. An rAAV particle comprising: a) the nucleic acid molecule of any one of claims 1-6, andb) the nucleic acid molecule of any one of claims 9-16.
  • 24. The rAAV particle of claim 23 further comprising an rAAV2 particle, rAAV6 particle, rAAV8 particle, rPHP.B particle, rPHP.eB particle, or rAAV9 particle, or a variant thereof.
  • 25. The rAAV particle of claim 23 or 24, wherein the rAAV particle is an rAAV9 particle.
  • 26. The composition of claim 22 or the rAAV particle of any one of claims 23-25, wherein the first promoter of the nucleic acid molecule of any one of claims 1-6 and the first promoter of the nucleic acid molecule of any one of claims 9-16 are the same.
  • 27. The composition of claim 22 or the rAAV particle of any one of claims 23-25, wherein the second promoter of the nucleic acid molecule of any one of claims 1-6 and the second promoter of the nucleic acid molecule of any one of claims 9-16 are the same.
  • 28. A composition comprising: (i) a first nucleotide sequence encoding a N-terminal portion of a Cas9 protein fused at its C-terminus to an intein-N; and(ii) a second nucleotide sequence encoding an intein-C fused to the N-terminus of a C-terminal portion of the Cas9 protein,wherein at least one of the first nucleotide sequence and second nucleotide sequence is operably linked to a first promoter,wherein at least one of the first nucleotide sequence and second nucleotide sequence comprises at its 3′ end a gRNA nucleic acid segment encoding a guide RNA (gRNA) operably linked to a second promoter, andwherein the direction of transcription of the gRNA nucleic acid segment is reversed relative to the direction of transcription of the at least one nucleotide sequence.
  • 29. The composition of claim 28, wherein at least one of the first nucleotide sequence and second nucleotide sequence is operably linked to least one bipartite nuclear localization signal.
  • 30. The composition of claim 28 or 29, wherein the N-terminal portion of the Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 1-570, 1-571, 1-572, 1-573, 1-574, 1-575, 1-576, 1-634, 1-635, 1-636, 1-637, 1-638, 1-639, or 1-640 of SEQ ID NO: 3, or amino acids 1-431, 1-453, 1-457, 1-484, 1-501, 1-534, or 1-537 of SEQ ID NO: 11.
  • 31. The composition of any one of claims 28-30, wherein the C-terminal portion of the Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 571-1368, 572-1368, 573-1368, 574-1368, 575-1368, 576-1368, 577-1368, 635-1368, 636-1368, 637-1368, 638-1368, 639-1368, 640-1368, or 641-1368 of SEQ ID NO: 3, or amino acids 432-1054, 454-1054, 458-1054, 485-1054, 502-1054, 535-1054, or 538-1054 of SEQ ID NO: 11.
  • 32. The composition of any one of claims 28-31, wherein the N-terminal portion of the Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 1-573 or 1-637 of SEQ ID NO: 11 or SEQ ID NO: 3.
  • 33. The composition of any one of claims 28-32, wherein the C-terminal portion of the Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 574-1368 or 638-1368 of SEQ ID NO: 11 or SEQ ID NO: 3.
  • 34. The composition of any one of claims 28-33, wherein the intein-N comprises the amino acid sequence as set forth in SEQ ID NO: 351 or 355.
  • 35. The composition of any one of claims 28-34, wherein the intein-C comprises the amino acid sequence as set forth in SEQ ID NO: 353 or 357.
  • 36. The composition of any one of claims 28-33, wherein the intein-N comprises the amino acid sequence as set forth in SEQ ID NO: 351.
  • 37. The composition of any one of claims 28-34, wherein the intein-C comprises the amino acid sequence as set forth in SEQ ID NO: 353.
  • 38. The composition of any one of claims 28-37, wherein the first nucleotide sequence or the second nucleotide sequence further comprises a transcriptional terminator.
  • 39. The composition of claim 38, wherein the transcriptional terminator is the transcriptional terminator from a bGH gene.
  • 40. The composition of any one of claims 28-39, wherein the first nucleotide sequence or the second nucleotide sequence further comprises a WPRE inserted 5′ of the transcriptional terminator.
  • 41. The composition of any one of claims 28-40, wherein the bipartite nuclear localization signal comprises an amino acid sequence selected from the group consisting of:
  • 42. The composition of claim 28-41, wherein the bipartite nuclear localization signal comprises the amino acid sequence as set forth in SEQ ID NO: 344 or 398.
  • 43. The composition of any one of claims 28-42, wherein the Cas9 protein is a catalytically inactive Cas9 (dCas9) or a Cas9 nickase (nCas9), and wherein the first nucleotide sequence of (i) further comprises a nucleotide sequence encoding a nucleobase modifying enzyme fused to the N-terminus of the N-terminal portion of the Cas9 protein.
  • 44. The composition of any one of claims 28-42, wherein the Cas9 protein is a catalytically inactive Cas9 (dCas9) or a Cas9 nickase (nCas9), and wherein the second nucleotide sequence of (ii) further comprises a nucleotide sequence encoding a nucleobase modifying enzyme fused to the C-terminus of the C-terminal portion of the Cas9 protein.
  • 45. The composition of claim 43 or 44, wherein the nucleobase modifying enzyme is a deaminase.
  • 46. The composition of claim 45, wherein the deaminase is a cytosine deaminase.
  • 47. The composition of claim 45, wherein the deaminase is an adenosine deaminase.
  • 48. The composition of any one of claims 28-47, wherein the second nucleotide sequence of (ii) further comprises a nucleotide sequence encoding a uracil glycosylase inhibitor (UGI) at the 3′ end of the second nucleotide sequence.
  • 49. The composition of claim 48, wherein the UGI comprises the amino acid sequence as set forth in any one of SEQ ID NOs: 299-302.
  • 50. The composition of any one of claims 28-49, wherein the first promoter is a Cbh promoter.
  • 51. The composition of any one of claims 28-49, wherein the second promoter is a U6 promoter.
  • 52. The composition of any one of claims 28-51, wherein the first nucleotide sequence and the second nucleotide sequence are on different vectors.
  • 53. The composition of claim 52, wherein each of the different vectors is a genome of a recombinant adeno-associated virus (rAAV).
  • 54. The composition of claim 53, wherein each vector is packaged in a rAAV particle.
  • 55. The composition of claim 54, wherein the rAAV particle is an rAAV2 particle, rAAV6 particle, rAAV8 particle, rPHP.B particle, rPHP.eB particle, or rAAV9 particle, or a variant thereof.
  • 56. The composition of claim 55, wherein the rAAV particle is an rAAV9 particle.
  • 57. A composition, comprising: (i) a first recombinant adeno associated virus (rAAV) particle comprising a first nucleotide sequence encoding a N-terminal portion of a Cas9 protein fused at its C-terminus to an intein-N; and(ii) a second recombinant adeno associated virus (rAAV) particle comprising a second nucleotide sequence encoding an intein-C fused to the N-terminus of a C-terminal portion of the Cas9 protein,wherein at least one of the first nucleotide sequence and second nucleotide sequence is operably linked to a first promoter,wherein at least one of the first nucleotide sequence and second nucleotide sequence comprises at its 3′ end a gRNA nucleic acid segment encoding a guide RNA (gRNA) operably linked to a second promoter, andwherein the direction of transcription of the gRNA nucleic acid segment is reversed relative to the direction of transcription of the at least one nucleotide sequence.
  • 58. A cell comprising at least one of a) the nucleic acid molecule of any one of claims 1-6, b) the nucleic acid molecule of any one of claims 9-16, and c) the nucleic acid molecule of any one of claims 19-21.
  • 59. A cell comprising the composition of any one of claim 7, 17, 22, or 26-57.
  • 60. A cell comprising the rAAV particle of any one of claim 8, 18, or 23-25.
  • 61. The cell of any one of claims 58-60, wherein the N-terminal portion of the Cas9 protein and the C-terminal portion of the Cas9 protein are joined together to form the Cas9 protein.
  • 62. The cell of any one of claims 58-61, wherein the cell is a prokaryotic cell.
  • 63. The cell of claim 62, wherein the cell is a bacterial cell.
  • 64. The cell of any one of claims 58-61, wherein the cell is a eukaryotic cell.
  • 65. The cell of claim 64, wherein the cell is a yeast cell, a plant cell, or a mammalian cell.
  • 66. The cell of claim 65, wherein the cell is a human cell.
  • 67. A kit comprising the composition of any one of claim 7, 17, 22, or 26-57.
  • 68. A kit comprising the rAAV particle of any one of claim 8, 18, or 23-25.
  • 69. A composition comprising: (i) a first nucleotide sequence encoding a N-terminal portion of a nucleobase editor fused at its C-terminus to an intein-N; and(ii) a second nucleotide sequence encoding an intein-C fused to the N-terminus of a C-terminal portion of the nucleobase editor,wherein at least one of the first nucleotide sequence and second nucleotide sequence is operably linked to a first promoter,wherein at least one of the first nucleotide sequence and second nucleotide sequence comprises at its 3′ end a gRNA nucleic acid segment encoding a guide RNA (gRNA) operably linked to a second promoter, andwherein the direction of transcription of the gRNA nucleic acid segment is reversed relative to the direction of transcription of the at least one nucleotide sequence.
  • 70. The composition of claim 69, wherein the intein-N comprises the amino acid sequence as set forth in SEQ ID NO: 351 or 355.
  • 71. The composition of claim 69 or 70, wherein the intein-C comprises the amino acid sequence as set forth in SEQ ID NO: 353 or 357.
  • 72. The composition of claim 69, wherein the intein-N comprises the amino acid sequence as set forth in SEQ ID NO: 351.
  • 73. The composition of claim 69 or 72, wherein the intein-C comprises the amino acid sequence as set forth in SEQ ID NO: 353.
  • 74. The composition of any one of claims 69-73, wherein the first nucleotide sequence or the second nucleotide sequence further comprises a transcriptional terminator.
  • 75. The composition of any one of claims 69-74, wherein the transcriptional terminator is a transcriptional terminator from a bGH gene, hGH gene, or SV40 gene.
  • 76. The composition of any one of claims 69-75, wherein the transcriptional terminator is the transcriptional terminator from a bGH gene.
  • 77. The composition of any one of claims 69-76, wherein the first nucleotide sequence or the second nucleotide sequence further comprises a WPRE inserted 5′ of the transcriptional terminator.
  • 78. The composition of any one of claims 69-77, wherein at least one of the first nucleotide sequence and second nucleotide sequence is operably linked to least one bipartite nuclear localization signal.
  • 79. The composition of any one of claims 69-78, wherein the bipartite nuclear localization signal comprises an amino acid sequence selected from the group consisting of:
  • 80. The composition of claim 79, wherein the bipartite nuclear localization signal comprises the amino acid sequence as set forth in SEQ ID NO: 344 or 398.
  • 81. The composition of any one of claims 69-80, wherein the nucleobase editor comprises a cytosine deaminase fused to the N-terminus of a catalytically inactive Cas9 or a Cas9 nickase.
  • 82. The composition of claim 81, wherein the cytosine deaminase is selected from the group consisting of: APOBEC1, APOBEC3, AID, and pmCDA1.
  • 83. The composition of claim 81 or 82, wherein the nucleobase editor further comprises a uracil glycosylase inhibitor (UGI).
  • 84. The composition of claim 84, wherein the UGI comprises the amino acid sequence of any one of SEQ ID NOs: 299-302.
  • 85. The composition of any one of claims 69-84, wherein the first promoter is a Cbh promoter.
  • 86. The composition of any one of claims 69-85, wherein the second promoter is a U6 promoter.
  • 87. The composition of any one of claims 69-86, wherein the nucleobase editor comprises an amino acid sequence having at least 90% identity, at least 95% identity, or at least 99% identity to the amino acid sequence as set forth in SEQ ID NOs: 365, 372, 388, 399, 478, 482, 483, and 490.
  • 88. The composition of any one of claims 69-87, wherein the first nucleotide sequence and the second nucleotide sequence are on different vectors.
  • 89. The composition of claim 88, wherein each of the different vectors is a genome of a recombinant adeno-associated virus (rAAV).
  • 90. The composition of claim 89, wherein the vector is packaged in a rAAV particle.
  • 91. An rAAV particle comprising: (i) a first nucleotide sequence encoding a N-terminal portion of a nucleobase editor fused at its C-terminus to an intein-N; and(ii) a second nucleotide sequence encoding an intein-C fused to the N-terminus of a C-terminal portion of the nucleobase editor,wherein at least one of the first nucleotide sequence and second nucleotide sequence is operably linked to a first promoter,wherein at least one of the first nucleotide sequence and second nucleotide sequence comprises at its 3′ end a gRNA nucleic acid segment encoding a guide RNA (gRNA) operably linked to a second promoter, andwherein the direction of transcription of the gRNA nucleic acid segment is reversed relative to the direction of transcription of the at least one nucleotide sequence.
  • 92. The rAAV particle of claim 91, further comprising an rAAV2 particle, rAAV6 particle, rAAV8 particle, rPHP.B particle, rPHP.eB particle, or rAAV9 particle, or a variant thereof.
  • 93. The rAAV particle of claim 92, further comprising an rAAV9 particle.
  • 94. A composition comprising: (i) a first recombinant adeno associated virus (rAAV) particle comprising a first nucleotide sequence encoding a N-terminal portion of a nucleobase editor fused at its C-terminus to an intein-N; and(ii) a second recombinant adeno associated virus (rAAV) particle comprising a second nuclei acid encoding an intein-C fused to the N-terminus of a C-terminal portion of the nucleobase editor,wherein at least one of the first nucleotide sequence and second nucleotide sequence is operably linked to a first promoter,wherein at least one of the first nucleotide sequence and second nucleotide sequence comprises at its 3′ end a gRNA nucleic acid segment encoding a guide RNA (gRNA) operably linked to a second promoter, andwherein the direction of transcription of the gRNA nucleic acid segment is reversed relative to the direction of transcription of the at least one nucleotide sequence.
  • 95. A cell comprising the composition of any one of claims 69-90 or the rAAV particle of any one of claims 91-93.
  • 96. The cell of claim 96, wherein the N-terminal portion of the nucleobase editor and the C-terminal portion of the nucleobase editor are joined together to form the nucleobase editor.
  • 97. The cell of claim 95 or 96, wherein the cell is a prokaryotic cell.
  • 98. The cell of claim 97, wherein the cell is a bacterial cell.
  • 99. The cell of claim 95 or 96, wherein the cell is a eukaryotic cell.
  • 100. The cell of claim 99, wherein the cell is a yeast cell, a plant cell, or a mammalian cell.
  • 101. The cell of claim 100, wherein the cell is a human cell.
  • 102. A kit comprising the composition of any one of claims 69-90 or the rAAV particle of any one of claims 91-93.
  • 103. A method comprising: contacting a cell with the composition of any one of claim 7, 17, 22, or 26-57 or the rAAV particle of any one of claim 8, 18, or 23-25, wherein the contacting results in the delivery of the first nucleotide sequence and the second nucleotide sequence into the cell, and wherein the N-terminal portion of the Cas9 protein and the C-terminal portion of the Cas9 protein are joined to form a Cas9 protein.
  • 104. A method comprising: contacting a cell with the composition of any one of claims 69-90 or the rAAV particle of any one of claims 91-93, wherein the contacting results in the delivery of the first nucleotide sequence and the second nucleotide sequence into the cell, and wherein the N-terminal portion of the nucleobase editor and the C-terminal portion of the nucleobase editor are joined to form a nucleobase editor.
  • 105. The method of claim 103 or 104, wherein the cell is a eukaryotic cell.
  • 106. The method of claim 105, wherein the cell is a mammalian cell.
  • 107. The method of claim 106, wherein the cell is a human cell.
  • 108. The method of claim 106 or 107, wherein the cell is a retinal cell.
  • 109. The method of claim 108, wherein the step of contacting results in an editing efficiency of at least about 40%, at least about 45%, at least about 47%, at least about 48%, at least about 49%, at least about 50%, or at least about 55%.
  • 110. The method of claim 106 or 107, wherein the cell is a cortical cell.
  • 111. The method of claim 110, wherein the step of contacting results in an editing efficiency of at least about 50%, at least about 55%, at least about 57%, at least about 58%, at least about 59%, at least about 60%, at least about 61%, or at least about 65%.
  • 112. The method of claim 106 or 107, wherein the cell is a cerebellar cell.
  • 113. The method of claim 112, wherein the step of contacting results in an editing efficiency of at least about 30%, at least about 32%, at least about 34%, at least about 35%, at least about 36%, at least about 37%, or at least about 40%.
  • 114. The method of any one of claims 103-113, wherein the step of contacting results in a base edit:indel ratio of at least about 5:1, 7:1, 8:1, 9:1, 10:1, 11:1, 12:1, 13:1 or greater than about 15:1.
  • 115. A method comprising: administering to a subject in need thereof a therapeutically effective amount of the composition of any one of claim 7, 17, 22, 26-57, or 69-90, or the rAAV particle of any one of claim 8, 18, 23-25, or 91-93.
  • 116. The method of claim 115, wherein the subject has a disease or disorder.
  • 117. The method of claim 116, wherein the disease or disorder is selected from the group consisting of: cystic fibrosis, phenylketonuria, epidermolytic hyperkeratosis (EHK), chronic obstructive pulmonary disease (COPD), Charcot-Marie-Toot disease type 4J, neuroblastoma (NB), von Willebrand disease (vWD), myotonia congenital, hereditary renal amyloidosis, dilated cardiomyopathy, hereditary lymphedema, familial Alzheimer's disease, prion disease, chronic infantile neurologic cutaneous articular syndrome (CINCA), Niemann-Pick disease type C (NPC) disease, congenital deafness, and desmin-related myopathy (DRM).
  • 118. The method of claim 117, wherein the disease or disorder is Niemann-Pick, type C1 (NPC1) disease.
  • 119. The method of any one of claims 115-118, wherein the rAAV particle is administered in a therapeutically effective amount of about 1015, about 1014, about 1013, about 1012, or less than about 1012 vector genomes (vgs) per kg weight of the subject.
  • 120. The method of any one of claims 116-119, wherein the disease or disorder is associated with a point mutation in an NPC1 gene, a DNMT1 gene, a PCSK9 gene, or a Tmc1 gene.
  • 121. The method of claim 120, wherein the point mutation is a T3182C mutation in NPC1 or a A545G mutation in TMC1.
  • 122. The composition of any one of claim 28-57 or 69-90, wherein the Cas9 protein comprises a Cas9 selected from S. pyogenes Cas9, S. pyogenes Cas9 nickase, S. aureus Cas9, and S. aureus Cas9 nickase.
  • 123. The composition of any one of claims 28-31, wherein the N-terminal portion of the Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 1-534 of SEQ ID NO: 11.
  • 124. The composition of any one of claims 28-32, wherein the C-terminal portion of the Cas9 protein comprises a portion of any one of SEQ ID NOs: 1-129, 143-275, 282-291, 394-397, 435-437, 519-549, and 554-556 that corresponds to amino acids 535-1054 of SEQ ID NO: 11.
  • 125. The composition of any one of claims 69-86, wherein the nucleobase editor comprises an amino acid sequence having at least 90% identity, at least 95% identity, or at least 99% identity to the amino acid sequence as set forth in SEQ ID NOs: 303-313, 362, 364, 365, 369-372, 399-406, 482, 489-490, 515-518, 550-552.
  • 126. The composition of any one of claims 69-86, wherein the nucleobase editor comprises an amino acid sequence having at least 90% identity, at least 95% identity, or at least 99% identity to the amino acid sequence as set forth in SEQ ID NOs: 323-342, 379-383, 385-388, 458-478, 480, 483, and 553.
  • 127. The composition of any one of claim 69-90 or 122-126, wherein the guide RNA comprises a nucleic acid sequence that is at least 90%, at least 95%, at least 98%, or at least 99% identical to any one of 669-743.
  • 128. The composition of claim 127, wherein the guide RNA comprises a nucleic acid sequence selected from the group consisting of
  • 129. The nucleic acid molecule of any one of claims 1-6, wherein the nucleic acid molecule comprises sequence that is at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to any one of SEQ ID NOs: 642, 644, 646, 648, 650, and 652.
  • 130. The nucleic acid molecule of any one of claims 9-16, wherein the nucleic acid molecule comprises sequence that is at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to any one of SEQ ID NOs: 643, 645, 647, 649, 651, and 653.
  • 131. A composition comprising the nucleic acid molecule of claim 129, and the nucleic acid molecule of claim 130.
  • 132. An rAAV particle comprising the nucleic acid molecule of claim 129, and the nucleic acid molecule of claim 130.
RELATED APPLICATIONS

This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Applications, U.S. Ser. No. 62/850,523, filed May 20, 2019, and U.S. Ser. No. 62/949,275, filed Dec. 17, 2019, each of which is incorporated herein by reference.

GOVERNMENT SUPPORT

This invention was made with government support under grant numbers UG3 TR002636, U01 AI142756, RM1 HG009490, R35 GM118062, and R01 EB022376 awarded by the National Institutes of Health. The government has certain rights in the invention.

PCT Information
Filing Document Filing Date Country Kind
PCT/US2020/033873 5/20/2020 WO
Provisional Applications (2)
Number Date Country
62949275 Dec 2019 US
62850523 May 2019 US