BASE EDITING ENZYMES

BACKGROUND

Cas enzymes along with their associated Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) guide ribonucleic acids (RNAs) appear to be a pervasive (˜45% of bacteria, ˜84% of archaea) component of prokaryotic immune systems, serving to protect such microorganisms against non-self nucleic acids, such as infectious viruses and plasmids by CRISPR-RNA guided nucleic acid cleavage. While the deoxyribonucleic acid (DNA) elements encoding CRISPR RNA elements may be relatively conserved in structure and length, their CRISPR-associated (Cas) proteins are highly diverse, containing a wide variety of nucleic acid-interacting domains. While CRISPR DNA elements have been observed as early as 1987, the programmable endonuclease cleavage ability of CRISPR complexes has only been recognized relatively recently, leading to the use of recombinant CRISPR systems in diverse DNA manipulation and gene editing applications.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. The XML copy, created on Jun. 6, 2024, is named 55921-742.301v3.xml and is 2,368,638 bytes in size.

SUMMARY

In some aspects, the present disclosure provides for a method of deaminating a cytosine residue in a eukaryotic nucleic acid sequence in a cell, comprising: contacting to said eukaryotic nucleic acid sequence a polypeptide with cytosine deaminase activity comprising a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 599-675, 744-835, 970-982, or a variant thereof. In some embodiments, said eukaryotic nucleic acid sequence is a mammalian, primate, or human nucleic acid sequence. In some embodiments, said cell is a mammalian, primate, or human cell. In some embodiments, said eukaryotic nucleic acid sequence comprises single-stranded DNA (ssDNA) or ribonucleic acid (RNA). In some embodiments, said polypeptide with cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 809-811, 819, 826, 752, 777, 823, 668-671, 675, 650, 752, 774, 777, 806, 812, 816, 817, 818, 825, 827, 832, 970-982, or a variant thereof. In some embodiments, said polypeptide with cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 808, 810-811, 819, 826, 752, 777, or 823, or a variant thereof. In some embodiments, said eukaryotic nucleic acid sequence comprises double-stranded DNA (dsDNA). In some embodiments, said polypeptide with cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 810-811. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a nucleic acid binding domain, an endonuclease, or a nickase. In some embodiments, said polypeptide with cytosine deaminase activity further comprises said endonuclease or said nickase, wherein said endonuclease or said nickase comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 70-78, 596, 597, 1120, 1122-1127, 1647, or a variant thereof. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a nickase, wherein said nickase comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597, or any combination thereof. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a uracil DNA glycosylase inhibitor sequence. In some embodiments, said uracil DNA glycosylase inhibitor comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 52-56 or SEQ ID NO: 67, or a variant thereof. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a FAM72A sequence. In some embodiments, said FAM72A sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID NO: 1121, or a variant thereof.

In some aspects, the present disclosure provides for a method of deaminating a cytosine residue in a primate nucleic acid sequence in a cell, comprising: contacting to a primate nucleic acid sequence a polypeptide with cytosine deaminase activity comprising a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 599-638, 660-675, 828-835, or a variant thereof. In some embodiments, said eukaryotic nucleic acid sequence comprises double-stranded DNA (dsDNA), single-stranded DNA (ssDNA) or ribonucleic acid (RNA). In some embodiments, said polypeptide with cytosine deaminase activity further comprises a nucleic acid binding domain, an endonuclease, or a nickase. In some embodiments, said polypeptide with cytosine deaminase activity further comprises said endonuclease or said nickase, wherein said endonuclease or said nickase comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 70-78, 596, 597, 1120, 1122-1127, 1647, or a variant thereof. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a nickase, wherein said nickase comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597, or any combination thereof. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a uracil DNA glycosylase inhibitor sequence. In some embodiments, said uracil DNA glycosylase inhibitor comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 52-56 or SEQ ID NO: 67, or a variant thereof. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a FAM72A sequence. In some embodiments, said FAM72A sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID NO: 1121, or a variant thereof.

In some aspects, the present disclosure provides for a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in a mammalian organism, wherein said nucleic acid encodes a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 599-675, 744-835, 970-982, or a variant thereof. In some embodiments, said nucleic acid encodes a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 809-811, 819, 826, 752, 777, 823, 668-671, 675, 650, 752, 774, 777, 806, 812, 816, 817, 818, 825, 827, 832, 832, 970-982, or a variant thereof

In some aspects, the present disclosure provides for a nucleic acid encoding any of the polypeptides described herein.

In some aspects, the present disclosure provides for a vector comprising any of the nucleic acids described herein.

In some aspects, the present disclosure provides for a fusion polypeptide comprising: (a) a domain with cytosine deaminase activity comprising a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 599-675, 744-835, 970-982, or a variant thereof; and (b) a nucleic acid binding domain, an endonuclease domain, or a nickase domain. In some embodiments, said domain with cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 809-811, 819, 826, 752, 777, 823, 668-671, 675, 650, 752, 774, 777, 806, 812, 816, 817, 818, 825, 827, 832, 832, 970-982, or a variant thereof. In some embodiments, said domain with cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 809-811, 819, 826, 752, 777, 823, or a variant thereof. In some embodiments, said fusion polypeptide comprises said endonuclease domain or said nickase domain, wherein said endonuclease domain or said nickase domain comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 70-78, 596, 597, Sequence Number: A598, SEQ ID NOs: 1120, 1122-1127, 1647, or a variant thereof. In some embodiments, said fusion protein comprises said nickase domain, wherein said nickase domain comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597, or any combination thereof. In some embodiments, said fusion protein comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 877-916 or 968-969, or a variant thereof.

In some aspects, the present disclosure provides for system comprising: (a) any of the fusion proteins (e.g. endonuclease-base editor or endonuclease-deaminase fusions); and (b) an engineered guide polynucleotide configured to form a complex with said endonuclease domain comprising: (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a ribonucleic acid sequence configured to bind to said endonuclease domain. In some embodiments, said engineered guide polynucleotide further comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to non-degenerate nucleotides of any one of SEQ ID NOs: 88-96, 917-931, 963-967, 1099-1105, or a variant thereof.

In some aspects, the present disclosure provides for a polypeptide with adenosine deaminase activity comprising: a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 50, 51, 385-443, 448-475, or a variant thereof, wherein said polypeptide comprises a substitution at least one of residues T2, D7, E10, M13, W24, G32, K38, G45, G51, A63, E66, R75, C91, G93, H97, A107, E108, D109, P110, H124, A126, H129, F150, or S165, or any combination thereof relative to SEQ ID NO: 50 when optimally aligned. In some embodiments, said substitution comprises T2X₁, D7X₁, E10X₁, M13X₄, W24X₁, G32X₁, K38X₂, G45X₂, G51X₅, A63X₇, E66X₅, E66X₂, R75H, C91R, G93X₆, H97X₆, H97X₅, A107X₅, E108X₂, D109N, P110H, H124X₆, A126X₂, H129R, H129N, F150P, F150S, S165X₅, or any combination thereof relative to SEQ ID NO: 50 or MG68-4 when optimally aligned, wherein X₁is A or G; X₂is D or E; X₃is N or Q; X₄is R or K; X₅is I, L, M, or V; X₆is F, Y, or W; and X₇is S or T. In some embodiments, said polypeptide comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 836-860, or a variant thereof. In some embodiments, said polypeptide comprises any one of SEQ ID NOs: 839, 841, 843, 844, 847, 848, 849, 850, 851, 852, 859, or a variant thereof. In some embodiments, said substitution comprises W24G, G51V, E108D, P110H, F150P, D7G, E10G, or H129N, or any combination thereof, relative to SEQ ID NO: 50 or MG68-4 when optimally aligned. In some embodiments, said polypeptide further comprises a nucleic acid binding domain, an endonuclease domain, or a nickase domain. In some embodiments, said polypeptide comprises said endonuclease domain or said nickase domain, wherein said endonuclease domain or said nickase domain comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 70-78, 596, 597, Sequence Number: A598, SEQ ID NOs: 1120, 1122-1127, 1647, or a variant thereof. In some embodiments, said polypeptide comprises said nickase domain, wherein said nickase domain comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597, or any combination thereof.

In some aspects, the present disclosure provides for a system comprising: (a) any of the polypeptides or fusion polypeptides described herein; and (b) an engineered guide polynucleotide configured to form a complex with said endonuclease domain comprising: (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a ribonucleic acid sequence configured to bind to said endonuclease domain. In some embodiments, said engineered guide polynucleotide further comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to non-degenerate nucleotides of any one of SEQ ID NOs: 88-96, 917-931, 963-967, 1099-1105, or a variant thereof;

In some aspects, the present disclosure provides for a method of deaminating a cytosine residue in a cell, comprising introducing to said cell: (a) a vector encoding a polypeptide with cytosine deaminase activity; and (b) a vector encoding a FAM72A protein. In some embodiments, said vector encoding said FAM72A protein comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID NO: 1115, or a variant thereof, or encodes a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID NO: 1121, or a variant thereof. In some embodiments, said polypeptide with cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 599-675, 744-835, 970-982, or a variant thereof. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a nucleic acid binding domain, an endonuclease domain, or a nickase domain. In some embodiments, said polypeptide with cytosine deaminase activity comprises said endonuclease domain or said nickase domain, wherein said endonuclease domain or said nickase domain comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 70-78, 596, 597, Sequence Number: A598, SEQ ID NOs: 1120, 1122-1127, 1647, or a variant thereof. In some embodiments, said polypeptide with cytosine deaminase activity comprises said nickase domain, wherein said nickase domain comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597, or any combination thereof.

In some aspects, the present disclosure provides for an engineered nucleic acid editing polypeptide, comprising (i) a sequence with cytosine deaminase activity; and (ii) a sequence derived from a FAM72A protein. In some embodiments, said sequence with cytosine deaminase activity has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 599-675, 744-835, 970-982, or a variant thereof. In some embodiments, said sequence derived from said FAM72A protein has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID NO: 1121, or a variant thereof. In some embodiments, the polypeptide further comprises an endonuclease sequence comprising a RuvC domain and an HNH domain, wherein said endonuclease sequence is a sequence of a class 2, type II endonuclease. In some embodiments, said RuvC domain lacks nuclease activity. In some embodiments, said endonuclease comprises a nickase. In some embodiments, said class 2, type II endonuclease sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 70-78, 596, 597, Sequence Number: A598, SEQ ID NOs: 1120, 1122-1127, 1647, or a variant thereof. In some embodiments, said class 2, type II endonuclease comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597 when optimally aligned.

In some aspects, the present disclosure provides for a method of editing a cytosine residue to a thymine residue in a cell, comprising contacting to said cell any of the cytosine deaminase fusion polypeptides described herein. In some embodiments, said cell is a prokaryotic, eukaryotic, mammalian, primate, or human cell.

In some aspects, the present disclosure provides for an engineered nucleic acid editing polypeptide, comprising: a plurality of domains derived from a Class 2, Type II endonuclease, wherein said domains comprise RUVC-I, REC, HNH, RUVC-III, and WED domains; and a domain comprising a base editor sequence, wherein said base editor sequence is inserted: (a) within said RUVC-I domain; (b) within said REC domain; (c) within said HNH domain; (d) within said RUV-CIII domain; (e) within said WED domain; (f) prior to said HNH domain; (g) prior to said RUV-CIII domain; or (h) between said RUVC-III and said WED domain. In some embodiments, said Class 2, Type II endonuclease comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 70-78, 596, 597, Sequence Number: A598, SEQ ID NOs: 1120, 1122-1127, 1647, or a variant thereof. In some embodiments, said Class 2, Type II endonuclease comprises a sequence having at least 80% sequence identity to SEQ ID NO: 1647, or a variant thereof. In some embodiments, said base editor sequence comprises a deaminase sequence. In some embodiments, said deaminase sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 599-675, 744-835, 970-982, 50, 51, 385-443, 448-475, or a variant thereof. In some embodiments, said deaminase sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 599-675, 744-835, 970-982, or a variant thereof. In some embodiments, said deaminase sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 50, 51, 385-443, 448-475, or a variant thereof. In some embodiments, said deaminase has at least 80% sequence identity to SEQ ID NO: 386, or a variant thereof. In some embodiments, said deaminase sequence comprises a substitution of one of residues T2, D7, E10, M13, W24, G32, K38, G45, G51, A63, E66, R75, C91, G93, H97, A107, E108, D109, P110, H124, A126, H129, F150, or S165, or any combination thereof relative to SEQ ID NO: 50 or MG68-4 when optimally aligned. In some embodiments, said engineered nucleic acid editing polypeptide comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1128-1160, or a variant thereof. In some embodiments, said engineered nucleic acid editing polypeptide comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1137, 1140, 1142, 1143, 1146, 1149, 1151-1158, or a variant thereof. In some embodiments, said engineered nucleic acid editing polypeptide comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1139,1152,1158, or a variant thereof.

In some aspects, the present disclosure provides for polypeptide with adenosine deaminase activity comprising: a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 50, 51, 385-443, 448-475, or a variant thereof, wherein said polypeptide comprises a substitution of a wild-type residue for a non-wild-type residue at residue 109 and one other residue comprising any one of 24, 37, 49, 52, 83, 85, 107, 110, 112, 120, 123, 124, 147, 148, 150, 156, 157, 158, 166, 167, or 129, or any combination thereof relative to SEQ ID NO: 386 when optimally aligned. In some embodiments, said sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID NO: 386. In some embodiments, the polypeptide comprises a substitution of 109N and at least one other substitution comprising any one of 24R, 37L, 49A, 52L, 83S, 85F, 107V, 110S, 112R, 120N, 123N, 124Y, 147C, 148Y, 148R, 150Y, 156V, 157F, 158N, 1661, or 129N, or any combination thereof relative to SEQ ID NO: 386 when optimally aligned. In some embodiments, the peptide comprises any of the substitutions depicted in FIG. 34B. In some embodiments, said polypeptide has at least 80% sequence identity to any one of SEQ ID NOs: 1161-1183, or a variant thereof. In some embodiments, said polypeptide has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1170, 1179, or 1166, or a variant thereof. In some embodiments, said polypeptide further comprises an endonuclease or a nickase. In some embodiments, said polypeptide comprises said endonuclease or said nickase, wherein said endonuclease or said nickase comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 70-78, 596, 597, Sequence Number: A598, SEQ ID NOs: 1120, 1122-1127, 1647, or a variant thereof. In some embodiments, said polypeptide comprises said nickase, wherein said nickase comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597, or any combination thereof

In some aspects, the present disclosure provides for a polypeptide with cytosine deaminase activity comprising: a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 599-675, 744-835, 970-982, or a variant thereof; wherein said polypeptide comprises at least one of the alterations described in Table 12C. In some embodiments, said polypeptide has at least one substitution of a wild-type amino acid for a non-wild-type amino acid comprising any one of W90A, W90F, W90H, W90Y, Y120F, Y120H, Y121F, Y121H, Y121Q, Y121A, Y121D, Y121W, H122Y, H122F, H1221, H122A, H122W, H122D, Y121T, R33A, R34A, R34K, H122A, R33A, R34A, R52A, N57G, H122A, E123A, E123Q, W127F, W127H, W127Q, W127A, W127D, R39A, K40A, H128A, N63G, R58A, H121F, H121Y, H121Q, H121A, H121D, H121W, R33A, K34A, H122A, H121A, R52A, P26R, P26A, N27R, N27A, W44A, W45A, K49G, S50G, R51G, R121A, I122A, N123A, Y88F, Y120F, P22R, P22A, K23A, K41R, K41A, E54A, E54A, E55A, K30A, K30R, M32A, M32K, Y117A, K118A, 1119A, 1119H, R120A, R121A, P46A, P46R, N29A, R27A, or N50G, or any combination thereof, optionally relative to an APOBEC polypeptide. In some embodiments, the polypeptide comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1208-1315, or a variant thereof

In some aspects, the present disclosure provides for a polypeptide with cytosine deaminase activity comprising: a cytosine deaminase sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 835, 1275, 668, 774, 818, 671, 667, 650, 827, 819, 823, 814, 813, 817, 628, 826, 1223, 834, 618, 621, 669, 833, 830, or a variant thereof; and an endonuclease or a nickase. In some embodiments, said endonuclease or said nickase comprises a sequence having at least 80% identity to any one of SEQ ID NOs: 70-78, 596, 597, Sequence Number: A598, SEQ ID NOs: 1120, or 1122-1127, 1647, or a variant thereof. In some embodiments, said polypeptide comprises said nickase, wherein said nickase comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597, or any combination thereof. In some embodiments, said cytosine deaminase sequence has at least 80% sequence identity to any one of SEQ ID NOs: 1275, 835, or 774, or a combination thereof.

In some aspects, the present disclosure provides for a method of editing an APOA1 locus in a cell, comprising contacting to said cell (a) an RNA-guided endonuclease; and (b) an engineered guide nucleic acid structure, wherein said engineered guide nucleic acid structure is configured to form a complex with said endonuclease and said engineered guide nucleic acid structure comprises a spacer sequence configured to hybridize to a region of said APOA1 locus, wherein said engineered guide nucleic acid structure comprises a targeting sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to at least 18, 19, 20, 21, 22, 23, 24, 25, or 26 consecutive nucleotides of any one of SEQ ID NOs: 1455-1478 or a reverse complement thereof. In some embodiments, said engineered guide nucleic acid structure has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1431-1454. In some embodiments, said engineered guide nucleic acid structure comprises any of the nucleotide modifications recited in Table 13A. In some embodiments, said RNA-guided endonuclease is a class 2, type II endonuclease. In some embodiments, said RNA-guided endonuclease has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 70-78, 596, 597, Sequence Number: A598, SEQ ID NOs: 1120, 1122-1127, 1647, or a variant thereof.

In some aspects, the present disclosure provides for a method of editing an ANGPTL3 locus in a cell, comprising contacting to said cell (a) an RNA-guided endonuclease; and (b) an engineered guide nucleic acid structure, wherein said engineered guide nucleic acid structure is configured to form a complex with said endonuclease and said engineered guide nucleic acid structure comprises a spacer sequence configured to hybridize to a region of said ANGPTL3 locus, wherein said engineered guide nucleic acid structure comprises a targeting sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to at least 18, 19, 20, 21, 22, 23, 24, 25, or 26 consecutive nucleotides of any one of SEQ ID NOs: 1484-1488 or a reverse complement thereof. In some embodiments, said engineered guide nucleic acid structure has at least 80% identity to any one of SEQ ID NOs: 1479-1483. In some embodiments, said engineered guide nucleic acid structure comprises any of the nucleotide modifications recited in Table 13A. In some embodiments, said RNA-guided endonuclease is a class 2, type II endonuclease. In some embodiments, said RNA-guided endonuclease has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 70-78, 596, 597, Sequence Number: A598, SEQ ID NOs: 1120, 1122-1127, 1647, or a variant thereof.

In some aspects, the present disclosure provides for a method of editing a TRAC locus in a cell, comprising contacting to said cell (a) an RNA-guided endonuclease; and (b) an engineered guide nucleic acid structure, wherein said engineered guide nucleic acid structure is configured to form a complex with said endonuclease and said engineered guide nucleic acid structure comprises a spacer sequence configured to hybridize to a region of said TRAC locus, wherein said engineered guide nucleic acid structure comprises a targeting sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to at least 18, 19, 20, 21, 22, 23, 24, 25, or 26 consecutive nucleotides of any one of SEQ ID NOs: 1491-1492 or a reverse complement thereof. In some embodiments, said engineered guide nucleic acid structure has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1489-1490. In some embodiments, aid engineered guide nucleic acid structure comprises any of the nucleotide modifications recited in Table 13A. In some embodiments, said RNA-guided endonuclease is a class 2, type II endonuclease. In some embodiments, said RNA-guided endonuclease has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 70-78, 596, 597, Sequence Number: A598, SEQ ID NOs: 1120, 1122-1127, 1647, or a variant thereof.

In some aspects, the present disclosure provides for an engineered adenosine base editor polypeptide, wherein said polypeptide comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1647-1653.

In some aspects, the present disclosure provides for a method of deaminating a cytosine residue in a eukaryotic nucleic acid sequence in a cell, comprising: contacting to said eukaryotic nucleic acid sequence a polypeptide with cytosine deaminase activity comprising a sequence having at least at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 599-675, 744-835, 970-982, or a variant thereof. In some embodiments, said eukaryotic nucleic acid sequence is a mammalian, primate, or human nucleic acid sequence. In some embodiments, said cell is a mammalian, primate, or human cell. In some embodiments, said eukaryotic nucleic acid sequence comprises single-stranded DNA (ssDNA) or ribonucleic acid (RNA). In some embodiments, said polypeptide with cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 809-811, 819, 826, 752, 777, 823, 668-671, 675, 650, 752, 774, 777, 806, 812, 816, 817, 818, 825, 827, 832, 970-982, or a variant thereof. In some embodiments, said polypeptide with cytosine deaminase activity comprises a sequence having at least 80% identity to any one of SEQ ID NOs: 808, 810-811, 819, 826, 752, 777, or 823, or a variant thereof. In some embodiments, said eukaryotic nucleic acid sequence comprises double-stranded DNA (dsDNA). In some embodiments, said polypeptide with cytosine deaminase activity comprises a sequence having at least 80% identity to any one of SEQ ID NOs: 810-811. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a nucleic acid binding domain, an endonuclease, or a nickase. In some embodiments, said polypeptide with cytosine deaminase activity further comprises said endonuclease or said nickase, wherein said endonuclease or said nickase comprises a sequence having at least 80%, at least 810%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 70-78, 596, 597, 1120, or 1122-1127, or a variant thereof. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a nickase, wherein said nickase comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597, or any combination thereof. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a uracil DNA glycosylase inhibitor sequence. In some embodiments, said uracil DNA glycosylase inhibitor comprises a sequence with at least 70%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 52-56 or SEQ ID NO: 67, or a variant thereof. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a FAM72A sequence. In some embodiments, said FAM72A sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID NO: 1121, or a variant thereof.

In some aspects, the present disclosure provides for a method of deaminating a cytosine residue in a primate nucleic acid sequence in a cell, comprising: contacting to said primate nucleic acid sequence a polypeptide with cytosine deaminase activity comprising a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 599-638, 660-675, or 828-835, or a variant thereof. In some embodiments, said eukaryotic nucleic acid sequence comprises double-stranded DNA (dsDNA), single-stranded DNA (ssDNA) or ribonucleic acid (RNA). In some embodiments, said polypeptide with cytosine deaminase activity further comprises a nucleic acid binding domain, an endonuclease, or a nickase. In some embodiments, said polypeptide with cytosine deaminase activity further comprises said endonuclease or said nickase, wherein said endonuclease or said nickase comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity identity to any one of SEQ ID NOs: 70-78, 596, 597, 1120, or 1122-1127, or a variant thereof. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a nickase, wherein said nickase comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597, or any combination thereof. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a uracil DNA glycosylase inhibitor sequence. In some embodiments, said uracil DNA glycosylase inhibitor comprises a sequence with at least 70%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 52-56 or SEQ ID NO: 67, or a variant thereof. In some embodiments, said polypeptide with cytosine deaminase activity further comprises a FAM72A sequence. In some embodiments, said FAM72A sequence has at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to SEQ ID NO: 1121, or a variant thereof.

In some aspects, the present disclosure provides for a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in a mammalian organism, wherein said nucleic acid encodes a sequence having at least 70%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 599-675, 744-835, 970-982, or a variant thereof. In some embodiments, said nucleic acid encodes a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 809-811, 819, 826, 752, 777, 823, 668-671, 675, 650, 752, 774, 777, 806, 812, 816, 817, 818, 825, 827, 832, 832, 970-982, or a variant thereof.

In some aspects, the present disclosure provides for a vector comprising any of the nucleic acids described herein. In some embodiments, the vector is a non-viral or a viral vector. In some embodiments the vector is a plasmid, minicircle, or plasmid vector. In some embodiments, the viral vector is an AAV vector.

In some aspects, the present disclosure provides for a fusion polypeptide comprising: (a) a domain with cytosine deaminase activity comprising a sequence having at least 80% identity to any one of SEQ ID NOs: 1-49, 444-447, 599-675, 744-835, 970-982, or a variant thereof; and (b) a nucleic acid binding domain, an endonuclease domain, or a nickase domain. In some embodiments, said domain with cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 809-811, 819, 826, 752, 777, 823, 668-671, 675, 650, 752, 774, 777, 806, 812, 816, 817, 818, 825, 827, 832, 832, 970-982, or a variant thereof. In some embodiments, said domain with cytosine deaminase activity comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 809-811, 819, 826, 752, 777, 823, or a variant thereof. In some embodiments, said fusion polypeptide comprises said endonuclease domain or said nickase domain, wherein said endonuclease domain or said nickase domain comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 70-78, 596, 597, Sequence Number: A598, SEQ ID NOs: 1120, or 1122-1127, or a variant thereof. In some embodiments, said fusion protein comprises said nickase domain, wherein said nickase domain comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597, or any combination thereof. In some embodiments, said fusion protein comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 877-916 or 968-969, or a variant thereof.

In some aspects, the present disclosure provides for a system comprising: (a) any of the the fusion polypeptides described herein; and (b) an engineered guide polynucleotide configured to form a complex with said endonuclease domain comprising: (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a ribonucleic acid sequence configured to bind to said endonuclease domain. In some embodiments, said engineered guide polynucleotide further comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to non-degenerate nucleotides of any one of SEQ ID NOs: 88-96, 917-931, 963-967, or 1099-1105, or a variant thereof.

In some aspects, the present disclosure provides for a polypeptide with adenosine deaminase activity comprising: a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 50, 51, 385-443, 448-475, or a variant thereof, wherein said polypeptide comprises a substitution at least one of residues T2, D7, E10, M13, W24, G32, K38, G45, G51, A63, E66, R75, C91, G93, H97, A107, E108, D109, P110, H124, A126, H129, F150, or S165, or any combination thereof relative to SEQ ID NO: 50 when optimally aligned. In some embodiments, said substitution comprises T2X₁, D7X₁, E10X₁, M13X₄, W24X₁, G32X₁, K38X₂, G45X₂, G51X₅, A63X₇, E66X₅, E66X₂, R75H, C91R, G93X₆, H97X₆, H97X₅, A107X₅, E108X₂, D109N, P110H, H124X₆, A126X₂, H129R, H129N, F150P, F150S, S165X₅, or any combination thereof relative to SEQ ID NO: 50 when optimally aligned, wherein X₁is A or G; X₂is D or E; X₃is N or Q; X₄is R or K; X₅is I, L, M, or V; X₆is F, Y, or W; and X₇is S or T. In some embodiments, said polypeptide comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity any one of SEQ ID NOs. 836-860, or a variant thereof. In some embodiments, said polypeptide comprises any one of SEQ ID NOs: 839, 841, 843, 844, 847, 848, 849, 850, 851, 852, or 859. In some embodiments, said substitution comprises W24G, G51V, E108D, P110H, F150P, D7G, E10G, or H129N, or any combination thereof, relative to SEQ ID NO: 50 when optimally aligned. In some embodiments, said polypeptide further comprises a nucleic acid binding domain, an endonuclease domain, or a nickase domain. In some embodiments, said polypeptide comprises said endonuclease domain or said nickase domain, wherein said endonuclease domain or said nickase domain comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to any one of SEQ ID NOs: 70-78, 596, 597, Sequence Number: A598, SEQ ID NOs: 1120, or 1122-1127, or a variant thereof. In some embodiments, said polypeptide comprises said nickase domain, wherein said nickase domain comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597, or any combination thereof.

In some aspects, the present disclosure provides for a system comprising: (a) any of the polypeptides for base editor fusions described herein (e.g. endonuclease deaminase fusions); and (b) an engineered guide polynucleotide configured to form a complex with said endonuclease domain comprising: (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a ribonucleic acid sequence configured to bind to said endonuclease domain. In some embodiments, said engineered guide polynucleotide further comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to non-degenerate nucleotides of any one of SEQ ID NOs: 88-96, 917-931, 963-967, or 1099-1105.

In some aspects, the present disclosure provides for an engineered nucleic acid editing system, comprising: an endonuclease comprising a RuvC domain and an HNH domain, wherein said endonuclease is derived from an uncultivated microorganism, wherein said endonuclease is a class 2, type II endonuclease, wherein said endonuclease is configured to be deficient in nuclease activity; a base editor coupled to said endonuclease; and an engineered guide ribonucleic acid structure configured to form a complex with said endonuclease comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a ribonucleic acid sequence configured to bind to said endonuclease. In some embodiments, said RuvC domain lacks nuclease activity. In some embodiments, said class 2, type II endonuclease comprises a nickase mutation. In some embodiments, said class 2, type II endonuclease comprises the aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597 when optimally aligned. In some embodiments, said endonuclease comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NO: 72, or residue 17 relative to SEQ ID NO: 75 when optimally aligned. In some embodiments, said endonuclease comprises a sequence with at least 95% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof. In some aspects, the present disclosure provides for an engineered nucleic acid editing system comprising: an endonuclease having at least 95% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof; a base editor coupled to said endonuclease; and an engineered guide ribonucleic acid structure configured to form a complex with said endonuclease comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a ribonucleic acid sequence configured to bind to said endonuclease. In some aspects, the present disclosure provides for an engineered nucleic acid editing system comprising: an endonuclease configured to bind to a protospacer adjacent motif (PAM) sequence comprising any one of Sequence Numbers: A360-A368 or A598, or a variant thereof, wherein said endonuclease is a class 2, type II endonuclease, and wherein said endonuclease is configured to be deficient in nuclease activity; a base editor coupled to said endonuclease; and an engineered guide ribonucleic acid structure configured to form a complex with said endonuclease comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a ribonucleic acid sequence configured to bind to said endonuclease. In some embodiments, said endonuclease comprises a nickase mutation. In some embodiments, said endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid. In some embodiments, said class 2, type II endonuclease comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597 when optimally aligned. In some embodiments, said base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 1-51, 57-66, 385-443, 444-475, 594-595, or 599-675, or a variant thereof. In some embodiments, said base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 50-51 or 385-390. In some embodiments, said RuvC domain lacks nuclease activity. In some embodiments, said endonuclease is derived from an uncultivated microorganism. In some embodiments, said endonuclease has less than 80% identity to a Cas9 endonuclease. In some embodiments, said endonuclease further comprises an HNH domain. In some embodiments, said engineered guide ribonucleic acid sequence comprises a sequence with at least 80% sequence identity to non-degenerate nucleotides of any one of SEQ ID NOs: 88-96, 488-489, or 679-680, or a variant thereof. In some aspects, the present disclosure provides for an engineered nucleic acid editing system comprising, an engineered guide ribonucleic acid structure comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a ribonucleic acid sequence configured to bind to an endonuclease, wherein said engineered ribonucleic acid sequence comprises a sequence with at least 80% sequence identity to non-degenerate nucleotides of any one of SEQ ID NOs: 88-96, 488-489, or 679-680, or a variant thereof; a class 2, type II endonuclease configured to bind to said engineered guide ribonucleic acid; and a base editor coupled to said endonuclease. In some embodiments, said base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 50-51 or 385-390. In some embodiments, said endonuclease is configured to bind to a protospacer adjacent motif (PAM) sequence selected from the group consisting of Sequence Numbers: A360-A368 or A598. In some embodiments, said base editor comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 1-51, 57-66, 385-443, 444-475, 594-595, or 599-675, or a variant thereof. In some embodiments, said base editor is an adenine deaminase. In some embodiments, said adenosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 50-51, 57, 385-443, 448-475, or 595, or a variant thereof. In some embodiments, said base editor is a cytosine deaminase. In some embodiments, said cytosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 1-49, 444-447, 594, or 58-66, or a variant thereof. In some embodiments, the system further comprises a uracil DNA glycosylase inhibitor coupled to said endonuclease or said base editor. In some embodiments, said uracil DNA glycosylase inhibitor comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 52-56 or SEQ ID NO: 67. In some embodiments, said engineered guide ribonucleic acid structure comprises at least two ribonucleic acid polynucleotides. In some embodiments, said engineered guide ribonucleic acid structure comprises one ribonucleic acid polynucleotide comprising said guide ribonucleic acid sequence and said tracr ribonucleic acid sequence. In some embodiments, said guide ribonucleic acid sequence is complementary to a prokaryotic, bacterial, archaeal, eukaryotic, fungal, plant, mammalian, or human genomic sequence. In some embodiments, said guide ribonucleic acid sequence is 15-24 nucleotides in length. In some embodiments, said endonuclease comprises one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of said endonuclease. In some embodiments, said NLS comprises a sequence with at least 90% identity to a selected from SEQ ID NOs: 369-384, or a variant thereof. In some embodiments, said endonuclease is covalently coupled directly to said base editor or covalently coupled to said base editor through a linker. In some embodiments, said endonuclease comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73 or 78, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, residue 8 relative to SEQ ID NO: 77, or residue 10 relative to SEQ ID NO: 597 when optimally aligned. In some embodiments, said endonuclease comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NO: 72, or residue 17 relative to SEQ ID NO: 75 when optimally aligned. In some embodiments, a polypeptide comprises said endonuclease and said base editor. In some embodiments, said endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid. In some embodiments, said system further comprises a source of Mg²⁺. In some embodiments: (a) said endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to any one of SEQ ID NOs: 70, 71, 73, 74, 76, 78, 77, or 78, or a variant thereof; (b) said guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to non-degenerate nucleotides of any one of SEQ ID NOs: 88, 89, 91, 92, 94, 96, 95, or 488; (c) said endonuclease is configured to bind to a PAM comprising any one of Sequence Numbers: A360, A361, A363, A365, A367, or A368; or (d) said base editor comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NOs: 58 or 595, or a variant thereof. In some embodiments: (a) said endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to any one of SEQ ID NOs: 70, 71, or 78, or a variant thereof; (b) said guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to non-degenerate nucleotides of at least one of SEQ ID NOs: 88, 89, or 96; (c) said endonuclease is configured to bind to a PAM comprising any one of Sequence Numbers: A360, A362, or A368; or (d) said base editor comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 594, or a variant thereof. In some embodiments, said sequence identity is determined by a BLASTP, CLUSTALW, MUSCLE, MAFFT, or Smith-Waterman homology search algorithm. In some embodiments, said sequence identity is determined by said BLASTP homology search algorithm using parameters of a wordlength (W) of 3, an expectation (E) of 10, and a BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment. In some embodiments, said endonuclease is configured to be catalytically dead. In some embodiments, said endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid.

In some aspects, the present disclosure provides for a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein said nucleic acid encodes a class 2, type II endonuclease coupled to a base editor, and wherein said endonuclease is derived from an uncultivated microorganism.

In some aspects, the present disclosure provides for a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein said nucleic acid encodes an endonuclease having at least 70% sequence identity to any one of SEQ ID NOs: 70-78 coupled to a base editor. In some embodiments, said endonuclease comprises a sequence encoding one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of said endonuclease. In some embodiments, said NLS comprises a sequence with at least 90% identity to a selected from SEQ ID NOs: 369-384, or a variant thereof. In some embodiments, said organism is prokaryotic, bacterial, eukaryotic, fungal, plant, mammalian, rodent, or human.

In some aspects, the present disclosure provides for a vector comprising a nucleic acid sequence encoding a class 2, type II endonuclease coupled to a base editor, wherein said endonuclease is derived from an uncultivated microorganism.

In some aspects, the present disclosure provides for a vector comprising the nucleic acid of any of the aspects or embodiments described herein. In some embodiments, the vector further comprises a nucleic acid encoding an engineered guide ribonucleic acid structure configured to form a complex with said endonuclease comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a ribonucleic acid sequence configured to binding to said endonuclease. In some embodiments, the vector is a plasmid, a minicircle, a CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus.

In some aspects, the present disclosure provides for a cell comprising the vector of any of the aspects or embodiments described herein.

In some aspects, the present disclosure provides for a method of manufacturing an endonuclease, comprising cultivating the cell of any of the aspects or embodiments described herein.

In some aspects, the present disclosure provides for a method for modifying a double-stranded deoxyribonucleic acid polynucleotide comprising contacting said double-stranded deoxyribonucleic acid polynucleotide with a complex comprising: an endonuclease comprising a RuvC domain and an HNH domain, wherein said endonuclease is derived from an uncultivated microorganism, wherein said endonuclease is a class 2, type II endonuclease, and wherein the endonuclease is configured to be deficient in nuclease activity; a base editor coupled to said endonuclease; and an engineered guide ribonucleic acid structure configured to bind to said endonuclease and said double-stranded deoxyribonucleic acid polynucleotide; wherein said double-stranded deoxyribonucleic acid polynucleotide comprises a protospacer adjacent motif (PAM). In some embodiments, said endonuclease comprising a RuvC domain and an HNH domain is covalently coupled directly to said base editor or covalently coupled to said base editor through a linker. In some embodiments, said endonuclease comprising a RuvC domain and an HNH domain comprises a sequence with at least 95% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof.

In some aspects, the present disclosure provides for a method for modifying a double-stranded deoxyribonucleic acid polynucleotide, comprising contacting said double-stranded deoxyribonucleic acid polynucleotide with a complex comprising: a class 2, type II endonuclease, a base editor coupled to said endonuclease, and an engineered guide ribonucleic acid structure configured to bind to said endonuclease and said double-stranded deoxyribonucleic acid polynucleotide; wherein said double-stranded deoxyribonucleic acid polynucleotide comprises a protospacer adjacent motif (PAM); and wherein said PAM comprises a sequence selected from the group consisting of SEQ ID NOs: 70-78 or 597. In some embodiments, said class 2, type II endonuclease is covalently coupled to said base editor or coupled to said base editor through a linker. In some embodiments, said base editor comprises a sequence with at least 70%, at least 80%, at least 90% or at least 95% identity to a sequence selected from SEQ ID NOs: 1-51, 57-66, 385-443, 444-475, 594-595, or 599-675, or a variant thereof. In some embodiments, said base editor comprises an adenine deaminase; said double-stranded deoxyribonucleic acid polynucleotide comprises an adenine; and modifying said double-stranded deoxyribonucleic acid polypeptide comprises converting said adenine to guanine. In some embodiments, said adenine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs: 50-51, 57, 385-443, 448-475, or 595, or a variant thereof. In some embodiments, said base editor comprises a cytosine deaminase; said double-stranded deoxyribonucleic acid polynucleotide comprises a cytosine; and modifying said double-stranded deoxyribonucleic acid polypeptide comprises converting said cytosine to uracil. In some embodiments, said cytosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 594, or 58-66, or a variant thereof. In some embodiments, said complex further comprises a uracil DNA glycosylase inhibitor coupled to said endonuclease or said base editor. In some embodiments, said uracil DNA glycosylase inhibitor comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 52-56 or SEQ ID NO: 67, or a variant thereof. In some embodiments, said double-stranded deoxyribonucleic acid polynucleotide comprises a first strand comprising a sequence complementary to a sequence of said engineered guide ribonucleic acid structure and a second strand comprising said PAM. In some embodiments, said PAM is directly adjacent to the 3′ end of said sequence complementary to said sequence of said engineered guide ribonucleic acid structure. In some embodiments, said class 2, type II endonuclease is not a Cas9 endonuclease, a Cas14 endonuclease, a Cas12a endonuclease, a Cas12b endonuclease, a Cas 12c endonuclease, a Cas12d endonuclease, a Cas12e endonuclease, a Cas13a endonuclease, a Cas13b endonuclease, a Cas13c endonuclease, or a Cas 13d endonuclease. In some embodiments, said class 2, type II endonuclease is derived from an uncultivated microorganism. In some embodiments, said double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.

In some aspects, the present disclosure provides for a method of modifying a target nucleic acid locus, said method comprising delivering to said target nucleic acid locus said engineered nucleic acid editing system of any of the aspects or embodiments described herein, wherein said endonuclease is configured to form a complex with said engineered guide ribonucleic acid structure, and wherein said complex is configured such that upon binding of said complex to said target nucleic acid locus, said complex modifies a nucleotide of said target nucleic locus. In some embodiments, said engineered nucleic acid editing system comprises an adenine deaminase, said nucleotide is an adenine, and modifying said target nucleic acid locus comprises converting said adenine to a guanine. In some embodiments, said engineered nucleic acid editing system comprises a cytidine deaminase and a uracil DNA glycosylase inhibitor, said nucleotide is a cytosine and modifying said target nucleic acid locus comprises converting said adenine to a uracil. In some embodiments, said target nucleic acid locus comprises genomic DNA, viral DNA, or bacterial DNA. In some embodiments, said target nucleic acid locus is in vitro. In some embodiments, said target nucleic acid locus is within a cell. In some embodiments, said cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, or a human cell. In some embodiments, said cell is within an animal. In some embodiments, said cell is within a cochlea. In some embodiments, said cell is within an embryo. In some embodiments, said embryo is a two-cell embryo. In some embodiments, said embryo is a mouse embryo. In some embodiments, delivering said engineered nucleic acid editing system to said target nucleic acid locus comprises delivering the nucleic acid of any of the aspects or embodiments described herein or the vector of any of the aspects or embodiments described herein. In some embodiments, delivering said engineered nucleic acid editing system to said target nucleic acid locus comprises delivering a nucleic acid comprising an open reading frame encoding said endonuclease. In some embodiments, said nucleic acid comprises a promoter to which said open reading frame encoding said endonuclease is operably linked. In some embodiments, delivering said engineered nucleic acid editing system to said target nucleic acid locus comprises delivering a capped mRNA containing said open reading frame encoding said endonuclease. In some embodiments, delivering said engineered nucleic acid editing system to said target nucleic acid locus comprises delivering a translated polypeptide. In some embodiments, delivering said engineered nucleic acid editing system to said target nucleic acid locus comprises delivering a deoxyribonucleic acid (DNA) encoding said engineered guide ribonucleic acid structure operably linked to a ribonucleic acid (RNA) pol III promoter.

In some aspects, the present disclosure provides for an engineered nucleic acid editing polypeptide, comprising: an endonuclease comprising a RuvC domain and an HNH domain, wherein said endonuclease is derived from an uncultivated microorganism, wherein said endonuclease is a class 2, type II endonuclease, and wherein the endonuclease is configured to be deficient in nuclease activity; and a base editor coupled to said endonuclease. In some embodiments, said endonuclease comprises a sequence with at least 95% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof.

In some aspects, the present disclosure provides for an engineered nucleic acid editing polypeptide, comprising: an endonuclease having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof, wherein the endonuclease is configured to be deficient in nuclease activity; and a base editor coupled to said endonuclease. In some aspects, the present disclosure provides for an engineered nucleic acid editing polypeptide, comprising: an endonuclease configured to bind to a protospacer adjacent motif (PAM) sequence comprising any one of Sequence Numbers: A360-A368 or A598, wherein said endonuclease is a class 2, type II endonuclease, and wherein the endonuclease is configured to be deficient in nuclease activity; and a base editor coupled to said endonuclease. In some embodiments, said endonuclease is derived from an uncultivated microorganism. In some embodiments, said endonuclease has less than 80% identity to a Cas9 endonuclease. In some embodiments, said endonuclease further comprises an HNH domain. In some embodiments, said tracr ribonucleic acid sequence comprises a sequence with at least 80% sequence identity to about 60 to 90 consecutive nucleotides selected from any one of SEQ ID NOs: 88-96, 488, 489, and 679-680. In some embodiments, said base editor comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 1-51, 57-66, 385-443, 444-475, 594-595, or 599-675, or a variant thereof. In some embodiments, said base editor is an adenine deaminase. In some embodiments, said adenosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs: 50-51, 57, 385-443, 448-475, or 595, or a variant thereof. In some embodiments, said base editor is a cytosine deaminase. In some embodiments, said cytosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 594, or 58-66, or a variant thereof.

In some aspects, the present disclosure provides for an engineered nucleic acid editing polypeptide, comprising: an endonuclease, wherein said endonuclease is configured to be deficient in endonuclease activity; and a base editor coupled to said endonuclease, wherein said base editor comprises a sequence with at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs: 1-51, 385-386, 387-443, 444-447,488-475, or 595, or a variant thereof. In some embodiments, said endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid. In some embodiments, said endonuclease is configured to be catalytically dead. In some embodiments, said endonuclease is a Class II, type II endonuclease or a Class II, type V endonuclease. In some embodiments, said endonuclease comprises a sequence having at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof. In some embodiments, said endonuclease comprises a nickase mutation. In some embodiments, said endonuclease comprises the aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597 when optimally aligned. In some embodiments, said endonuclease is configured to bind to a protospacer adjacent motif (PAM) sequence selected from the group consisting of Sequence Numbers: A360-A368 or A598. In some embodiments, said base editor is an adenine deaminase. In some embodiments, said adenosine deaminase comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 50-51, 385-443, or 448-475, or a variant thereof. In some embodiments, said adenosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 50-51, 385-390, or 595, or a variant thereof. In some embodiments, said base editor is a cytosine deaminase. In some embodiments, said cytosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 1-49, 444-447, or a variant thereof. In some embodiments, the polypeptide further comprises a uracil DNA glycosylase inhibitor coupled to said endonuclease or said base editor. In some embodiments, said uracil DNA glycosylase inhibitor comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 52-56 or SEQ ID NO: 67, or a variant thereof. In some embodiments, said endonuclease comprises one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of said endonuclease. In some embodiments, said NLS comprises a sequence with at least 90% identity to a selected from SEQ ID NOs: 369-384, or a variant thereof. In some embodiments, said endonuclease is covalently coupled directly to said base editor or covalently coupled to said base editor through a linker.

In some aspects, the present disclosure provides for a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein said nucleic acid encodes a sequence having at least 70%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 1-51, 385-386, 387-443, 444-447, or 488-475, or a variant thereof. In some embodiments, said organism is prokaryotic, bacterial, eukaryotic, fungal, plant, mammalian, rodent, or human.

In some aspects, the present disclosure provides for a vector comprising the nucleic acid of any of the aspects or embodiments described herein. In some embodiments, the vector is a plasmid, a minicircle, a CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus.

In some aspects, the present disclosure provides for a cell comprising the vector of any one of the aspects or embodiments described herein.

In some aspects, the present disclosure provides for a method of manufacturing a base editor, comprising cultivating said cell of any one of the aspects or embodiments described herein.

In some aspects, the present disclosure provides for a system comprising: (a) the nucleic acid editing polypeptide of any of the aspects or embodiments described herein; and (b) an engineered guide ribonucleic acid structure configured to form a complex with said nucleic acid editing polypeptide comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a ribonucleic acid sequence configured to bind to said endonuclease. In some embodiments, said engineered guide ribonucleic acid sequence comprises a sequence with at least 80% sequence identity to non-degenerate nucleotides of any one of SEQ ID NOs: 88-96, 488-489, or 679-680.

In some aspects, the present disclosure provides for an engineered nucleic acid editing system comprising: (a) an endonuclease having at least 95% sequence identity to any one of SEQ ID NOs: 70-78, wherein the endonuclease comprises a RuvC domain lacking nuclease activity; a base editor coupled to the endonuclease; and an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease comprising: (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a tracr ribonucleic acid sequence configured to bind to the endonuclease.

In some aspects, the present disclosure provides an engineered nucleic acid editing system comprising, (a) an engineered guide ribonucleic acid structure comprising: (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a tracr ribonucleic acid sequence configured to bind to an endonuclease, wherein the tracr ribonucleic acid sequence comprises a sequence with at least 80% sequence identity to about 60 to 90 consecutive nucleotides selected from any one of SEQ ID NOs: 88-96, 488, 489, and 679-680; and a class 2, type II endonuclease configured to bind to the engineered guide ribonucleic acid.

In some embodiments, the endonuclease is configured to bind to a protospacer adjacent motif (PAM) sequence selected from the group consisting of Sequence Numbers: A360-A368. In some embodiments, the base editor comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 1-51 and 385-475. In some embodiments, the base editor is an adenine deaminase. In some embodiments, the adenosine deaminase comprises a sequence with at least 95% identity to SEQ ID NO: 57. In some embodiments, the base editor is a cytosine deaminase. In some embodiments, the cytosine deaminase comprises a sequence with at least 95% identity to SEQ ID NO: 58. In some embodiments, the cytosine deaminase comprises a sequence with at least 95% identity to any one of SEQ ID NOs: 59-66.

In some embodiments, the engineered guide ribonucleic acid structure comprises at least two ribonucleic acid polynucleotides. In some embodiments, the engineered guide ribonucleic acid structure comprises one ribonucleic acid polynucleotide comprising the guide ribonucleic acid sequence and the tracr ribonucleic acid sequence. In some embodiments, the guide ribonucleic acid sequence is complementary to a prokaryotic, bacterial, archaeal, eukaryotic, fungal, plant, mammalian, or human genomic sequence. In some embodiments, the guide ribonucleic acid sequence is 15-24 nucleotides in length. In some embodiments, the endonuclease comprises one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of said endonuclease. In some embodiments, the endonuclease is covalently coupled directly to the base editor or covalently coupled to the base editor through a linker. In some embodiments, a polypeptide comprises the endonuclease and the base editor. In some embodiments, the endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid. In some embodiments, the endonuclease comprises SEQ ID NO: 370. In some embodiments, the system further comprises a source of Mg²⁺.

In some embodiments, the endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 70; the guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 88; and the endonuclease is configured to bind to a PAM comprising Sequence Number: A360.

In some embodiments, the endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 71; the guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 89; and the endonuclease is configured to bind to a PAM comprising Sequence Number: A361.

In some embodiments, the endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 73; the guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 91; and the endonuclease is configured to bind to a PAM comprising Sequence Number: A363.

In some embodiments, the endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 75; the guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 93; and the endonuclease is configured to bind to a PAM comprising Sequence Number: A365.

In some embodiments, the endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 76; the guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 94; and the endonuclease is configured to bind to a PAM comprising Sequence Number: A366.

In some embodiments, the endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 77; the guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 95; and the endonuclease is configured to bind to a PAM comprising Sequence Number: A367.

In some embodiments, the endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 78; the guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 96; and the endonuclease is configured to bind to a PAM comprising Sequence Number: A368.

In some embodiments, the base editor comprises an adenine deaminase. In some embodiments, the adenine deaminase comprises SEQ ID NO: 57. In some embodiments, the base editor comprises a cytosine deaminase. In some embodiments, the cytosine deaminase comprises SEQ ID NO: 58. In some embodiments, the engineered nucleic acid editing system described herein further comprises a uracil DNA glycosylation inhibitor. In some embodiments, the uracil DNA glycosylation inhibitor comprises SEQ ID NO: 67.

In some embodiments, the sequence identity is determined by a BLASTP, CLUSTALW, MUSCLE, MAFFT, or Smith-Waterman homology search algorithm. In some embodiments, the sequence identity is determined by said BLASTP homology search algorithm using parameters of a wordlength (W) of 3, an expectation (E) of 10, and a BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment.

In some aspects, the present disclosure provides a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes a class 2, type II endonuclease coupled to a base editor, and wherein the endonuclease is derived from an uncultivated microorganism.

In some aspects, the present disclosure provides a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes an endonuclease having at least 70% sequence identity to any one of SEQ ID NOs: 70-78 coupled to a base editor. In some embodiments, the endonuclease comprises a sequence encoding one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of said endonuclease. In some embodiments, the organism is prokaryotic, bacterial, eukaryotic, fungal, plant, mammalian, rodent, or human.

In some aspects, the present disclosure provides a vector comprising a nucleic acid sequence encoding a class 2, type II endonuclease coupled to a base editor, wherein said endonuclease is derived from an uncultivated microorganism. In some embodiments, the vector comprises the nucleic acid described herein. In some embodiments, the vector further comprises a nucleic acid encoding an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease comprising: a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and a tracr ribonucleic acid sequence configured to binding to the endonuclease. In some embodiments, the vector is a plasmid, a minicircle, a CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus. In some aspects, the present disclosure provides a cell comprising the vector described herein. In some aspects, the present disclosure provides a method of manufacturing an endonuclease, comprising cultivating the cell described herein.

In some aspects, the present disclosure provides a method for modifying a double-stranded deoxyribonucleic acid polynucleotide comprising contacting the double-stranded deoxyribonucleic acid polynucleotide with a complex comprising: an endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultivated microorganism, wherein the endonuclease is a class 2, type II endonuclease, and wherein the RuvC domain lacks nuclease activity; a base editor coupled to the endonuclease; and an engineered guide ribonucleic acid structure configured to bind to the endonuclease and the double-stranded deoxyribonucleic acid polynucleotide; wherein the double-stranded deoxyribonucleic acid polynucleotide comprises a protospacer adjacent motif (PAM).

In some aspects, the present disclosure provides a method for modifying a double-stranded deoxyribonucleic acid polynucleotide, comprising contacting the double-stranded deoxyribonucleic acid polynucleotide with a complex comprising: a class 2, type II endonuclease, a base editor coupled to the endonuclease, and an engineered guide ribonucleic acid structure configured to bind to the endonuclease and the double-stranded deoxyribonucleic acid polynucleotide; wherein the double-stranded deoxyribonucleic acid polynucleotide comprises a protospacer adjacent motif (PAM); and wherein the PAM comprises a sequence selected from the group consisting of Sequence Numbers: A360-A368.

In some embodiments, the class 2, type II endonuclease is covalently coupled to the base editor or coupled to the base editor through a linker. In some embodiments, the base editor comprises a sequence with at least 70%, at least 80%, at least 90% or at least 95% identity to a sequence selected from SEQ ID NOs: 1-51 and 385-475. In some embodiments, the base editor comprises an adenine deaminase; the double-stranded deoxyribonucleic acid polynucleotide comprises an adenine; and modifying the double-stranded deoxyribonucleic acid polypeptide comprises converting the adenine to guanine. In some embodiments, the adenine deaminase comprises a sequence with at least 95% identity to SEQ ID NO: 57.

In some embodiments, the base editor comprises a cytosine deaminase; the double-stranded deoxyribonucleic acid polynucleotide comprises a cytosine; and modifying the double-stranded deoxyribonucleic acid polypeptide comprises converting the cytosine to uracil. In some embodiments, the cytosine deaminase comprises a sequence with at least 95% identity to SEQ ID NO: 58. In some embodiments, the cytosine deaminase comprises a sequence with at least 95% identity to any one of SEQ ID NOs: 59-66.

In some embodiments, the complex further comprises a uracil DNA glycosylase inhibitor. In some embodiments, the uracil DNA glycosylase inhibitor comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 52-56 or SEQ ID NO: 67. In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide comprises a first strand comprising a sequence complementary to a sequence of the engineered guide ribonucleic acid structure and a second strand comprising said PAM. In some embodiments, the PAM is directly adjacent to the 3′ end of the sequence complementary to the sequence of the engineered guide ribonucleic acid structure.

In some embodiments, the class 2, type II endonuclease is not a Cas9 endonuclease, a Cas14 endonuclease, a Cas12a endonuclease, a Cas12b endonuclease, a Cas 12c endonuclease, a Cas12d endonuclease, a Cas12e endonuclease, a Cas13a endonuclease, a Cas13b endonuclease, a Cas13c endonuclease, or a Cas 13d endonuclease. In some embodiments, the class 2, type II endonuclease is derived from an uncultivated microorganism. In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.

In some aspects, the present disclosure provides a method of modifying a target nucleic acid locus, said method comprising delivering to said target nucleic acid locus the engineered nucleic acid editing system described herein, wherein the endonuclease is configured to form a complex with the engineered guide ribonucleic acid structure, and wherein the complex is configured such that upon binding of the complex to the target nucleic acid locus, the complex modifies a nucleotide of the target nucleic locus.

In some embodiments, the engineered nucleic acid editing system comprises an adenine deaminase, the nucleotide is an adenine, and modifying the target nucleic acid locus comprises converting the adenine to a guanine. In some embodiments, the engineered nucleic acid editing system comprises a cytidine deaminase and a uracil DNA glycosylase inhibitor, the nucleotide is a cytosine and modifying the target nucleic acid locus comprises converting the adenine to a uracil. In some embodiments, the target nucleic acid locus comprises genomic DNA, viral DNA, or bacterial DNA. In some embodiments, the target nucleic acid locus is in vitro. In some embodiments, the target nucleic acid locus is within a cell. In some embodiments, the cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, or a human cell. In some embodiments, the cell is within an animal.

In some embodiments, the cell is within a cochlea. In some embodiments, the cell is within an embryo. In some embodiments, the embryo is a two-cell embryo. In some embodiments, the embryo is a mouse embryo. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering the nucleic acid described herein or the vector described herein. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a nucleic acid comprising an open reading frame encoding the endonuclease.

In some embodiments, the nucleic acid comprises a promoter to which the open reading frame encoding the endonuclease is operably linked. In some embodiments, delivering the engineered nucleic acid editing system to said target nucleic acid locus comprises delivering a capped mRNA containing the open reading frame encoding the endonuclease. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a translated polypeptide. In some embodiments, delivering the engineered nucleic acid editing system to the target nucleic acid locus comprises delivering a deoxyribonucleic acid (DNA) encoding the engineered guide ribonucleic acid structure operably linked to a ribonucleic acid (RNA) pol III promoter.

In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide, comprising: an endonuclease having at least 95% sequence identity to any one of SEQ ID NOs: 70-78, wherein the endonuclease comprises a RuvC domain lacking nuclease activity; and a base editor coupled to the endonuclease.

In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide, comprising: an endonuclease configured to bind to a protospacer adjacent motif (PAM) sequence comprising Sequence Numbers: A360-A368, wherein the endonuclease is a class 2, type II endonuclease, and wherein the endonuclease comprises a RuvC domain lacks nuclease activity; and a base editor coupled to the endonuclease.

Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also “Figure” and “FIG.” herein), of which:

FIG. 1 depicts example organizations of CRISPR loci of different classes and types.

FIG. 2 shows the structure of a base editor plasmid containing a T7 promoter driving expression of the systems described herein.

FIG. 3 shows plasmid maps for systems described herein. MGA contains TadA*(from ABE8.17m)-SV40 NLS and MGC contains APOBEC1 (from BE3) linked to a uracil glycosylase inhibitor and an SV40 NLS.

FIG. 4 shows predicted catalytic residues in the RuvCI domains of selected endonucleases described herein which are mutated to disrupt nuclease activity to generate nickase enzymes.

FIG. 5 depicts an example method for cloning a single guide RNA expression cassette into the systems described herein. One fragment comprises a T7 promoter plus spacer. The other fragment comprises spacer plus single guide scaffold sequence plus bidirectional terminator. The fragments are assembled into expression plasmids, resulting in functional constructs that can simultaneously express sgRNAs and base editors.

FIGS. 6A and 6B show sgRNA designs for lacZ targeting in E. coli. The spacer length used for the systems described herein was 22 nucleotides. For selected systems described herein, three sgRNAs targeting lacZ in E. coli were designed to determine editing windows.

FIG. 7 shows the nickase activity of selected mutated effectors. 600 bp double-stranded DNA fragments labeled with a fluorophore (6-FAM) on both 5′ ends were incubated with purified enzymes supplemented with their cognate sgRNAs. The reaction products were resolved on a 10% TBE-Urea denaturing gel. Double-stranded cleavage yields bands of 400 and 200 bases. Nickase activity yields bands of 600 and 200 bases.

FIGS. 8A, 8B, and 8C shows Sanger sequencing results demonstrating base edits by selected systems described herein.

FIG. 9 shows how the systems described herein expand base-editing capabilities with the endonucleases and base editors described herein.

FIGS. 10A and 10B show base editing efficiencies of adenine base editors (ABEs) comprising TadA (ABE8.17m) and MG nickases. TadA is a tRNA adenine deaminase, and TadA (ABE8.17m) is an engineered variant of E. coli TadA. 12_MG nickases fused with TadA (ABE8.17m) were constructed and tested in E. coli. Three guides were designed to target lacZ. Numbers shown in boxes indicate percentages of A to G conversion quantified by Edit R. ABE8.17m was used as the positive control for the experiment.

FIGS. 11A and 11B show base editing efficiencies of cytosine base editors (CBEs) comprising rat APOBEC1, MG nickases, and the uracil glycosylase inhibitor of Bacillus subtilis bacteriophage (UGI (PBS1)). APOBEC1 is a cytosine deaminase. 12_MG nickases fused to rAPOBEC1 on their N-terminus and UGI on their C-terminus were constructed and tested in E. coli. Three guides were designed to target lacZ. The numbers shown in boxes indicate percentages of C to T conversion quantified by Edit R. BE3 was used as the positive control in the experiment.

FIGS. 12A and 12B show effects of MG uracil glycosylase inhibitors (UGIs) on the base-editing activities of CBEs. FIG. 12A depicts a graph showing base-editing activity of MGC15-1 and variants, which comprise an N-terminal APOBEC1, the MG15-1 nickase, and a C-terminal UGI. Three MG UGIs were tested for improvements of cytosine base editing activities in E. coli. Panel FIG. 12B is a graph showing base editing activity of BE3, which comprises an N-terminal rAPOBEC1, the SpCas9 nickase, and a C-terminal UGI. Two MG UGIs were tested for improvements of cytosine base editing activities in HEK293T cells. Editing efficiencies were quantified by Edit R.

FIGS. 13A and 13B depicts maps of edited sites showing editing efficiencies of cytosine base editors comprising AOA2K5RDN7, an MG nickases, and an MG UGI. The constructs comprise an N-terminal AOA2K5RDN7, an MG nickases, and a C-terminal MG69-1. For simplicity, the identities of MG nickases are shown in the figure. BE3 was used as the positive control for base editing. An empty vector was used for the negative control. Three independent experiments were performed on different days. Abbreviations: R, repeat; NEG, negative control.

FIGS. 14A and 14B shows a positive selection method for TadA characterization in E. coli. FIG. 14A shows a map of one plasmid system used for TadA selection. The vector comprises CAT (H193Y), a sgRNA expression cassette targeting CAT, and an ABE expression cassette. In this figure, N-terminal TadA from E. coli and a C-terminal SpCas9 (D10A) from Streptococcus pyogenes are shown. FIG. 14B shows sequencing traces demonstrating that when introduced/transformed into E. coli cells, the A2 position of CAT (H193Y)'s template strand is edited, reverting the H193Y mutant to wild type and restoring its activity. Abbreviations: CAT, chloramphenicol acetyltransferase.

FIGS. 15A and 15B shows mutations caused by TadA enable high tolerance of chloramphenicol (Cm). FIG. 15A shows photographs of growth plates where different concentrations of chloramphenicol were used to select for antibiotics resistance of E. coli. In this example, wild type and two variants of TadA from E. coli (EcTadA) were tested. FIG. 15B shows a results summary table demonstrating that ABEs carrying mutated TadA show higher editing efficiencies than the wild type. In these experiments, colonies were picked from the plates with greater than or equal to 0.5 μg/mL Cm. For simplicity, identities of deaminases are shown in the table.

FIG. 16A shows photographs of growth plates to investigate MG TadA activity in positive selection. 8_MG68 TadA candidates were tested against 0 to 2 μg/mL of chloramphenicol (ABEs comprised N-terminal TadA variants and C-terminal SpCas9 (D10A) nickase). For simplicity, identities of deaminases are shown. In this experiment, colonies were picked from the plates with greater than or equal to 0.5 μg/mL Cm.

FIG. 16B summarizes the editing efficiencies of MG TadA candidates and demonstrates that MG68-3, and MG68-4 drove base edits of adenine.

FIGS. 17A and 17B showsan improvement of base editing efficiency of MG68-4_nSpCas9 via D109N mutation on MG68-4. FIG. 17A shows photographs of growth plates where wild type MG68-4 and its variant were tested against 0 to 4 μg/mL of chloramphenicol. For simplicity, identities of deaminases are shown. Adenine base editors in this experiment are comprise N-terminal TadA variants and C-terminal SpCas9 (D10A) nickase. Panel (b) shows a summary table depicting editing efficiencies of MG TadA candidates. FIG. 17B demonstrates thatMG68-4 and MG68-4 (D109N) showed base edits of adenine, with the D109N mutant showing increased activity. In this experiment, colonies were picked from the plates with greater than or equal to 0.5 μg/mL Cm.

FIGS. 18A and 18B show base editing of MG68-4 (D109N)_nMG34-1. FIG. 18A shows photographs of growth plates of an experiment where an ABE comprising N-terminal MG68-4 (D109N) and C-terminal SpCas9 (D10A) nickase was tested against 0 to 2 μg/mL of chloramphenicol. FIG. 18B shows a summary table depicting editing efficiencies with and without sgRNA. In this experiment, colonies were picked from the plates with greater than or equal to 1 μg/mL Cm.

FIG. 19 shows 28_MG68-4 variants designed for improvements of MG68-4-nMG34-1 base editing activity (SEQ ID NOs: 448-475). 12 residues were selected for targeted mutagenesis to improve editing of the enzymes.

FIG. 20 shows the results of a gel-based deaminase assay showing activity of deaminases from several selected Families (MG93, MG138, and MG139). Enzymes were expressed in a bacterial (E. coli codon optimized) Purexpress cell lysate-derived in vitro transcription-translation system and incubated with 5′FAM-labeled ssDNA and USER enzyme (uracil DNA glycosylase and endonuclease VIII) at 37° C. for 2.5 h. The resulting DNA was resolved on a denaturing polyacrylamide gel and imaged. The positive control is a sequence with a U synthetically incorporated at the same position as the target C and the negative control is a sequence with no U or C.

FIG. 21 shows a diagram illustrating base editing efficiencies of adenine base editors at specific nucleotide sites using MG68-4v1 fusing with either nMG34-1 or nSpCas9. 9 guides were designed to target genomic loci of HEK293T cells. Abbreviations: MG68-4v1, MG68-4 (D109N); nMG34-1, MG34-1 nickase; nSpCas9, SpCas9 nickase.

FIGS. 22A, 22B, 22C, 22D, 22E, and 22F show in vivo base editing with engineered MG34-1 and MG35-1 nickases. Panels (A) and (B) show base editing in the E. coli genome at four target loci. FIG. 22A shows ABE-MG34-1 base editor vs. a reference ABE-SpCas9 (both with TadA*(8.8m) deaminase). FIG. 22B shows CBE-MG34-1 base editor vs. a reference CBE-SpCas9 (both with rAPOBEC1 deaminase and PBS1 UGI). FIG. 22C shows base editing in human HEK293T cells with an ABE-MG34-1 nickase at three target loci. The target sequence for each locus in panels A, B, and C is shown above each heatmap. Expected edit positions are represented on the sequence by a subscript number and at each position on the heatmap (squares). Heatmaps in FIGS. 22 A, B, and C represent the percentage of NGS reads supporting an edit. Values in FIGS. 22 (A) and (B) represent the mean of two independent experiments, while values in panel (C) represent the mean of three independent biological replicates. FIG. 22D shows an E. coli survival assay. E. coli is transformed with a plasmid containing the ABE, a non-functional chloramphenicol acetyltransferase (CAT H193Y) gene, and an sgRNA that either targets the CAT gene (target spacer) or not (non-target spacer). E. coli survival under chloramphenicol selection is dependent on the ABE base editing the non-functional CAT gene to its wild type sequence. FIG. 22E, top panel shows a diagram of an ABE construct with an engineered MG35-1 nickase containing a C-terminal TadA*-(7.10) monomer and a SV40 NLS fused to the C-terminus. FIG. 22E, bottom panel: transformed E. coli was grown on plates containing chloramphenicol concentrations of 0, 2, 3, 4, and 8 μg/mL. Plates also contain 100 μg/mL Carbecillin and 0.1 mM IPTG. Colonies grown on plates containing chloramphenicol concentrations of 0, 2, 3, and 4 μg/mL were sequenced to assess reversion of the CAT gene. Experiments were performed in duplicate.

FIGS. 23A and 23B depict a gel-based deaminase assay showing activity of deaminases from one selected Family (MG139). Enzymes were expressed in a bacterial (E. coli codon optimized) Purexpress cell lysate-derived in vitro transcription-translation system and incubated with 5′FAM-labeled ssDNA and USER enzyme (uracil DNA glycosylase and endonuclease VIII) at 37° C. for 2.5 h. The resulting DNA was resolved on a denaturing polyacrylamide gel and imaged, which is shown in FIG. 23A. The positive control is a sequence with a U synthetically incorporated at the same position as the target C and the negative control is a sequence with no U or C. FIG. 23B depicts Percentage of deamination activity of all the active cytidine deaminases on ssDNA. The taxonomic classification of the cytidine deaminases are shown.

FIG. 24 depicts a gel-based deaminase assay showing ssDNA and dsDNA activities of deaminases from several selected Families (MG93, MG138 and MG139). Enzymes were expressed in a bacterial (E. coli codon optimized) Purexpress cell lysate-derived in vitro transcription-translation system and incubated with 5′FAM-labeled ssDNA or dsDNA and USER enzyme (uracil DNA glycosylase and endonuclease VIII) at 37° C. for 2.5 h. The resulting DNA was resolved on a denaturing polyacrylamide gel and imaged. The positive control for ssDNA activity is a sequence with a U synthetically incorporated at the same position as the target C and the negative control is a sequence with no U or C. The positive control for dsDNA activity is DddA toxin deaminase that has been documented as selective for a dsDNA substrate (Mok, B. Y., de Moraes, M. H., Zeng, J. et al. A bacterial cytidine deaminase toxin enables CRISPR-free mitochondrial base editing. Nature 583, 631-637 (2020). https://doi.org/10.1038/s41586-020-2477-4)

FIGS. 25A, 25B, and 25C depict data demonstrating that Cytosine Base Editors (CBEs) containing novel cytidine deaminases with spCas9, MG3-6, or MG34-1 effectors show varying editing levels in HEK293 cells. Each novel cytidine deaminase is fused via a linker to the N-terminus of the effector (spCas9, MG3-6, or MG34-1). A uracil glycosylase inhibitor domain (UGI or MG69-1) is fused to the C-terminus of the effector, followed by a Nuclear Localization Signal (NLS). Each CBE was transiently transfected into HEK293 cells and targeted to 5 distinct genomic locations with corresponding sgRNAs (spacer sequence indicated, targeted cytosines underlined). Editing levels (C to T (%)) of spacer sequence and surrounding cytosines are indicated for CBEs with each distinct cytidine deaminase effector (n=3).

FIGS. 26A, 26B, and 26C depicts the activity of cytidine deaminases (CDAs) fused to MG3-6. Cytidine deaminases were fused to MG3-6 and their activity was assessed by targeting an engineered site in a reporter cell line. FIG. 26A shows relative activity of various CDAs, controls used were a highly active CBE from literature A0A2K5RDN7, as well as rAPOBEC1. FIG. 26B shows quantification of activity of various CDAs in comparison to the highly active CDA A0A2K5RDN7. FIG. 26C shows MG139-52 activity highlighting the G-A conversion suggesting editing of the opposite strand—the strand in the DNA/RNA heteroduplex in the R-loop.

FIGS. 27A and 27B depict a toxicity assay in mammalian cells. Toxicity of CDAs was measured by stable expression of CDAs as CBEs (fused to MG3-6). HEK293T cells stably expressing CBEs were grown in puromycin for 3 days, alive cells were stained with crystal violet. Crystal violet dye was then solubilized with 1% SDS and quantified in a plate reader. FIG. 27A shows a picture of cells stained with crystal violet; FIG. 27B shows quantification of FIG. 27A. Absorbance was taken in a plate reader at 570 nm.

FIG. 28 depicts mutations identified from chloramphenicol selection in E. coli. r1v1 variant was the starting variant for the evolution experiment. 24 variants were identified and the associated mutations were shown in the table.

FIG. 29 depicts beneficial mutations identified from variant screening in HEK293T. The predicted structure of MG68-4 is aligned with tRNA^Arg2from S. aureus TadA (PDB 2B3J). Key mutated residues are highlighted in the structural display.

FIG. 30 depicts screening of MG68-4 variants in HEK293T cells. Four guides were used to screen the activity, editing window, and sequence preference of engineered variants.

FIG. 31 depicts the ABE-MG35-1 E. coli survival assay sequencing results. Surviving colonies were picked from plates under chloramphenicol selection for the first experimental replicate and Sanger-sequenced. Sequencing of four of five selected colonies show a mutation from A back to G on the negative strand, restoring CAT function from Y193 back to H on the positive strand (boxed nucleotides). A bystander base edit was observed in two of the five sequenced colonies.

FIG. 32 depicts increased cytosine base editing efficiency upon Fam72a expression.

FIG. 33 depicts data demonstrating that structurally optimized adenine base editors (ABEs) show varying editing levels in HEK293 cells. Each of 33 ABEs was constructed by inserting the MG68-4 (D109N) deaminase upstream, downstream, or within the MG3-6_3-8 (D13A) nickase enzyme and cloned into the pCMV vector. These plasmids were co-transfected with a plasmid containing one of 8 sgRNAs targeting the HEK293 genome. Data shown is from a sgRNA targeting the ACAGACAAAACTGTGCTAGACA sequence. Editing levels (A to G (%)) of A5, A7, A8, A9, and A10 within the spacer sequence are indicated as well as cell viability of each individual experiment (n=2).

FIG. 34A-FIG. 34B depicts rational design of MG68-4 variants. FIG. 34A depicts structural alignment of E. coli TadA (PDB:1z3a) and the predicted structure of MG68-4. tRNA structure was retrieved from S. aureus TadA (PDB: 2b3j). FIG. 34B depicts mutations identified from EcTadA for developments of adenine base editors (ABE7.10, ABE8.8m, ABE8.17m, and ABE8e) and equivalent residues of EcTadA on MG68-4. The mutations of EcTadA were installed to MG68-4 accordingly. H129N was identified from a bacterial selection in E. coli. In general, nuclear localization signal (SV40) was positioned on the C-terminus. For 2NLS constructs, one SV40 was used on the N-terminus and one SV40 was used on the C-terminus. For simplicity, deaminase sequences of adenine base editors are shown in the table. Abbreviations: MGA0.1, MG68-4; MGA1.1, MG68-4 (D109N); MGA2.1, MG68-4 (D109N/H129N); RD, rationally designed variants.

FIG. 35 depicts screening of adenine base editors in HEK293T cells. The top three variants are highlighted. The starting variant is MGA1.1. For 2NLS constructs, one SV40 was used on the N-terminus and one SV40 was used on the C-terminus. Abbreviations: MGA0.1, MG68-4; MGA1.1, MG68-4 (D109N); MGA2.1, MG68-4 (D109N/H129N); RD, rationally designed variants.

FIG. 36 depicts a table summarizing the base editing activity of rationally designed ABE variants described herein.

FIG. 37 depicts a gel-based deaminase assay showing activity of variant deaminases from several selected Families (MG93, MG139, and MG152). Enzymes were expressed in a bacterial (E. coli codon optimized) Purexpress cell lysate-derived in vitro transcription-translation system and incubated with 5′FAM-labeled ssDNA and USER enzyme (uracil DNA glycosylase and endonuclease VIII) at 37° C. for 2.5 h. The resulting DNA was resolved on a denaturing polyacrylamide gel and imaged. The positive control is a sequence with a U synthetically incorporated at the same position as the target C and the negative control is a sequence with no U or C.

FIG. 38A-FIG. 38C depicts a gel-based deaminase with dual fluorophore assay. FIG. 38A depicts a schematic of substrate design. Substrates were designed for minimal overlap between the two fluorophores. Emission for Cy3 is around 560 nm and the emission peak for Cy5.5 is around 700 nm. FIGS. 38B and 38C depict TBE-Urea Gel Images imaged using a Cy3 and Cy5.5 filter, respectively. RF157 is a single nucleotide substrate with a FAM molecule to act as a positive control to confirm the USER enzyme is cutting in the reaction and provide confirmation that the filter works and can discriminate between either fluorophore. A mastermix is used as a negative control to provide a baseline measurement for the uncut substrate. FIG. 38B: Deaminases that preferentially cut the substrate at T at the −1 position give a fluorescent product of 65 nts. Substrates cut at C at the −1 position give a product of 45 nts. Deaminases active on both C or T at the −1 position will give a product of 30 nts. FIG. 38C: Deaminase that preferentially cut substrate at G at the −1 position give a fluorescent product of 65 nts. Substrates cut at C at the −1 position give a product of 45 nts. Deaminases active on both A or G at the −1 position will give a product of 30 nts.

FIG. 39 depicts the percentage of deamination for each −1 position to the target Cytidine for each variant (MG93 and MG152 families) tested in this study.

FIG. 40 depicts the percentage of deamination for each −1 position to the target Cytidine for each variant (MG139 family) tested in this study.

FIG. 41A-FIG. 41C depicts a summary of activity data for novel and engineered CDAs as CBEs in mammalian cells. FIG. 41A depicts the maximum detected editing efficiency for all tested CDAs across 5 engineered spacers. FIG. 41B depicts the maximum detected activity normalized to internal positive control across 5 engineered spacers. The internal experimental positive control used for normalization was a highly active CDA “A0A2K5RDN7”. FIG. 41C depicts side by side comparison of one of the lead candidates “139-52-V6” versus the highly active positive control “A0A2K5RDN7” with 2 guides. 139-52-V6 shows similar editing efficiencies in comparison to the highly active tested CDA.

FIG. 42 depicts the −1 nt preference of CDAs with more than 1% editing activity as CBEs in mammalian cells. The comparison of the −1 nt preference in mammalian cells vs in vitro is shown. −1 preference observed in mammalian cells as CBEs is by the most part comparable to the in vitro preference. The in vitro preference shows a more relaxed pattern than the CBE activity in mammalian cells.

FIG. 43A-FIG. 43C depicts an example of MG139-52 wt and mutated at N27 to A, MG139-52v6 that show differences of activity on ssDNA and/or on RNA:DNA duplex. FIG. 43A depicts a structural prediction of MG139-52 using A3H as template (pdb: 5W3V). The targeted mutation at N27 is indicated by an arrow and is located far away for the catalytic center and the recognition loop 7. FIG. 43B depicts a cartoon showing the DNA/RNA heteroduplex in the R-loop that is targeted by 139-52 WT. CRISPResso output shows the G-A conversion indicative of deamination in the DNA strand forming a DNA/RNA heteroduplex. FIG. 43C depicts CRISPREsso output showing that the G-A change in the DNA/RNA heteroduplex was abrogated with the N27A variant. Instead, such modification happens outside the DNA/RNA heteroduplex, suggesting that deamination in the DNA/RNA heteroduplex has been impaired.

FIG. 44 depicts the editing window of lead CDAs in comparison to the highly active CDA A0A2K5RDN7. The editing window shown corresponds to ˜110 nts. The R loop (Cas9 target) is shown as a square. Lead candidates 152-6 and 139-52-V6 have smaller editing windows than A0A2K5RDN7, a favorable feature to avoid off target edits. Engineered CDA 139-52-V6 shows a smaller editing window than its WT counterpart 139-52.

FIG. 45 depicts the mammalian cytotoxicity of stably expressed CDAs as CBEs. CDAs, expressed as CBEs, were stably expressed in mammalian cells by lentiviral integration. The cytotoxicity was measured as fold change relative to a low activity low cytotoxic CDA (rAPOBEC). The lead candidates (high editing efficiency) show medium cytotoxic activity under these conditions. It is understood that the cytotoxic activity will be reduced when the system is expressed transiently.

FIG. 46A-FIG. 46B depicts the dimeric design of MG68-4 variants. FIG. 46A depicts the predicted structure of MG68-4 and structural alignment of MG68-4 with SaTadA (PDB code: 2b3j). The distance between N-terminus of the first monomer and C-terminus of the second monomer is shown. FIG. 46B depicts base editing efficiency comparing the monomeric and dimeric designs. TadA*8.8m was used for benchmarking. The target sequence is shown in the bar chart. Conversion of A to G was obtained from the highest editing position A8. All deaminases were fused to the N-terminus of MG34-1 (D10A). The editing was evaluated in HEK293T cells.

FIG. 47 depicts the effect of D109Q mutation to base substitution of C to G. A to G and C to G conversions were obtained from the target sequences 633 and 634, respectively. The editing efficiencies of residue C6 of target sequence 633 and residue A8 of target sequence 634 are shown. All deaminases were fused to the N-terminus of MG34-1 (D10A). The editing efficiency was evaluated in HEK293T cells.

FIG. 48 depicts base editing efficiency of the combinatorial library in HEK293T cells. Beneficial mutations identified from rational design and directed evolution were installed into MG68-4 to make the combinatorial library. The variants were inserted into 3-68_DIV30_M_RDr1v1_B. The editing efficiency was evaluated in HEK293T cells.

FIG. 49 depicts the effects of MG68-4 dimerization and/or MG68-4 amino acid sequence variants within the 3-68_DIV30 scaffold on A to G conversion percentage in HEK293T cells.

FIG. 50A-FIG. 50B depicts data demonstrating that the MG35-1 nickase can function as the scaffold of an adenine base editor in E. Coli cells. FIG. 50A depicts a schematic of the MG35-1 adenine base editor (ABE) containing a C-terminal TadA*-(7.10) monomer and an SV40 NLS fused to the C-terminus. FIG. 50B depicts a chloramphenicol selection experiment used to assess MG35-1 ABE base editing. A plasmid containing the MG35-1 ABE, a non-functional chloramphenicol acetyltransferase (CAT) gene, and a sgRNA that either targets the CAT gene (targeting sgRNA) or does not target the CAT gene (non-targeting sgRNA) are transformed into BL21(DE3) (Lucigen) E. Coli cells. E. Coli survival under chloramphenicol selection was dependent on the MG35-1 ABE editing the non-functional CAT gene to its wildtype sequence. Transformed E. Coli was plated on plates containing chloramphenicol concentrations of 0, 2, 3, 4, and 8 μg/mL. Plates also contained 100 μg/mL Carbecillin and 0.1 mM IPTG. Colonies grown on plates containing chloramphenicol concentrations of 0, 2, 3, 4, and 8 μg/mL were sequenced to assess reversion of the CAT gene. Experiments were performed as n=2.

FIG. 51 depicts the activity of 3-6/8 ABE at Apoa1. High A to G conversion was observed with 26 Apoa1 guides. For all spacers shown in the graph, base conversion at all A positions within the spacer region is shown.

FIG. 52 depicts the activity of 3-6/8 ABE at Angptl3. High A to G conversion was observed with 5 Angptl3 guides. For all spacers shown in the graph, base conversion at all A positions within the spacer region is shown.

FIG. 53 depicts the activity of 3-6/8 ABE at Trac. High A to G conversion was observed with 2 Trac guides. For all spacers shown in the graph, base conversion at all A positions within the spacer region is shown.

FIG. 54 depicts the background 3-6/8 ABE activity at Apoa1. Primer pairs for active guides were tested on mock-nucleofected samples to assay background editing at targeted regions. Scale is from 0 to 1%.

FIG. 55A-FIG. 55E depicts an E. coli survival assay with an nMG35-1 ABE. E. coli was transformed with a plasmid containing the nMG35-1-ABE, a non-functional chloramphenicol acetyltransferase (CAT Y193) gene, and an sgRNA that either targets the CAT gene (targeting spacer) or not (scramble spacer). FIG. 55A depicts a diagram showing the target sequences with the expected TAM. Cell growth is dependent on the ABE base editing the non-functional CAT gene (A at position 17 from the TAM/PAM, boxed) to restore activity. FIGS. 55B-55E depicts the base editing activity in E. coli of base editors comprising nMG35-1 fused to the TadA deaminase with linkers of various lengths. The X axis shows the linkers listed in Table 14.

FIG. 56A-FIG. 56D depicts the evaluation of nMG35-1 ABE base editing in an E. coli survival assay under chloramphenicol selection, where cell growth is dependent on the ABE base editing the non-functional CAT gene stop codon and restoring activity. FIGS. 56A-56B depict diagrams showing the target sequences with the expected TAM. The “A” base at position 11 (A) or 10 (B) from the TAM (boxes) is expected to edit to “G” in order to revert the stop codon to glutamine and restore chloramphenicol (cm) resistance. FIG. 56C: E. coli was transformed with a plasmid containing the nMG35-1-ABE, a non-functional chloramphenicol acetyltransferase (CAT), and an sgRNA that either targets the CAT gene (targeting spacer) or not (no spacer). Transformed E. coli was grown on plates containing chloramphenicol concentrations of 0, 2, 4, and 8 μg/mL. Plates also contained 100 μg/mL Carbecillin and 0.1 mM IPTG. The nMG35-1-ABE targeting both STOP98Q and STOP122Q contains both stop codons in the same gene that need to be reverted for CAT gene functionality. MIC: minimum inhibitory concentration. FIG. 56D depicts Sanger sequencing chromatograms of five of 18 colonies grown at 2 μg/mL of chloramphenicol for the nMG35-1 ABE double reversion of STOP98Q and STOP122Q in the CAT gene. The chromatogram of the colony that does not show reversion (colony 3) reveals a smaller peak for A to G conversion that is likely obscured due to co-transformation with an unedited plasmid.

FIG. 57 depicts data demonstrating that truncation of the predicted PLMP domain at the N-terminus of MG35-1 ablates function of the MG35-1 ABE in E. coli. E. coli was transformed with a plasmid containing the nMG35-1-ABE, a non-functional chloramphenicol acetyltransferase (CAT), and an sgRNA that either targets the CAT gene (WT (top row) or PLMP domain truncation (bottom row) MG35-1 ABE) or a non-target spacer (middle row: WT MG35-1 ABE with a scrambled spacer). Transformed E. coli was grown on plates containing chloramphenicol concentrations of 0, 2 and 4 μg/mL. Plates also contained 100 μg/mL Carbecillin and 0.1 mM IPTG. MIC: minimum inhibitory concentration.

BRIEF DESCRIPTION OF THE SEQUENCE LISTING

The Sequence Listing filed herewith provides exemplary polynucleotide and polypeptide sequences for use in methods, compositions and systems according to the disclosure. Below are exemplary descriptions of sequences therein.

SEQ ID NOs: 1-47 show the full-length peptide sequences of MG66 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 48-49 show the full-length peptide sequences of MG67 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 50-51 show the full-length peptide sequences of MG68 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 52-56 show the sequences of uracil DNA glycosylase inhibitors suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 57-66 show the sequences of reference deaminases.

SEQ ID NO: 67 shows the sequence of a reference uracil DNA glycosylase inhibitor.

SEQ ID NO: 68 shows the sequence of an adenine base editor.

SEQ ID NO: 69 shows the sequence of a cytosine base editor.

SEQ ID NOs: 70-78 show the full-length peptide sequences of MG nickases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 79-87 shows the protospacer and PAM used in in vitro nickase assays described herein.

SEQ ID NOs: 88-96 show the peptide sequences of single guide RNA used in in vitro nickase assays described herein.

SEQ ID NOs: 97-156 show the sequences of spacers when targeting E. coli lacZ.

SEQ ID NOs: 157-176 show the sequences of primers when conducting site directed mutagenesis.

SEQ ID NOs: 177-178 show the sequences of primers for lacZ sequencing.

SEQ ID NOs: 179-342 show the sequences of primers used during amplification.

SEQ ID NOs: 343-345 show the sequences of primers for lacZ sequencing.

SEQ ID NOs: 346-359 show the sequences of primers used during amplification.

Sequence Numbers: A360-A368 show protospacer adjacent motifs suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 369-384 show nuclear localization sequences (NLS's) suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 385-443 show the full-length peptide sequences of MG68 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 444-447 show the full-length peptide sequences of MG121 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 448-475 show the full-length peptide sequences of MG68 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 476 and 477 show sequences of adenine base editors.

SEQ ID NOs: 478-482 show sequences of cytosine base editors.

SEQ ID NOs: 483-487 show the sequences of plasmids suitable for encoding the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 488 and 489 show the sgRNA scaffold sequences for MG15-1 and MG34-1.

SEQ ID NOs: 490-522 show the sequences of spacers used to target genomic loci in E. coli and HEK293T cells.

SEQ ID NOs: 523-585 show the sequences of primers used during amplification and Sanger sequencing.

SEQ ID NOs: 584-585 show the sequences of primers used during amplification.

SEQ ID NO: 586 shows the sequence of an adenine base editor.

SEQ ID NO: 587 shows the sequence of a cytosine base editor.

SEQ ID NOs: 588-589 show sequences of adenine base editors.

SEQ ID NOs: 590-593 show the full-length peptide sequences of linkers suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 594 shows the sequence of a cytosine deaminase.

SEQ ID NO: 595 shows the sequence of an adenosine deaminase.

SEQ ID NO: 596 shows the sequence of an MG34 active effector suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 597 shows the sequence of an MG34 nickase suitable for the engineered nucleic acid editing systems described herein.

Sequence Number: A598 shows the sequence of an MG34 PAM.

SEQ ID NOs: 599-638 show the full-length peptide sequences of MG138 cytidine deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 639-659 show the full-length peptide sequences of MG139 cytidine deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 660-662 show the full-length peptide sequences of MG141 cytidine deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 663-664 show the full-length peptide sequences of MG142 cytidine deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 665-675 show the full-length peptide sequences of MG93 cytidine deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 676-678 show sequences of adenine base editors.

SEQ ID NOs: 679-680 show the sgRNA scaffold sequences for MG34-1 and SpCas9.

SEQ ID NOs: 681-689 show spacer sequences used to target genomic loci in guide RNAs.

SEQ ID NOs: 690-707 show sequences of primers used to amplify genomic targets of adenine bae editors (ABE) for next generation sequencing (NGS) analysis.

SEQ ID NO: 708 shows the sequence of a blasticidin (BSD) resistance cassette.

SEQ ID NOs: 709-719 show spacer sequences used to target genomic loci in guide RNAs.

SEQ ID NOs: 720-726 show the sequences of plasmids suitable for encoding the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 728-729 show sequences of adenine base editors.

SEQ ID NOs: 730-736 show spacer sequences used to target genomic loci in guide RNAs.

SEQ ID NOs: 737-738 show the sequences of plasmids suitable for encoding the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 739-740 show sequences of cytidine base editors.

SEQ ID NO: 741 shows the sequence of a plasmid suitable for encoding the A1CF gene.

SEQ ID NO: 742 shows the sequence of an RNA used to test CDAs for RNA activity.

SEQ ID NO: 743 shows the sequence of a labelled primer for poisoned primer extension assay used to test CDAs for RNA activity.

SEQ ID NOs: 744-827 show the full-length peptide sequences of MG139 cytidine deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 828 shows the full-length peptide sequence of an MG93 cytidine deaminase suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 829 shows the full-length peptide sequence of an MG142 cytidine deaminase suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 830-835 show the full-length peptide sequences of MG152 cytidine deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 836-860 show sequences of adenine base editors.

SEQ ID NOs: 861-864 show spacer sequences used to target genomic loci in guide RNAs.

SEQ ID NOs: 865-872 show sequences of primers used to amplify genomic targets of adenine bae editors (ABE) for next generation sequencing (NGS) analysis.

SEQ ID NOs: 873-875 show the sequences of plasmids suitable for encoding the engineered nucleic acid editing systems described herein.

SEQ ID NO: 876 shows the sgRNA scaffold sequence for MG34-1.

SEQ ID NOs: 877-916 show sequences of cytosine base editors.

SEQ ID NOs: 917-931 show the sequences of sgRNAs suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 932-961 show sequences of primers used to amplify genomic targets of adenine base editors (ABE) for next generation sequencing (NGS) analysis.

SEQ ID NO: 962 shows a site engineered in mammalian cell line with 5 PAMs compatible with Cas9 and MG3-6 editing.

SEQ ID NOs: 963-967 show the sequences of sgRNAs suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 968-969 show sequences of cytosine base editors.

SEQ ID NO: 970 shows the full-length peptide sequence of an MG139 cytidine deaminase suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 971-977 show the full-length peptide sequences of MG93 cytidine deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 978-981 show the full-length peptide sequences of MG138 cytidine deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 982 shows the full-length peptide sequence of MG142 cytidine deaminase suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 983-1014 shows the full-length peptide sequence of MG128 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 1015-1026 shows the full-length peptide sequence of MG129 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 1027-1031 shows the full-length peptide sequence of MG130 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 1032-1040 shows the full-length peptide sequence of MG131 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 1041-1043 shows the full-length peptide sequence of MG132 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 1044-1057 shows the full-length peptide sequence of MG133 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 1058-1061 shows the full-length peptide sequence of MG134 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 1062-1069 shows the full-length peptide sequence of MG135 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 1070-1081 shows the full-length peptide sequence of MG136 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NO: 1082-1098 shows the full-length peptide sequence of MG137 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 1099-1105 show the sequences of sgRNAs suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 1106-1111 show the sequences of MG35 PAMs.

SEQ ID NO: 1112 shows the DNA sequence of a gene encoding the ABE-MG35-1 adenine base editor.

SEQ ID NO: 1113 shows the protein sequence of the ABE-MG35-1 adenine base editor.

SEQ ID NO: 1114 shows the nucleotide sequence of a plasmid encoding a Cas9-based cytosine base editor (CBE).

SEQ ID NO: 1115 shows the nucleotide sequence of a plasmid encoding Fam72a.

SEQ ID NOs: 1116-1117 show the sequences of Cas9-CBE target sites.

SEQ ID NOs: 1118-1119 show the sequences of NGS amplicons.

SEQ ID NO: 1120 shows the full-length peptide sequence of an MG35 nuclease.

SEQ ID NO: 1121 shows the full-length peptide sequence of Fam72A.

SEQ ID NOs: 1121-1127 shows the full-length peptide sequences of MG35 nucleases.

SEQ ID NOs: 1128-1160 shows the full-length peptide sequences of MG3-6/3-8 adenine base editors.

SEQ ID NOs: 1161-1186 shows the full-length peptide sequences of MG34-1 adenine base editors.

SEQ ID NOs: 1187-1195 show the sequences of sgRNAs suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 1196-1204 show spacer sequences used to target genomic loci in guide RNAs.

SEQ ID NO: 1205 shows the nucleotide sequence of a plasmid encoding an MG3-6/3-8 adenine base editor.

SEQ ID NO: 1206 shows the nucleotide sequence of a plasmid encoding an sgRNA suitable for an MG3-6/3-8 adenine base editor described herein.

SEQ ID NO: 1207 shows the nucleotide sequence of a plasmid encoding an MG34-1 adenine base editor.

SEQ ID NOs: 1208-1269 show the full-length peptide sequences of MG93 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 1270-1296 show the full-length peptide sequences of MG139 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 1297-1311 show the full-length peptide sequences of MG152 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 1312-1313 show the full-length peptide sequences of MG138 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 1314-1315 show the full-length peptide sequences of MG139 deaminases suitable for the engineered nucleic acid editing systems described herein.

SEQ ID NOs: 1316-1319 show the nucleotide sequences of 5′-FAM-labeled ssDNAs.

SEQ ID NOs: 1320-1321 show the nucleotide sequences of Cy5.5-labeled ssDNAs.

SEQ ID NOs: 1322-1355 show sequences of cytidine base editors.

SEQ ID NOs: 1356-1362 show the full-length peptide sequences of MG34-1 adenine base editors.

SEQ ID NOs: 1363-1415 show the full-length peptide sequences of MG3-6/3-8 adenine base editors.

SEQ ID NOs: 1416-1417 show the nucleotide sequences of sgRNAs suitable for use with MG34-1 adenine base editors described herein.

SEQ ID NO: 1418 shows the nucleotide sequence of an sgRNA suitable for use with MG3-6/3-8 adenine base editors described herein.

SEQ ID NOs: 1419-1420 show the DNA sequences of target sites suitable for targeting by MG34-1 adenine base editors described herein.

SEQ ID NO: 1421 shows a DNA sequence of a target site suitable for targeting by MG3-6/3-8 adenine base editors described herein.

SEQ ID NO: 1422 shows the nucleotide sequence of a plasmid suitable for expression of an MG34-1 adenine base editor described herein.

SEQ ID NO: 1423 shows the nucleotide sequence of a plasmid suitable for expression of an MG3-6/3-8 adenine base editor described herein.

SEQ ID NO: 1424 shows the full-length peptide sequence of an MG35-1 adenine base editor.

SEQ ID NO: 1425-1426 show the nucleotide sequences of plasmids suitable for expression of MG35-1 adenine base editors and sgRNAs described herein.

SEQ ID NOs: 1427-1428 show the nucleotide sequences of sgRNAs suitable for use with MG35-1 adenine base editors described herein.

SEQ ID NOs: 1429-1430 show the DNA sequences of target sites suitable for targeting by MG35-1 adenine base editors described herein.

SEQ ID NOs: 1431-1454 show the nucleotide sequences of sgRNAs engineered to function with an MG3-6/3-8 adenine base editor in order to target APOA1.

SEQ ID NOs: 1455-1478 show the DNA sequences of APOA1 target sites.

SEQ ID NOs: 1479-1483 show the nucleotide sequences of sgRNAs engineered to function with an MG3-6/3-8 adenine base editor in order to target ANGPTL3.

SEQ ID NOs: 1484-1488 show the DNA sequences of ANGPTL3 target sites.

SEQ ID NOs: 1489-1490 show the nucleotide sequences of sgRNAs engineered to function with an MG3-6/3-8 adenine base editor in order to target TRAC.

SEQ ID NOs: 1491-1492 show the DNA sequences of TRAC sites.

SEQ ID NOs: 1493-1516 show the nucleotide sequences of NGS primers suitable for use in assessing base editing of APOA1.

SEQ ID NOs: 1517-1521 show the nucleotide sequences of NGS primers suitable for use in assessing base editing of ANGPTL3.

SEQ ID NOs: 1522-1523 show the nucleotide sequences of NGS primers suitable for use in assessing base editing of TRAC.

SEQ ID NOs: 1524-1547 show the nucleotide sequences of NGS primers suitable for use in assessing base editing of APOA1.

SEQ ID NOs: 1548-1552 show the nucleotide sequences of NGS primers suitable for use in assessing base editing of ANGPTL3.

SEQ ID NOs: 1553-1554 show the nucleotide sequences of NGS primers suitable for use in assessing base editing of TRAC.

SEQ ID NO: 1555 shows the nucleotide sequence of a plasmid suitable for use in mRNA production.

SEQ ID NOs: 1556-1562 show the full-length peptide sequences of MG131 adenine deaminase variants.

SEQ ID NOs: 1563-1566 show the full-length peptide sequences of MG134 adenine deaminase variants.

SEQ ID NOs: 1567-1574 show the full-length peptide sequences of MG135 adenine deaminase variants.

SEQ ID NOs: 1575-1589 show the full-length peptide sequences of MG137 adenine deaminase variants.

SEQ ID NOs: 1590-1599 show the full-length peptide sequences of MG68 adenine deaminase variants.

SEQ ID NOs: 1600-1602 show the full-length peptide sequences of MG132 adenine deaminase variants.

SEQ ID NOs: 1603-1616 show the full-length peptide sequences of MG133 adenine deaminase variants.

SEQ ID NOs: 1617-1624 show the full-length peptide sequences of MG136 adenine deaminase variants.

SEQ ID NOs: 1625-1633 show the full-length peptide sequences of MG129 adenine deaminase variants.

SEQ ID NOs: 1634-1638 show the full-length peptide sequences of MG130 adenine deaminase variants.

SEQ ID NOs: 1639-1644 show the full-length peptide sequences of MG34-1 adenine base editors.

SEQ ID NOs: 1645-1646 show the nucleotide sequences of ssDNA substrates suitable for testing adenine deaminase activity in vitro.

DETAILED DESCRIPTION

While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.

The practice of some methods disclosed herein employ, unless otherwise indicated, techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DNA. See for example Sambrook and Green, Molecular Cloning: A Laboratory Manual, 4th Edition (2012); the series Current Protocols in Molecular Biology (F. M. Ausubel, et al. eds.); the series Methods In Enzymology (Academic Press, Inc.), PCR 2: A Practical Approach (M. J. MacPherson, B. D. Hames and G. R. Taylor eds. (1995)), Harlow and Lane, eds. (1988) Antibodies, A Laboratory Manual, and Culture of Animal Cells: A Manual of Basic Technique and Specialized Applications, 6th Edition (R. I. Freshney, ed. (2010)) (which is entirely incorporated by reference herein).

As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, to the extent that the terms “including”, “includes”, “having”, “has”, “with”, or variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising.”

The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within one or more than one standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 15%, up to 10%, up to 5%, or up to 1% of a given value.

As used herein, a “cell” generally refers to a biological cell. A cell may be the basic structural, functional or biological unit of a living organism. A cell may originate from any organism having one or more cells. Some non-limiting examples include: a prokaryotic cell, eukaryotic cell, a bacterial cell, an archaeal cell, a cell of a single-cell eukaryotic organism, a protozoa cell, a cell from a plant (e.g., cells from plant crops, fruits, vegetables, grains, soy bean, corn, maize, wheat, seeds, tomatoes, rice, cassava, sugarcane, pumpkin, hay, potatoes, cotton, cannabis, tobacco, flowering plants, conifers, gymnosperms, ferns, clubmosses, homworts, liverworts, mosses), an algal cell, (e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens C. Agardh, and the like), seaweeds (e.g., kelp), a fungal cell (e.g., a yeast cell, a cell from a mushroom), an animal cell, a cell from an invertebrate animal (e.g., fruit fly, cnidarian, echinoderm, nematode, etc.), a cell from a vertebrate animal (e.g., fish, amphibian, reptile, bird, mammal), a cell from a mammal (e.g., a pig, a cow, a goat, a sheep, a rodent, a rat, a mouse, a non-human primate, a human, etc.), and etcetera. Sometimes a cell is not originating from a natural organism (e.g., a cell can be a synthetically made, sometimes termed an artificial cell).

The term “nucleotide,” as used herein, generally refers to a base-sugar-phosphate combination. A nucleotide may comprise a synthetic nucleotide. A nucleotide may comprise a synthetic nucleotide analog. Nucleotides may be monomeric units of a nucleic acid sequence (e.g., deoxyribonucleic acid (DNA) and ribonucleic acid (RNA)). The term nucleotide may include ribonucleoside triphosphates adenosine triphosphate (ATP), uridine triphosphate (UTP), cytosine triphosphate (CTP), guanosine triphosphate (GTP) and deoxyribonucleoside triphosphates such as dATP, dCTP, dITP, dUTP, dGTP, dTTP, or derivatives thereof. Such derivatives may include, for example, [αS]dATP, 7-deaza-dGTP and 7-deaza-dATP, and nucleotide derivatives that confer nuclease resistance on the nucleic acid molecule containing them. The term nucleotide as used herein may refer to dideoxyribonucleoside triphosphates (ddNTPs) and their derivatives. Illustrative examples of dideoxyribonucleoside triphosphates may include, but are not limited to, ddATP, ddCTP, ddGTP, ddITP, and ddTTP. A nucleotide may be unlabeled or detectably labeled, such as using moieties comprising optically detectable moieties (e.g., fluorophores). Labeling may also be carried out with quantum dots. Detectable labels may include, for example, radioactive isotopes, fluorescent labels, chemiluminescent labels, bioluminescent labels and enzyme labels. Fluorescent labels of nucleotides may include but are not limited fluorescein, 5-carboxyfluorescein (FAM), 2′7′-dimethoxy-4′5-dichloro-6-carboxyfluorescein (JOE), rhodamine, 6-carboxyrhodamine (R6G), N,N,N′,N′-tetramethyl-6-carboxyrhodamine (TAMRA), 6-carboxy-X-rhodamine (ROX), 4-(4′dimethylaminophenylazo) benzoic acid (DABCYL), Cascade Blue, Oregon Green, Texas Red, Cyanine and 5-(2′-aminoethyl)aminonaphthalene-1-sulfonic acid (EDANS). Specific examples of fluorescently labeled nucleotides can include [R6G]dUTP, [TAMRA]dUTP, [R110]dCTP, [R6G]dCTP, [TAMRA]dCTP, [JOE]ddATP, [R6G]ddATP, [FAM]ddCTP, [R110]ddCTP, [TAMRA]ddGTP, [ROX]ddTTP, [dR6G]ddATP, [dR110]ddCTP, [dTAMRA]ddGTP, and [dROX]ddTTP available from Perkin Elmer, Foster City, Calif; FluoroLink DeoxyNucleotides, FluoroLink Cy3-dCTP, FluoroLink Cy5-dCTP, FluoroLink Fluor X-dCTP, FluoroLink Cy3-dUTP, and FluoroLink Cy5-dUTP available from Amersham, Arlington Heights, Ill.; Fluorescein-15-dATP, Fluorescein-12-dUTP, Tetramethyl-rodamine-6-dUTP, IR770-9-dATP, Fluorescein-12-ddUTP, Fluorescein-12-UTP, and Fluorescein-15-2′-dATP available from Boehringer Mannheim, Indianapolis, Ind.; and Chromosome Labeled Nucleotides, BODIPY-FL-14-UTP, BODIPY-FL-4-UTP, BODIPY-TMR-14-UTP, BODIPY-TMR-14-dUTP, BODIPY-TR-14-UTP, BODIPY-TR-14-dUTP, Cascade Blue-7-UTP, Cascade Blue-7-dUTP, fluorescein-12-UTP, fluorescein-12-dUTP, Oregon Green 488-5-dUTP, Rhodamine Green-5-UTP, Rhodamine Green-5-dUTP, tetramethylrhodamine-6-UTP, tetramethylrhodamine-6-dUTP, Texas Red-5-UTP, Texas Red-5-dUTP, and Texas Red-12-dUTP available from Molecular Probes, Eugene, Oreg. Nucleotides can also be labeled or marked by chemical modification. A chemically-modified single nucleotide can be biotin-dNTP. Some non-limiting examples of biotinylated dNTPs can include, biotin-dATP (e.g., bio-N6-ddATP, biotin-14-dATP), biotin-dCTP (e.g., biotin-11-dCTP, biotin-14-dCTP), and biotin-dUTP (e.g., biotin-11-dUTP, biotin-16-dUTP, biotin-20-dUTP).

The terms “polynucleotide,” “oligonucleotide,” and “nucleic acid” are used interchangeably to generally refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof, either in single-, double-, or multi-stranded form. A polynucleotide may be exogenous or endogenous to a cell. A polynucleotide may exist in a cell-free environment. A polynucleotide may be a gene or fragment thereof. A polynucleotide may be DNA. A polynucleotide may be RNA. A polynucleotide may have any three-dimensional structure and may perform any function. A polynucleotide may comprise one or more analogs (e.g., altered backbone, sugar, or nucleobase). If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. Some non-limiting examples of analogs include: 5-bromouracil, peptide nucleic acid, xeno nucleic acid, morpholinos, locked nucleic acids, glycol nucleic acids, threose nucleic acids, dideoxynucleotides, cordycepin, 7-deaza-GTP, fluorophores (e.g., rhodamine or fluorescein linked to the sugar), thiol containing nucleotides, biotin linked nucleotides, fluorescent base analogs, CpG islands, methyl-7-guanosine, methylated nucleotides, inosine, thiouridine, pseudouridine, dihydrouridine, queuosine, and wyosine. Non-limiting examples of polynucleotides include coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, cell-free polynucleotides including cell-free DNA (cfDNA) and cell-free RNA (cfRNA), nucleic acid probes, and primers. The sequence of nucleotides may be interrupted by non-nucleotide components.

The terms “transfection” or “transfected” generally refer to introduction of a nucleic acid into a cell by non-viral or viral-based methods. The nucleic acid molecules may be gene sequences encoding complete proteins or functional portions thereof. See, e.g., Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, 18.1-18.88.

The terms “peptide,” “polypeptide,” and “protein” are used interchangeably herein to generally refer to a polymer of at least two amino acid residues joined by peptide bond(s). This term does not connote a specific length of polymer, nor is it intended to imply or distinguish whether the peptide is produced using recombinant techniques, chemical or enzymatic synthesis, or is naturally occurring. The terms apply to naturally occurring amino acid polymers as well as amino acid polymers comprising at least one modified amino acid. In some cases, the polymer may be interrupted by non-amino acids. The terms include amino acid chains of any length, including full length proteins, and proteins with or without secondary or tertiary structure (e.g., domains). The terms also encompass an amino acid polymer that has been modified, for example, by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, oxidation, and any other manipulation such as conjugation with a labeling component. The terms “amino acid” and “amino acids,” as used herein, generally refer to natural and non-natural amino acids, including, but not limited to, modified amino acids and amino acid analogues. Modified amino acids may include natural amino acids and non-natural amino acids, which have been chemically modified to include a group or a chemical moiety not naturally present on the amino acid. Amino acid analogues may refer to amino acid derivatives. The term “amino acid” includes both D-amino acids and L-amino acids.

As used herein, the “non-native” can generally refer to a nucleic acid or polypeptide sequence that is not found in a native nucleic acid or protein. Non-native may refer to affinity tags. Non-native may refer to fusions. Non-native may refer to a naturally occurring nucleic acid or polypeptide sequence that comprises mutations, insertions or deletions. A non-native sequence may exhibit or encode for an activity (e.g., enzymatic activity, methyltransferase activity, acetyltransferase activity, kinase activity, ubiquitinating activity, etc.) that may also be exhibited by the nucleic acid or polypeptide sequence to which the non-native sequence is fused. A non-native nucleic acid or polypeptide sequence may be linked to a naturally-occurring nucleic acid or polypeptide sequence (or a variant thereof) by genetic engineering to generate a chimeric nucleic acid or polypeptide sequence encoding a chimeric nucleic acid or polypeptide.

The term “promoter”, as used herein, generally refers to the regulatory DNA region which controls transcription or expression of a gene and which may be located adjacent to or overlapping a nucleotide or region of nucleotides at which RNA transcription is initiated. A promoter may contain specific DNA sequences which bind protein factors, often referred to as transcription factors, which facilitate binding of RNA polymerase to the DNA leading to gene transcription. A ‘basal promoter’, also referred to as a ‘core promoter’, may generally refer to a promoter that contains all the basic elements to promote transcriptional expression of an operably linked polynucleotide. Eukaryotic basal promoters can contain a TATA-box or a CAAT box.

The term “expression”, as used herein, generally refers to the process by which a nucleic acid sequence or a polynucleotide is transcribed from a DNA template (such as into mRNA or other RNA transcript) or the process by which a transcribed mRNA is subsequently translated into peptides, polypeptides, or proteins. Transcripts and encoded polypeptides may be collectively referred to as “gene product.” If the polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell.

As used herein, “operably linked”, “operable linkage”, “operatively linked”, or grammatical equivalents thereof generally refer to juxtaposition of genetic elements, e.g., a promoter, an enhancer, a polyadenylation sequence, etc., wherein the elements are in a relationship permitting them to operate in the expected manner. For instance, a regulatory element, which may comprise promoter or enhancer sequences, is operatively linked to a coding region if the regulatory element helps initiate transcription of the coding sequence. There may be intervening residues between the regulatory element and coding region so long as this functional relationship is maintained.

A “vector” as used herein, generally refers to a macromolecule or association of macromolecules that comprises or associates with a polynucleotide and which may be used to mediate delivery of the polynucleotide to a cell. Examples of vectors include plasmids, viral vectors, liposomes, and other gene delivery vehicles. The vector generally comprises genetic elements, e.g., regulatory elements, operatively linked to a gene to facilitate expression of the gene in a target.

As used herein, “an expression cassette” and “a nucleic acid cassette” are used interchangeably generally to refer to a combination of nucleic acid sequences or elements that are expressed together or are operably linked for expression. In some cases, an expression cassette refers to the combination of regulatory elements and a gene or genes to which they are operably linked for expression.

A “functional fragment” of a DNA or protein sequence generally refers to a fragment that retains a biological activity (either functional or structural) that is substantially similar to a biological activity of the full-length DNA or protein sequence. A biological activity of a DNA sequence may be its ability to influence expression in a manner attributed to the full-length sequence.

As used herein, an “engineered” object generally indicates that the object has been modified by human intervention. According to non-limiting examples: a nucleic acid may be modified by changing its sequence to a sequence that does not occur in nature; a nucleic acid may be modified by ligating it to a nucleic acid that it does not associate with in nature such that the ligated product possesses a function not present in the original nucleic acid; an engineered nucleic acid may synthesized in vitro with a sequence that does not exist in nature; a protein may be modified by changing its amino acid sequence to a sequence that does not exist in nature; an engineered protein may acquire a new function or property. An “engineered” system comprises at least one engineered component.

As used herein, “synthetic” and “artificial” are used interchangeably to refer to a protein or a domain thereof that has low sequence identity (e.g., less than 50% sequence identity, less than 25% sequence identity, less than 10% sequence identity, less than 5% sequence identity, less than 1% sequence identity) to a naturally occurring human protein. For example, VPR and VP64 domains are synthetic transactivation domains.

The term “tracrRNA” or “tracr sequence”, as used herein, can generally refer to a nucleic acid with at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or 100% sequence identity or sequence similarity to a wild type example tracrRNA sequence (e.g., a tracrRNA from S. pyogenes S. aureus, etc.). tracrRNA can refer to a nucleic acid with at most about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% sequence identity or sequence similarity to a wild type example tracrRNA sequence (e.g., a tracrRNA from S. pyogenes S. aureus, etc.). tracrRNA may refer to a modified form of a tracrRNA that can comprise a nucleotide change such as a deletion, insertion, or substitution, variant, mutation, or chimera. A tracrRNA may refer to a nucleic acid that can be at least about 60% identical to a wild type example tracrRNA (e.g., a tracrRNA from S. pyogenes S. aureus, etc.) sequence over a stretch of at least 6 contiguous nucleotides. For example, a tracrRNA sequence can be at least about 60% identical, at least about 65% identical, at least about 70% identical, at least about 75% identical, at least about 80% identical, at least about 85% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical, or 100% identical to a wild type example tracrRNA (e.g., a tracrRNA from S. pyogenes S. aureus, etc.) sequence over a stretch of at least 6 contiguous nucleotides. Type II tracrRNA sequences can be predicted on a genome sequence by identifying regions with complementarity to part of the repeat sequence in an adjacent CRISPR array.

As used herein, a “guide nucleic acid” can generally refer to a nucleic acid that may hybridize to another nucleic acid. A guide nucleic acid may be RNA. A guide nucleic acid may be DNA. The guide nucleic acid may be programmed to bind to a sequence of nucleic acid site-specifically. The nucleic acid to be targeted, or the target nucleic acid, may comprise nucleotides. The guide nucleic acid may comprise nucleotides. A portion of the target nucleic acid may be complementary to a portion of the guide nucleic acid. The strand of a double-stranded target polynucleotide that is complementary to and hybridizes with the guide nucleic acid may be called the complementary strand. The strand of the double-stranded target polynucleotide that is complementary to the complementary strand, and therefore may not be complementary to the guide nucleic acid may be called noncomplementary strand. A guide nucleic acid may comprise a polynucleotide chain and can be called a “single guide nucleic acid.” A guide nucleic acid may comprise two polynucleotide chains and may be called a “double guide nucleic acid.” If not otherwise specified, the term “guide nucleic acid” may be inclusive, referring to both single guide nucleic acids and double guide nucleic acids. A guide nucleic acid may comprise a segment that can be referred to as a “nucleic acid-targeting segment” or a “nucleic acid-targeting sequence.” A nucleic acid-targeting segment may comprise a sub-segment that may be referred to as a “protein binding segment” or “protein binding sequence” or “Cas protein binding segment”.

The term “sequence identity” or “percent identity” in the context of two or more nucleic acids or polypeptide sequences, generally refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence over a local or global comparison window, as measured using a sequence comparison algorithm. Suitable sequence comparison algorithms for polypeptide sequences include, e.g., BLASTP using parameters of a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment for polypeptide sequences longer than 30 residues; BLASTP using parameters of a wordlength (W) of 2, an expectation (E) of 1000000, and the PAM30 scoring matrix setting gap costs at 9 to open gaps and 1 to extend gaps for sequences of less than 30 residues (these are the default parameters for BLASTP in the BLAST suite available at https://blast.ncbi.nlm.nih.gov); CLUSTALW with parameters of; the Smith-Waterman homology search algorithm with parameters of a match of 2, a mismatch of −1, and a gap of −1; MUSCLE with default parameters; MAFFT with parameters retree of 2 and maxiterations of 1000; Novafold with default parameters; HMMER hmmalign with default parameters.

As used herein, the term “RuvC_III domain” generally refers to a third discontinuous segment of a RuvC endonuclease domain (the RuvC nuclease domain being comprised of three discontiguous segments, RuvC_I, RuvC_II, and RuvC_III). A RuvC domain or segments thereof can generally be identified by alignment to documented domain sequences, structural alignment to proteins with annotated domains, or by comparison to Hidden Markov Models (HMMs) built based on documented domain sequences (e.g., Pfam HMM PF18541 for RuvC III).

As used herein, the term “HNH domain” generally refers to an endonuclease domain having characteristic histidine and asparagine residues. An HNH domain can generally be identified by alignment to documented domain sequences, structural alignment to proteins with annotated domains, or by comparison to Hidden Markov Models (HMMs) built based on documented domain sequences (e.g., Pfam HMM PF01844 for domain HNH).

As used herein, the term “base editor” generally refers to an enzyme that catalyzes the conversion of one target base or base pair into another (e.g. A:T to G:C, C:G to T:A) without requiring the creation and repair of a double-strand break. In some embodiments, the base editor is a deaminase.

As used herein, the term “deaminase” generally refers to a protein or enzyme that catalyzes a deamination reaction. In some embodiments, the deaminase is an adenosine deaminase, which catalyzes the hydrolytic deamination of adenine or adenosine (e.g., an engineered adenosine deaminase that deaminates adenosine in DNA). In some embodiments, the deaminase or deaminase domain is a cytidine (or cytosine) deaminase, catalyzing the hydrolytic deamination of cytidine (or cytosine) or deoxycytidine to uridine (or uracil) or deoxyuridine, respectively. In some embodiments, the deaminase or deaminase domain is a cytidine (or cytosine) deaminase domain, catalyzing the hydrolytic deamination of cytosine (or cytosine) to uracil (or uridine). In some embodiments, the deaminase or deaminase domain is a naturally-occurring deaminase from an organism, such as a human, chimpanzee, gorilla, monkey, cow, dog, rat, mouse, or bacterium (e.g. E. coli). In some embodiments, the deaminase or deaminase domain is a variant of a naturally-occurring deaminase from an organism that does not occur in nature.

The term “optimally aligned” in the context of two or more nucleic acids or polypeptide sequences, generally refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that have been aligned to maximal correspondence of amino acids residues or nucleotides, for example, as determined by the alignment producing a highest or “optimized” percent identity score.

Included in the current disclosure are variants of any of the enzymes described herein with one or more conservative amino acid substitutions. Such conservative substitutions can be made in the amino acid sequence of a polypeptide without disrupting the three-dimensional structure or function of the polypeptide. Conservative substitutions can be accomplished by substituting amino acids with similar hydrophobicity, polarity, and R chain length for one another. Additionally, or alternatively, by comparing aligned sequences of homologous proteins from different species, conservative substitutions can be identified by locating amino acid residues that have been mutated between species (e.g., non-conserved residues) without altering the basic functions of the encoded proteins. Such conservatively substituted variants may include variants with at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of the endonuclease protein sequences described herein. In some embodiments, such conservatively substituted variants are functional variants. Such functional variants can encompass sequences with substitutions such that the activity of one or more critical active site residues or guide RNA binding residues of the endonuclease are not disrupted.

Also included in the current disclosure are variants of any of the enzymes described herein with substitution of one or more catalytic residues to decrease or eliminate activity of the enzyme (e.g. decreased-activity variants). In some embodiments, a decreased activity variant as a protein described herein comprises a disrupting substitution of at least one, at least two, or all three catalytic residues. In some embodiments, any of the endonucleases described herein can comprise a nickase mutation. In some embodiments, any of the endonucleases described herein can comprise a RuvC domain lacking nuclease activity. In some embodiments, any of the endonucleases described herein can be configured to cleave one strand of a double-stranded target deoxyribonucleic acid. In some embodiments, any of the endonucleases described herein can comprise can be configured to lack endonuclease activity or be catalytically dead.

Conservative substitution tables providing functionally similar amino acids are available from a variety of references (see, for e.g., Creighton, Proteins: Structures and Molecular Properties (W H Freeman & Co.; 2nd edition (December 1993)). The following eight groups each contain amino acids that are conservative substitutions for one another:

- 1) Alanine (A), Glycine (G);
- 2) Aspartic acid (D), Glutamic acid (E);
- 3) Asparagine (N), Glutamine (Q);
- 4) Arginine (R), Lysine (K);
- 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V);
- 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W);
- 7) Serine (S), Threonine (T); and
- 8) Cysteine (C), Methionine (M)

Overview

The discovery of new CRISPR enzymes with unique functionality and structure may offer the potential to further disrupt deoxyribonucleic acid (DNA) editing technologies, improving speed, specificity, functionality, and ease of use. Relative to the predicted prevalence of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) systems in microbes and the sheer diversity of microbial species, comparatively few functionally characterized CRISPR enzymes exist in the literature. This is partly because a huge number of microbial species may not be readily cultivated in laboratory conditions. Metagenomic sequencing from natural environmental niches that represent large numbers of microbial species may offer the potential to drastically increase the number of new CRISPR systems documented and speed the discovery of new oligonucleotide editing functionalities. A recent example of the fruitfulness of such an approach is demonstrated by the 2016 discovery of CasX/CasY CRISPR systems from metagenomic analysis of natural microbial communities.

CRISPR systems are RNA-directed nuclease complexes that have been described to function as an adaptive immune system in microbes. In their natural context, CRISPR systems occur in CRISPR (clustered regularly interspaced short palindromic repeats) operons or loci, which generally comprise two parts: (i) an array of short repetitive sequences (30-40 bp) separated by equally short spacer sequences, which encode the RNA-based targeting element; and (ii) ORFs encoding the nuclease polypeptide directed by the RNA-based targeting element alongside accessory proteins/enzymes. Efficient nuclease targeting of a particular target nucleic acid sequence generally requires both (i) complementary hybridization between the first 6-8 nucleic acids of the target (the target seed) and the crRNA guide; and (ii) the presence of a protospacer-adjacent motif (PAM) sequence within a defined vicinity of the target seed (the PAM usually being a sequence not commonly represented within the host genome). Depending on the exact function and organization of the system, CRISPR systems are commonly organized into 2 classes, 5 types and 16 subtypes based on shared functional characteristics and evolutionary similarity (see FIG. 1).

Class I CRISPR systems have large, multisubunit effector complexes, and comprise Types I, III, and IV.

Type I CRISPR systems are considered of moderate complexity in terms of components. In Type I CRISPR systems, the array of RNA-targeting elements is transcribed as a long precursor crRNA (pre-crRNA) that is processed at repeat elements to liberate short, mature crRNAs that direct the nuclease complex to nucleic acid targets when they are followed by a suitable short consensus sequence called a protospacer-adjacent motif (PAM). This processing occurs via an endoribonuclease subunit (Cas6) of a large endonuclease complex called Cascade, which also comprises a nuclease (Cas3) protein component of the crRNA-directed nuclease complex. Type I nucleases function primarily as DNA nucleases.

Type III CRISPR systems may be characterized by the presence of a central nuclease, known as Cas10, alongside a repeat-associated mysterious protein (RAMP) that comprises Csm or Cmr protein subunits. Like in Type I systems, the mature crRNA is processed from a pre-crRNA using a Cas6-like enzyme. Unlike type I and II systems, type III systems appear to target and cleave DNA-RNA duplexes (such as DNA strands being used as templates for an RNA polymerase).

Type IV CRISPR systems possess an effector complex that comprises a highly reduced large subunit nuclease (csf1), two genes for RAMP proteins of the Cas5 (csf3) and Cas7 (csf2) groups, and, in some cases, a gene for a predicted small subunit; such systems are commonly found on endogenous plasmids.

Class II CRISPR systems generally have single-polypeptide multidomain nuclease effectors, and comprise Types II, V and VI.

Type II CRISPR systems are considered the simplest in terms of components. In Type II CRISPR systems, the processing of the CRISPR array into mature crRNAs does not require the presence of a special endonuclease subunit, but rather a small trans-encoded crRNA (tracrRNA) with a region complementary to the array repeat sequence; the tracrRNA interacts with both its corresponding effector nuclease (e.g. Cas9) and the repeat sequence to form a precursor dsRNA structure, which is cleaved by endogenous RNAse III to generate a mature effector enzyme loaded with both tracrRNA and crRNA. Type II nucleases are known as DNA nucleases. Type 2 effectors generally exhibit a structure comprising a RuvC-like endonuclease domain that adopts the RNase H fold with an unrelated HNH nuclease domain inserted within the folds of the RuvC-like nuclease domain. The RuvC-like domain is responsible for the cleavage of the target (e.g., crRNA complementary) DNA strand, while the HNH domain is responsible for cleavage of the displaced DNA strand.

Type V CRISPR systems are characterized by a nuclease effector (e.g. Cas12) structure similar to that of Type II effectors, comprising a RuvC-like domain. Similar to Type II, most (but not all) Type V CRISPR systems use a tracrRNA to process pre-crRNAs into mature crRNAs; however, unlike Type II systems which requires RNAse III to cleave the pre-crRNA into multiple crRNAs, type V systems are capable of using the effector nuclease itself to cleave pre-crRNAs. Like Type-II CRISPR systems, Type V CRISPR systems are again known as DNA nucleases. Unlike Type II CRISPR systems, some Type V enzymes (e.g., Cas12a) appear to have a robust single-stranded nonspecific deoxyribonuclease activity that is activated by the first crRNA directed cleavage of a double-stranded target sequence.

Type VI CRISPR systems have RNA-guided RNA endonucleases. Instead of RuvC-like domains, the single polypeptide effector of Type VI systems (e.g. Cas13) comprises two HEPN ribonuclease domains. Differing from both Type II and V systems, Type VI systems also may not require a tracrRNA in some instances for processing of pre-crRNA into crRNA. Similar to type V systems, however, some Type VI systems (e.g., C2C2) appear to possess robust single-stranded nonspecific nuclease (ribonuclease) activity activated by the first crRNA directed cleavage of a target RNA.

Because of their simpler architecture, Class II CRISPR have been most widely adopted for engineering and development as designer nuclease/genome editing applications.

One of the early adaptations of such a system for in vitro use can be found in Jinek et al. (Science. 2012 Aug. 17; 337(6096):816-21, which is entirely incorporated herein by reference). The Jinek study first described a system that involved (i) recombinantly-expressed, purified full-length Cas9 (e.g., a Class II, Type II enzyme) isolated from S. pyogenes SF370, (ii) purified mature ˜42 nt crRNA bearing a ˜20 nt 5′ sequence complementary to the target DNA sequence to be cleaved followed by a 3′ tracr-binding sequence (the whole crRNA being in vitro transcribed from a synthetic DNA template carrying a T7 promoter sequence); (iii) purified tracrRNA in vitro transcribed from a synthetic DNA template carrying a T7 promoter sequence, and (iv) Mg²⁺. Jinek later described an improved, engineered system wherein the crRNA of (ii) is joined to the 5′ end of (iii) by a linker (e.g., GAAA) to form a single fused synthetic guide RNA (sgRNA) capable of directing Cas9 to a target by itself.

Mali et al. (Science. 2013 Feb. 15; 339(6121): 823-826), which is entirely incorporated herein by reference, later adapted this system for use in mammalian cells by providing DNA vectors encoding (i) an ORF encoding codon-optimized Cas9 (e.g., a Class II, Type II enzyme) under a suitable mammalian promoter with a C-terminal nuclear localization sequence (e.g., SV40 NLS) and a suitable polyadenylation signal (e.g., TK pA signal); and (ii) an ORF encoding an sgRNA (having a 5′ sequence beginning with G followed by 20 nt of a complementary targeting nucleic acid sequence joined to a 3′ tracr-binding sequence, a linker, and the tracrRNA sequence) under a suitable Polymerase III promoter (e.g., the U6 promoter).

Base Editing

Base editing is the conversion of one target base or base pair into another (e.g. A:T to G: C, C:G to T:A) without requiring the creation and repair of a double-strand break. The base editing may be achieved with the help of DNA and RNA base editors that allow the introduction of point mutations at specific sites, in either DNA or RNA. Generally, DNA base editors may comprise a fusion of a catalytically inactive nuclease and a catalytically active base-modification enzyme that acts on single-stranded DNAs (ssDNAs). RNA base editors may comprise of similar, RNA-specific enzymes. Base editing may increase the efficiency of gene modification, while reducing the off-target and random mutations in the DNA.

DNA base editors are engineered ribonucleoprotein complexes that act as tools for single base substitution in cells and organism. They may be created by fusing an engineered base-modification enzyme and a catalytically deficient CRISPR endonuclease variant that cannot cut dsDNA, but it is able to unfold the dsDNA in a protospacer adjacent motif (PAM) sequence-dependent manner, such that a guide RNA can find its complementary target to indicate a ssDNA scission site. The guide RNA anneals to the complementary DNA, displacing a fragment of ssDNA and directing the CRISPR ‘scissors’ to the base modification site. The cellular repair machinery will repair the nicked non-edited strand using information from the complementary edited template.

So far, two types of DNA editors, cytosine base (CBEs) and adenine base editors (ABEs) have been developed. They were shown to efficiently and precisely edit point mutations in DNA with minimal off-target DNA editing (see Nat Biotechnol. 2017; 35:435-437, Nat Biotechnol. 2017; 35:438-440 and Nat Biotechnol. 2017; 35:475-480, each of which is entirely incorporated herein by reference). However, recent findings indicate that off-target modifications are present in DNA, and that many off-target modifications are also introduced into RNA by DNA base editors.

MG Base Editors

In some aspects, the present disclosure provides for an engineered nucleic acid editing system, comprising: (a) an endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultivated microorganism, wherein the endonuclease is a class 2, type II endonuclease, and wherein the endonuclease is configured to be deficient in nuclease activity; (b) a base editor coupled to the endonuclease; and (c) an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease comprising: (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a ribonucleic acid sequence configured to bind to the endonuclease. In some embodiments, the endonuclease comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof. In some cases, the RuvC domain lacks nuclease activity. In some cases, the endonuclease comprises a nickase mutation. In some cases, the endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid. In some cases the ribonucleic acid sequence configured to bind to the endonuclease comprises a tracr sequence.

In some aspects, the present disclosure provides for an engineered nucleic acid editing system comprising: (a) an endonuclease having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof, wherein the endonuclease is configured to be deficient in nuclease activity; a base editor coupled to the endonuclease; and an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease comprising: (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a ribonucleic acid sequence configured to bind to the endonuclease. In some cases the ribonucleic acid sequence configured to bind to the endonuclease comprises a tracr sequence. In some cases, the RuvC domain lacks nuclease activity. In some cases, the endonuclease comprises a nickase mutation. In some cases, the endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid.

In some aspects, the present disclosure provides for an engineered nucleic acid editing system comprising: (a) an endonuclease configured to bind to a protospacer adjacent motif (PAM) sequence comprising any one of Sequence Numbers: A360-A368 or A598, wherein the endonuclease is a class 2, type II endonuclease, and the endonuclease is configured to be deficient in nuclease activity; and (b) a base editor coupled to the endonuclease; and (c) an engineered guide ribonucleic acid structure configured to form a complex with the endonuclease comprising: (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a ribonucleic acid sequence configured to bind to the endonuclease. In some cases, the ribonucleic acid sequence configured to bind to the endonuclease comprises a tracr sequence. In some cases, the endonuclease comprises a nickase mutation. In some cases, the RuvC domain lacks nuclease activity. In some cases, the endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid.

In some aspects, the present disclosure provides an engineered nucleic acid editing system comprising, (a) an engineered guide ribonucleic acid structure comprising: (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and (ii) a tracr ribonucleic acid sequence configured to bind to an endonuclease, wherein the tracr ribonucleic acid sequence comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity non-degenerate nucleotides of any one of SEQ ID NOs: 88-96, 488-489, or 679-680, or a variant thereof; and a class 2, type II endonuclease configured to bind to the engineered guide ribonucleic acid.

In some embodiments, the endonuclease is configured to bind to a protospacer adjacent motif (PAM) sequence comprising any one of Sequence Numbers: A360, A362, or A368. In some embodiments, the base editor comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 1-51, 57-66, 385-443, 444-475, 594-595, or 599-675, or a variant thereof. In some embodiments, the base editor is an adenine deaminase. In some embodiments, the adenosine deaminase comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NOs: 50-51, 57, 385-443, 448-475, or 595, or a variant thereof. In some embodiments, the base editor is a cytosine deaminase. In some embodiments, the cytosine deaminase comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 594, or 58-66, or a variant thereof.

In some embodiments, the engineered nucleic acid editing system further comprises a uracil DNA glycosylase inhibitor. In some embodiments, the uracil DNA glycosylase inhibitor comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 52-56 or SEQ ID NO: 67, or a variant thereof.

The NLS can comprise any of the sequences in Table 1 below, or a combination thereof:

TABLE 1

Example NLS Sequences that can be used with Effectors According to the

Disclosure

Source
NLS amino acid sequence
SEQ ID NO:

SV40
PKKKRKV
369

nucleoplasmin bipartite NLS
KRPAATKKAGQAKKKK
370

c-myc NLS
PAAKRVKLD
371

c-myc NLS
RQRRNELKRSP
372

hRNPA1 M9 NLS
NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQG
373

GY

Importin-alpha IBB domain
RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQIL
374

KRRNV

Myoma T protein
VSRKRPRP
375

Myoma T protein
PPKKARED
376

p53
PQPKKKPL
377

mouse c-abl IV
SALIKKKKKMAP
378

influenza virus NS1
DRLRR
379

influenza virus NS1
PKQKKRK
380

Hepatitis virus delta antigen
RKLKKKIKKL
381

mouse Mx1 protein
REKKKFLKRR
382

human poly (ADP-ribose)
KRKGDEVDGVDEVAKKKSKK
383

polymerase

steroid hormone receptor (human)
RKCLQAGMNLEARKTKK
384

glucocorticoid

In some embodiments, the endonuclease is covalently coupled directly to the base editor or covalently coupled to the base editor through a linker. In some embodiments, linkers joining any of the enzymes or domains described herein can comprise one or multiple copies of a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SGGSSGGSSGSETPGTSESATPESSGGSSGGS, SGSETPGTSESATPESA, GSGGS, SGSETPGTSESATPES, SGGSS, or GAAA, or any other linker sequence described herein. In some embodiments, a polypeptide comprises the endonuclease and the base editor. In some embodiments, the endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid. In some embodiments, the endonuclease comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof. In some embodiments, the system further comprises a source of Mg²⁺.

In some embodiments, the endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 70, or a variant thereof; the guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO: 88; and the endonuclease is configured to bind to a PAM comprising Sequence Number: A360.

In some embodiments, the endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 71, or a variant thereof; the guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO: 89; and the endonuclease is configured to bind to a PAM comprising Sequence Number: A361.

In some embodiments, the endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 73, or a variant thereof; the guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO: 91; and the endonuclease is configured to bind to a PAM comprising Sequence Number: A363.

In some embodiments, the endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 75, or a variant thereof; the guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO: 93; and the endonuclease is configured to bind to a PAM comprising Sequence Number: A365.

In some embodiments, the endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 76, or a variant thereof; the guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO: 94; and the endonuclease is configured to bind to a PAM comprising Sequence Number: A366.

In some embodiments, the endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 77, or a variant thereof; the guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO: 95; and the endonuclease is configured to bind to a PAM comprising Sequence Number: A367.

In some embodiments, the endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 78, or a variant thereof; the guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to at least one of SEQ ID NO: 96; and the endonuclease is configured to bind to a PAM comprising Sequence Number: A368.

In some embodiments, the base editor comprises an adenine deaminase. In some embodiments, the adenine deaminase comprises SEQ ID NO: 57, or a variant thereof. In some embodiments, the base editor comprises a cytosine deaminase. In some embodiments, the cytosine deaminase comprises SEQ ID NO: 58, or a variant thereof. In some embodiments, the engineered nucleic acid editing system described herein further comprises a uracil DNA glycosylation inhibitor. In some embodiments, the uracil DNA glycosylation inhibitor comprises SEQ ID NO: 67, or a variant thereof.

In some aspects, the present disclosure provides a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes a class 2, type II endonuclease coupled to a base editor, and wherein the endonuclease is derived from an uncultivated microorganism.

In some aspects, the present disclosure provides a nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein the nucleic acid encodes an endonuclease having at least 70%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof coupled to a base editor. In some embodiments, the endonuclease comprises a sequence encoding one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of said endonuclease. In some embodiments, the organism is prokaryotic, bacterial, eukaryotic, fungal, plant, mammalian, rodent, or human.

In some embodiments, the endonuclease comprising a RuvC domain and an HNH domain is covalently coupled directly to the base editor or covalently coupled to the base editor through a linker. In some embodiments, the endonuclease comprising a RuvC domain and an HNH domain comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof.

In some embodiments, the class 2, type II endonuclease is covalently coupled to the base editor or coupled to the base editor through a linker. In some embodiments, the base editor comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to a sequence selected from SEQ ID NOs: 1-51, 57-66, 385-443, 444-475, 594-595, or 599-675, or a variant thereof. In some embodiments, the base editor comprises an adenine deaminase; the double-stranded deoxyribonucleic acid polynucleotide comprises an adenine; and modifying the double-stranded deoxyribonucleic acid polypeptide comprises converting the adenine to guanine. In some embodiments, the adenine deaminase comprises a sequence with at least 95% identity to SEQ ID NO: 57, or a variant thereof.

In some embodiments, the base editor comprises a cytosine deaminase; the double-stranded deoxyribonucleic acid polynucleotide comprises a cytosine; and modifying the double-stranded deoxyribonucleic acid polypeptide comprises converting the cytosine to uracil. In some embodiments, the cytosine deaminase comprises a sequence with at least 95% identity to SEQ ID NO: 58, or a variant thereof. In some embodiments, the cytosine deaminase comprises a sequence with at least 95% identity to any one of SEQ ID NOs: 59-66, or a variant thereof.

In some embodiments, the complex further comprises a uracil DNA glycosylase inhibitor. In some embodiments, the uracil DNA glycosylase inhibitor comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 52-56 or SEQ ID NO: 67, or a variant thereof. In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide comprises a first strand comprising a sequence complementary to a sequence of the engineered guide ribonucleic acid structure and a second strand comprising said PAM. In some embodiments, the PAM is directly adjacent to the 3′ end of the sequence complementary to the sequence of the engineered guide ribonucleic acid structure.

In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide, comprising: an endonuclease comprising a RuvC domain and an HNH domain, wherein the endonuclease is derived from an uncultivated microorganism, wherein the endonuclease is a class 2, type II endonuclease, and wherein the endonuclease is configured to be deficient in nuclease activity. In some embodiments, the endonuclease comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof.

In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide, comprising: an endonuclease having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof, wherein the endonuclease is configured to be deficient in nuclease activity; and a base editor coupled to the endonuclease.

In some aspects, the present disclosure provides an engineered nucleic acid editing polypeptide, comprising: an endonuclease configured to bind to a protospacer adjacent motif (PAM) sequence comprising any one of Sequence Numbers: A360-A368 or A598, wherein the endonuclease is a class 2, type II endonuclease, and wherein the endonuclease is configured to be deficient in nuclease activity; and a base editor coupled to the endonuclease.

In some embodiments, the endonuclease is derived from an uncultivated microorganism. In some embodiments, the endonuclease has less than 80% identity to a Cas9 endonuclease. In some embodiments, the endonuclease further comprises an HNH domain. In some embodiments, the ribonucleic acid sequence configured to bind the endonuclease comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to about 60 to 90 consecutive nucleotides selected from any one of SEQ ID NOs: 88-96, 488-489, or 679-680, or a variant thereof. In some embodiments, the ribonucleic acid sequence configured to bind the endonuclease comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to non-degenerate nucleotides of any one of SEQ ID NOs: 88-96, 488-489, or 679-680, or a variant thereof. In some embodiments, the base editor comprises a sequence with at least 70%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof. In some embodiments, the base editor is an adenine deaminase. In some embodiments, the adenosine deaminase comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 50-51, 57, 385-443, 448-475, or 595, or a variant thereof. In some embodiments, the base editor is a cytosine deaminase. In some embodiments, the cytosine deaminase comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 594, or 58-66, or a variant thereof.

Systems of the present disclosure may be used for various applications, such as, for example, nucleic acid editing (e.g., gene editing), binding to a nucleic acid molecule (e.g., sequence-specific binding). Such systems may be used, for example, for addressing (e.g., removing or replacing) a genetically inherited mutation that may cause a disease in a subject, inactivating a gene in order to ascertain its function in a cell, as a diagnostic tool to detect disease-causing genetic elements (e.g. via cleavage of reverse-transcribed viral RNA or an amplified DNA sequence encoding a disease-causing mutation), as deactivated enzymes in combination with a probe to target and detect a specific nucleotide sequence (e.g. sequence encoding antibiotic resistance int bacteria), to render viruses inactive or incapable of infecting host cells by targeting viral genomes, to add genes or amend metabolic pathways to engineer organisms to produce valuable small molecules, macromolecules, or secondary metabolites, to establish a gene drive element for evolutionary selection, to detect cell perturbations by foreign small molecules and nucleotides as a biosensor.

TABLE 2

Sequence Listing of Protein and Nucleic Acid Sequences Referred to Herein

Other

Sequence
SEQ

Information or

Category
Number
ID NO:
Description
Type
Organism
Sequence

MG66

1
MG66-2 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

2
MG66-3 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

3
MG66-4 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

4
MG66-5 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

5
MG66-6 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

6
MG66-7 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

7
MG66-8 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

8
MG66-9 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

9
MG66-10 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

10
MG66-11 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

11
MG66-12 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

12
MG66-13 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

13
MG66-14 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

14
MG66-15 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

15
MG66-18 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

16
MG66-19 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

17
MG66-20 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

18
MG66-21 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

19
MG66-22 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

20
MG66-23 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

21
MG66-24 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

22
MG66-25 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

23
MG66-26 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

24
MG66-27 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

25
MG66-28 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

26
MG66-29 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

27
MG66-30 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

28
MG66-31 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

29
MG66-32 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

30
MG66-33 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

31
MG66-34 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

32
MG66-35 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

33
MG66-36 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

34
MG66-37 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

35
MG66-38 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

36
MG66-39 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

37
MG66-40 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

38
MG66-41 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

39
MG66-42 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

40
MG66-43 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

41
MG66-44 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

42
MG66-45 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

43
MG66-46 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

44
MG66-47 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

45
MG66-48 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

46
MG66-49 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG66

47
MG66-50 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG67

48
MG67-2 deaminase
protein
unknown
uncultivated

putative

organism

cytidine

deaminase

MG67

49
MG67-4 deaminase
protein
unknown
uncultivated

putative

organism

cytdidine

deaminase

MG68

50
MG68-1 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

51
MG68-2 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG69

52
MG69-1 deaminase
protein
unknown
uncultivated

UGI

organism

MG69

53
MG69-2 deaminase
protein
unknown
uncultivated

UGI

organism

MG69

54
MG69-3 deaminase
protein
unknown
uncultivated

UGI

organism

MG69

55
MG69-4 deaminase
protein
unknown
uncultivated

UGI

organism

MG69

56
MG69-5 deaminase
protein
unknown
uncultivated

UGI

organism

reference

57
P68398 TADA tRNA specific
protein

Escherichia

deaminase

adenosine deaminase

coli

strain

K12

OX

reference

58
P38483 APOBEC 1 C U editing
protein
Rattus

deaminase

deaminase

norvegicus

reference

59
Aicda XM 004869540 cytidine
protein
Heterocephalus

deaminase

deaminase

glaber

reference

60
PmCDA1 L1 AVN88313.1 cytidine
protein
Petromyzo

deaminase

deaminase

marinus

reference

61
PmCDA1 ABO15149.1 cytosine
protein
Petromyzo

deaminase

deaminase

marinus

reference

62
NP 663745.1 DNA dC- dU-editing
protein

Homo

deaminase

deaminase APOBEC-3A isoform a

sapien S

reference

63
Q9GZX7.1 AICDA Single-stranded
protein

Homo

deaminase

DNA cytosine deaminase (Activation-

sapien S

induced cytidine deaminase, Cytidine

aminohydrolase)

reference

64
LpCDA1L1 3 AVN88320.1 cytidine
protein
Lampetra

deaminase

deaminase

planeri

reference

65
LpCDA1L1 1 AVN88319.1 cytidine
protein
Lampetra

deaminase

deaminase

planeri

reference

66
ljCDA1 cytidine deaminase
nucleotide
Lampetra

deaminase

planeri

reference

67
P14739 UNGI BPPB2 (UGI)
protein
Bacillus

UGI

phage

PBS2

adenine

68
linker-His tag-adenine deaminse-
protein
artificial

base

linker-nickase-linker-SV40 NLS

sequence

editor

cytosine

69
linker-His tag-cytidine deaminase-
protein
artificial

base

linker-nickase-linker-uracil glycosylase

sequence

editor

inhibitor-linker-SV40 NLS

nickase

70
nMG1-4 (D9A) nickase
protein
artificial

sequence

nickase

71
nMG1-6 (D13A) nickase
protein
artificial

sequence

nickase

72
nMG3-6 (D13A) nickase
protein
artificial

sequence

nickase

73
nMG3-7 (D12A) nickase
protein
artificial

sequence

nickase

74
nMG3-8 (D13A) nickase
protein
artificial

sequence

nickase

75
nMG4-5 (D17A) nickase
protein
artificial

sequence

nickase

76
nMG14-1 (D23A) nickase
protein
artificial

sequence

nickase

77
nMG15-1 (D8A) nickase
protein
artificial

sequence

nickase

78
nMG18-1 (D12A) nickase
protein
artificial

sequence

target

79
nMG1-4 (D9A) protospacer and PAM
nucleotide
artificial

sequence

for in vitro nickase assay

sequence

target

80
nMG1-6 (D13A) protospacer and PAM
nucleotide
artificial

sequence

for in vitro nickase assay

sequence

target

81
nMG3-6 (D13A) protospacer and PAM
nucleotide
artificial

sequence

for in vitro nickase assay

sequence

target

82
nMG3-7 (D12A) protospacer and PAM
nucleotide
artificial

sequence

for in vitro nickase assay

sequence

target

83
nMG3-8 (D13A) protospacer and PAM
nucleotide
artificial

sequence

for in vitro nickase assay

sequence

target

84
nMG4-5 (D17A) protospacer and PAM
nucleotide
artificial

sequence

for in vitro nickase assay

sequence

target

85
nMG14-1 (D23A) protospacer and
nucleotide
artificial

sequence

PAM for in vitro nickase assay

sequence

target

86
nMG15-1 (D8A) protospacer and PAM
nucleotide
artificial

sequence

for in vitro nickase assay

sequence

target

87
nMG18-1 (D12A) protospacer and
nucleotide
artificial

sequence

PAM for in vitro nickase assay

sequence

single

88
nMG1-4 (D9A) single guide RNA for
nucleotide
artificial

guide

in vitro nickase assay

sequence

RNA

single

89
nMG1-6 (D13A) single guide RNA for
nucleotide
artificial

guide

in vitro nickase assay

sequence

RNA

single

90
nMG3-6 (D13A) single guide RNA for
nucleotide
artificial

guide

in vitro nickase assay

sequence

RNA

single

91
nMG3-7 (D12A) single guide RNA for
nucleotide
artificial

guide

in vitro nickase assay

sequence

RNA

single

92
nMG3-8 (D13A) single guide RNA for
nucleotide
artificial

guide

in vitro nickase assay

sequence

RNA

single

93
nMG4-5 (D17A) single guide RNA for
nucleotide
artificial

guide

in vitro nickase assay

sequence

RNA

single

94
nMG14-1 (D23A) single guide RNA
nucleotide
artificial

guide

for in vitro nickase assay

sequence

RNA

single

95
nMG15-1 (D8A) single guide RNA for
nucleotide
artificial

guide

in vitro nickase assay

sequence

RNA

single

96
nMG18-1 (D12A) single guide RNA
nucleotide
artificial

guide

for in vitro nickase assay

sequence

RNA

spacer

97
MGA1-4 sgRNA spacer 1 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

98
MGA1-4 sgRNA spacer 2 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

99
MGA1-4 sgRNA spacer 3 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

100
MGA1-6 sgRNA spacer 1 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

101
MGA1-6 sgRNA spacer 2 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

102
MGA1-6 sgRNA spacer 3 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

103
MGA3-6 sgRNA spacer 1 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

104
MGA3-6 sgRNA spacer 2 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

105
MGA3-6 sgRNA spacer 3 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

106
MGA3-7 sgRNA spacer 1 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

107
MGA3-7 sgRNA spacer 2 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

108
MGA3-7 sgRNA spacer 3 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

109
MGA3-8 sgRNA spacer 1 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

110
MGA3-8 sgRNA spacer 2 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

111
MGA3-8 sgRNA spacer 3 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

112
MGA4-5 sgRNA spacer 1 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

113
MGA4-5 sgRNA spacer 2 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

114
MGA4-5 sgRNA spacer 3 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

115
MGA14-1 sgRNA spacer 1 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

116
MGA14-1 sgRNA spacer 2 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

117
MGA14-1 sgRNA spacer 3 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

118
MGA15-1 sgRNA spacer 1 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

119
MGA15-1 sgRNA spacer 2 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

120
MGA15-1 sgRNA spacer 3 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

121
MGA18-1 sgRNA spacer 1 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

122
MGA18-1 sgRNA spacer 2 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

123
MGA18-1 sgRNA spacer 3 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

124
ABE8.17m sgRNA spacer 1 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

125
ABE8.17m sgRNA spacer 2 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

126
ABE8.17m sgRNA spacer 3 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

127
MGC1-4 sgRNA spacer 1 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

128
MGC1-4 sgRNA spacer 2 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

129
MGC1-4 sgRNA spacer 3 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

130
MGC1-6 sgRNA spacer 1 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

131
MGC1-6 sgRNA spacer 2 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

132
MGC1-6 sgRNA spacer 3 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

133
MGC3-6 sgRNA spacer 1 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

134
MGC3-6 sgRNA spacer 2 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

135
MGC3-6 sgRNA spacer 3 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

136
MGC3-7 sgRNA spacer 1 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

137
MGC3-7 sgRNA spacer 2 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

138
MGC3-7 sgRNA spacer 3 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

139
MGC3-8 sgRNA spacer 1 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

140
MGC3-8 sgRNA spacer 2 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

141
MGC3-8 sgRNA spacer 3 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

142
MGC4-5 sgRNA spacer 1 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

143
MGC4-5 sgRNA spacer 2 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

144
MGC4-5 sgRNA spacer 3 (targeting E.
nucleotide
artificial

coli lacZ)

sequence

spacer

145
MGC14-1 sgRNA spacer 1 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

146
MGC14-1 sgRNA spacer 2 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

147
MGC14-1 sgRNA spacer 3 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

148
MGC15-1 sgRNA spacer 1 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

149
MGC15-1 sgRNA spacer 2 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

150
MGC15-1 sgRNA spacer 3 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

151
MGC18-1 sgRNA spacer 1 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

152
MGC18-1 sgRNA spacer 2 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

153
MGC18-1 sgRNA spacer 3 (targeting
nucleotide
artificial

E. coli lacZ)

sequence

spacer

154
BE3 sgRNA spacer 1 (targeting E. coli
nucleotide
artificial

lacZ)

sequence

spacer

155
BE3 sgRNA spacer 2 (targeting E. coli
nucleotide
artificial

lacZ)

sequence

spacer

156
BE3 sgRNA spacer 3 (targeting E. coli
nucleotide
artificial

lacZ)

sequence

primer

157
Site-directed mutagenesis of MG1-4
nucleotide
artificial

(D9A)

sequence

primer

158
Site-directed mutagenesis of MG1-4
nucleotide
artificial

(D9A)

sequence

primer

159
Site-directed mutagenesis of MG1-6
nucleotide
artificial

(D13A)

sequence

primer

160
Site-directed mutagenesis of MG1-6
nucleotide
artificial

(D13A)

sequence

primer

161
Site-directed mutagenesis of MG3-6
nucleotide
artificial

(D13A)

sequence

primer

162
Site-directed mutagenesis of MG3-6
nucleotide
artificial

(D13A)

sequence

primer

163
Site-directed mutagenesis of MG3-7
nucleotide
artificial

(D12A)

sequence

primer

164
Site-directed mutagenesis of MG3-7
nucleotide
artificial

(D12A)

sequence

primer

165
Site-directed mutagenesis of MG3-8
nucleotide
artificial

(D13A)

sequence

primer

166
Site-directed mutagenesis of MG3-8
nucleotide
artificial

(D13A)

sequence

primer

167
Site-directed mutagenesis of MG4-5
nucleotide
artificial

(D17A)

sequence

primer

168
Site-directed mutagenesis of MG4-5
nucleotide
artificial

(D17A)

sequence

primer

169
Site-directed mutagenesis of MG14-1
nucleotide
artificial

(D23A)

sequence

primer

170
Site-directed mutagenesis of MG14-1
nucleotide
artificial

(D23A)

sequence

primer

171
Site-directed mutagenesis of MG15-1
nucleotide
artificial

(D8A)

sequence

primer

172
Site-directed mutagenesis of MG15-1
nucleotide
artificial

(D8A)

sequence

primer

173
Site-directed mutagenesis of MG18-1
nucleotide
artificial

(D12A)

sequence

primer

174
Site-directed mutagenesis of MG18-1
nucleotide
artificial

(D12A)

sequence

primer

175
Site-directed mutagenesis of SpCas9
nucleotide
artificial

(D10A)

sequence

primer

176
Site-directed mutagenesis of SpCas9
nucleotide
artificial

(D10A)

sequence

primer

177
For lacZ sequencing
nucleotide
artificial

sequence

primer

178
For lacZ sequencing
nucleotide
artificial

sequence

primer

179
Amplify the fragment for nickase assay
nucleotide
artificial

sequence

primer

180
Amplify the fragment for nickase assay
nucleotide
artificial

sequence

primer

181
Amplify T7 promoter-His tag-adenine
nucleotide
artificial

deaminase for MGA entry plasmid

sequence

primer

182
Amplify T7 promoter-His tag-adenine
nucleotide
artificial

deaminase for MGA entry plasmid

sequence

primer

183
Amplify SV40 NLS-vector backbone
nucleotide
artificial

for MGA entry plasmid

sequence

primer

184
Amplify SV40 NLS-vector backbone
nucleotide
artificial

for MGA entry plasmid

sequence

primer

185
Amplify vector backbone for MGA
nucleotide
artificial

entry plasmid

sequence

primer

186
Amplify vector backbone for MGA
nucleotide
artificial

entry plasmid

sequence

primer

187
Amplify T7 promoter-His-tag-cytosine
nucleotide
artificial

deaminase for MGC entry plasmid

sequence

primer

188
Amplify T7 promoter-His-tag-cytosine
nucleotide
artificial

deaminase for MGC entry plasmid

sequence

primer

189
Amplify UGI-SV40 NLS for MGC
nucleotide
artificial

entry plasmid

sequence

primer

190
Amplify UGI-SV40 NLS for MGC
nucleotide
artificial

entry plasmid

sequence

primer

191
Amplify SV40 NLS-vector backbone
nucleotide
artificial

for MGC entry plasmid

sequence

primer

192
Amplify SV40 NLS-vector backbone
nucleotide
artificial

for MGC entry plasmid

sequence

primer

193
Amplify vector backbone for MGC
nucleotide
artificial

entry plasmid

sequence

primer

194
Amplify vector backbone for MGC
nucleotide
artificial

entry plasmid

sequence

primer

195
Amplify nMG1-4 (D9A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

196
Amplify nMG1-4 (D9A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

197
Amplify nMG1-6 (D13A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

198
Amplify nMG1-6 (D13A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

199
Amplify nMG3-6 (D13A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

200
Amplify nMG3-6 (D13A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

201
Amplify nMG3-7 (D12A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

202
Amplify nMG3-7 (D12A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

203
Amplify nMG3-8 (D13A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

204
Amplify nMG3-8 (D13A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

205
Amplify nMG4-5 (D17A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

206
Amplify nMG4-5 (D17A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

207
Amplify nMG14-1 (D23A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

208
Amplify nMG14-1 (D23A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

209
Amplify nMG15-1 (D8A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

210
Amplify nMG15-1 (D8A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

211
Amplify nMG18-1 (D12A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

212
Amplify nMG18-1 (D12A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

213
Amplify SpCas9 (D10A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

214
Amplify SpCas9 (D10A) for pMGA
nucleotide
artificial

expression plasmid

sequence

primer

215
Amplify nMG1-4 (D9A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

216
Amplify nMG1-4 (D9A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

217
Amplify nMG1-6 (D13A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

218
Amplify nMG1-6 (D13A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

219
Amplify nMG3-6 (D13A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

220
Amplify nMG3-6 (D13A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

221
Amplify nMG3-7 (D12A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

222
Amplify nMG3-7 (D12A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

223
Amplify nMG3-8 (D13A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

224
Amplify nMG3-8 (D13A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

225
Amplify nMG4-5 (D17A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

226
Amplify nMG4-5 (D17A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

227
Amplify nMG14-1 (D23A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

228
Amplify nMG14-1 (D23A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

229
Amplify nMG15-1 (D8A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

230
Amplify nMG15-1 (D8A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

231
Amplify nMG18-1 (D12A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

232
Amplify nMG18-1 (D12A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

233
Amplify SpCas9 (D10A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

234
Amplify SpCas9 (D10A) for pMGC
nucleotide
artificial

expression plasmid

sequence

primer

235
Amplify MGA1-4_sgRNA spacer 1
nucleotide
artificial

sequence

primer

236
Amplify MGA1-4_sgRNA spacer 1
nucleotide
artificial

sequence

primer

237
Amplify MGA1-4_sgRNA spacer 2
nucleotide
artificial

sequence

primer

238
Amplify MGA1-4_sgRNA spacer 2
nucleotide
artificial

sequence

primer

239
Amplify MGA1-4_sgRNA spacer 3
nucleotide
artificial

sequence

primer

240
Amplify MGA1-4_sgRNA spacer 3
nucleotide
artificial

sequence

primer

241
Amplify MGA1-6_sgRNA spacer 1
nucleotide
artificial

sequence

primer

242
Amplify MGA1-6_sgRNA spacer 1
nucleotide
artificial

sequence

primer

243
Amplify MGA1-6_sgRNA spacer 2
nucleotide
artificial

sequence

primer

244
Amplify MGA1-6_sgRNA spacer 2
nucleotide
artificial

sequence

primer

245
Amplify MGA1-6_sgRNA spacer 3
nucleotide
artificial

sequence

primer

246
Amplify MGA1-6_sgRNA spacer 3
nucleotide
artificial

sequence

primer

247
Amplify MGA3-6_sgRNA spacer 1
nucleotide
artificial

sequence

primer

248
Amplify MGA3-6_sgRNA spacer 1
nucleotide
artificial

sequence

primer

249
Amplify MGA3-6_sgRNA spacer 2
nucleotide
artificial

sequence

primer

250
Amplify MGA3-6_sgRNA spacer 2
nucleotide
artificial

sequence

primer

251
Amplify MGA3-6_sgRNA spacer 3
nucleotide
artificial

sequence

primer

252
Amplify MGA3-6_sgRNA spacer 3
nucleotide
artificial

sequence

primer

253
Amplify MGA3-7_sgRNA spacer 1
nucleotide
artificial

sequence

primer

254
Amplify MGA3-7_sgRNA spacer 1
nucleotide
artificial

sequence

primer

255
Amplify MGA3-7_sgRNA spacer 2
nucleotide
artificial

sequence

primer

256
Amplify MGA3-7_sgRNA spacer 2
nucleotide
artificial

sequence

primer

257
Amplify MGA3-7_sgRNA spacer 3
nucleotide
artificial

sequence

primer

258
Amplify MGA3-7_sgRNA spacer 3
nucleotide
artificial

sequence

primer

259
Amplify MGA4-5_sgRNA spacer 1
nucleotide
artificial

sequence

primer

260
Amplify MGA4-5_sgRNA spacer 1
nucleotide
artificial

sequence

primer

261
Amplify MGA4-5_sgRNA spacer 2
nucleotide
artificial

sequence

primer

262
Amplify MGA4-5_sgRNA spacer 2
nucleotide
artificial

sequence

primer

263
Amplify MGA4-5_sgRNA spacer 3
nucleotide
artificial

sequence

primer

264
Amplify MGA4-5_sgRNA spacer 3
nucleotide
artificial

sequence

primer

265
Amplify MGA14-1_sgRNA spacer 1
nucleotide
artificial

sequence

primer

266
Amplify MGA14-1_sgRNA spacer 1
nucleotide
artificial

sequence

primer

267
Amplify MGA14-1_sgRNA spacer 2
nucleotide
artificial

sequence

primer

268
Amplify MGA14-1_sgRNA spacer 2
nucleotide
artificial

sequence

primer

269
Amplify MGA14-1_sgRNA spacer 3
nucleotide
artificial

sequence

primer

270
Amplify MGA14-1_sgRNA spacer 3
nucleotide
artificial

sequence

primer

271
Amplify MGA15-1_sgRNA spacer 1
nucleotide
artificial

sequence

primer

272
Amplify MGA15-1_sgRNA spacer 1
nucleotide
artificial

sequence

primer

273
Amplify MGA15-1_sgRNA spacer 2
nucleotide
artificial

sequence

primer

274
Amplify MGA15-1_sgRNA spacer 2
nucleotide
artificial

sequence

primer

275
Amplify MGA15-1_sgRNA spacer 3
nucleotide
artificial

sequence

primer

276
Amplify MGA15-1_sgRNA spacer 3
nucleotide
artificial

sequence

primer

277
Amplify MGA18-1_sgRNA spacer 1
nucleotide
artificial

sequence

primer

278
Amplify MGA18-1_sgRNA spacer 1
nucleotide
artificial

sequence

primer

279
Amplify MGA18-1_sgRNA spacer 2
nucleotide
artificial

sequence

primer

280
Amplify MGA18-1_sgRNA spacer 2
nucleotide
artificial

sequence

primer

281
Amplify MGA18-1_sgRNA spacer 3
nucleotide
artificial

sequence

primer

282
Amplify MGA18-1_sgRNA spacer 3
nucleotide
artificial

sequence

primer

283
Amplify ABE8.17m_sgRNA spacer 1
nucleotide
artificial

sequence

primer

284
Amplify ABE8.17m_sgRNA spacer 1
nucleotide
artificial

sequence

primer

285
Amplify ABE8.17m_sgRNA spacer 2
nucleotide
artificial

sequence

primer

286
Amplify ABE8.17m_sgRNA spacer 2
nucleotide
artificial

sequence

primer

287
Amplify ABE8.17m_sgRNA spacer 3
nucleotide
artificial

sequence

primer

288
Amplify ABE8.17m_sgRNA spacer 3
nucleotide
artificial

sequence

primer

289
Amplify MGC1-4_spacer 1
nucleotide
artificial

sequence

primer

290
Amplify MGC1-4_spacer 1
nucleotide
artificial

sequence

primer

291
Amplify MGC1-4_spacer 2
nucleotide
artificial

sequence

primer

292
Amplify MGC1-4_spacer 2
nucleotide
artificial

sequence

primer

293
Amplify MGC1-4_spacer 3
nucleotide
artificial

sequence

primer

294
Amplify MGC1-4_spacer 3
nucleotide
artificial

sequence

primer

295
Amplify MGC1-6_spacer 1
nucleotide
artificial

sequence

primer

296
Amplify MGC1-6_spacer 1
nucleotide
artificial

sequence

primer

297
Amplify MGC1-6_spacer 2
nucleotide
artificial

sequence

primer

298
Amplify MGC1-6_spacer 2
nucleotide
artificial

sequence

primer

299
Amplify MGC1-6_spacer 3
nucleotide
artificial

sequence

primer

300
Amplify MGC1-6_spacer 3
nucleotide
artificial

sequence

primer

301
Amplify MGC3-6_spacer 1
nucleotide
artificial

sequence

primer

302
Amplify MGC3-6_spacer 1
nucleotide
artificial

sequence

primer

303
Amplify MGC3-6_spacer 2
nucleotide
artificial

sequence

primer

304
Amplify MGC3-6_spacer 2
nucleotide
artificial

sequence

primer

305
Amplify MGC3-6_spacer 3
nucleotide
artificial

sequence

primer

306
Amplify MGC3-6_spacer 3
nucleotide
artificial

sequence

primer

307
Amplify MGC3-7_spacer 1
nucleotide
artificial

sequence

primer

308
Amplify MGC3-7_spacer 1
nucleotide
artificial

sequence

primer

309
Amplify MGC3-7_spacer 2
nucleotide
artificial

sequence

primer

310
Amplify MGC3-7_spacer 2
nucleotide
artificial

sequence

primer

311
Amplify MGC3-7_spacer 3
nucleotide
artificial

sequence

primer

312
Amplify MGC3-7_spacer 3
nucleotide
artificial

sequence

primer

313
Amplify MGC4-5_spacer 1
nucleotide
artificial

sequence

primer

314
Amplify MGC4-5_spacer 1
nucleotide
artificial

sequence

primer

315
Amplify MGC4-5_spacer 2
nucleotide
artificial

sequence

primer

316
Amplify MGC4-5_spacer 2
nucleotide
artificial

sequence

primer

317
Amplify MGC4-5_spacer 3
nucleotide
artificial

sequence

primer

318
Amplify MGC4-5_spacer 3
nucleotide
artificial

sequence

primer

319
Amplify MGC14-1_spacer 1
nucleotide
artificial

sequence

primer

320
Amplify MGC14-1_spacer 1
nucleotide
artificial

sequence

primer

321
Amplify MGC14-1_spacer 2
nucleotide
artificial

sequence

primer

322
Amplify MGC14-1_spacer 2
nucleotide
artificial

sequence

primer

323
Amplify MGC14-1_spacer 3
nucleotide
artificial

sequence

primer

324
Amplify MGC14-1_spacer 3
nucleotide
artificial

sequence

primer

325
Amplify MGC15-1_spacer 1
nucleotide
artificial

sequence

primer

326
Amplify MGC15-1_spacer 1
nucleotide
artificial

sequence

primer

327
Amplify MGC15-1_spacer 2
nucleotide
artificial

sequence

primer

328
Amplify MGC15-1_spacer 2
nucleotide
artificial

sequence

primer

329
Amplify MGC15-1_spacer 3
nucleotide
artificial

sequence

primer

330
Amplify MGC15-1_spacer 3
nucleotide
artificial

sequence

primer

331
Amplify MGC18-1_spacer 1
nucleotide
artificial

sequence

primer

332
Amplify MGC18-1_spacer 1
nucleotide
artificial

sequence

primer

333
Amplify MGC18-1_spacer 2
nucleotide
artificial

sequence

primer

334
Amplify MGC18-1_spacer 2
nucleotide
artificial

sequence

primer

335
Amplify MGC18-1_spacer 3
nucleotide
artificial

sequence

primer

336
Amplify MGC18-1_spacer 3
nucleotide
artificial

sequence

primer

337
Amplify BE3_sgRNA spacer 1
nucleotide
artificial

sequence

primer

338
Amplify BE3_sgRNA spacer 1
nucleotide
artificial

sequence

primer

339
Amplify BE3_sgRNA spacer 2
nucleotide
artificial

sequence

primer

340
Amplify BE3_sgRNA spacer 2
nucleotide
artificial

sequence

primer

341
Amplify BE3_sgRNA spacer 3
nucleotide
artificial

sequence

primer

342
Amplify BE3_sgRNA spacer 3
nucleotide
artificial

sequence

primer

343
For lacZ sequencing
nucleotide
artificial

sequence

primer

344
For lacZ sequencing
nucleotide
artificial

sequence

primer

345
For lacZ sequencing
nucleotide
artificial

sequence

primer

346
Amplify sgRNA expression cassette
nucleotide
artificial

sequence

primer

347
Amplify sgRNA expression cassette
nucleotide
artificial

sequence

primer

348
Amplify MGA3-8_sgRNA spacer 1
nucleotide
artificial

sequence

primer

349
Amplify MGA3-8_sgRNA spacer 1
nucleotide
artificial

sequence

primer

350
Amplify MGA3-8_sgRNA spacer 2
nucleotide
artificial

sequence

primer

351
Amplify MGA3-8_sgRNA spacer 2
nucleotide
artificial

sequence

primer

352
Amplify MGA3-8_sgRNA spacer 3
nucleotide
artificial

sequence

primer

353
Amplify MGA3-8_sgRNA spacer 3
nucleotide
artificial

sequence

primer

354
Amplify MGC3-8_sgRNA spacer 1
nucleotide
artificial

sequence

primer

355
Amplify MGC3-8_sgRNA spacer 1
nucleotide
artificial

sequence

primer

356
Amplify MGC3-8_sgRNA spacer 2
nucleotide
artificial

sequence

primer

357
Amplify MGC3-8_sgRNA spacer 2
nucleotide
artificial

sequence

primer

358
Amplify MGC3-8_sgRNA spacer 3
nucleotide
artificial

sequence

primer

359
Amplify MGC3-8_sgRNA spacer 3
nucleotide
artificial

sequence

PAM
A360

nMG1-4 (D9A) nickase PAM
nucleotide
artificial
nRRR

sequence

PAM
A361

nMG1-6 (D13A) nickase PAM
nucleotide
artificial
nnRRAY

sequence

PAM
A362

nMG3-6 (D13A) nickase PAM
nucleotide
artificial
nnRGGnT

sequence

PAM
A363

nMG3-7 (D12A) nickase PAM
nucleotide
artificial
nnRnYAY

sequence

PAM
A364

nMG3-8 (D13A) nickase PAM
nucleotide
artificial
nnRGGTY

sequence

PAM
A365

nMG4-5 (D17A) nickase PAM
nucleotide
artificial
nRCCV

sequence

PAM
A366

nMG14-1 (D23A) nickase PAM
nucleotide
artificial
nRnnGRKA

sequence

PAM
A367

nMG15-1 (D8A) nickase PAM
nucleotide
artificial
nnnnC

sequence

PAM

368
nMG18-1 (D12A) nickase PAM
nucleotide
artificial
nRWART

sequence

NLS

369
SV40
nucleotide
artificial
Nuclear

sequence
localization

sequence

NLS

370
nucleoplasmin bipartite NLS
nucleotide

Nuclear

localization

sequence

NLS

371
c-myc NLS
nucleotide

Nuclear

localization

sequence

NLS

372
c-myc NLS
nucleotide

Nuclear

localization

sequence

NLS

373
bRNPA1 M9 NLS
nucleotide

Nuclear

localization

sequence

NLS

374
Importin-alpha IBB domain
nucleotide

Nuclear

localization

sequence

NLS

375
Myoma T protein
nucleotide

Nuclear

localization

sequence

NLS

376
Myoma T protein
nucleotide

Nuclear

localization

sequence

NLS

377
p53
nucleotide

Nuclear

localization

sequence

NLS

378
mouse c-abl IV
nucleotide

Nuclear

localization

sequence

NLS

379
influenza virus NS1
nucleotide

Nuclear

localization

sequence

NLS

380
influenza virus NS1
nucleotide

Nuclear

localization

sequence

NLS

381
Hepatitis virus delta antigen
nucleotide

Nuclear

localization

sequence

NLS

382
mouse Mx1 protein
nucleotide

Nuclear

localization

sequence

NLS

383
human poly(ADP-ribose) polymerase
nucleotide

Nuclear

localization

sequence

NLS

384
steroid hormone receptor (human)
nucleotide

Nuclear

glucocorticoid

localization

sequence

MG68

385
MG68-3 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

386
MG68-4 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

387
MG68-5 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

388
MG68-6 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

389
MG68-7 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

390
MG68-8 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

391
MG68-9 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

392
MG68-10 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like

MG68

393
MG68-11 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

394
MG68-12 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

395
MG68-13 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

396
MG68-14 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

397
MG68-15 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

398
MG68-16 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

399
MG68-17 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

400
MG68-18 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

401
MG68-19 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

402
MG68-20 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

403
MG68-21 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

404
MG68-22 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

405
MG68-23 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

406
MG68-24 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

407
MG68-25 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

408
MG68-26 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

409
MG68-27 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

410
MG68-28 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

411
MG68-29 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

412
MG68-30 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

413
MG68-31 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

414
MG68-32 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

415
MG68-33 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

416
MG68-34 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

417
MG68-35 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like

MG68

418
MG68-36 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

419
MG68-37 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like

MG68

420
MG68-38 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

421
MG68-39 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

422
MG68-40 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

423
MG68-41 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

424
MG68-42 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

425
MG68-43 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

426
MG68-44 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

427
MG68-45 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

428
MG68-46 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

429
MG68-47 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

430
MG68-48 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

431
MG68-49 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

432
MG68-50 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

433
MG68-51 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

434
MG68-52 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

435
MG68-53 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

436
MG68-54 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

437
MG68-55 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

438
MG68-56 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

439
MG68-57 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

440
MG68-58 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

441
MG68-59 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

442
MG68-60 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG68

443
MG68-61 deaminase
protein
unknown
uncultivated

putative

organism

adenosine

deaminase

(TadA-

like)

MG121

444
MG121-1 deaminase
protein
unknown
uncultivated

deaminase

organism

MG121

445
MG121-2 deaminase
protein
unknown
uncultivated

deaminase

organism

MG121

446
MG121-3 deaminase
protein
unknown
uncultivated

deaminase

organism

MG121

447
MG121-4 deaminase
protein
unknown
uncultivated

deaminase

organism

MG68

448
MG68-4_V1
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

449
MG68-4_V2
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

450
MG68-4_V3
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

451
MG68-4_V4
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

452
MG68-4_V5
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

453
MG68-4_V6
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

454
MG68-4_V7
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

455
MG68-4_V8
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

456
MG68-4_V9
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

457
MG68-4_V10
protein
artificial

putative

adenosine

sequence

deaminase

(TadA-

like)

MG68

458
MG68-4_V11
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

459
MG68-4_V12
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

460
MG68-4_V13
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

461
MG68-4_V14
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

462
MG68-4_V15
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

463
MG68-4_V16
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

464
MG68-4_V17
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

465
MG68-4_V18
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

466
MG68-4_V19
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

467
MG68-4_V20
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

468
MG68-4_V21
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

469
MG68-4_V22
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

470
MG68-4_V23
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

471
MG68-4_V24
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

472
MG68-4_V25
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

473
MG68-4_V26
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

474
MG68-4_V27
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

MG68

475
MG68-4_V28
protein
artificial

putative

sequence

adenosine

deaminase

(TadA-

like)

adenine

476
MG68-4_V1-nMG34-1 (D10A)
protein
artificial

base

sequence

editor

adenine

477
MG68-4_V1-nSpCas9 (D10A)
protein
artificial

base

sequence

editor

cytosine

478
rAPOBEC1-nMG15-1 (D8A)
protein
artificial

base

sequence

editor

cytosine

479
rAPOBEC1-nMG15-1 (D&A)-UGI
protein
artificial

base

(PBS1)

sequence

editor

cytosine

480
rAPOBEC1-nMG15-1 (D8A)-MG69-1
protein
artificial

base

sequence

editor

cytosine

481
rAPOBEC1-nMG15-1 (D8A)-MG69-2
protein
artificial

base

sequence

editor

cytosine

482
rAPOBEC1-nMG15-1 (D8A)-MG69-
protein
artificial

base

sequence

editor

Plasmid

483
pET21-CAT (H193Y)-sgRNA-TadA-
nucleotide
artificial

nSpCas9 (D10A)

sequence

Plasmid

484
pET21-sgRNA-TadA (ABE8.17m)-
nucleotide
artificial

nMG34-1 (D10A)

sequence

Plasmid

485
pET21-sgRNA-rAPOBEC1-nMG34-1
nucleotide
artificial

(DIOA)-UGI (PBS1)

sequence

Plasmid

486
pET21-CAT (H193Y)-sgRNA-MG68-
nucleotide
artificial

4 (D109N)-nMG34-1 (D10A)

sequence

Plasmid

487
pET21-CAT (H193Y)-sgRNA-MG68-
nucleotide
artificial

4 (D109N)-nSpCas9 (D10A)

sequence

sgRNA

488
MG15-1
nucleotide
artificial

scaffold

sequence

sequence

sgRNA

489
MG34-1
nucleotide
artificial

scaffold

sequence

sequence

spacer

490
rAPOBEC1-nMG15-1 (D8A) in E. coli
nucleotide
artificial

sequence

spacer

491
rAPOBEC1-nMG15-1 (D8A)-UGI
nucleotide
artificial

(PBS1) in E. coli

sequence

spacer

492
rAPOBEC1-nMG15-1 (D8A)-MG69-1
nucleotide
artificial

in E. coli

sequence

spacer

493
rAPOBEC1-nMG15-1 (D8A)-MG69-2
nucleotide
artificial

in E. coli

sequence

spacer

494
rAPOBEC1-nMG15-1 (D8A)-MG69-3
nucleotide
artificial

in E. coli

sequence

spacer

495
rAPOBEC1-nSpCas9 (D10A)-UGI
nucleotide
artificial

(PBS1) in HEK293T

sequence

spacer

496
rAPOBEC1-nSpCas9 (D10A) in
nucleotide
artificial

HEK293T

sequence

spacer

497
rAPOBEC1-nSpCas9 (D10A)~MG69-1
nucleotide
artificial

in HEK293T

sequence

spacer

498
rAPOBEC1-nSpCas9 (D10A)-MG69-2
nucleotide
artificial

in HEK293T

sequence

spacer

499
A0A2K5RDN7-nMG1-4 (D9A)-
nucleotide
artificial

MG69-1_site 1 in HEK293T

sequence

spacer

500
A0A2K5RDN7-nMG1-4 (D9A)-
nucleotide
artificial

MG69-1_site 2 in HEK293T

sequence

spacer

501
A0A2K5RDN7-nMG1-4 (D9A)-
nucleotide
artificial

MG69-1_site 3 in HEK293T

sequence

spacer

502
A0A2K5RDN7-nMG1-4 (D9A)-
nucleotide
artificial

MG69-1_site 4 in HEK293T

sequence

spacer

503
A0A2K5RDN7-nMG3-6 (D13A)-
nucleotide
artificial

MG69-1_site 1 in HEK293T

sequence

spacer

504
A0A2K5RDN7-nMG3-6 (D13A)-
nucleotide
artificial

MG69-1_site 2 in HEK293T

sequence

spacer

505
A0A2KSRDN7-nMG3-6 (D13A)-
nucleotide
artificial

MG69-1_site 3 in HEK293T

sequence

spacer

506
A0A2K5RDN7-nMG3-6 (D13A)-
nucleotide
artificial

MG69-1_site 4 in HEK293T

sequence

spacer

507
A0A2K5RDN7-nMG3-6 (D13A)-
nucleotide
artificial

MG69-1_site 5 in HEK293T

sequence

spacer

508
A0A2K5RDN7-nMG3-6 (D13A)-
nucleotide
artificial

MG69-1_site 6 in HEK293T

sequence

spacer

509
A0A2K5RDN7-nMG3-6 (D13A)-
nucleotide
artificial

MG69-1_site 7 in HEK293T

sequence

spacer

510
A0A2K5RDN7-nMG4-2 (D28A)-
nucleotide
artificial

MG69-1_site 1 in HEK293T

sequence

spacer

511
A0A2K5RDN7-nMG4-2 (D28A)-
nucleotide
artificial

MG69-1_site 2 in HEK293T

sequence

spacer

512
A0A2K5RDN7-nMG4-2 (D28A)-
nucleotide
artificial

MG69-1_site 3 in HEK293T

sequence

spacer

513
A0A2K5RDN7-nMG4-2 (D28A)-
nucleotide
artificial

MG69-1_site 4 in HEK293T

sequence

spacer

514
A0A2K5RDN7-nMG18-1 (D12A)-
nucleotide
artificial

MG69-1_site 1 in HEK293T

sequence

spacer

515
A0A2K5RDN7-nMG18-1 (D12A)-
nucleotide
artificial

MG69-1_site 2 in HEK293T

sequence

spacer

516
A0A2K5RDN7-nMG18-1 (D12A)-
nucleotide
artificial

MG69-1_site 3 in HEK293T

sequence

spacer

517
A0A2K5RDN7-nMG18-1 (D12A)-
nucleotide
artificial

MG69-1_site 4 in HEK293T

sequence

spacer

518
A0A2K5RDN7-nSpCas9 (D10A)-
nucleotide
artificial

MG69-1_site 1 in HEK293T

sequence

spacer

519
A0A2K5RDN7-nSpCas9 (D10A)-
nucleotide
artificial

MG69-1_site 2 in HEK293T

sequence

spacer

520
A0A2K5RDN7-nSpCas9 (D10A)-
nucleotide
artificial

MG69-1_site 3 in HEK293T

sequence

spacer

521
A0A2K5RDN7-nSpCas9 (D10A)-
nucleotide
artificial

MG69-1_site 4 in HEK293T

sequence

spacer

522
A0A2K5RDN7-nSpCas9 (D10A)-
nucleotide
artificial

MG69-1_site 5 in HEK293T

sequence

primer

523
Forward primer used to amplify lacZ of
nucleotide
artificial

E. coli and Sanger sequencing

sequence

primer

524
Reverse primer used to amplify lacZ of
nucleotide
artificial

E. coli and Sanger sequencing

sequence

primer

525
Sanger sequencing of base edit of lacZ
nucleotide
artificial

of E. coli

sequence

primer

526
Sanger sequencing of base edit of lacZ
nucleotide
artificial

of E. coli

sequence

primer

527
Sanger sequencing of base edit of lacZ
nucleotide
artificial

of E. coli

sequence

primer

528
Sanger sequencing of base edit of lacZ
nucleotide
artificial

of E. coli

sequence

primer

529
Sanger sequencing of base edit of lacZ
nucleotide
artificial

of E. coli

sequence

primer

530
Sanger sequencing of base edit of lacZ
nucleotide
artificial

of E. coli

sequence

primer

531
Sanger sequencing of base edit of lacZ
nucleotide
artificial

of E. coli

sequence

primer

532
Forward primer used to amplify CAT
nucleotide
artificial

(H193Y) of CAT (H193Y)-sgRNA-

sequence

MG68-4 variant-nSpCas9 (D10A)

primer

533
Reverse primer used to amplify CAT
nucleotide
artificial

(H193Y) of CAT (H193Y)-sgRNA-

sequence

MG68-4 variant-nSpCas9

primer

534
Forward primer used to amplify CAT
nucleotide
artificial

(H193Y) of CAT (H193Y)-sgRNA-

sequence

MG68-4 variant-nMG34-1 (D10A)

primer

535
Sanger sequencing primer of CAT
nucleotide
artificial

(H193Y)

sequence

primer

536
Forward primer used to amplify BE3
nucleotide
artificial

target site in HEK293T cells and

sequence

Sanger sequencing

primer

537
Reverse primer used to amplify BE3
nucleotide
artificial

target site in HEK293T cells for Sanger

sequence

sequencing

primer

538
Forward primer used to amplify
nucleotide
artificial

A0A2KSRDN7-nSpCas9 (D10A)-

sequence

MG69-1 site 1 in HEK293T cells

primer

539
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nSpCas9 (D10A)-

sequence

MG69-1_site 1 in HEK293T cells

primer

540
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nSpCas9 (D10A)-

sequence

MG69-1 site 2 in HEK293T cells

primer

541
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nSpCas9 (D10A)-

sequence

MG69-1 site 2 in HEK293T cells

primer

542
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nSpCas9 (D10A)-

sequence

MG69-1 site 3 in HEK293T cells

primer

543
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nSpCas9 (D10A)-

sequence

MG69-1 site 3 in HEK293T cells

primer

544
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nSpCas9 (D10A)-

sequence

MG69-1 site 4 in HEK293T cells

primer

545
Reverse primer used to amplify
nucleotide
artificial

A0A2KSRDN7-nSpCas9 (D10A)-

sequence

MG69-1_site 4 in HEK293T cells

primer

546
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nSpCas9 (D10A)-

sequence

MG69-1_site 5 in HEK293T cells

primer

547
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nSpCas9 (D10A)-

sequence

MG69-1_site 5 in HEK293T cells

primer

548
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG1-4 (D9A)-

sequence

MG69-1_site 1 in HEK293T cells

primer

549
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG1-4 (D9A)-

sequence

MG69-1_site 1 in HEK293T cells

primer

550
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG1-4 (D9A)-

sequence

MG69-1_site 2 in HEK293T cells

primer

551
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG1-4 (D9A)-

sequence

MG69-1_site 2 in HEK293T cells

primer

552
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG1-4 (D9A)-

sequence

MG69-1_site 3 in HEK293T cells

primer

553
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG1-4 (D9A)-

sequence

MG69-1_site 3 in HEK293T cells

primer

554
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG1-4 (D9A)-

sequence

MG69-1_site 4 in HEK293T cells

primer

555
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG1-4 (D9A)-

sequence

MG69-1_site 4 in HEK293T cells

primer

556
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG3-6 (D13A)-

sequence

MG69-1_site 1 in HEK293T cells

primer

557
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG3-6 (D13A)-

sequence

MG69-1_site 1 in HEK293T cells

primer

558
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG3-6 (D13A)-

sequence

MG69-1_site 2 in HEK293T cells

primer

559
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG3-6 (D13A)-

sequence

MG69-1_site 2 in HEK293T cells

primer

560
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG3-6 (D13A)-

sequence

MG69-1_site 3 in HEK293T cells

primer

561
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG3-6 (D13A)-

sequence

MG69-1_site 3 in HEK293T cells

primer

562
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG3-6 (D13A)-

sequence

MG69-1_site 4 in HEK293T cells

primer

563
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG3-6 (D13A)-

sequence

MG69-1_site 4 in HEK293T cells

primer

564
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG3-6 (D13A)-

sequence

MG69-1_site 5 in HEK293T cells

primer

565
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG3-6 (D13A)-

sequence

MG69-1_site 5 in HEK293T cells

primer

566
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG3-6 (D13A)-

sequence

MG69-1_site 6 in HEK293T cells

primer

567
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG3-6 (D13 A)-

sequence

MG69-1_site 6 in HEK293T cells

primer

568
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG3-6 (D13A)-

sequence

MG69-1_site 7 in HEK293T cells

primer

569
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG3-6 (D13A)-

sequence

MG69-1_site 7 in HEK293T cells

primer

570
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG4-2 (D28A)-

sequence

MG69-1_site 1 in HEK293T cells

primer

571
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG4-2 (D28A)-

sequence

MG69-1_site 1 in HEK293T cells

primer

572
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG4-2 (D28A)-

sequence

MG69-1_site 2 in HEK293T cells

primer

573
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG4-2 (D28A)-

sequence

MG69-1_site 2 in HEK293T cells

primer

574
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG4-2 (D28A)-

sequence

MG69-1_site 3 in HEK293T cells

primer

575
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG4-2 (D28A)-

sequence

MG69-1_site 3 in HEK293T cells

primer

576
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG4-2 (D28A)-

sequence

MG69-1_site 4 in HEK293T cells

primer

577
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG4-2 (D28A)-

sequence

MG69-1_site 4 in HEK293T cells

primer

578
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG18-1 (D12A)-

sequence

MG69-1_site 1 in HEK293T cells

primer

579
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG18-1 (D12A)-

sequence

MG69-1_site 1 in HEK293T cells

primer

580
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG18-1 (D12A)-

sequence

MG69-1_site 2 in HEK293T cells

primer

581
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG18-1 (D12A)-

sequence

MG69-1_site 2 in HEK293T cells

primer

582
Forward primer used to amplify
nucleotide
artificial

A0A2KSRDN7-nMG18-1 (D12A)-

sequence

MG69-1_site 3 in HEK293T cells

primer

583
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG18-1 (D12A)-

sequence

MG69-1_site 3 in HEK293T cells

primer

584
Forward primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG18-1 (D12A)-

sequence

MG69-1_site 4 in HEK293T cells

primer

585
Reverse primer used to amplify
nucleotide
artificial

A0A2K5RDN7-nMG18-1 (D12A)-

sequence

MG69-1_site 4 in HEK293T cells

adenine

586
TadA (ABE8.17m)-nMG34-1 (D10A)
protein
artificial

base

sequence

editor

cytosine

587
rAPOBEC1-nMG34-1 (D10A)-UGI
protein
artificial

base

(PBS1)

sequence

editor

adenine

588
MG68-3-nSpCas9 (D10A)
protein
artificial

base

sequence

editor

adenine

589
MG68-8-nSpCas9 (D10A)
protein
artificial

base

sequence

editor

Linker

590

protein
artificial

sequence

Linker

591

protein
artificial

sequence

Linker

592

protein
artificial

sequence

Linker

593

protein
artificial

sequence

Cytosine

594
CMP/dCMP-type deaminase domain-
protein
Cebus
unknown

Deaminase

containing protein (uniprot accession

imitator

A0A2K5RDN7)

Adenosine

595
TadA* (ABE8.17m)
protein
unknown
unknown

Deaminase

MG34

596
MG34-1 effector
protein
unknown
uncultivated

active

organism

effectors

nickase

597
MG34-1 (D10A)
protein
unknown
uncultivated

organism

PAM
A598

MG34-1 PAM
nucleotide
unknown
NGG

MG138

599
MG138-1
protein
unknown
Aves Class

cytidine

deaminase

MG138

600
MG138-2
protein
unknown
Aves Class

cytidine

deaminase

MG138

601
MG138-3
protein
unknown
Aves Class

cytidine

deaminase

MG138

602
MG138-4
protein
unknown
Aves Class

cytidine

deaminase

MG138

603
MG138-5
protein
unknown
Aves Class

cytidine

deaminase

MG138

604
MG138-6
protein
unknown
Aves Class

cytidine

deaminase

MG138

605
MG138-7
protein
unknown
Aves Class

cytidine

deaminase

MG138

606
MG138-8
protein
unknown
Aves Class

cytidine

deaminase

MG138

607
MG138-9
protein
unknown
Aves Class

cytidine

deaminase

MG138

608
MG138-10
protein
unknown
Aves Class

cytidine

deaminase

MG138

609
MG138-11
protein
unknown
Aves Class

cytidine

deaminase

MG138

610
MG138-12
protein
unknown
Aves Class

cytidine

deaminase

MG138

611
MG138-13
protein
unknown
Aves Class

cytidine

deaminase

MG138

612
MG138-14
protein
unknown
Aves Class

cytidine

deaminase

MG138

613
MG138-15
protein
unknown
Aves Class

cytidine

deaminase

MG138

614
MG138-16
protein
unknown
Aves Class

cytidine

deaminase

MG138

615
MG138-17
protein
unknown
Aves Class

cytidine

deaminase

MG138

616
MG138-18
protein
unknown
Aves Class

cytidine

deaminase

MG138

617
MG138-19
protein
unknown
Aves Class

cytidine

deaminase

MG138

618
MG138-20
protein
unknown
Aves Class

cytidine

deaminase

MG138

619
MG138-21
protein
unknown
Aves Class

cytidine

deaminase

MG138

620
MG138-22
protein
unknown
Aves Class

cytidine

deaminase

MG138

621
MG138-23
protein
unknown
Aves Class

cytidine

deaminase

MG138

622
MG138-24
protein
unknown
Aves Class

cytidine

deaminase

MG138

623
MG138-25
protein
unknown
Aves Class

cytidine

deaminase

MG138

624
MG138-26
protein
unknown
Aves Class

cytidine

deaminase

MG138

625
MG138-27
protein
unknown
Aves Class

cytidine

deaminase

MG138

626
MG138-28
protein
unknown
Aves Class

cytidine

deaminase

MG138

627
MG138-29
protein
unknown
Aves Class

cytidine

deaminase

MG138

628
MG138-30
protein
unknown
Aves Class

cytidine

deaminase

MG138

629
MG138-31
protein
unknown
Aves Class

cytidine

deaminase

MG138

630
MG138-32
protein
unknown
Aves Class

cytidine

deaminase

MG138

631
MG138-33
protein
unknown
Aves Class

cytidine

deaminase

MG138

632
MG138-34
protein
unknown
Aves Class

cytidine

deaminase

MG138

633
MG138-35
protein
unknown
Aves Class

cytidine

deaminase

MG138

634
MG138-36
protein
unknown
Aves Class

cytidine

deaminase

MG138

635
MG138-37
protein
unknown
Aves Class

cytidine

deaminase

MG138

636
MG138-38
protein
unknown
Aves Class

cytidine

deaminase

MG138

637
MG138-39
protein
unknown
Aves Class

cytidine

deaminase

MG138

638
MG138-40
protein
unknown
Aves Class

cytidine

deaminase

MG139

639
MG139-1
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

640
MG139-2
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

641
MG139-3
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

642
MG139-4
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

643
MG139-5
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

644
MG139-6
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

645
MG139-7
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

646
MG139-8
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

647
MG139-9
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

648
MG139-10
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

649
MG139-11
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

650
MG139-12
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

651
MG139-13
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

652
MG139-14
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

653
MG139-15
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

654
MG139-16
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

655
MG139-17
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

656
MG139-18
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

657
MG139-19
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

658
MG139-20
protein
unknown
uncultivated

cytidine

organism

deaminase

MG139

659
MG139-21
protein
unknown
uncultivated

cytidine

organism

deaminase

MG141

660
MG141-1
protein
unknown
Aves class

cytidine

deaminase

MG141

661
MG141-2
protein
unknown
Aves class

cytidine

deaminase

MG141

662
MG141-3
protein
unknown
Aves class

cytidine

deaminase

MG142

663
MG142-1
protein
unknown
Rodent class

cytidine

deaminase

MG142

664
MG142-2
protein
unknown
Rodent class

cytidine

deaminase

MG93

665
MG93-1
protein
unknown
Rodent class

cytidine

deaminase

MG93

666
MG93-2
protein
unknown
Rodent class

cytidine

deaminase

MG93

667
MG93-3
protein
unknown
Rodent class

cytidine

deaminase

MG93

668
MG93-4
protein
unknown
Rodent class

cytidine

deaminase

MG93

669
MG93-5
protein
unknown
Rodent class

cytidine

deaminase

MG93

670
MG93-6
protein
unknown
Rodent class

cytidine

deaminase

MG93

671
MG93-7
protein
unknown
Rodent class

cytidine

deaminase

MG93

672
MG93-8
protein
unknown
Rodent class

cytidine

deaminase

MG93

673
MG93-9
protein
unknown
Rodent class

cytidine

deaminase

MG93

674
MG93-10
protein
unknown
Rodent class

cytidine

deaminase

MG93

675
MG93-11
protein
unknown
Rodent class

cytidine

deaminase

adenine

676
MG68-4v1-nMG34-1
Protein
artificial

base

sequence

editor

adenine

677
TadA*(8.8m)-nMG34-1
Protein
artificial

base

sequence

editor

adenine

678
MG68-4v1-nSpCas9
Protein
artificial

base

sequence

editor

sgRNA

679
MG34-1
nucleotide
artificial

scaffold

sequence

sequence

sgRNA

680
SpCas9
nucleotide
artificial

scaffold

sequence

sequence

spacer

681
Spacer targeting site 1
nucleotide
artificial

sequence

spacer

682
Spacer targeting site 2
nucleotide
artificial

sequence

spacer

683
Spacer targeting site 3
nucleotide
artificial

sequence

spacer

684
Spacer targeting site 4
nucleotide
artificial

sequence

spacer

685
Spacer targeting site 5
nucleotide
artificial

sequence

spacer

686
Spacer targeting site 6
nucleotide
artificial

sequence

spacer

687
Spacer targeting site 7
nucleotide
artificial

sequence

spacer

688
Spacer targeting site 8
nucleotide
artificial

sequence

spacer

689
Spacer targeting site 9
nucleotide
artificial

sequence

primer

690
NGS primer for ABE site 1
nucleotide
artificial

sequence

primer

691
NGS primer for ABE site 1
nucleotide
artificial

sequence

primer

692
NGS primer for ABE site 2
nucleotide
artificial

sequence

primer

693
NGS primer for ABE site 2
nucleotide
artificial

sequence

primer

694
NGS primer for ABE site 3
nucleotide
artificial

sequence

primer

695
NGS primer for ABE site 3
nucleotide
artificial

sequence

primer

696
NGS primer for ABE site 4
nucleotide
artificial

sequence

primer

697
NGS primer for ABE site 4
nucleotide
artificial

sequence

primer

698
NGS primer for ABE site 5
nucleotide
artificial

sequence

primer

699
NGS primer for ABE site 5
nucleotide
artificial

sequence

primer

700
NGS primer for ABE site 6
nucleotide
artificial

sequence

primer

701
NGS primer for ABE site 6
nucleotide
artificial

sequence

primer

702
NGS primer for ABE site 7
nucleotide
artificial

sequence

primer

703
NGS primer for ABE site 7
nucleotide
artificial

sequence

primer

704
NGS primer for ABE site 8
nucleotide
artificial

sequence

primer

705
NGS primer for ABE site 8
nucleotide
artificial

sequence

primer

706
NGS primer for ABE site 9
nucleotide
artificial

sequence

primer

707
NGS primer for ABE site 9
nucleotide
artificial

sequence

BSD

708
Blasticidin engineered sequence for
nucleotide
artificial

resistance

selection purposes

sequence

casette

spacer

709
Spacer_MG3-6_g5
nucleotide
artificial

sequence

spacer

710
Spacer_MG3-6_g4
nucleotide
artificial

sequence

spacer

711
Spacer_MG3-6_g3
nucleotide
artificial

sequence

spacer

712
Spacer_MG3-6_g2
nucleotide
artificial

sequence

spacer

713
Spacer_MG3-6_g1
nucleotide
artificial

sequence

spacer

714
Spacer_Cas9_g6
nucleotide
artificial

sequence

spacer

715
Spacer_Cas9_g5
nucleotide
artificial

sequence

spacer

716
Spacer_Cas9_g4
nucleotide
artificial

sequence

spacer

717
Spacer_Cas9_g3
nucleotide
artificial

sequence

spacer

718
Spacer_Cas9_g2
nucleotide
artificial

sequence

spacer

719
Spacer_Cas9_g1
nucleotide
artificial

sequence

plasmid

720
pCMV
nucleotide
artificial

sequence

plasmid

721
pCMV-MG68-4v1-nMG34-1
nucleotide
artificial

sequence

plasmid

722
pCMV-TadA*(8.8m)-nMG34-1
nucleotide
artificial

sequence

plasmid

723
pCMV-MG68-4v1-nSpCas9
nucleotide
artificial

sequence

plasmid

724
pCMV-MG68-4v1-nMG34-1_sgRNA
nucleotide
artificial

1

sequence

plasmid

725
pCMV-TadA*(8.8m)-nMG34-
nucleotide
artificial

1_sgRNA 1

sequence

plasmid

726
pCMV-MG68-4v1-nSpCas9_sgRNA 1
nucleotide
artificial

sequence

adenine

727
TadA*(8.17m)-nMG34-1
Protein
artificial

base

editor

sequence

adenine

728
TadA*(8.17m)-nSpCas9
Protein
artificial

base

editor

sequence

spacer

729
Spacer 1 for TadA*(8.17m)-nMG34-1
nucleotide
artificial

targeting in E. coli

sequence

spacer

730
Spacer 2 for TadA*(8.17m)-nMG34-1
nucleotide
artificial

targeting in E. coli

sequence

spacer

731
Spacer 3 for TadA*(8.17m)-nMG34-1
nucleotide
artificial

targeting in E. coli

sequence

spacer

732
Spacer 4 for TadA*(8.17m)-nMG34-1
nucleotide
artificial

targeting in E. coli

sequence

spacer

733
Spacer 1 for TadA*(8.17m)-nSpCas9
nucleotide
artificial

targeting in E. coli

sequence

spacer

734
Spacer 2 for TadA*(8.17m)-nSpCas9
nucleotide
artificial

targeting in E. coli

sequence

spacer

735
Spacer 3 for TadA*(8.17m)-nSpCas9
nucleotide
artificial

targeting in E. coli

sequence

spacer

736
Spacer 4 for TadA*(8.17m)-nSpCas9
nucleotide
artificial

targeting in E. coli

sequence

plasmid

737
pCMV-TadA*(8.17m)-nMG34-
nucleotide
artificial

1_sgRNA 1

sequence

plasmid

738
pCMV-TadA*(8.17m)-
nucleotide
artificial

nSpCas9_sgRNA 1

sequence

cytidine

739
rAPOBEC1-nMG34-1-UGI (PBS)
Protein
artificial

base

sequence

editor

cytidine

740
rAPOBEC1-nSpCas9-UGI (PBS)
Protein
artificial

base

sequence

editor

plasmid

741
plasmid, prepared by Twist, that
nucleotide
human

contains the A1CF gene, a cofactor for

APOBEC activity on RNA

oligonucl

742
RNA Sequence used to test CDAs for
nucleotide

eotide

RNA activity. From Wolfe et. al. NAR

Cancer, 2020, Vol. 2, No. 4

oligonucl

743
Labelled primer for poisoned primer
nucleotide

eotide

extension assay used to test CDAs for

RNA activity. From Wolfe et. al. NAR

Cancer, 2020, Vol. 2, No. 4. 5′ FAM

Label

MG139

744
MG139-22
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

745
MG139-23
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

746
MG139-24
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

747
MG139-25
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

748
MG139-26
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

749
MG139-27
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

750
MG139-28
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

751
MG139-29
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

752
MG139-30
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

753
MG139-31
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

754
MG139-32
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

755
MG139-33
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

756
MG139-34
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

757
MG139-35
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

758
MG139-36
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

759
MG139-37
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

760
MG139-38
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

761
MG139-39
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

762
MG139-40
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

763
MG139-41
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

764
MG139-42
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

765
MG139-43
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

766
MG139-44
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

767
MG139-45
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

768
MG139-46
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

769
MG139-47
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

770
MG139-48
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

771
MG139-49
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

772
MG139-50
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

773
MG139-51
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

774
MG139-52
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

775
MG139-53
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

776
MG139-54
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

777
MG139-55
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

778
MG139-56
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

779
MG139-57
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

780
MG139-58
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

781
MG139-59
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

782
MG139-60
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

783
MG139-61
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

784
MG139-62
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

785
MG139-63
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

786
MG139-64
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

787
MG139-65
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

788
MG139-66
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

789
MG139-67
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

790
MG139-68
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

791
MG139-69
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

792
MG139-70
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

793
MG139-71
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

794
MG139-72
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

79
MG139-73
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

796
MG139-74-1
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

797
MG139-74-2
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

798
MG139-75
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

799
MG139-76
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

800
MG139-77-1
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

801
MG139-77-2
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

802
MG139-78
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

803
MG139-79
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

804
MG139-80
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

805
MG139-81
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

806
MG139-82
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

80
MG139-83
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

808
MG139-84
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

809
MG139-85
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

810
MG139-86
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

811
MG139-87
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

812
MG139-88
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

813
MG139-89
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

814
MG139-90
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

815
MG139-91
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

816
MG139-92
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

817
MG139-93
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

818
MG139-94
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

819
MG139-95
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

820
MG139-96
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

821
MG139-97
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

822
MG139-98
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

823
MG139-99
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

824
MG139-100
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

825
MG139-101
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

826
MG139-102
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG139

827
MG139-103
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG93

828
MG93-12
Protein
Unknown
Rodent class

cytidine

deaminase

MG142

829
MG142-3
Protein
Unknown
Rodent class

Cytidine

deaminase

MG152

830
MG152-1
Protein
Unknown
Bivalvia class

cytidine

deaminase

MG152

831
MG152-2
Protein
Unknown
Bivalvia class

cytidine

deaminase

MG152

832
MG152-3
Protein
Unknown
Bivalvia class

cytidine

deaminase

MG152

833
MG152-4
Protein
Unknown
Bivalvia class

cytidine

deaminase

MG152

834
MG152-5
Protein
Unknown
Bivalvia class

cytidine

deaminase

MG152

835
MG152-6
Protein
Unknown
Bivalvia class

cytidine

deaminase

adenine

836
MG68-4_r1v1_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

837
MG68-4_r2v1_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

838
MG68-4_r2v2_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

839
MG68-4_r2v3_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

840
MG68-4_r2v4_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

841
MG68-4_r2v5_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

842
MG68-4_r2v6_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

843
MG68-4_r2v7_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

844
MG68-4_r2v8_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

845
MG68-4_r2v9_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

846
MG68-4_r2v10_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

847
MG68-4_r2v11_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

848
MG68-4_r2v12_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

849
MG68-4_r2v13_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

850
MG68-4_r2v14_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

851
MG68-4_r2v15_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

852
MG68-4_r2v16_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

853
MG68-4_r2v17_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

854
MG68-4_r2v18_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

855
MG68-4_r2v19_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

856
MG68-4_r2v20_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

857
MG68-4_r2v21_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

858
MG68-4_r2v22_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

859
MG68-4_r2v23_nMG34-1
Protein
Artificial

base

sequence

editor

adenine

860
MG68-4_r2v24_nMG34-1
Protein
Artificial

base

sequence

editor

spacer

861
guide 1 for ABE using MG34-1
nucleotide
Artificial

sequence

spacer

862
guide 2 for ABE using MG34-1
nucleotide
Artificial

sequence

spacer

863
guide 3 for ABE using MG34-1
nucleotide
Artificial

sequence

spacer

864
guide 4 for ABE using MG34-1
nucleotide
Artificial

sequence

primer

865
NGS primer for guide 1 of ABE using
nucleotide
Artificial

MG34-1

sequence

primer

866
NGS primer for guide 1 of ABE using
nucleotide
Artificial

MG34-1

sequence

primer

867
NGS primer for guide 2 of ABE using
nucleotide
Artificial

MG34-1

sequence

primer

868
NGS primer for guide 2 of ABE using
nucleotide
Artificial

MG34-1

sequence

primer

869
NGS primer for guide 3 of ABE using
nucleotide
Artificial

MG34-1

sequence

primer

870
NGS primer for guide 3 of ABE using
nucleotide
Artificial

MG34-1

sequence

primer

871
NGS primer for guide 4 of ABE using
nucleotide
Artificial

MG34-1

sequence

primer

872
NGS primer for guide 4 of ABE using
nucleotide
Artificial

MG34-1

sequence

Plasmid

873
pCMV-MG68-4_rlv1_nMG34-1
nucleotide
Artificial

sequence

Plasmid

874
pCMV-U6p-spacer (guide 1)-MG34-1
nucleotide
Artificial

sgRNA scaffold

sequence

Plasmid

875
pAL478
nucleotide
Artificial

sequence

sgRNA

876
MG34-1
nucleotide
artificial

scaffold

sequence

sequence

Cytosine

877
spCAS9 + MG139-12 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

878
spCAS9 + MG93-4 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

879
spCAS9 + MG93-3 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

880
spCAS9 + MG93-5 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

881
spCAS9 + MG93-6 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

882
spCAS9 + MG93-7 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

883
spCAS9 + MG93-9 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

884
spCAS9 + MG93-11 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

885
spCAS9 + MG138-17 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

886
spCAS9 + MG138-20 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

887
spCAS9 + MG138-23 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

888
spCAS9 + MG138-32 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

889
spCAS9 + MG142-1 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

890
MG3-6 + MG139-12 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

891
MG3-6 + MG93-4 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

892
MG3-6 + MG93-3 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

893
MG3-6 + MG93-5 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

894
MG3-6 + MG93-6 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

895
MG3-6 + MG93-7 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

896
MG3-6 + MG93-9 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

897
MG3-6 + MG93-11 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

898
MG3-6 + MG138-17 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

899
MG3-6 + MG138-20 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

900
MG3-6 + MG138-23 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

901
MG3-6 + MG138-32 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

902
MG3-6 + MG142-1 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

903
MG34-1 + MG139-12 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

904
MG34-1 + MG93-4 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

905
MG34-1 + MG93-3 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

906
MG34-1 + MG93-5 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

907
MG34-1 + MG93-6 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

908
MG34-1 + MG93-7 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

909
MG34-1 + MG93-9 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

910
MG34-1 + MG93-11 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

911
MG34-1 + MG138-17 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

912
MG34-1 + MG138-20 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

913
MG34-1 + MG138-23 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

914
MG34-1 + MG138-32 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

915
MG34-1 + MG142-1 + MG69-1
Protein
Artificial

Base

sequence

Editor

Cytosine

916
MG34-1 + A0A2K5RDN7(APOBEC
Protein
Artificial

Base

3A) + MG69-1

sequence

Editor

sgRNA

917
sgRNA266
nucleotide
Artificial

(spacer

sequence

and

scaffold)

sgRNA

918
sgRNA691
nucleotide
Artificial

(spacer

sequence

and

scaffold)

sgRNA

919
sgRNA692
nucleotide
Artificial

(spacer

sequence

and

scaffold)

sgRNA

920
sgRNA693
nucleotide
Artificial

(spacer

sequence

scaffold)

sgRNA

921
sgRNA694
nucleotide
Artificial

(spacer

sequence

and

scaffold)

sgRNA

922
sgRNA708
nucleotide
Artificial

(spacer

sequence

and

scaffold)

sgRNA

923
sgRNA709
nucleotide
Artificial

(spacer

sequence

and

scaffold)

sgRNA

924
sgRNA710
nucleotide
Artificial

(spacer

sequence

and

scaffold)

sgRNA

925
sgRNA711
nucleotide
Artificial

(spacer

sequence

and

scaffold)

sgRNA

926
sgRNA712
nucleotide
Artificial

(spacer

sequence

and

scaffold)

sgRNA

927
sgRNA633
nucleotide
Artificial

(spacer

sequence

and

scaffold)

sgRNA

928
sgRNA634
nucleotide
Artificial

(spacer

sequence

and

scaffold)

sgRNA

929
sgRNA635
nucleotide
Artificial

(spacer

sequence

and

scaffold)

sgRNA

930
sgRNA636
nucleotide
Artificial

(spacer

sequence

and

scaffold)

sgRNA

931
sgRNA641
nucleotide
Artificial

(spacer

sequence

and

scaffold)

primer

932
NGS primer for sgRNA266
nucleotide
Artificial

sequence

primer

933
NGS primer for sgRNA266
nucleotide
Artificial

sequence

primer

934
NGS primer for sgRNA691
nucleotide
Artificial

sequence

primer

935
NGS primer for sgRNA691
nucleotide
Artificial

sequence

primer

936
NGS primer for sgRNA692
nucleotide
Artificial

sequence

primer

937
NGS primer for sgRNA692
nucleotide
Artificial

sequence

primer

938
NGS primer for sgRNA693
nucleotide
Artificial

sequence

primer

939
NGS primer for sgRNA693
nucleotide
Artificial

sequence

primer

940
NGS primer for sgRNA694
nucleotide
Artificial

sequence

primer

941
NGS primer for sgRNA694
nucleotide
Artificial

sequence

primer

942
NGS primer for sgRNA708
nucleotide
Artificial

sequence

primer

943
NGS primer for sgRNA708
nucleotide
Artificial

sequence

primer

944
NGS primer for sgRNA709
nucleotide
Artificial

sequence

primer

945
NGS primer for sgRNA709
nucleotide
Artificial

sequence

primer

946
NGS primer for sgRNA710
nucleotide
Artificial

sequence

primer

947
NGS primer for sgRNA710
nucleotide
Artificial

sequence

primer

948
NGS primer for sgRNA711
nucleotide
Artificial

sequence

primer

949
NGS primer for sgRNA711
nucleotide
Artificial

sequence

primer

950
NGS primer for sgRNA712
nucleotide
Artificial

sequence

primer

951
NGS primer for sgRNA712
nucleotide
Artificial

sequence

primer

952
NGS primer for sgRNA633
nucleotide
Artificial

sequence

primer

953
NGS primer for sgRNA633
nucleotide
Artificial

sequence

primer

954
NGS primer for sgRNA634
nucleotide
Artificial

sequence

primer

955
NGS primer for sgRNA634
nucleotide
Artificial

sequence

primer

956
NGS primer for sgRNA635
nucleotide
Artificial

sequence

primer

957
NGS primer for sgRNA635
nucleotide
Artificial

sequence

primer

958
NGS primer for sgRNA636
nucleotide
Artificial

sequence

primer

959
NGS primer for sgRNA636
nucleotide
Artificial

sequence

primer

960
NGS primer for sgRNA641
nucleotide
Artificial

sequence

primer

961
NGS primer for sgRNA641
nucleotide
Artificial

sequence

Engineered

962
Site enginereed in mammalian cell line
nucleotide
Artificial

sequence

with 5 PAMs compatible with Cas9

sequence

in

and MG3-6 editing

mammalian

cells

sgRNA

963
Spacer targeting engineered site #1
nucleotide
Artificial

sequence

sgRNA

964
Spacer targeting engineered site #2
nucleotide
Artificial

sequence

sgRNA

965
Spacer targeting engineered site #3
nucleotide
Artificial

sequence

sgRNA

966
Spacer targeting engineered site #4
nucleotide
Artificial

sequence

sgRNA

967
Spacer targeting engineered site #5
nucleotide
Artificial

sequence

Cytosine

968
spCas9 + A0A2K5RDN7(APOBEC
Protein
Artificial

Base

3A) + MG69-1

sequence

Editor

Cytosine

969
MG3-6 + A0A2K5RDN7(APOBEC
Protein
Artificial

Base

3A) + MG69-1

sequence

Editor

MG139

970
MG139-12
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG93

971
MG93-3
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG93

972
MG93-4
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG93

973
MG93-5
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG93

974
MG93-6
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG93

975
MG93-7
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG93

976
MG93-9
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG93

977
MG93-11
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG138

978
MG138-17
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG138

979
MG138-20
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG138

980
MG138-23
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG138

981
MG138-32
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG142

982
MG142-1
Protein
Unknown
uncultivated

cytidine

organism

deaminase

MG128

983
MG128-1 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

984
MG128-2 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

985
MG128-3 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

986
MG128-4 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

987
MG128-5 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

988
MG128-6 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

989
MG128-7 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

990
MG128-8 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

991
MG128-9 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

992
MG128-10 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

993
MG128-11 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

994
MG128-12 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

995
MG128-13 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

996
MG128-14 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

997
MG128-15 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

998
MG128-16 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

999
MG128-17 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1000
MG128-18 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1001
MG128-19 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1002
MG128-20 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1003
MG128-21 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1004
MG128-22 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1005
MG128-23 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1006
MG128-24 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1007
MG128-25 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1008
MG128-26 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1009
MG128-27 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1010
MG128-28 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1011
MG128-29 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1012
MG128-30 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1013
MG128-31 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG128

1014
MG128-32 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG129

1015
MG129-1 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG129

1016
MG129-2 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG129

1017
MG129-3 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG129

1018
MG129-4 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG129

1019
MG129-5 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG129

1020
MG129-6 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG129

1021
MG129-7 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG129

1022
MG129-8 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG129

1023
MG129-9 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG129

1024
MG129-10 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG129

1025
MG129-11 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG129

1026
MG129-12 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG130

1027
MG130-1 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG130

1028
MG130-2 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG130

1029
MG130-3 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG130

1030
MG130-4 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG130

1031
MG130-5 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG131

1032
MG131-1 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG131

1033
MG131-2 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG131

1034
MG131-3 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG131

1035
MG131-4 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG131

1036
MG131-5 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG131

1037
MG131-6 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG131

1038
MG131-7 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG131

1039
MG131-8 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG131

1040
MG131-9 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG132

1041
MG132-1 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG132

1042
MG132-2 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG132

1043
MG132-3 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG133

1044
MG133-1 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG133

1045
MG133-2 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG133

1046
MG133-3 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG133

1047
MG133-4 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG133

1048
MG133-5 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG133

1049
MG133-6 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG133

1050
MG133-7 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG133

1051
MG133-8 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG133

1052
MG133-9 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG133

1053
MG133-10 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG133

1054
MG133-11 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG133

1055
MG133-12 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG133

1056
MG133-13 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG133

1057
MG133-14 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG134

1058
MG134-1 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG134

1059
MG134-2 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG134

1060
MG134-3 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG134

1061
MG134-4 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG135

1062
MG135-1 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG135

1063
MG135-2 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG135

1064
MG135-3 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG135

1065
MG135-4 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG135

1066
MG135-5 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG135

1067
MG135-6 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG135

1068
MG135-7 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG135

1069
MG135-8 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG136

1070
MG136-1 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG136

1071
MG136-2 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG136

1072
MG136-3 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG136

1073
MG136-4 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG136

1074
MG136-5 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG136

1075
MG136-6 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG136

1076
MG136-7 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG136

1077
MG136-8 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG136

1078
MG136-9 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG136

1079
MG136-10 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG136

1080
MG136-11 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG136

1081
MG136-12 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1082
MG137-1 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1083
MG137-2 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1084
MG137-3 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1085
MG137-4 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1086
MG137-5 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1087
MG137-6 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1088
MG137-7 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1089
MG137-8 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1090
MG137-9 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1091
MG137-10 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1092
MG137-11 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1093
MG137-12 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1094
MG137-13 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1095
MG137-14 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1096
MG137-15 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1097
MG137-16 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG137

1098
MG137-17 Deaminase
Protein
Unknown
Uncultivated

Deaminase

Organism

MG35

1099
MG35-1 active effectors sgRNA
nucleotide
artificial
N/A

active

sequence

effectors

sgRNA

MG35

1100
MG35-2 active effectors sgRNA
nucleotide
artificial
N/A

active

sequence

effectors

sgRNA

MG35

1101
MG35-3 active effectors sgRNA
nucleotide
artificial
N/A

active

sequence

effectors

sgRNA

MG35

1102
MG35-4 active effectors sgRNA
nucleotide
artificial
N/A

active

sequence

effectors

sgRNA

MG35

1103
MG35-5 active effectors sgRNA
nucleotide
artificial
N/A

active

sequence

effectors

sgRNA

MG35

1104
MG35-6 active effectors sgRNA
nucleotide
artificial
N/A

active

sequence

effectors

sgRNA

MG35

1105
MG35-102 active effectors sgRNA
nucleotide
artificial
N/A

active

sequence

effectors

sgRNA

MG35

1106
MG35-1 active effectors PAM
nucleotide
artificial
AnGg

active

sequence

effectors

PAM

MG35
A1107

MG35-2 active effectors PAM
nucleotide
artificial
nARAA

active

sequence

effectors

PAM

MG35
A1108

MG35-3 active effectors PAM
nucleotide
artificial
ATGaaa

active

sequence

effectors

PAM

MG35
A1109

MG35-4 active effectors PAM
nucleotide
artificial
ATGA

active

sequence

effectors

PAM

MG35
A1110

MG35-5 active effectors PAM
nucleotide
artificial
WTGG

active

sequence

effectors

PAM

MG35
A1111

MG35-102 active effectors PAM
nucleotide
artificial
RTGA

active

sequence

effectors

PAM

ABE-

1112
ABE-MG35-1 active adenine base
nucleotide
artificial
N/A

MG35

editor gene

sequence

active

adenine

base

editor

genes

ABE-

1113
ABE-MG35-1 active adenine base
protein
artificial
N/A

MG35

editor

sequence

active

adenine

base

editors

Cas9-

1114
pMG3078
Nucleotide

CBE

Fam72a

1115
pMG3072
Nucleotide

Cas9-

1116
PE266
Nucleotide

CBE

target

site

Cas9-

1117
PE691
Nucleotide

CBE

target

site

NGS

1118
PE266 NGS Amplicon
Nucleotide

Amplicon

NGS

1119
PE691 NGS Amplicon
Nucleotide

Amplicon

MG35

1120
MG35-1 active effector amino acid
Polypeptide

active

sequence

effector

FAM72

1121
Fam72A peptide sequence
Polypeptide

A

MG35

1122
MG35-2 active effector amino acid
Polypeptide

active

sequence

effector

MG35

1123
MG35-3 active effector amino acid
Polypeptide

active

sequence

effector

MG35

1124
MG35-4 active effector amino acid
Polypeptide

active

sequence

effector

MG35

1125
MG35-5 active effector amino acid
Polypeptide

active

sequence

effector

MG35

1126
MG35-6 active effector amino acid
Polypeptide

active

sequence

effector

MG35

1127
MG35-102 active effector amino acid
Polypeptide

active

sequence

effector

MG3-

1128
3-68_DIV1_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1129
3-68_DIV2_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1130
3-68_DIV3_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1131
3-68_DIV4_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1132
3-68_DIV5_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1133
3-68_DIV6_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1134
3-68_DIV7_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1135
3-68_DIV8_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1136
3-68_DIV9_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1137
3-68_DIV10_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1138
3-68_DIV11_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1139
3-68_DIV12_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1140
3-68_DIV13_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1141
3-68_DIV14_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1142
3-68_DIV15_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1143
3-68_DIV16_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1144
3-68_DIV17_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1145
3-68_DIV18_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1146
3-68_DIV19_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1147
3-68_DIV20_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1148
3-68_DIV21_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1149
3-68_DIV22_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1150
3-68_DIV23_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1151
3-68_DIV24_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1152
3-68_DIV25_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1153
3-68_DIV26_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1154
3-68_DIV27_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1155
3-68_DIV28_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1156
3-68_DIV29_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1157
3-68_DIV30_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1158
3-68_DIV31_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1159
3-68_DIV32_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1160
3-68_DIV33_M_RDr1v1_B
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG34-1

1161
MG68-4
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1162
MGA1.1RD1
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1163
MGA1.1RD2
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1164
MGA1.1RD3
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1165
MGA1.1RD4
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1166
MGA1.1RD5
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1167
MGA1.1RD6
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1168
MGA1.1RD7
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1169
MGA1.1RD8
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1170
MGA1.1RD9
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1171
MGA1.1RD10
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1172
MGA1.1RD11
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1173
MGA1.1RD12
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1174
MGA1.1RD13
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1175
MGA1.1RD14
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1176
MGA1.1RD15
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1177
MGA1.1RD16
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1178
MGA1.1RD17
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1179
MGA1.1RD18
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1180
MGA1.1RD19
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1181
MGA1.1RD20
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1182
MGA1.1RD21
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1183
MGA1.1RD22
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1184
MAG0.1_2NLS
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1185
MAG1.1 2NLS
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1186
MAG2.1 2NLS
Protein
artificial
MG34-1 sequence

adenine

sequence
is included

base

editor

MG34-1

1187
guide 2 for ABE using MG34-1
Nucleotide
artificial

adenine

sequence

base

editor

sgRNA6

8

sequence

MG3-

1188
sgRNA68
Nucleotide
artificial

6_3-8

sequence

adenine

base

editor

sgRNA

sequence

MG3-

1189
sgRNA46
Nucleotide
artificial

6_3-8

sequence

adenine

base

editor

sgRNA

sequence

MG3-

1190
sgRNA49
Nucleotide
artificial

6_3-8

sequence

adenine

base

editor

sgRNA

sequence

MG3-

1191
sgRNA51
Nucleotide
artificial

6_3-8

sequence

adenine

base

editor

sgRNA

sequence

MG3-

1192
sgRNA53
Nucleotide
artificial

6_3-8

sequence

adenine

base

editor

sgRNA

sequence

MG3-

1193
sgRNA54
Nucleotide
artificial

6_3-8

sequence

adenine

base

editor

sgRNA

sequence

MG3-

1194
sgRNA55
Nucleotide
artificial

6_3-8

sequence

adenine

base

editor

sgRNA

sequence

MG3-

1195
sgRNA62
Nucleotide
artificial

6_3-8

sequence

adenine

base

editor

sgRNA

sequence

DNA

1196
guide 2 for ABE using MG34-1
Nucleotide
artificial

Sequence

sequence

of Target

Site

DNA

1197
sgRNA68
Nucleotide
artificial

Sequence

sequence

of Target

Site

DNA

1198
sgRNA46
Nucleotide
artificial

Sequence

sequence

of Target

Site

DNA

1199
sgRNA49
Nucleotide
artificial

Sequence

sequence

of Target

Site

DNA

1200
sgRNA51
Nucleotide
artificial

Sequence

sequence

of Target

Site

DNA

1201
sgRNA53
Nucleotide
artificial

Sequence

sequence

of Target

Site

DNA

1202
sgRNA54
Nucleotide
artificial

Sequence

sequence

of Target

Site

DNA

1203
sgRNA55
Nucleotide
artificial

Sequence

sequence

of Target

Site

DNA

1204
sgRNA62
Nucleotide
artificial

Sequence

sequence

of Target

Site

Plasmid

1205
Expression of MG3-6_3-8 adenine
Nucleotide
artificial

base editor

sequence

Plasmid

1206
Expression of sgRNA for MG3-6_3-8
Nucleotide
artificial

adenine base editor

sequence

Plasmid

1207
Expression of MG34-1 adenine base
Nucleotide
artificial

editor

sequence

MG93

1208
W90A
MG93_4v1
Protein
Rodent class

cytidine

deaminase

variant

MG93

1209
W90F
MG93_4v2
Protein
Rodent class

cytidine

deaminase

variant

MG93

1210
W90H
MG93_4v3
Protein
Rodent class

cytidine

deaminase

variant

MG93

1211
W90Y
MG93_4v4
Protein
Rodent class

cytidine

deaminase

variant

MG93

1212
Y120F
MG93_4v5
Protein
Rodent class

cytidine

deaminase

variant

MG93

1213
Y120H
MG93_4v6
Protein
Rodent class

cytidine

deaminase

variant

MG93

1214
Y121F
MG93_4v7
Protein
Rodent class

cytidine

deaminase

variant

MG93

1215
Y121H
MG93_4v8
Protein
Rodent class

cytidine

deaminase

variant

MG93

1216
Y121Q
MG93_4v9
Protein
Rodent class

cytidine

deaminase

variant

MG93

1217
Y121A
MG93_4v10
Protein
Rodent class

cytidine

deaminase

variant

MG93

1218
Y121D
MG93_4v11
Protein
Rodent class

cytidine

deaminase

variant

MG93

1219
Y121W
MG93_4v12
Protein
Rodent class

cytidine

deaminase

variant

MG93

1220
H122Y
MG93_4v13
Protein
Rodent class

cytidine

deaminase

variant

MG93

1221
H122F
MG93_4v14
Protein
Rodent class

cytidine

deaminase

variant

MG93

1222
H122I
MG93_4v15
Protein
Rodent class

cytidine

deaminase

variant

MG93

1223
H122A
MG93_4v16
Protein
Rodent class

cytidine

deaminase

variant

MG93

1224
H122W
MG93_4v17
Protein
Rodent class

cytidine

deaminase

variant

MG93

1225
H122D
MG93_4v18
Protein
Rodent class

cytidine

deaminase

variant

MG93

1226
Replace with hAID loop7
MG93_4v19
Protein
Rodent class

cytidine

deaminase

variant

MG93

1227
Replace with 139_86 loop 7
MG93_4v20
Protein
Rodent class

cytidine

deaminase

variant

MG93

1228
Truncate from 188 to end
MG93_4v21
Protein
Rodent class

cytidine

deaminase

variant

MG93

1229
Y121T
MG93_4v22
Protein
Rodent class

cytidine

deaminase

variant

MG93

1230
Replace with a smaller section of hAID
MG93_4v23
Protein
Rodent class

cytidine

loop7

deaminase

variant

MG93

1231
Replace with a smaller section of hAID
MG93_4v24
Protein
Rodent class

cytidine

loop7

deaminase

variant

MG93

1232
R33A
MG93_4v25
Protein
Rodent class

cytidine

deaminase

variant

MG93

1233
R34A
MG93_4v26
Protein
Rodent class

cytidine

deaminase

variant

MG93

1234
R34K
MG93_4v27
Protein
Rodent class

cytidine

deaminase

variant

MG93

1235
H122A R33A
MG93_4v28
Protein
Rodent class

cytidine

deaminase

variant

MG93

1236
H122A R34A
MG93_4v29
Protein
Rodent class

cytidine

deaminase

variant

MG93

1237
R52A
MG93_4v30
Protein
Rodent class

cytidine

deaminase

variant

MG93

1238
H122A R52A
MG93_4v31
Protein
Rodent class

cytidine

deaminase

variant

MG93

1239
N57G (Shown to have lower off target
MG93_4v32
Protein
Rodent class

cytidine

activity in A3A)

deaminase

variant

MG93

1240
N57G H122A
MG93_4v33
Protein
Rodent class

cytidine

deaminase

variant

MG93

1241
Replace with A3A loop7
MG139_86v1
Protein
Rodent class

cytidine

deaminase

variant

MG93

1242
E123A
MG139_95v1
Protein
Rodent class

cytidine

deaminase

variant

MG93

1243
E123Q
MG139_95v2
Protein
Rodent class

cytidine

deaminase

variant

MG93

1244
Replace with hAID loop7
MG93_3v1
Protein
Rodent class

cytidine

deaminase

variant

MG93

1245
Replace with 139_86 loop 7
MG93_3v2
Protein
Rodent class

cytidine

deaminase

variant

MG93

1246
W127F
MG93_3v3
Protein
Rodent class

cytidine

deaminase

variant

MG93

1247
W127H
MG93_3v4
Protein
Rodent class

cytidine

deaminase

variant

MG93

1248
W127Q
MG93_3v5
Protein
Rodent class

cytidine

deaminase

variant

MG93

1249
W127A
MG93_3v6
Protein
Rodent class

cytidine

deaminase

variant

MG93

1250
W127D
MG93_3v7
Protein
Rodent class

cytidine

deaminase

variant

MG93

1251
R39A
MG93_3v8
Protein
Rodent class

cytidine

deaminase

variant

MG93

1252
K40A
MG93_3v9
Protein
Rodent class

cytidine

deaminase

variant

MG93

1253
H128A
MG93_3v10
Protein
Rodent class

cytidine

deaminase

variant

MG93

1254
N63G
MG93_3v11
Protein
Rodent class

cytidine

deaminase

variant

MG93

1255
R58A
MG93_3v12
Protein
Rodent class

cytidine

deaminase

variant

MG93

1256
Replace with hAID loop7
MG93_11v1
Protein
Rodent class

cytidine

deaminase

variant

MG93

1257
Replace with 139_86 loop 7
MG93_11v2
Protein
Rodent class

cytidine

deaminase

variant

MG93

1258
H121F
MG93_11v3
Protein
Rodent class

cytidine

deaminase

variant

MG93

1259
H121Y
MG93_11v4
Protein
Rodent class

cytidine

deaminase

variant

MG93

1260
H121Q
MG93_11v5
Protein
Rodent class

cytidine

deaminase

variant

MG93

1261
H121A
MG93_11v6
Protein
Rodent class

cytidine

deaminase

variant

MG93

1262
H121D
MG93_11v7
Protein
Rodent class

cytidine

deaminase

variant

MG93

1263
H121W
MG93_11v8
Protein
Rodent class

cytidine

deaminase

variant

MG93

1264
N57G (Shown to have lower off target
MG93_11v9
Protein
Rodent class

cytidine

activity in A3A)

deaminase

variant

MG93

1265
R33A
MG93_11v10
Protein
Rodent class

cytidine

deaminase

variant

MG93

1266
K34A
MG93_11v11
Protein
Rodent class

cytidine

deaminase

variant

MG93

1267
H122A
MG93_11v12
Protein
Rodent class

cytidine

deaminase

variant

MG93

1268
H121A
MG93_11v13
Protein
Rodent class

cytidine

deaminase

variant

MG93

1269
R52A
MG93_11v14
Protein
Rodent class

cytidine

deaminase

variant

MG139

1270
K16 through P25 of pgtA3H replaces
139_52v1
Protein
uncultivated

cytidine

G20 through P26

organism

deaminase

variant

MG139

1271
S170 through D138 of pgtA3H
139_52v2
Protein
uncultivated

cytidine

replaces K196 to V215

organism

deaminase

variant

MG139

1272
P26R
139_52v3
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1273
P26A
139_52v4
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1274
N27R
139_52v5
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1275
N27A
139_52v6
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1276
W44A (equivalent to R52A)
139_52v7
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1277
W45A (equivalent to R52A)
139_52v8
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1278
K49G (equivalent to N57G)
139_52v9
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1279
S50G (equivalent to N57G)
139_52v10
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1280
R51G (equivalent to N57G)
139_52v11
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1281
R121A (equivalent to H121A)
139_52v12
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1282
I122A (equivalent to H122A)
139_52v13
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1283
N123A (equivalent to H122A)
139_52v14
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1284
Y88F (equivalent to W90F)
139_52v15
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1285
Y120F (equivalent to Y120F)
139_52v16
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1286
P22R
139_86v2
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1287
P22A
139_86v3
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1288
K23A
139_86v4
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1289
K41R
139_86v5
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1290
K41A
139_86v6
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1291
truncate K179 and onwards
139_86v7
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1292
Insert hAID loop 7 and truncate K179
139_86v8
Protein
uncultivated

cytidine

onwards

organism

deaminase

variant

MG139

1293
E54D and truncation
139_86v9
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1294
E54A Mutate catalytic E residue
139_86v10
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1295
Mutate neighboring E residue
139_86v11
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1296
E54AE55A Mutate both catalytic E
139_86v12
Protein
uncultivated

cytidine

residues

organism

deaminase

variant

MG152

1297
K30A
152_6v1
Protein
Bivalvia class

cytidine

deaminase

variant

MG152

1298
K30R
152_6v2
Protein
Bivalvia class

cytidine

deaminase

variant

MG152

1299
M32A
152_6v3
Protein
Bivalvia class

cytidine

deaminase

variant

MG152

1300
M32K
152_6v4
Protein
Bivalvia class

cytidine

deaminase

variant

MG152

1301
Y117A
152_6v5
Protein
Bivalvia class

cytidine

deaminase

variant

MG152

1302
K118A
152_6v6
Protein
Bivalvia class

cytidine

deaminase

variant

MG152

1303
I119A
152_6v7
Protein
Bivalvia class

cytidine

deaminase

variant

MG152

1304
I119H
152_6v8
Protein
Bivalvia class

cytidine

deaminase

variant

MG152

1305
R120A
152_6v9
Protein
Bivalvia class

cytidine

deaminase

variant

MG152

1306
R121A
152_6v10
Protein
Bivalvia class

cytidine

deaminase

variant

MG152

1307
P46A
152_6v11
Protein
Bivalvia class

cytidine

deaminase

variant

MG152

1308
P46R
152_6v12
Protein
Bivalvia class

cytidine

deaminase

variant

MG152

1309
N29A
152_6v13
Protein
Bivalvia class

cytidine

deaminase

variant

MG152

1310
Loop 7 from MG138-20
152_6v14
Protein
Bivalvia class

cytidine

deaminase

variant

MG152

1311
Loop 7 from MG139-12
152_6v15
Protein
Bivalvia class

cytidine

deaminase

variant

MG138

1312
R27A
138_20v1
Protein
Aves Class

cytidine

deaminase

variant

MG138

1313
N50G
138_20v2
Protein
Aves Class

cytidine

deaminase

variant

MG139

1314
Loop 7 from MG138-20
139_52v17
Protein
uncultivated

cytidine

organism

deaminase

variant

MG139

1315
Loop 7 from MG139-12
139_52v18
Protein
uncultivated

cytidine

organism

deaminase

variant

RF148

1316

SSDNA
DNA
artificial

substrate

RF149

1317

SSDNA
DNA
artificial

substrate

RF150

1318

SSDNA
DNA
artificial

substrate

RF151

1319

SSDNA
DNA
artificial

substrate

RF253

1320
AC vs GC Substrate
Dual
DNA
artificial

DNA

substrate

RF220

1321
TC v CC substrate
Dual
DNA
artificial

DNA

substrate

152-

1322

CDA
Protein
artificial

6_CBE

fused

linker,

MG3-6,

UGI

and

NLS

139-

1323
N27A
CDA
Protein
artificial

52v6_CBE

fused

linker,

MG3-6,

UGI

and

NLS

93-

1324

CDA
Protein
artificial

4_CBE

fused

linker,

MG3-6,

UGI

and

NLS

139-

1325

CDA
Protein
artificial

52_CBE

fused

linker,

MG3-6,

UGI

and

NLS

139-

1326

CDA
Protein
artificial

94_CBE

fused

linker

MG3-6,

UGI

and

NLS

93-

1327

CDA
Protein
artificial

7_CBE

fused

linker,

MG3-6,

UGI

and

NLS

93-

1328

CDA
Protein
artificial

3_CBE

fused

linker,

MG3-6,

UGI

and

NLS

139-

1329

CDA
Protein
artificial

92_CBE

fused

linker,

MG3-6,

UGI

and

NLS

139-

1330

CDA
Protein
artificial

12_CBE

fused

linker,

MG3-6,

UGI

and

NLS

139-

1331

CDA
Protein
artificial

103_CBE

fused

linker,

MG3-6,

UGI

and

NLS

139-

1332

CDA
Protein
artificial

95_CBE

fused

linker,

MG3-6,

UGI

and

NLS

139-

1333

CDA
Protein
artificial

99_CBE

fused

linker,

MG3-6,

UGI

and

NLS

139-

1334

CDA
Protein
artificial

90_CBE

fused

linker,

MG3-6,

UGI

and

NLS

139-

1335

CDA
Protein
artificial

89_CBE

fused

linker,

MG3-6,

UGI

and

NLS

139-

1336

CDA
Protein
artificial

93_CBE

fused

linker,

MG3-6,

UGI

and

NLS

138-

1337

CDA
Protein
artificial

30_CBE

fused

linker,

MG3-6,

UGI

and

NLS

139-

1338

CDA
Protein
artificial

102_CBE

fused

linker,

MG3-6,

UGI

and

NLS

93-

1339
H122A
CDA
Protein
artificial

4v16_CBE

fused

linker,

MG3-6,

UGI

and

NLS

152-

1340

CDA
Protein
artificial

5_CBE

fused

linker,

MG3-6,

UGI

and

NLS

138-

1341

CDA
Protein
artificial

20_CBE

fused

linker,

MG3-6,

UGI

and

NLS

138-

1342

CDA
Protein
artificial

23_CBE

fused

linker,

MG3-6,

UGI

and

NLS

93-

1343

CDA
Protein
artificial

5_CBE

fused

linker,

MG3-6,

UGI

and

NLS

152-

1344

CDA
Protein
artificial

4_CBE

fused

linker,

MG3-6,

UGI

and

NLS

152-

1345

CDA
Protein
artificial

1_CBE

fused

linker,

MG3-6,

UGI

and

NLS

152-

1346

CDA
Protein
artificial

fused

linker,

MG3-6,

3_CBE

UGI

and

NLS

139-

1347

CDA
Protein
artificial

56_CBE

fused

linker,

MG3-6,

UGI

and

NLS

93-

1348

CDA
Protein
artificial

11_CBE

fused

linker,

MG3-6,

UGI

and

NLS

93-

1349

CDA
Protein
artificial

6_CBE

fused

linker,

MG3-6,

UGI

and

NLS

93-

1350

CDA
Protein
artificial

9_CBE

fused

linker,

MG3-6,

UGI

and

NLS

142-

1351

CDA
Protein
artificial

1_CBE

fused

linker,

MG3-6,

UGI

and

NLS

138-

1352

CDA
Protein
artificial

32_CBE

fused

linker,

MG3-6,

UGI

and

NLS

139-

1353

CDA
Protein
artificial

101_CBE

fused

linker,

MG3-6,

UGI

and

NLS

138-

1354

CDA
Protein
artificial

17_CBE

fused

linker,

MG3-6,

UGI

and

NLS

139-

1355

CDA
Protein
artificial

91_CBE

fused

linker,

MG3-6,

UGI

and

NLS

MG34-1

1356
MG68-4_MG34-1 (D10A)
Protein
artificial

adenine

sequence

base

editor

MG34-1

1357
MG68-4 (D109N)_MG34-1 (D10A)
Protein
artificial

adenine

sequence

base

editor

MG34-1

1358
MG68-4 (D109N homodimer_32aa
Protein
artificial

adenine

linker)_MG34-1 (D10A)

sequence

base

editor

MG34-1

1359
MG68-4_(D109N homodimer_52aa
Protein
artificial

adenine

linker)_MG34-1 (D10A)

sequence

base

editor

MG34-1

1360
MG68-4_(D109N homodimer_64aa
Protein
artificial

adenine

linker)_MG34-1 (D10A)

sequence

base

editor

MG34-1

1361
MG68-4_(D109N homodimer_5aa
Protein
artificial

adenine

linker)_MG34-1 (D10A)

sequence

base

editor

MG34-1

1362
TadA*8.8m_MG34-1 (D10A)
Protein
artificial

adenine

sequence

base

editor

MG3-

1363
3-68_DIV30M_CMCL1
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1364
3-68_DIV30M_CMCL2
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1365
3-68_DIV30M_CMCL3
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1366
3-68_DIV30M_CMCL4
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1367
3-68_DIV30M_CMCL5
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1368
3-68_DIV30M_CMCL6
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1369
3-68_DIV30M_CMCL7
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1370
3-68_DIV30M_CMCL9
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1371
3-68_DIV30M_CMCL10
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1372
3-68_DIV30M_CMCL11
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1373
3-68_DIV30M_CMCL12
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1374
3-68_DIV30M_CMCL13
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1375
3-68_DIV30M_CMCL14
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1376
3-68_DIV30M_CMCL15
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1377
3-68_DIV30M_CMCL16
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1378
3-68_DIV30M_CMCL17
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1379
3-68_DIV30M_CMCL18
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1380
3-68_DIV30M_CMCL20
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1381
3-68_DIV30M_CMCL22
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1382
3-68_DIV30M_CMCL23
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1383
3-68_DIV30M_CMCL25
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1384
3-68_DIV30M_CMCL28
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1385
3-68_DIV30M_CMCL29
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1386
3-68_DIV30M_CMCL30
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1387
3-68_DIV30M_CMCL34
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1388
3-68_DIV30M_CMCL35
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1389
3-68_DIV30M_CMCL40
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1390
3-68_DIV30M_CMCL56
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1391
3-68_DIV30M_CMCL57
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1392
3-68_DIV30M_CMCL58
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1393
3-68_DIV30M CMCL59
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1394
3-68_DIV30M_CMCL60
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1395
3-68_DIV30M_CMCL61
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1396
3-68_DIV30M_CMCL62
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1397
3-68_DIV30M_CMCL63
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1398
3-68_DIV30M_CMCL64
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1399
3-68_DIV30M_CMCL65
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1400
3-68_DIV30M_CMCL66
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1401
3-68_DIV30M_CMCL67
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1402
3-68_DIV30M_CMCL68
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1403
3-68_DIV30M_CMCL69
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1404
3-68_DIV30M_CMCL70
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1405
3-68_DIV30M_CMCL71
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1406
3-68_DIV30M_CMCL72
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1407
3-68_DIV30M_CMCL73
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1408
3-68_DIV30M_CMCL74
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1409
3-68_DIV30M_CMCL75
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1410
3-68_DIV30M
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1411
3-68_DIV30D
Protein
artificial

6_3-8

sequence

adenine

base

editor

MG3-

1412
3-68_DIV30_M_EPMG68-
Protein
artificial

6_3-8

4_D7G_D10G_B

sequence

adenine

base

editor

MG3-

1413
3-68_DIV30_M_EPMG68-
Protein
artificial

6_3-8

4_H129N_B

sequence

adenine

base

editor

MG3-

1414
3-68_DIV30_HT_EPMG68-
Protein
artificial

6_3-8

4_D109N + D7G-D10G_B

sequence

adenine

base

editor

MG3-

1415
3-68_DIV30_HT_EPMG68-
Protein
artificial

6_3-8

4_D109N + H129N B

sequence

adenine

base

editor

MG34-1

1416
MG34-1_633 guide
Nucleotide
artificial

adenine

sequence

base

editor

sgRNA

sequence

MG34-1

1417
MG34-1_634 guide
Nucleotide
artificial

adenine

sequence

base

editor

sgRNA

sequence

MG3-

1418
sgRNA68
Nucleotide
artificial

6_3-8

sequence

adenine

base

editor

sgRNA

sequence

MG34-1

1419
MG34-1_633 target sequence
Nucleotide
artificial

adenine

sequence

base

editor

target

sequence

MG34-1

1420
MG34-1_634 target sequence
Nucleotide
artificial

adenine

sequence

base

editor

target

sequence

MG3-

1421
sgRNA68 target sequence
Nucleotide
artificial

6_3-8

sequence

adenine

base

editor

target

sequence

Plasmid

1422
Expression of MG34-1 adenine base
Nucleotide
artificial

editor, pPE798

sequence

Plasmid

1423
Expression of MG3-6_3-8 adenine
Nucleotide
artificial

base editor, pPE1159

sequence

MG35-1

1424
MG35-1 ABE
Protein
artificial

adenine

sequence

base

editor

Plasmid,

1425
Expression of MG35-1 ABE and
Nucleotide
artificial

MG35-1

sgRNA targeting the CAT gene

sequence

adenine

base

editor

construct

with

sgRNA

and CAT

gene

Plasmid,

1426
Expression of MG35-1 ABE and
Nucleotide
artificial

MG35-1

sgRNA with a scrabled spacer that

sequence

adenine

cannot target the CAT gene

base

editor

construct

with

sgRNA

and CAT

gene

MG35-1

1427
MG35-1 sgRNA with spacer targeting
Nucleotide
artificial

sgRNA

CAT gene

sequence

MG35-1

1428
MG35-1 sgRNA with scrambled
Nucleotide
artificial

sgRNA

version of spacer targeting CAT gene

sequence

MG35-1

1429
MG35-1 CAT gene target sequence
Nucleotide
artificial

target

sequence

sequence

MG35-1

1430
MG35-1 CAT gene scrambled target
Nucleotide
artificial

target

sequence

sequence

sequence

MG3-

1431
MG3-6/3-8 mApoa1 BE F12

N.A.

6/3-8

APOA1

sgRNA

MG3-

1432
MG3-6/3-8 mApoa1 BE D11

N.A.

6/3-8

APOA1

sgRNA

MG3-

1433
MG3-6/3-8 mApoa1 BE C5

N.A.

6/3-8

APOA1

sgRNA

MG3-

1434
MG3-6/3-8 mApoa1 BE A4

N.A.

6/3-8

APOA1

sgRNA

MG3-

1435
MG3-6/3-8 mApoa1 BE F4

N.A.

6/3-8

APOA1

sgRNA

MG3-

1436
MG3-6/3-8 mApoa1 BE A5

N.A.

6/3-8

APOA1

sgRNA

MG3-

1437
MG3-6/3-8 mApoa1 BE E12

N.A.

6/3-8

APOA1

sgRNA

MG3-

1438
MG3-6/3-8 mApoa1 BE A11

N.A.

6/3-8

APOA1

sgRNA

MG3-

1439
MG3-6/3-8 mApoa1 BE B4

N.A.

6/3-8

APOA1

sgRNA

MG3-

1440
MG3-6/3-8 mApoa1 BE G4

N.A.

6/3-8

APOA1

sgRNA

MG3-

1441
MG3-6/3-8 mApoa1 BE B2

N.A.

6/3-8

APOA1

sgRNA

MG3-

1442
MG3-6/3-8 mApoa1 BE D7

N.A.

6/3-8

APOA1

sgRNA

MG3-

1443
MG3-6/3-8 mApoa1 BE B5

N.A.

6/3-8

APOA1

sgRNA

MG3-

1444
MG3-6/3-8 mApoa1 BE G6

N.A.

6/3-8

APOA1

sgRNA

MG3-

1445
MG3-6/3-8 mApoa1 BE A8

N.A.

6/3-8

APOA1

sgRNA

MG3-

1446
MG3-6/3-8 mApoa1 BE F2

N.A.

6/3-8

APOA1

sgRNA

MG3-

1447
MG3-6/3-8 mApoa1 BE E1

N.A.

6/3-8

APOA1

sgRNA

MG3-

1448
MG3-6/3-8 mApoa1 BE B8

N.A.

6/3-8

APOA1

sgRNA

MG3-

1449
MG3-6/3-8 mApoa1 BE H8

N.A.

6/3-8

APOA1

sgRNA

MG3-

1450
MG3-6/3-8 mApoa1 BE H6

N.A.

6/3-8

APOA1

sgRNA

MG3-

1451
MG3-6/3-8 mApoa1 BE F5

N.A.

6/3-8

APOA1

sgRNA

MG3-

1452
MG3-6/3-8 mApoa1 BE H3

N.A.

6/3-8

APOA1

sgRNA

MG3-

1453
MG3-6/3-8 mApoa1 BE H4

N.A.

6/3-8

APOA1

sgRNA

MG3-

1454
MG3-6/3-8 mApoa1 BE E8

N.A.

6/3-8

APOA1

sgRNA

MG3-

1455
MG3-6/3-8 mApoa1 BE F12

N.A.

6/3-8

APOA1

target

sequence

MG3-

1456
MG3-6/3-8 mApoa1 BE D11

N.A.

6/3-8

APOA1

target

sequence

MG3-

1457
MG3-6/3-8 mApoa1 BE C5

N.A.

6/3-8

APOA1

target

sequence

MG3-

1458
MG3-6/3-8 mApoa1 BE A4

N.A.

6/3-8

APOA1

target

sequence

MG3-

1459
MG3-6/3-8 mApoa1 BE F4

N.A.

6/3-8

APOA1

target

sequence

MG3-

1460
MG3-6/3-8 mApoa1 BE A5

N.A.

6/3-8

APOA1

target

sequence

MG3-

1461
MG3-6/3-8 mApoa1 BE E12

N.A.

6/3-8

APOA1

target

sequence

MG3-

1462
MG3-6/3-8 mApoa1 BE A11

N.A.

6/3-8

APOA1

target

sequence

MG3-

1463
MG3-6/3-8 mApoa1 BE B4

N.A.

6/3-8

APOA1

target

sequence

MG3-

1464
MG3-6/3-8 mApoa1 BE G4

N.A.

6/3-8

APOA1

target

sequence

MG3-

1465
MG3-6/3-8 mApoa1 BE B2

N.A.

6/3-8

APOA1

target

sequence

MG3-

1466
MG3-6/3-8 mApoa1 BE D7

N.A.

6/3-8

APOA1

target

sequence

MG3-

1467
MG3-6/3-8 mApoa1 BE B5

N.A.

6/3-8

APOA1

target

sequence

MG3-

1468
MG3-6/3-8 mApoa1 BE G6

N.A.

6/3-8

APOA1

target

sequence

MG3-

1469
MG3-6/3-8 mApoa1 BE A8

N.A.

6/3-8

APOA1

target

sequence

MG3-

1470
MG3-6/3-8 mApoa1 BE F2

N.A.

6/3-8

APOA1

target

sequence

MG3-

1471
MG3-6/3-8 mApoa1 BE E1

N.A.

6/3-8

APOA1

target

sequence

MG3-

1472
MG3-6/3-8 mApoa1 BE B8

N.A.

6/3-8

APOA1

target

sequence

MG3-

1473
MG3-6/3-8 mApoa1 BE H8

N.A.

6/3-8

APOA1

target

sequence

MG3-

1474
MG3-6/3-8 mApoa1 BE H6

N.A.

6/3-8

APOA1

target

sequence

MG3-

1475
MG3-6/3-8 mApoa1 BE F5

N.A.

6/3-8

APOA1

target

sequence

MG3-

1476
MG3-6/3-8 mApoa1 BE H3

N.A.

6/3-8

APOA1

target

sequence

MG3-

1477
MG3-6/3-8 mApoa1 BE H4

N.A.

6/3-8

APOA1

target

sequence

MG3-

1478
MG3-6/3-8 mApoa1 BE E8

N.A.

6/3-8

APOA1

target

sequence

MG3-

1479
MG3-6/3-8 mAngptl3 BE C12

N.A.

6/3-8

ANGPTL3

sgRNA

MG3-

1480
MG3-6/3-8 mAngptl3 BE B2

N.A.

6/3-8

ANGPTL3

sgRNA

MG3-

1481
MG3-6/3-8 mAngptl3 BE C1

N.A.

6/3-8

ANGPTL3

sgRNA

MG3-

1482
MG3-6/3-8 mAngptl3 BE F3

N.A.

6/3-8

ANGPTL3

sgRNA

MG3-

1483
MG3-6/3-8 mAngptl3 BE G1

N.A.

6/3-8

ANGPTL3

sgRNA

MG3-

1484
MG3-6/3-8 mAngptl3 BE C12

N.A.

6/3-8

ANGPTL3

target

sequence

MG3-

1485
MG3-6/3-8 mAngptl3 BE B2

N.A.

6/3-8

ANGPTL3

target

sequence

MG3-

1486
MG3-6/3-8 mAngptl3 BE C1

N.A.

6/3-8

ANGPTL3

target

sequence

MG3-

1487
MG3-6/3-8 mAngptl3 BE F3

N.A.

6/3-8

ANGPTL3

target

sequence

MG3-

1488
MG3-6/3-8 mAngptl3 BE G1

N.A.

6/3-8

ANGPTL3

target

sequence

MG3-

1489
MG3-6/3-8 mTrac BE E1

N.A.

6/3-8

TRAC

sgRNA

MG3-

1490
MG3-6/3-8 mTrac BE D10

N.A.

6/3-8

TRAC

sgRNA

MG3-

1491
MG3-6/3-8 mTrac BE E1

N.A.

6/3-8

TRAC

target

sequence

MG3-

1492
MG3-6/3-8 mTrac BE D10

N.A.

6/3-8

TRAC

target

sequence

NGS

1493
mApoa1 BE F12F

N.A.

primers

for

mApoa1

BE F12

NGS

1494
mApoa1 BE D11F

N.A.

primers

for

mApoa1

BE D11

NGS

1495
mApoa1 BE C5F

N.A.

primers

for

mApoa1

BE C5

NGS

1496
mApoa1 BE A4F

N.A.

primers

for

mApoa1

BE A4

NGS

1497
mApoa1 BE F4F

N.A.

primers

for

mApoa1

BE F4

NGS

1498
mApoa1 BE A5F

N.A.

primers

for

mApoa1

BE A5

NGS

1499
mApoa1 BE E12F

N.A.

primers

for

mApoa1

BE E12

NGS

1500
mApoa1 BE A11F

N.A.

primers

for

mApoa1

BE A11

NGS

1501
mApoa1 BE B4F

N.A.

primers

for

mApoa1

BE B4

NGS

1502
mApoa1 BE G4F

N.A.

primers

for

mApoa1

BE G4

NGS

1503
mApoa1 BE B2F

N.A.

primers

for

mApoa1

BE B2

NGS

1504
mApoa1 BE D7F

N.A.

primers

for

mApoa1

BE D7

NGS

1505
mApoa1 BE B5F

N.A.

primers

for

mApoa1

BE B5

NGS

1506
mApoa1 BE G6F

N.A.

primers

for

mApoa1

BE G6

NGS

1507
mApoa1 BE A8F

N.A.

primers

for

mApoa1

BE A8

NGS

1508
mApoa1 BE F2F

N.A.

primers

for

mApoa1

BE F2

NGS

1509
mApoa1 BE E1F

N.A.

primers

for

mApoa1

BE E1

NGS

1510
mApoa1 BE B8F

N.A.

primers

for

mApoa1

BE B8

NGS

1511
mApoa1 BE H8F

N.A.

primers

for

mApoa1

BE H8

NGS

1512
mApoa1 BE H6F

N.A.

primers

for

mApoa1

BE H6

NGS

1513
mApoa1 BE F5F

N.A.

primers

for

mApoa1

BE F5

NGS

1514
mApoa1 BE H3F

N.A.

primers

for

mApoa1

BE H3

NGS

1515
mApoa1 BE H4F

N.A.

primers

for

mApoa1

BE H4

NGS

1516
mApoa1 BE E8F

N.A.

primers

for

mApoa1

BE E8

NGS

1517
mAngptl3 BE C12F

N.A.

primers

for

mAngptl3

BE C12

NGS

1518
mAngptl3 BE B2F

N.A.

primers

for

mAngptl3

BE B2

NGS

1519
mAngptl3 BE C1F

N.A.

primers

for

mAngptl3

BE C1

NGS

1520
mAngptl3 BE F3F

N.A.

primers

for

mAngptl3

BE F3

NGS

1521
mAngptl3 BE G1F

N.A.

primers

for

mAngptl3

BE G1

NGS

1522
mTrac BE E1F

N.A.

primers

for

mTrac

BE E1

NGS

1523
mTrac BE D10F

N.A.

primers

for

mTrac

BE D10

NGS

1524
mApoa1 BE F12R

N.A.

primers

for

mApoa1

BE F12

NGS

1525
mApoa1 BE D11R

N.A.

primers

for

mApoa1

BE D11

NGS

1526
mApoa1 BE C5R

N.A.

primers

for

mApoa1

BE C5

NGS

1527
mApoa1 BE A4R

N.A.

primers

for

mApoa1

BE A4

NGS

1528
mApoa1 BE F4R

N.A.

primers

for

mApoa1

BE F4

NGS

1529
mApoa1 BE A5R

N.A.

primers

for

mApoa1

BE A5

NGS

1530
mApoa1 BE E12R

N.A.

primers

for

mApoa1

BE E12

NGS

1531
mApoa1 BE A11R

N.A.

primers

for

mApoa1

BE A11

NGS

1532
mApoa1 BE B4R

N.A.

primers

for

mApoa1

BE B4

NGS

1533
mApoa1 BE G4R

N.A.

primers

for

mApoa1

BE G4

NGS

1534
mApoa1 BE B2R

N.A.

primers

for

mApoa1

BE B2

NGS

1535
mApoa1 BE D7R

N.A.

primers

for

mApoa1

BE D7

NGS

1536
mApoa1 BE B5R

N.A.

primers

for

mApoa1

BE B5

NGS

1537
mApoa1 BE G6R

N.A.

primers

for

mApoa1

BE G6

NGS

1538
mApoa1 BE A8R

N.A.

primers

for

mApoa1

BE A8

NGS

1539
mApoa1 BE F2R

N.A.

primers

for

mApoa1

BE F2

NGS

1540
mApoa1 BE E1R

N.A.

primers

for

mApoa1

BE E1

NGS

1541
mApoa1 BE B8R

N.A.

primers

for

mApoa1

BE B8

NGS

1542
mApoa1 BE H8R

N.A.

primers

for

mApoa1

BE H8

NGS

1543
mApoa1 BE H6R

N.A.

primers

for

mApoa1

BE H6

NGS

1544
mApoa1 BE F5R

N.A.

primers

for

mApoa1

BE F5

NGS

1545
mApoa1 BE H3R

N.A.

primers

for

mApoa1

BE H3

NGS

1546
mApoa1 BE H4R

N.A.

primers

for

mApoa1

BE H4

NGS

1547
mApoa1 BE E8R

N.A.

primers

for

mApoa1

BE E8

NGS

1548
mAngptl3 BE C12R

N.A.

primers

for

mAngptl

3 BE

C12

NGS

1549
mAngptl3 BE B2R

N.A.

primers

for

mAngptl

3 BE B2

NGS

1550
mAngptl3 BE C1R

N.A.

primers

for

mAngptl

3 BE C1

NGS

1551
mAngptl3 BE F3R

N.A.

primers

for

mAngptl

3 BE F3

NGS

1552
mAngptl3 BE G1R

N.A.

primers

for

mAngptl

3 BE G1

NGS

1553
mTrac BE E1R

N.A.

primers

for

mTrac

BE E1

NGS

1554
mTrac BE D10R

N.A.

primers

for

mTrac

BE D10

Plasmid

1555
mRNA production
nucleotide
artificial

sequence

MG131

1556
mutated adenine deaminase
protein
uncultivated
MG131-1v1

adenine

organism

deaminase

variant

MG131

1557
mutated adenine deaminase
protein
uncultivated
MG131-2v2

adenine

organism

deaminase

variant

MG131

1558
mutated adenine deaminase
protein
uncultivated
MG131-5v3

adenine

organism

deaminase

variant

MG131

1559
mutated adenine deaminase
protein
uncultivated
MG131-6v4

adenine

organism

deaminase

variant

MG131

1560
mutated adenine deaminase
protein
uncultivated
MG131-9v5

adenine

organism

deaminase

variant

MG131

1561
mutated adenine deaminase
protein
uncultivated
MG131-7v6

adenine

organism

deaminase

variant

MG131

1562
mutated adenine deaminase
protein
uncultivated
MG131-3v7

adenine

organism

deaminase

variant

MG134

1563
mutated adenine deaminase
protein
uncultivated
MG134-1v1

adenine

organism

deaminase

variant

MG134

1564
mutated adenine deaminase
protein
uncultivated
MG134-2v2

adenine

organism

deaminase

variant

MG134

1565
mutated adenine deaminase
protein
uncultivated
MG134-3v3

adenine

organism

deaminase

variant

MG134

1566
mutated adenine deaminase
protein
uncultivated
MG134-4v4

adenine

organism

deaminase

variant

MG135

1567
mutated adenine deaminase
protein
uncultivated
MG135-1v1

adenine

organism

deaminase

variant

MG135

1568
mutated adenine deaminase
protein
uncultivated
MG135v-2v2

adenine

organism

deaminase

variant

MG135

1569
mutated adenine deaminase
protein
uncultivated
MG135-4v3

adenine

organism

deaminase

variant

MG135

1570
mutated adenine deaminase
protein
uncultivated
MG135-5v4

adenine

organism

deaminase

variant

MG135

1571
mutated adenine deaminase
protein
uncultivated
MG135-6v5

adenine

organism

deaminase

variant

MG135

1572
mutated adenine deaminase
protein
uncultivated
MG135-8v6

adenine

organism

deaminase

variant

MG135

1573
mutated adenine deaminase
protein
uncultivated
MG135-7v7

adenine

organism

deaminase

variant

MG135

1574
mutated adenine deaminase
protein
uncultivated
MG135-3v8

adenine

organism

deaminase

variant

MG137

1575
mutated adenine deaminase
protein
uncultivated
MG137-1v1

adenine

organism

deaminase

variant

MG137

1576
mutated adenine deaminase
protein
uncultivated
MG137-2v2

adenine

organism

deaminase

variant

MG137

1577
mutated adenine deaminase
protein
uncultivated
MG137-4v3

adenine

organism

deaminase

variant

MG137

1578
mutated adenine deaminase
protein
uncultivated
MG137-6v4

adenine

organism

deaminase

variant

MG137

1579
mutated adenine deaminase
protein
uncultivated
MG137-17v5

adenine

organism

deaminase

variant

MG137

1580
mutated adenine deaminase
protein
uncultivated
MG137-9v6

adenine

organism

deaminase

variant

MG137

1581
mutated adenine deaminase
protein
uncultivated
MG137-11v7

adenine

organism

deaminase

variant

MG137

1582
mutated adenine deaminase
protein
uncultivated
MG137-12v8

adenine

organism

deaminase

variant

MG137

1583
mutated adenine deaminase
protein
uncultivated
MG137-13v9

adenine

organism

deaminase

variant

MG137

1584
mutated adenine deaminase
protein
uncultivated
MG137-15v10

adenine

organism

deaminase

variant

MG137

1585
mutated adenine deaminase
protein
uncultivated
MG137-5v11

adenine

organism

deaminase

variant

MG137

1586
mutated adenine deaminase
protein
uncultivated
MG137-14v12

adenine

organism

deaminase

variant

MG137

1587
mutated adenine deaminase
protein
uncultivated
MG137-16v13

adenine

organism

deaminase

variant

MG137

1588
mutated adenine deaminase
protein
uncultivated
MG137-8v14

adenine

organism

deaminase

variant

MG137

1589
mutated adenine deaminase
protein
uncultivated
MG137-3v15

adenine

organism

deaminase

variant

MG68

1590
mutated adenine deaminase
protein
uncultivated
MG68-55v1

adenine

organism

deaminase

variant

MG68

1591
mutated adenine deaminase
protein
uncultivated
MG68-27v2

adenine

organism

deaminase

variant

MG68

1592
mutated adenine deaminase
protein
uncultivated
MG68-52v3

adenine

organism

deaminase

variant

MG68

1593
mutated adenine deaminase
protein
uncultivated
MG68-15v4

adenine

organism

deaminase

variant

MG68

1594
mutated adenine deaminase
protein
uncultivated
MG68-58v5

adenine

organism

deaminase

variant

MG68

1595
mutated adenine deaminase
protein
uncultivated
MG68-25v6

adenine

organism

deaminase

variant

MG68

1596
mutated adenine deaminase
protein
uncultivated
MG68-18v7

adenine

organism

deaminase

variant

MG68

1597
mutated adenine deaminase
protein
uncultivated
MG68-45v8

adenine

organism

deaminase

variant

MG68

1598
mutated adenine deaminase
protein
uncultivated
MG68-13v9

adenine

organism

deaminase

variant

MG68

1599
mutated adenine deaminase
protein
uncultivated
MG68-4v10

adenine

organism

deaminase

variant

MG132

1600
mutated adenine deaminase
protein
uncultivated
MG132-1v1

adenine

organism

deaminase

variant

MG132

1601
mutated adenine deaminase
protein
uncultivated
MG132-1v2

adenine

organism

deaminase

variant

MG132

1602
mutated adenine deaminase
protein
uncultivated
MG132-1v3

adenine

organism

deaminase

variant

MG133

1603
mutated adenine deaminase
protein
uncultivated
MG133-1v1

adenine

organism

deaminase

variant

MG133

1604
mutated adenine deaminase
protein
uncultivated
MG133-2v2

adenine

organism

deaminase

variant

MG133

1605
mutated adenine deaminase
protein
uncultivated
MG133-7v3

adenine

organism

deaminase

variant

MG133

1606
mutated adenine deaminase
protein
uncultivated
MG133-4v4

adenine

organism

deaminase

variant

MG133

1607
mutated adenine deaminase
protein
uncultivated
MG133-12v5

adenine

organism

deaminase

variant

MG133

1608
mutated adenine deaminase
protein
uncultivated
MG133-5v6

adenine

organism

deaminase

variant

MG133

1609
mutated adenine deaminase
protein
uncultivated
MG133-9v7

adenine

organism

deaminase

variant

MG133

1610
mutated adenine deaminase
protein
uncultivated
MG133-14v8

adenine

organism

deaminase

variant

MG133

1611
mutated adenine deaminase
protein
uncultivated
MG133-8v9

adenine

organism

deaminase

variant

MG133

1612
mutated adenine deaminase
protein
uncultivated
MG133-10v10

adenine

organism

deaminase

variant

MG133

1613
mutated adenine deaminase
protein
uncultivated
MG133-13v11

adenine

organism

deaminase

variant

MG133

1614
mutated adenine deaminase
protein
uncultivated
MG133-3v12

adenine

organism

deaminase

variant

MG133

1615
mutated adenine deaminase
protein
uncultivated
MG133-6v13

adenine

organism

deaminase

variant

MG133

1616
mutated adenine deaminase
protein
uncultivated
MG133-11v14

adenine

organism

deaminase

variant

MG136

1617
mutated adenine deaminase
protein
uncultivated
MG136-1v1

adenine

organism

deaminase

variant

MG136

1618
mutated adenine deaminase
protein
uncultivated
MG136-6v2

adenine

organism

deaminase

variant

MG136

1619
mutated adenine deaminase
protein
uncultivated
MG136-12v3

adenine

organism

deaminase

variant

MG136

1620
mutated adenine deaminase
protein
uncultivated
MG136-2v4

adenine

organism

deaminase

variant

MG136

1621
mutated adenine deaminase
protein
uncultivated
MG136-3v5

adenine

organism

deaminase

variant

MG136

1622
mutated adenine deaminase
protein
uncultivated
MG136-9v6

adenine

organism

deaminase

variant

MG136

1623
mutated adenine deaminase
protein
uncultivated
MG136-10v7

adenine

organism

deaminase

variant

MG136

1624
mutated adenine deaminase
protein
uncultivated
MG136-11v8

adenine

organism

deaminase

variant

MG129

1625
mutated adenine deaminase
protein
uncultivated
MG129-1v1

adenine

organism

deaminase

variant

MG129

1626
mutated adenine deaminase
protein
uncultivated
MG129-2v2

adenine

organism

deaminase

variant

MG129

1627
mutated adenine deaminase
protein
uncultivated
MG129-11v3

adenine

organism

deaminase

variant

MG129

1628
mutated adenine deaminase
protein
uncultivated
MG129-3v4

adenine

organism

deaminase

variant

MG129

1629
mutated adenine deaminase
protein
uncultivated
MG129-7v5

adenine

organism

deaminase

variant

MG129

1630
mutated adenine deaminase
protein
uncultivated
MG129-4v6

adenine

organism

deaminase

variant

MG129

1631
mutated adenine deaminase
protein
uncultivated
MG129-9v7

adenine

organism

deaminase

variant

MG129

1632
mutated adenine deaminase
protein
uncultivated
MG129-10v8

adenine

organism

deaminase

variant

MG129

1633
mutated adenine deaminase
protein
uncultivated
MG129-12v9

adenine

organism

deaminase

variant

MG130

1634
mutated adenine deaminase
protein
uncultivated
MG130-3v1

adenine

organism

deaminase

variant

MG130

1635
mutated adenine deaminase
protein
uncultivated
MG130-1v2

adenine

organism

deaminase

variant

MG130

1636
mutated adenine deaminase
protein
uncultivated
MG130-5v3

adenine

organism

deaminase

variant

MG130

1637
mutated adenine deaminase
protein
uncultivated
MG130-2v4

adenine

organism

deaminase

variant

MG130

1638
mutated adenine deaminase
protein
uncultivated
MG130-4v5

adenine

organism

deaminase

variant

MG34-1

1639
MG68-4_nMG34-1 (D10A)
Protein
artificial

adenine

sequence

base

editor

MG34-1

1640
MG68-4 (D109Q)_nMG34-1 (D10A)
Protein
artificial

adenine

sequence

base

editor

MG34-1

1641
MG68-4 (D109N/H129N)_nMG34-1
Protein
artificial

adenine

(D10A)

sequence

base

editor

MG34-1

1642
MG68-4 (D109Q/H129N)_nMG34-1
Protein
artificial

adenine

(D10A)

sequence

base

editor

MG34-1

1643
MG68-4
Protein
artificial

adenine

(D7G/E10G/D109N)_nMG34-1

sequence

base

(D10A)

editor

MG34-1

1644
MG68-4
Protein
artificial

adenine

(D7G/E10G/D109Q)_nMG34-1

sequence

base

(D10A)

editor

RF253

1645
ssDNA substrate for testing ADA in
DNA
artificial

vitro

sequence

RF278

1646
ssDNA substrate for testing ADA in
DNA
artificial

vitro

sequence

MG

1647
MG3-6/3-8 effector
protein
unknown
MSTDMKNYRIG

effectors

VDVGDRSVGL

AAIEFDDDGLPI

QKLALVTFRHD

GGLDPTKNKTP

MSRKETRGIAR

RTMRMNRERK

RRLRNLDNVLE

NLGYSVPEGPE

PETYEAWTSRA

LLASIKLASADE

LNEHLVRAVRH

MARHRGWANP

WWSLDQLEKA

SQEPSETFEIILA

RARELFGEKVP

ANPTLGMLGAL

AANNEVLLRPR

DEKKRKTGYV

RGTPLMFAQVR

QGDQLAELRRI

CEVQGIEDQYE

ALRLGVFDHKH

PYVPKERVGKD

PLNPSTNRTIRA

SLEFQEFRILDS

VANLRVRIGSR

AKRELTEAEYD

AAVEFLMDYA

DKEQPSWADV

AEKIGVPGNRL

VAPVLEDVQQK

TAPYDRSSAAF

EKAMGKKTEA

RQWWESTDDD

QLRSLLIAFLVD

ATNDTEEAAAE

AGLSELYKSWP

AEEREALSNIDF

EKGRVAYSQET

LSKLSEYMHEY

RVGLHEARKA

VFGVDDTWRPP

LDKLEEPTGQP

AVDRVLTILRR

FVLDCERQWG

RPRAITVEHTRT

GLMGPTQRQKI

LNEQKKNRAD

NERIRDELRESG

VDNPSRAEVRR

HLIVQEQECQC

LYCGTMITTTTS

ELDHIVPRAGG

GSSRRENLAAV

CRACNAKKKR

ELFYAWAGPV

KSQETIERVRQL

KAFKDSKKAK

MFKNQIRRLNQ

TEADEPIDERSL

ASTSYAAVAVR

ERLEQHFNEGL

ALDDKSRVVLD

VYAGAVTRESR

RAGGIDERILLR

GERDKNRFDVR

HHAVDAAVMT

LLNRSVALTLE

QRSQLRRAFYE

QGLDKLDRDQL

KPEEDWRNFIG

LSLASQEKFLE

WKKVTTVLGD

LLAEAIEDDSIA

VVSPLRLRPQN

GRVHKDTIAAV

KKQTLGSAWS

ADAVKRIVDPEI

YLAMKDALGK

SKVLPEDSART

LELSDGRYLEA

DDEVLFFPKNA

ASILTPRGVAEI

GGSIHHARLYS

WLTKKGELKIG

MLRVYGAEFP

WLMRESGSHD

VLRMPIHPGSQ

SFRDMQDTTRK

AVESSEAVEFA

WITQNDELEFE

PEDYIAHGGKD

ELRQFLEFMPE

CRWRVDGFKK

NYQIRIRPAMLS

REQLPSDIQRRL

ESKTLTENESLL

LKALDTGLVVA

IGGLLPLGTLKV

IRRNNLGFPRW

RGNGNLPTSFE

VRSSALRALGV

EG

MG

1648
MG3-6/3-8 effector sgRNA
RNA
synthetic
NNNNNNNNNN

effectors

NNNNNNNNNN

sgRNA

NNGTTGAGAA

TCGAAAGATTC

TTAATAAGGCA

TCCTTCCGATG

CTGACTTCTCA

CCGTCCGTTTT

CCAATAGGAG

CGGGCGGTAT

GTTTT

EXAMPLES
Example 1—Plasmid Construction for Base Editors

To create base editing enzymes that utilize CRISPR functionality to target their base editing, effector enzymes were fused in various configurations to the examplary deaminases described herein. This process involved a first stage of constructing vectors suitable for generating the fusion enzymes. Two entry plasmid vectors, MGA, and MGC, were first constructed.

To construct the MGA (Metagenomi adenine base editor) entry plasmid containing T7 promoter-His tag-TadA*(ABE8.17m)-SV40 NLS, three DNA fragments were amplified from pAL6. To construct the MGC (Metagenomi cytosine base editor) entry plasmid containing T7 promoter-His tag-APOBEC1(BE3)-UGI-SV40 NLS, APOBEC1 and UGI-SV40 NLS were amplified from pAL9 and two pieces of vector backbones were amplified from pAL6 (see FIG. 3).

To introduce mutations into the effectors, source plasmids containing MG1-4, MG1-6, MG3-6, MG3-7, MG3-8, MG4-5, MG14-1, MG15-1, or MG18-1 effector gene sequences were amplified by Q5 DNA polymerase with forward primers incorporating appropriate mutations and reverse primers. The linear DNA fragments were then phosphorylated and ligated. The DNA templates were digested with DpnI using KLD Enzyme Mix (New England Biolabs) per the manufacturer's instructions.

To generate the pMGA and pMGC expression plasmids, genes were amplified from plasmids carrying mutated effectors and cloned into MGA and MGC entry plasmids via XhoI and SacII sites, respectively. To clone sgRNA expression cassettes comprising T7 promoter-sgRNA-bidirectional terminator into BE expression plasmids, one set of primers (P366 as the forward primer) was used to amplify a T7 promoter-spacer sequence while another set of primers (P367 as the reverse primer) was used to amplify spacer sequence-sgRNA scaffold-bidirectional terminator, in which pTCM plasmids were used as templates (see FIG. 2). The two fragments were assembled into pMGA and pMGC via XbaI sites, resulting pMGA-sgRNA and pMGC-sgRNA, respectively.

TABLE 3

Summary of constructs made for ABE

screening systems described herein

#
Application
Candidate

1
ABE
MGA1-4-sgRNA1

2

MGA1-4-sgRNA2

3

MGA1-4-sgRNA3

4

MGA1-6-sgRNA1

5

MGA1-6-sgRNA2

6

MGA1-6-sgRNA3

7

MGA3-6-sgRNA1

8

MGA3-6-sgRNA2

9

MGA3-6-sgRNA3

10

MGA3-7-sgRNA1

11

MGA3-7-sgRNA2

12

MGA3-7-sgRNA3

13

MGA3-8-sgRNA1

14

MGA3-8-sgRNA2

15

MGA3-8-sgRNA3

16

MGA14-1-sgRNA1

17

MGA14-1-sgRNA2

18

MGA14-1-sgRNA3

19

MGA15-1-sgRNA1

20

MGA15-1-sgRNA2

21

MGA15-1-sgRNA3

22

MGA18-1-sgRNA1

23

MGA18-1-sgRNA2

24

MGA18-1-sgRNA3

25

ABE8.17m-sgRNA1

26

ABE8.17m-sgRNA2

27

ABE8.17m-sgRNA3

28
CBE
MGC1-4-sgRNA1

29

MGC1-4-sgRNA2

30

MGC1-4-sgRNA3

31

MGC1-6-sgRNA1

32

MGC1-6-sgRNA2

33

MGC1-6-sgRNA3

34

MGC3-6-sgRNA1

35

MGC3-6-sgRNA2

36

MGC3-6-sgRNA3

37

MGC3-7-sgRNA1

38

MGC3-7-sgRNA2

39

MGC3-7-sgRNA3

40

MGC3-8-sgRNA1

41

MGC3-8-sgRNA2

42

MGC3-8-sgRNA3

43

MGC4-5-sgRNA1

44

MGC4-5-sgRNA2

45

MGC4-5-sgRNA3

46

MGC14-1-sgRNA1

47

MGC14-1-sgRNA2

48

MGC14-1-sgRNA3

49

MGC15-1-sgRNA1

50

MGC15-1-sgRNA2

51

MGC15-1-sgRNA3

52

MGC18-1-sgRNA1

53

MGC18-1-sgRNA2

54

MGC18-1-sgRNA3

55

BE3-sgRNA1

56

BE3-sgRNA2

57

BE3-sgRNA3

All amplified DNA fragments were purified by QIAquick Gel Extraction Kit (Qiagen), assembled via NEBuilder HiFi DNA Assembly (New England Biolabs), and the resulting assemblies were propagated via Endura Electrocompetent cells (Lucergen) per the manufacturer's instructions (see FIGS. 4 & 5). The DNA sequences of all cloned genes were confirmed at ELIM BIOPHARM.

TABLE 4

Conserved catalytic residues parsed out

for selected systems described herein

Nickase

Associated Full-length

Candidate
Length
Protein Sequence

nMG1-4 (D9A)
1025
SEQ ID NO: 70

nMG1-6 (D13A)
1059
SEQ ID NO: 71

nMG3-6 (D13A)
1134
SEQ ID NO: 72

nMG3-7 (D12A)
1131
SEQ ID NO: 73

nMG3-8 (D13A)
1132
SEQ ID NO: 74

nMG4-5 (D17A)
1055
SEQ ID NO: 75

nMG14-1 (D23A)
1003
SEQ ID NO: 76

nMG15-1 (D8A)
1082
SEQ ID NO: 77

nMG18-1 (D12A)
1348
SEQ ID NO: 78

Example 2—Protein Expression and Purification

The T7 promoter driven mutated effector genes in the pMGA and pMGC plasmids were expressed in E. coli BL21 (DE3) cells in Magic Media per manufacturer's instructions (Thermo) by transformation with each of the respective plasmids described in Example 1 above. After a 40 hour incubation at 16° C. the transformed cells were harvested, suspended in lysis buffer (HisTrap equilibration buffer: 20 mM Tris (Sigma T2319-100_ML), 300 mM sodium chloride (VWR VWRVE529-500_ML), 5% glycerol, 10 mM MgCl₂, with 10 mM imidazole (Sigma 68268-100 ML-F); pH 7.5) and EDTA-free protease inhibitor (Pierce), and frozen in the −80° C. freezer. The cells were then thawed on ice, sonicated, clarified, and filtered before affinity purification. The protein was applied to Cytiva 5 ml HisTrap FF column on the Akta Avant FPLC per the manufacturer's specifications and the protein was eluted in an isocratic elution of 20 mM Tris (Sigma T2319-100_ML), 300 mM sodium chloride (VWR VWRVE529-500_ML), 5% glycerol, 10 mM MgCl₂, with 250 mM imidazole (Sigma 68268-100_ML-F); pH 7.5. Eluted fractions containing the His-tagged effector proteins were concentrated and buffer exchanged into 50 mM Tris-HCl, 300 mM NaCl, 1 mM TCEP, 5% glycerol; pH 7.5. The protein concentration was determined by bicinchoninic acid assay (Thermo) and adjusted after determining the relative purity by SDS PAGE densitometry in Image Lab (Bio-Rad) (see FIG. 7).

Example 3—In Vitro Nickase Assay

6-carboxyfluorescein (6-FAM) labeled primers P141 and P146 (SEQ ID NOs: 179 and 180) synthesized by IDT were used to amplify linear fragments of LacZ containing targeting sequences of effectors using Q5 DNA polymerase. DNA fragments containing the T7 promoter followed by sgRNAs containing 20-bp or 22-bp spacer sequences were transcribed in vitro using HiScribe T7 High Yield RNA Synthesis Kit (New England Biolabs) per manufacturer's instructions. Synthetic sgRNAs with the sequences corresponding to the named sgRNAs in the sequence listing were purified by Monarch RNA Cleanup Kit (New England Biolabs) according to the users manual and concentrations were measured by Nanodrop.

To determine DNA nickase activity, each of the purified mutated effectors was first supplemented with its cognate sgRNA. Reactions were initiated by adding the linear DNA substrate in a 15 μL reaction mixture containing 10 mM Tris pH 7.5, 10 mM MgCl₂, and 100 mM NaCl, 150 nM enzyme, 150 nM RNA, and 15 nM DNA. The reaction was incubated at 37° C. for 2h. Digested DNA was purified using AMPure XP SPRI paramagnetic beads (Beckman Coulter) and eluted with 6 μL TE buffer (10 mM Tris, 1 mM EDTA; pH 8.0). The nicked DNA was resolved on a 10% TBE-Urea denaturing gel (Biorad) and imaged by ChemiDoc (Bio-Rad) (see FIG. 7, which shows that the depicted enzymes display nickase activity by production of bands 600 and 200 bases versus 400 and 200 bases in the case of the wild-type enzyme). The results indicated that all the tested nickase mutants in FIG. 7 displayed their expected nickase activity instead of wild type cleavage activity with the exception of MG4-5 (D17A), which was inconclusive.

Example 4—Base Editor Introduction into E. coli

Plasmids were transformed into Lucergen's electrocompetent BL21(DE3) cells according to the manufacturer's instructions. After electroporation, cells were recovered with expression recovery media at 37° C. for 1 h and spread on LB plates containing 100 L/mg ampicillin and 0.1 mM IPTG. After overnight growth at 37° C., colonies were picked and lacZ gene was amplified by Q5 DNA polymerase (New England Biolabs) with primers P137 and P360. The resulting PCR products were purified and sequenced by Sanger sequencing at ELIM BIOPHARM. Base edits were determined by examining whether there exists C to T conversion or A to G conversion in the targeted protospacer regions for cytosine base editors or adenine base editors, respectively.

To evaluate editing efficiency in E. coli, plasmids were transformed into electrocompetent BL21(DE3) (Lucergen) and the electroporated cells were recovered with expression recovery media at 37° C. for 1 h. 10 μL of recovered cells were then inoculated into 990 μL SOB containing 100 μL/mg ampicillin and 0.1 mM IPTG in a 96-well deep well plate, and grown at 37° C. for 20h. 1 μL cells induced for base editor expression were used for amplification of the lacZ gene in a 20 μL PCR reaction (Q5 DNA polymerase) with primers P137 and P360. The resulting PCR products were purified and sequenced by Sanger sequencing at ELIM BIOPHARM. Quantification of editing efficiency was processed by Edit R as described in Example 12.

TABLE 5

The MG base editors described herein with associated PAM and deaminases

Linker

Linker

(Deaminase-

(Nickase-

Candidate
Type
PAM
Deaminase
Nickase)
Nickase
UGI
UGI)

MGA1-4
II
nRRR
TadA*
SGGSSGGSSGSE
nMG1-4 (D9A)

Sequence
(ABE8.17m)
TPGTSESATPESS
SEQ ID NO: 70

Number: A360
SEQ ID NO: 595
GGSSGGS

MGA3-7
II
nnRnYAY
TadA*
SGGSSGGSSGSE
nMG3-7 (D12A)

Sequence
(ABE8.17m)
TPGTSESATPESS
SEQ ID NO: 73

Number: A363
SEQ ID NO: 595
GGSSGGS

MGA18-1
II
nRWART
TadA*
SGGSSGGSSGSE
nMG18-1

Sequence
(ABE8.17m)
TPGTSESATPESS
(D12A)

Number: A368
SEQ ID NO: 595
GGSSGGS
SEQ ID NO: 78

MGC1-6
II
nnRRAY
APOBEC1 (BE3)
SGSETPGTSESAT
nMG1-6 (D13A)
UGI (BE3)
GSGGS

Sequence
SEQ ID NO: 58
PESA
SEQ ID NO: 71
SEQ ID

Number: A361

NO: 67

MGC3-7
II
nnRnYAY
APOBEC1 (BE3)
SGSETPGTSESAT
nMG3-7 (D12A)
UGI (BE3)
GSGGS

Sequence
SEQ ID NO: 58
PESA
SEQ ID NO: 73
SEQ ID

Number: A363

NO: 67

MGC4-5
II
nRCCV
APOBEC1 (BE3)
SGSETPGTSESAT
nMG4-5 (D17A)
UGI (BE3)
GSGGS

Sequence
SEQ ID NO: 58
PESA
SEQ ID NO: 74
SEQ ID

Number: A365

NO: 67

MGC14-1
II
nRnnGRKA
APOBEC1 (BE3)
SGSETPGTSESAT
nMG14-1
UGI (BE3)
GSGGS

Sequence
SEQ ID NO: 58
PESA
(D23A)
SEQ ID

Number: A366

SEQ ID NO: 76
NO: 67

MGC15-1
II
nnnnC
APOBEC1 (BE3)
SGSETPGTSESAT
nMG15-1 (D8A)
UGI (BE3)
GSGGS

Sequence
SEQ ID NO: 58
PESA
SEQ ID NO: 77
SEQ ID

Number: A367

NO: 67

MGC18-1
II
nRWART
APOBEC1 (BE3)
SGSETPGTSESAT
nMG18-1
UGI (BE3)
GSGGS

Sequence
SEQ ID NO: 58
PESA
(D12A)

Number: A368

SEQ ID NO: 78

Example 5—Protein Nucleofection and Amplicon Seq in Mammalian Cells (Prophetic)

Nucleofection is conducted in mammalian cells (e.g. K-562, Neuro-2A or RAW264.7) according to the manufacturer's recommendations using a Lonza 4D nucleofector and the Lonza SF Cell Line 4D-Nucleofector X Kit S (cat. no. V4XC-2032). After formulating the SF nucleofection buffer, 200,000 cells are resuspended in 5 μl of buffer per nucleofection. In the remaining 15 μl of buffer per nucleofection, 20 pmol of chemically modified sgRNA from Synthego is combined with 18 pmol of base editor enzymes (e.g. ABE8e) and incubated for 5 min at room temperature to complex. Cells are added to the 20 μl nucleofection cuvettes, followed by protein solution, and the mixture is triturated to mix. Cells are nucleofected with program CM-130, immediately after which 80 μl of warmed media is added to each well for recovery. After 5 min, 25 μl from each sample is added to 250 μl of fresh media in a 48-well poly-d-lysine plate (Corning). Cells are then treated in the same way as lipofected cells above for genomic DNA extraction after three more days of culture.

Following Illumina barcoding, PCR products are pooled and purified by electrophoresis with a 2% agarose gel using a Monarch DNA Gel Extraction Kit (New England Biolabs), eluting with 30 μl H2O. DNA concentration is quantified with a Qubit dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific) and sequenced on an Illumina MiSeq instrument (paired-end read, R1: 250-280 cycles, R2: 0 cycles) according to the manufacturer's protocols.

Sequencing reads are demultiplexed using the MiSeq Reporter (Illumina) and FASTQ files are analyzed using CRISPResso2. Dual editing in individual alleles is analyzed by a Python script. Base editing values are representative of n=3 independent biological replicates collected by different researchers, with the mean±s.d. shown. Base editing values are reported as a percentage of the number of reads with adenine mutagenesis over the total aligned reads.

Example 6—Plasmid Nucleofection and Whole Genome Seq in Mammalian Cells (Prophetic)

All plasmids are assembled by the uracil-specific excision reagent (USER) cloning method. Guide RNA plasmids for SpCas9, SaCas9 and all engineered variants are assembled. Plasmids for mammalian cell transfections are prepared using the ZymoPURE Plasmid Midiprep kit (Zymo Research Corporation). HEK293T cells (ATCC CRL-3216) are cultured in Dulbecco's modified Eagle's medium (Corning) supplemented with 10% fetal bovine serum (ThermoFisher Scientific) and maintained at 37° C. with 5% CO2.

HEK293T cells are seeded on 48-well poly-d-lysine plates (Corning) in the same culture medium. Cells are transfected 12-16 h after plating with 1.5 μl Lipofectamine 2000 (ThermoFisher Scientific) using 750 ng base editor plasmid, 250 ng guide RNA plasmid and 10 ng green fluorescent protein as a transfection control. Cells are cultured for 3 d with media exchanged following the first day, then washed with Å˜1 PBS (ThermoFisher Scientific), followed by genomic DNA extraction by addition of 100 μl freshly prepared lysis buffer (10 mM Tris-HCl, pH 7.5, 0.05% SDS, 25 μg ml−1 proteinase K (ThermoFisher Scientific)) directly into each transfected well. The mixture is incubated at 37° C. for 1 h then heat inactivated at 80° C. for 30 min. Genomic DNA lysate is subsequently used immediately for high-throughput sequencing (HTS).

HTS of genomic DNA from HEK293T cells is performed. Following Illumina barcoding, PCR products are pooled and purified by electrophoresis with a 2% agarose gel using a Monarch DNA Gel Extraction Kit (NEB), eluting with 30 μl H2O. DNA concentration is quantified with Qubit dsDNA High Sensitivity Assay Kit (ThermoFisher Scientific) and sequenced on an Illumina MiSeq instrument (paired end read, R1: 250-280 cycles, R2: 0 cycles) according to the manufacturer's protocols.

Example 7—Determining Editing Window (Prophetic)

To examine the editing window regions, the cytosine showing the highest C-T conversion frequency in a specified sgRNA is normalized to 1, and other cytosines at positions spanning from 30 nt upstream to 10 nt downstream of the PAM sequence (total 43 bp) of the same sgRNA are normalized subsequently. Then normalized C-T conversion frequencies are classified and compared according to their positions for all tested sgRNAs of a specified base editor. A comprehensive editing window (CEW) is defined to span positions with an average C-T conversion efficiency exceeding 0.6 after normalization.

To examine the substrate preference for each cytidine deaminase, C sites are initially classified according to their positions in sgRNA targeting regions and those positions containing at least one C site with ≥0.8 normalized C-T conversion frequency are included in subsequent analysis. Selected C sites are then compared depending on base types upstream or downstream of the edited cytosine (NC or CN). For cytidine deaminases showing efficient C-T conversion at both N-terminus and C-terminus of the endonuclease, the substrate preference is evaluated by integrating respective NT- and CT-CBEs together. For statistical analysis, one-way ANOVA is used and p<0.05 is considered as significant

Example 8a—Testing Off-Target Analysis with Whole Genome Sequencing and Transcriptomics in Mammalian Cells (Prophetic)

HEK293T cells are plated on 48-well poly-d-lysine-coated plates 16 to 20 h before lipofection at a density of 3.104 cells per well in DMEM+GlutaMAX medium (Thermo Fisher Scientific) without antibiotics. 750 ng nickase or base editor expression plasmid DNA is combined with 250 ng of sgRNA expression plasmid DNA in 15 μl Opti-MEM+GlutaMAX. This is combined with 10 μl of lipid mixture, comprising 1.5 μl Lipofectamine 2000 and 8.5 μl Opti-MEM+GlutaMAX per well. Cells are harvested 3 d after transfection and either DNA or RNA was harvested. For DNA analysis, cells are washed once in PBS, and then lysed in 100 μl QuickExtract Buffer (Lucigen) according to the manufacturer's instructions. For RNA harvest, the MagMAX mirVana Total RNA Isolation Kit (Thermo Fisher Scientific) is used with the KingFisher Flex.

Genomic DNA from mammalian cells is fragmented and adapter-ligated using the Nextera DNA Flex Library Prep Kit (Illumina) using 96-well plate Nextera indexing primers (Illumina), according to the manufacturer's instructions. Library size and concentration is confirmed by Fragment Analyzer (Agilent) and DNA is sent to Novogene for WGS using an Illumina HiSeq system.

All targeted NGS data is analyzed by performing four general operations: (1) alignment; (2) duplicate marking; (3) variant calling; and (4) background filtration of variants to remove artifacts and germline mutations. The mutation reference and alternate alleles are reported relative to the plus strand of the reference genome.

For whole Transcriptome sequencing, mRNA selection is performed using the NEBNext Poly(A) mRNA Magnetic Isolation Module (New England BioLabs). RNA library preparation is performed using NEBNext Ultra II RNA Library Prep Kit for Illumina (New England BioLabs). Based on the RNA input amount, a cycle number of 12 is used for the PCR enrichment of adapter-ligated DNA. NEBNext Sample Purification Beads (New England BioLabs) is used throughout for all of the size selection performed by this method. NEBNext Multiplex Oligos for Illumina (New England BioLabs) is used for the multiplex indexes in accordance with the PCR recipe outlined in the protocol. Before sequencing, samples are quality checked using the High Sensitivity D1000 ScreenTape on the 4200 TapeStation System (Agilent). The libraries are pooled and sequenced using a NovaSeq (Novogene). Targeted RNA sequencing is then performed. Complementary DNA is generated by PCR with reverse transcription (RT-PCR) from the isolated RNA using the SuperScript IV One-Step RT-PCR System with EZDnase (Thermo Fisher Scientific) according to the manufacturer's instructions.

The following program is used: 58° C. for 12 min; 98° C. for 2 min; followed by PCR cycles that varied by amplicon: for CTNNB1 and IP90; 32 cycles of (98° C. for 10 s; 60° C. for 10 sec; 72° C. for 30 sec). Following the combined RT-PCR, amplicons are barcoded and sequenced using an Illumina MiSeq sequencer as described above. The first 125 nucleotides in each amplicon, beginning at the first base after the end of the forward primer in each amplicon, are aligned to a reference sequence and used for analysis of maximum A-to-I frequencies in each amplicon. Off-target DNA sequencing is performed using primers, using a two-stage PCR and barcoding method to prepare samples for sequencing using Illumina MiSeq sequencers as above.

Example 8b—Analysis of Off-Target Edits by Whole Genome Sequencing and Transcriptomics (Prophetic)

Transfected cells prepared as in Example 8a are harvested after 3 days and the genomic DNA isolated using the Agencourt DNAdvance Genomic DNA Isolation Kit (Beckman Coulter) according to the manufacturer's instructions. On-target and off-target genomic regions of interest are amplified by PCR with flanking HTS primer pairs. PCR amplification is carried out with Phusion high-fidelity DNA polymerase (ThermoFisher) according to the manufacturer's instructions using 5 ng of genomic DNA as a template. Cycle numbers are determined separately for each primer pair as to ensure the reaction was stopped in the linear range of amplification (30, 28, 28, 28, 32, and 32 cycles for EMX1, FANCF, HEK293 site 2, HEK293 site 3, HEK293 site 4, and RNF2 primers, respectively). PCR products are purified using RapidTips (Diffinity Genomics). Purified DNA is amplified by PCR with primers containing sequencing adaptors. The products are gel-purified and quantified using the Quant-iT™ PicoGreen dsDNA Assay Kit (ThermoFisher) and KAPA Library Quantification Kit-Illumina (KAPA Biosystems). Samples are sequenced on an Illumina MiSeq as previously described.

Sequencing reads are automatically demultiplexed using MiSeq Reporter (Illumina), and individual FASTQ files are analyzed with a custom Matlab script. Each read is pairwise aligned to the appropriate reference sequence using the Smith-Waterman algorithm. Base calls with a Q-score below 31 are replaced with N's and are thus excluded in calculating nucleotide frequencies. This treatment yields an expected MiSeq base-calling error rate of approximately 1 in 1,000. Aligned sequences in which the read and reference sequence contained no gaps are stored in an alignment table from which base frequencies were tabulated for each locus. Indel frequencies were quantified with a custom Matlab script.

Sequencing reads are scanned for exact matches to two 10-bp sequences that flank both sides of a window in which indels might occur. If no exact matches were located, the read is excluded from analysis. If the length of this indel window exactly matched the reference sequence the read is classified as not containing an indel. If the indel window is two or more bases longer or shorter than the reference sequence, then the sequencing read is classified as an insertion or deletion, respectively.

Example 9—Mouse Editing Experiments (Prophetic)

It is envisaged that a base editor comprising a novel DNA targeting nuclease domain fused to a novel deaminase domain can be validated as a therapeutic candidate by testing in appropriate mouse models of disease.

One example of an appropriate model comprises mice that have been engineered to express the human PCSK9 protein, for example, as described by Herbert et al (10.1161/ATVBAHA.110.204040). The PCSK9 protein regulates LDL receptor (LDLR) levels and influences serum cholesterol levels. Mice expressing the human PCSK9 protein exhibit elevated levels of cholesterol and more rapid development of atherosclerosis. PCSK9 is a validated drug target for the reduction of lipid levels in people at increased risk of cardiovascular disease due abnormally high plasma lipid levels (https://doi.org/10.1038/s41569-018-0107-8). Reducing the levels of PCSK9 via genome editing is expected to permanently lower lipid levels for the life-time of the individual thus providing a life-long reduction in cardiovascular disease risk. One genome editing approach can involve targeting the coding sequence of the PCSK9 gene with the goal of editing a sequence to create a premature stop codon and thus prevent the translation of the PCSK9 mRNA into a functional protein. Targeting a region close to the 5′ end of the coding sequence is useful in order to block translation of the majority of the protein. To create a stop codon (TGA, TAA, TAG) with high efficiency and specificity will require targeting a region of the PCSK9 coding sequence wherein the editing window will be placed over an appropriate sequence such that the highest frequency editing event results in a stop codon. Therefore, the availability of multiple base editing systems with a wide range of PAMs or a base editing system with a degenerate PAM is useful to access a larger number of potential target sites in the PCSK9 gene. In addition, additional editing systems wherein the frequency of off-target editing is low (e.g. in the range of 1% or less of the on-target editing events) are also useful to perform gene editing in this context.

The efficiency of base editing required for a therapeutic effect is in the range of 50% or higher in order to achieve a significant reduction in plasma lipid levels. An example of the use of a base editor to create a stop codon in the PCSK9 gene is that of Carreras et al (https://doi.org/10.1186/s12915-018-0624-2) in which between 10% and 34% of the PCSK9 alleles were edited to create a stop codon. While this level of editing was sufficient to result in a measurable reduction in plasma lipid levels in the mice, a higher editing efficiency will be required for therapeutic use in humans.

To identify a base-editing (BE) system and a guide that are optimal for introducing the stop codons in the PCSK9 gene, a screen may be performed in a mouse liver cell line such as Hepa1-6 cells. In silico screening may first be used to identify guides that target the PCSK9 gene with the various BE systems available. To select among the large number of possible guides an in-silico analysis may be performed to determine which guides have an editing window that encompasses a sequence that when edited may create a stop codon. Preference may then be given to those guides that are closer to the 5′ end of the coding sequence. The resulting set of guides and BE proteins may be combined to form a ribonucleoprotein complex (RNP) and may be nucleofected into Hepa1-6 cells. After 72 h the efficiency of editing at the target site may be determined by NGS analysis. Based on these in vitro results the one or more BE/guide combinations that resulted in the highest frequency of stop codon formation may be selected for in vivo testing.

For application in a human therapeutic setting a safe and effective method of delivering the base editing components comprising the base editor and the guide RNA is required. In vivo delivery methods can be divided in to viral or non-viral methods. Among viral vectors the Adeno Associated Virus (AAV) is the virus of choice for clinical use due to its safety record, efficient delivery to multiple tissues and cell types and established manufacturing processes. The large size of base editors (BE) exceeds the packaging capacity of AAV which interferes with packaging in a single Adeno Associated Virus. While approaches that package BE into two AAV using split intein technology have been demonstrated to be successful in mice (https://doi.org/10.1038/s41551-019-0501-5), the requirement for 2 viruses can complicate development and manufacture. An additional disadvantage of AAV is that while the virus does not have a mechanism for promoting integration into the genome of host cells, and most of the AAV genomes remain episomal, a fraction of the AAV genomes do become integrated at random double strand breaks that occur naturally in cells (Curr Opin Mol Ther. 2009 August; 11(4): 442-447). This may lead to the persistence of gene sequences expressing the BE for the life-time of the organism. Moreover, AAV genomes persist as episomes inside the nucleus of transduced cells and can be maintained for years which may result in the long-term expression of BE in these cells and thus an increased risk of off-target effects because the risk of an off-target event occurring is a function of the time over which the editing enzyme is active. Adenovirus (Ad) such as Ad5 can efficiently deliver DNA payloads to the liver of mammals and can package up to 45 kb of DNA. However, adenoviruses are understood to induce a strong immune response in mammals (http://dx.doi.org/10.1136/gut.48.5.733), including in patients which can result in serious adverse events including death (https://doi.org/10.1016/j.ymthe.2020.02.010).

Non-viral delivery vectors (reviewed in doi:10.1038/mt.2012.79) which include lipid nanoparticles and polymeric nanoparticles have several advantages compared to viral delivery vectors including lower immunogenicity and transient expression of the nucleic acid cargo. The transient expression elicited by non-viral delivery vectors is particularly suited to genome editing applications because it is expected to minimize off target events. In addition, non-viral delivery unlike viral vectors has the potential for repeat administration to achieve the therapeutic effect. There is also no theoretical limit to the size of the nucleic acid molecules that can be packaged in non-viral vectors, although in practice the packaging becomes less efficient as the size of the nucleic acid increases and the particles size may increase.

A BE may be delivered in vivo using a non-viral vector such as a lipid nanoparticle (LNP) by encapsulating a synthetic mRNA encoding the BE together with the guide RNA into the LNP. This can be performed using any suitable methodology, for example as described by Finn et al (DOI: 10.1016/j.celrep.2018.02.014) or Yin et al (doi:10.1038/nbt.3471). LNP can deliver their cargo with a bias to the hepatocytes of the liver, which is also a target organ/cell type when attempting to interfere with the expression of the PCSK9 gene. In order to demonstrate proof of concept for this approach we envisage that a BE comprised of a novel genome editing protein fused to a deaminase domain may be encoded in a synthetic mRNA and packaged in a LNP together with an appropriate guide RNA that targets the selected site in the PCSK9 gene of the mouse. In the case of mice that were engineered to express the human PCSK9 gene the guide may be designed to target selectively the human PCSK9 gene or both the human and mouse PCSK9 genes. Following injection of these LNP the editing efficiency at the on-target site in the genome of the liver cells may be analyzed by amplicon sequencing or other methods such as tracking of indels by decomposition (doi: 10.1093/nar/gku936). The physiologic impact may be determined by measuring lipid levels in the blood of the mice, including total cholesterol and triglyceride levels using standard methods.

Another example of a disease that may be modeled in mice to evaluate a novel BE is Primary Hyperoxaluria type I. Primary Hyperoxaluria type I (PH1) is a rare autosomal recessive disease caused by defects in the AGXT gene that encodes the enzyme alanine-glyoxylate aminotransferase. This results in a defect in glyoxylate metabolism and the accumulation of the toxic metabolite oxalate. One approach to treating this disease is to reduce the expression of the enzyme glycolate oxidase (GO) that produces glyoxylate from glycolate and thereby reducing the amount of substrate (glyoxylate) available for the formation of oxalate. PH1 can be modeled in mice in which both copies of the AGXT gene have been knocked out (agxt−/− mice) resulting in a significant 3-fold increase in oxalate levels in the urine compared to wild type controls. The agxt−/− mice can therefore be used to assess the efficacy of a novel base editor designed to create a stop codon in the coding sequence of the endogenous mouse GO gene. To identify a BE system and a guide that is optimal for introducing stop codons in the GO gene, a screen may be performed in a mouse liver cell line such as Hepa1-6 cells. In silico screening may first be used to identify guides that target the GO gene with the various BE systems available. To select among the large number of possible guides an in-silico analysis may be performed to determine which guides have an editing window that encompasses a sequence that when edited may create a stop codon. In some instances, guides closer to the 5′ end of the coding sequence may be utilized. The resulting set of guides and BE proteins may be combined to form a ribonucleoprotein complex (RNP) and may be nucleofected in to Hepa1-6 cells. After 72 h, the efficiency of editing at the target site may be determined by NGS analysis. Based on these in vitro results the one or more BE/guide combinations that resulted in the highest frequency of stop codon formation may be selected for in vivo testing in mice.

The BE and guide may be delivered to the mice using an AAV virus with a split intein system to express the BE and a 3rd AAV to deliver the guide. Alternatively, an Adenovirus type 5 may be used to deliver the BE and guide in a single virus because of the >40 Kb packaging capacity of Adenovirus. Further, the BE may be delivered as a mRNA together with the guide RNA packaged in an appropriate LNP. After intravenous injection of the LNP into the agxt−/− mice the oxalate levels in the urine may be monitored over time to determine if oxalate levels were reduced which may indicate that the BE was active and had the expected therapeutic effect. To determine if the BE had introduced the stop codons, the appropriate region of the GO gene can be PCR amplified from the genomic DNA extracted from livers of treated and control mice. The resultant PCR product can be sequenced using Next Generation Sequencing to determine the frequency of the sequence changes.

Example 10—Gene Discovery of New Deaminases

4 Tbp (tera base pairs) of proprietary and public assembled metagenomic sequencing data from diverse environments (soil, sediments, groundwater, thermophilic, human, and non-human microbiomes) were mined to discover novel deaminases. HMM profiles of documented deaminases were built and searched against all predicted proteins using HMMER3 (hmmer.org) to identify deaminases from our databases. Predicted and reference (e.g., eukaryotic APOBEC1, bacterial TadA) deaminases were aligned with MAFFT and a phylogenetic tree was inferred using FastTree2. Novel families and subfamilies were defined by identifying clades composed of sequences disclosed herein. Candidates were selected based on the presence of critical catalytic residues indicative of enzymatic function (see e.g. SEQ ID NOs: 1-51, 385-386, 387-443, 444-447, 488-475, 599-675, 744-835, or 970-982).

Example 11—Plasmid Construction

DNA fragments of genes were synthesized at either Twist Bioscience or Integrated DNA Technologies (IDT). Plasmid DNA was amplified in Endura electrocompetent cells (Lucigen) and isolated by QIAprep Spin Miniprep Kit (Qiagen). Vector backbones were prepared by restriction enzyme digestion of plasmids. Inserts were amplified by Q5 High-Fidelity DNA polymerase (New England Biolabs) using primers (SEQ ID NOs: 690-707) ordered either from Elim BIOPHARM or IDT. Both vector backbones and inserts were purified by gel extraction using the Gel DNA Recovery Kit (Zymo Research). One or multiple DNA fragments were assembled into the vectors through NEBuilder HiFi DNA assembly (New England Biolabs) (SEQ ID NOs. 483-487, 720-726, or 737-738).

Example 12—Assessment of Base Edit Efficiency in E. coli by Sequencing

5 ng extracted DNA prepared as in Example 4 was used as the template and primers (P137 and P360) were used for PCR amplification, and the resulting products were submitted for Sanger sequencing at ELIM BIOPHARM. Primers used for sequencing are shown in Tables 6 and 7 (Seq ID NOs. 523-531).

TABLE 6

Primers used for base editing analysis of lacZ gene in E. coli

SEQ

Name
ID NO.
Description
Sequence (5′→3′)

P137
523
Forward primer used to amplify
CCAGGCTTTACACTTTATGCT

lacZ

P360
524
Reverse primer used to amplify
CGAACATCCAAAAGTTTGTGTTTTT

lacZ

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGA1-4_site 1

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGA1-4_site 2

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA1-4_site 3

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA1-6_site 1

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA1-6_site 2

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA1-6_site 3

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA3-6_site 1

P363
529
Sanger sequencing primer of
GAAAACGGCAACCCGTGG

MGA3-6_site 2

P360
524
Sanger sequencing primer of
CGAACATCCAAAAGTTTGTGTTTTT

MGA3-6_site 3

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGA3-7_site 1

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA3-7_site 2

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA3-7_site 3

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGA3-8_site 1

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA3-8_site 2

P363
529
Sanger sequencing primer of
GAAAACGGCAACCCGTGG

MGA3-8_site 3

P139
526
Sanger sequencing primer of
GTATGTGGTGGATGAAGCC

MGA4-2_site 1

P363
529
Sanger sequencing primer of
GAAAACGGCAACCCGTGG

MGA4-2_site 2

P360
524
Sanger sequencing primer of
CGAACATCCAAAAGTTTGTGTTTTT

MGA4-2_site 3

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA4-5 Site 1

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA4-5 Site 2

P461
530
Sanger sequencing primer of
GGATTGAAAATGGTCTGCTG

MGA4-5_Site 3

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGA7-1_site 1

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA7-1_site 2

P461
530
Sanger sequencing primer of
GGATTGAAAATGGTCTGCTG

MGA7-1_site 3

P139
526
Sanger sequencing primer of
GTATGTGGTGGATGAAGCC

MGA14-1_site 1

P363
529
Sanger sequencing primer of
GAAAACGGCAACCCGTGG

MGA14-1_site 2

P360
524
Sanger sequencing primer of
CGAACATCCAAAAGTTTGTGTTTTT

MGA14-1_site 3

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGA15-1_site 1

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA15-1_site 2

P140
527
Sanger sequencing primer of
TTGTGGAGCGACATCCAG

MGA15-1_site 3

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGA16-1_site 1

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA16-1_site 2

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA16-1_site 3

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGA18-1_site 1

P363
529
Sanger sequencing primer of
GAAAACGGCAACCCGTGG

MGA18-1_site 2

P462
531
Sanger sequencing primer of
ACTGCTGACGCCGCTGCG

MGA18-1_site 3

P363
529
Sanger sequencing primer of
GAAAACGGCAACCCGTGG

ABE8.17_site 1

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

ABE8.17_site 2

P139
526
Sanger sequencing primer of
GTATGTGGTGGATGAAGCC

ABE8.17_site 3

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGC1-4_site 1

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGC1-4_site 2

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGC1-4_site 3

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGC1-6_site 1

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGC1-6_site 2

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGC1-6_site 3

P138
525
Sanger sequencing primer of
CCGAAAGGCGCGGTGCCG

MGC3-6_site 1

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGC3-6_site 2

P360
524
Sanger sequencing primer of
CGAACATCCAAAAGTTTGTGTTTTT

MGC3-6_site 3

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGC3-7_site 1

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGC3-7_site 2

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGC3-7_site 3

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGC3-8_site 1

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGC3-8_site 2

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGC3-8_site 3

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGC4-2_site 1

P139
526
Sanger sequencing primer of
GTATGTGGTGGATGAAGCC

MGC4-2_site 2

P363
529
Sanger sequencing primer of
GAAAACGGCAACCCGTGG

MGC4-2_site 3

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGC4-5_site 1

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGC4-5_site 2

P139
526
Sanger sequencing primer of
GTATGTGGTGGATGAAGCC

MGC4-5_site 3

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGC7-1_site 1

P461
530
Sanger sequencing primer of
GGATTGAAAATGGTCTGCTG

MGC7-1_site 2

P139
526
Sanger sequencing primer of
GTATGTGGTGGATGAAGCC

MGC7-1_site 3

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGC14-1_site 1

P139
526
Sanger sequencing primer of
GTATGTGGTGGATGAAGCC

MGC14-1_site 2

P139
526
Sanger sequencing primer of
GTATGTGGTGGATGAAGCC

MGC14-1_site 3

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGC15-1_site 1

P461
530
Sanger sequencing primer of
GGATTGAAAATGGTCTGCTG

MGC15-1_site 2

P139
526
Sanger sequencing primer of
GTATGTGGTGGATGAAGCC

MGC15-1_site 3

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGC16-1_site 1

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

MGC16-1_site 2

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGC16-1_site 3

P361
528
Sanger sequencing primer of
TGAGCGCATTTTTACGCGC

MGC18-1_site 1

P139
526
Sanger sequencing primer of
GTATGTGGTGGATGAAGCC

MGC18-1_site 2

P363
529
Sanger sequencing primer of
GAAAACGGCAACCCGTGG

MGC18-1_site 3

P363
529
Sanger sequencing primer of
GAAAACGGCAACCCGTGG

BE3_site 1

P360
524
Sanger sequencing primer of
CGAACATCCAAAAGTTTGTGTTTTT

BE3_site 2

P137
523
Sanger sequencing primer of
CCAGGCTTTACACTTTATGCT

BE3_site 3

TABLE 7

Primers used for base editing analysis of the effect of uracil glycosylase

inhibitor (UGI) in E. coli

SEQ

Name
ID NO.
Description
Sequence (5′→3′)

P137
523
Forward primer used to amplify
CCAGGCTTTACACTTTATGCT

lacZ

P360
524
Reverse primer used to amplify
CGAACATCCAAAAGTTTGTGTTTTT

lacZ

P461
530
Sanger sequencing primer of lacZ
GGATTGAAAATGGTCTGCTG

site

FIGS. 8A-8C shows example base edits by enzymes interrogated by this experiment, as assessed by Sanger sequencing.

FIGS. 10A-10B shows base editing efficiencies of adenine base editors (ABEs) using TadA (ABE8.17m) (SEQ ID NO: 596) and MG nickases according to Table 3. TadA is a tRNA adenine deaminase; TadA (ABE8.17m) is an engineered variant of E. coli TadA. Twelve MG nickases fused with TadA (ABE8.17m) were constructed and tested in E. coli. Three guides were designed to target lacZ. Numbers shown in boxes indicate percentages of A to G conversion quantified by Edit R at each position. ABE8.17m was used as the positive control for the experiment.

FIGS. 11A-11B shows base editing efficiencies of cytosine base editors (CBEs) comprising rat APOBEC1, MG nickases, and uracil glycosylase inhibitor of Bacillus subtilis bacteriophage (UGI (PBS1)). APOBEC1 is a cytosine deaminase. 12_MG nickases fused with rAPOBEC1 on N-terminus and UGI on C-terminus were constructed and tested in E. coli. Three guides were designed to target lacZ. Numbers shown in boxes indicate percentages of C to T conversion quantified by Edit R. BE3 was used as the positive control in the experiment.

FIGS. 12A-B show effects of MG uracil glycosylase inhibitors (UGIs) on base editing activity when added to CBEs. FIG. 12A shows MGC15-1 comprises of N-terminal APOBEC1, MG15-1 nickase, and C-terminal UGI. Three MG UGIs were tested for improvements of cytosine base editing activities in E. coli. (b) BE3 comprises N-terminal rAPOBEC1, SpCas9 nickase, and C-terminal UGI. Two MG UGIs were tested for improvements of cytosine base editing activities in HEK293T cells. Editing efficiencies were quantified by Edit R.

Example 13—Cell Culture, Transfections, Next Generation Sequencing, and Base Edit Analysis

HEK293T cells were grown and passaged in Dulbecco's Modified Eagle's Medium plus GlutaMAX (Gibco) supplemented with 100 (v/v) fetal bovine serum (Gibco) at 37° C. with 50 CO₂, 5×10⁴cells were seeded on 96-well cell culture plates treated for cell attachment (Costar), grown for 20 to 24 h, and the spent media were refreshed with new media right before tranfection. 200 ng expression plasmid and 1 μL lipofectamine 2000 (ThermoFisher Scientific) were used for tranfection per well per manufacturer's instructions. Transfected cells were grown for 3 days, harvested, and gDNA was extracted with QuickExtract (Lucigen) per manufacturer's instructions. Targeted regions for base edits were amplified using Q5 High-Fidelity DNA polymerase (New England Biolabs) with primers listed in Tables 8 and 9 (SEQ ID NOs. 538-585) and extracted DNA as the templates.

TABLE 8

Primers used for base edit analysis of the effect of UGI in HEK293T

SEQ

Name
ID NO.
Description
Sequence (5′→3′)

P577
536
Forward primer used to amplify the
GAGGCTGGAGAGGCCCGT

targeted region

P578
537
Reverse primer used to amplify the
GATTTTCATGCAGGTGCTGAAA

targeted region

P577
536
Sanger sequencing primer
GAGGCTGGAGAGGCCCGT

TABLE 9a

Primers used to amplify targeted regions in HEK293T cells transfected with

A0A2K5RND7-MG nickase-MG69-1

SEQ

Name
ID NO.
Description
Sequence (5′→3′)

P969
538
Forward primer used to amplify
GCTCTTCCGATCTNNNNNAGGAG

A0A2K5RDN7-nSpCas9 (D10A)-
GAAGGGCCTGAGT

MG69-1_site 1

P970
539
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNTCTGC

A0A2K5RDN7-nSpCas9 (D10A)-
CCTCGTGGGTTTG

MG69-1_site 1

P971
540
Forward primer used to amplify
GCTCTTCCGATCTNNNNNCTCTG

A0A2K5RDN7-nSpCas9 (D10A)-
GCCACTCCCTGGC

MG69-1_site 2

P972
541
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNGGCAG

A0A2K5RDN7-nSpCas9 (D10A)-
GCTCTCCGAGGAG

MG69-1_site 2

P973
542
Forward primer used to amplify
GCTCTTCCGATCTNNNNNGGGAA

A0A2K5RDN7-nSpCas9 (D10A)-
TAATAAAAGTCTCTCTCTTAA

MG69-1_site 3

P974
543
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNCCCCC

A0A2K5RDN7-nSpCas9 (D10A)-
TCCACCAGTACCC

MG69-1_site 3

P975
544
Forward primer used to amplify
GCTCTTCCGATCTNNNNNCCTGT

A0A2K5RDN7-nSpCas9 (D10A)-
CCTTGGAGAACCG

MG69-1_site 4

P976
545
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNGCAGG

A0A2K5RDN7-nSpCas9 (D10A)-
TGAACACAAGAGCT

MG69-1_site 4

P977
546
Forward primer used to amplify
GCTCTTCCGATCTNNNNNGAAGG

A0A2K5RDN7-nSpCas9 (D10A)-
TGTGGTTCCAGAAC

MG69-1_site 5

P978
547
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNTCGAT

A0A2K5RDN7-nSpCas9 (D10A)-
GTCCTCCCCATTG

MG69-1_site 5

P979
548
Forward primer used to amplify
GCTCTTCCGATCTNNNNNAAACA

A0A2K5RDN7-nMG1-4 (D9A)-
GGCTAGACATAGGGA

MG69-1_site 1

P980
549
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNGAAGC

A0A2K5RDN7-nMG1-4 (D9A)-
CACCAGAGTCTCTA

MG69-1_site 1

P981
550
Forward primer used to amplify
GCTCTTCCGATCTNNNNNGCCGC

A0A2K5RDN7-nMG1-4 (D9A)-
CATTGACAGAGGG

MG69-1_site 2

P982
551
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNGCATC

A0A2K5RDN7-nMG1-4 (D9A)-
AAAACAAAAGGGAGATTG

MG69-1_site 2

P983
552
Forward primer used to amplify
GCTCTTCCGATCTNNNNNCCTCT

A0A2K5RDN7-nMG1-4 (D9A)-
GCCCACCTCACTT

MG69-1_site 3

P984
553
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNGCCAT

A0A2K5RDN7-nMG1-4 (D9A)-
GTGGGTTAATCTGG

MG69-1_site 3

P985
554
Forward primer used to amplify
GCTCTTCCGATCTNNNNNCCGGA

A0A2K5RDN7-nMG1-4 (D9A)-
CGCACCTACCCAT

MG69-1_site 4

P986
555
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNCTAGA

A0A2K5RDN7-nMG1-4 (D9A)-
TGGGAATGGATGGG

MG69-1_site 4

P987
556
Forward primer used to amplify
GCTCTTCCGATCTNNNNNAACCA

A0A2K5RDN7-nMG3-6 (D13A)-
CAAACCCACGAGG

MG69-1_site 1

P988
557
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNTCAAT

A0A2K5RDN7-nMG3-6 (D13A)-
GGCGGCCCCGGGC

MG69-1_site 1

P989
558
Forward primer used to amplify
GCTCTTCCGATCTNNNNNAGTGA

A0A2K5RDN7-nMG3-6 (D13A)-
TCCCCAGTGTCCC

MG69-1_site 2

P990
559
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNGCCCT

A0A2K5RDN7-nMG3-6 (D13A)-
GAACGCGTTTGCT

MG69-1_site 2

P991
560
Forward primer used to amplify
GCTCTTCCGATCTNNNNNTGGGA

A0A2K5RDN7-nMG3-6 (D13A)-
ATAATAAAAGTCTCTCTCT

MG69-1_site 3

P992
561
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNCCCCT

A0A2K5RDN7-nMG3-6 (D13A)-
CCACCAGTACCCC

MG69-1_site 3

P993
562
Forward primer used to amplify
GCTCTTCCGATCTNNNNNCAGGG

A0A2K5RDN7-nMG3-6 (D13A)-
CCTCCTCAGCCCA

MG69-1_site 4

P994
563
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNGTCTG

A0A2K5RDN7-nMG3-6 (D13A)-
GATGTCGTAAGGGAA

MG69-1_site 4

P995
564
Forward primer used to amplify
GCTCTTCCGATCTNNNNNGGGGT

A0A2K5RDN7-nMG3-6 (D13A)-
GTAACTCAGAATGTTTT

MG69-1_site 5

P996
565
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNGGGAG

A0A2K5RDN7-nMG3-6 (D13A)-
TGAGACTCAGAGA

MG69-1_site 5

P997
566
Forward primer used to amplify
GCTCTTCCGATCTNNNNNGCAAA

A0A2K5RDN7-nMG3-6 (D13A)-
GAGGGAAATGAGATCA

MG69-1_site 6

P998
567
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNGTGAC

A0A2K5RDN7-nMG3-6 (D13A)-
ACATTTGTTTGAGAATCA

MG69-1_site 6

P999
568
Forward primer used to amplify
GCTCTTCCGATCTNNNNNCTTTA

A0A2K5RDN7-nMG3-6 (D13A)-
TCCCCGCACAGAG

MG69-1_site 7

P1000
569
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNCTTGG

A0A2K5RDN7-nMG3-6 (D13A)-
CCCATGGGAAATC

MG69-1_site 7

P1001
570
Forward primer used to amplify
GCTCTTCCGATCTNNNNNGTCCC

A0A2K5RDN7-nMG4-2 (D28A)-
ATCCCAACACCCC

MG69-1_site 1

P1002
571
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNTGGGC

A0A2K5RDN7-nMG4-2 (D28A)-
ATGTGTGCTCCCA

MG69-1_site 1

P1003
572
Forward primer used to amplify
GCTCTTCCGATCTNNNNNCTATG

A0A2K5RDN7-nMG4-2 (D28A)-
GGAATAATAAAAGTCTCTC

MG69-1_site 2

P1004
573
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNCTCCA

A0A2K5RDN7-nMG4-2 (D28A)-
CCAGTACCCCACC

MG69-1_site 2

P1005
574
Forward primer used to amplify
GCTCTTCCGATCTNNNNNGGACC

A0A2K5RDN7-nMG4-2 (D28A)-
CTGGTCTCTACCT

MG69-1_site 3

P1006
575
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNCCTCT

A0A2K5RDN7-nMG4-2 (D28A)-
CCCATTGAACTACC

MG69-1_site 3

P1007
576
Forward primer used to amplify
GCTCTTCCGATCTNNNNNCCCCA

A0A2K5RDN7-nMG4-2 (D28A)-
GTGACTCAGGGCC

MG69-1_site 4

P1008
577
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNTCGTA

A0A2K5RDN7-nMG4-2 (D28A)-
AGGGAAAGACTTAGGAA

MG69-1_site 4

P1009
578
Forward primer used to amplify
GCTCTTCCGATCTNNNNNTCTCC

A0A2K5RDN7-nMG18-1 (D12A)-
CTTTTGTTTTGATGCATTT

MG69-1_site 1

P1010
579
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNCCACC

A0A2K5RDN7-nMG18-1 (D12A)-
CCAGGCTCTGGGG

MG69-1_site 1

P1011
580
Forward primer used to amplify
GCTCTTCCGATCTNNNNNCCTTT

A0A2K5RDN7-nMG18-1 (D12A)-
TGTTTTGATGCATTTCTGTTT

MG69-1_site 2

P1012
581
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNAATCT

A0A2K5RDN7-nMG18-1 (D12A)-
ACCACCCCAGGCT

MG69-1_site 2

P1013
582
Forward primer used to amplify
GCTCTTCCGATCTNNNNNATCCC

A0A2K5RDN7-nMG18-1 (D12A)-
CAGTGTCCCCCTT

MG69-1_site 3

P1014
583
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNCCAGG

A0A2K5RDN7-nMG18-1 (D12A)-
CCCTGAACGCGTT

MG69-1_site 3

P1015
584
Forward primer used to amplify
GCTCTTCCGATCTNNNNNAGGCC

A0A2K5RDN7-nMG18-1 (D12A)-
AGGCCTGCGGGGG

MG69-1_site 4

P1016
585
Reverse primer used to amplify
GCTCTTCCGATCTNNNNNCCAAA

A0A2K5RDN7-nMG18-1 (D12A)-
AACTCCCAAATTAGCAAA

MG69-1_site 4

PCR products were purified using the HighPrep PCR Clean-up System (MAGBIO) per manufacturer's instructions. The effect of uracil glycosylase inhibitor (UGI) on base editing of candidate enzymes was analyzed by submitting PCR products to Elim BIOPHARM for Sanger sequencing, and the efficiency was quantified by Edit R. To analyze base editing of A0A2K5RND7-MG nickase-MG69-1, adapters used for next generation sequencing (NGS) were appended to PCR products by subsequent PCR reactions using KAPA HiFi HotStart ReadyMix PCR Kit (Roche) and primers compatible with TruSeq DNA Library Prep Kits (illumina). DNA concentrations of the resulting products were quantified by TapeStation (Agilent), and samples were pooled together to prepare the library for NGS analysis. The resulting library was quantified by qPCR with Aria Real-time PCR System (Agilent) and high through sequencing was performed with an Illumina Miseq instrument per manufacturer's instructions. Sequencing data was analyzed for base edits by Cripresso2.

FIGS. 13A-13B shows maps of sites targeted by base editors showing base editing efficiencies of cytosine base editors comprising CMP/dCMP-type deaminase domain-containing protein (uniprot accession A0A2K5RDN7), MG nickases, and MG UGI. The constructs comprise N-terminal A0A2K5RDN7, MG nickases, and C-terminal MG69-1. For simplicity, the identities of MG nickases are shown in the figure. BE3 (APOBEC1) was used as a positive control for base editing. An empty vector was used for the negative control. Three independent experiments were performed on different days. Abbreviations: R, repeat; NEG, negative control.

TABLE 9b

Protein Domains used in constructs in Example 13

Linker

Linker

(Deaminase-

(Nickase-

Candidate
Type
PAM
Deaminase
Nickase)
Nickase
UGI
UGI)

A0A2K5RDN7-
II
nnRGGnT
A0A2K5RDN7
SGSETPGT
InMG3-6 (D13A)
MG69-1
SGGSS

nMG3-6-MG69-1

Sequence
SEQ ID NO: 594
SESATPES
SEQ ID NO: 71
SEQ ID

Number:

NO: 52

A362

A0A2K5RDN7-
II
nRRR
A0A2K5RDN7
SGSETPGT
nMG1-4
MG69-1
SGGSS

nMG1-4-MG69-1

Sequence
SEQ ID NO: 594
SESATPES
SEQ ID NO: 70
SEQ ID

Number:

NO: 52

A360

A0A2K5RDN7-
II
nRWART
A0A2K5RDN7
SGSETPGT
nMG18-1
MG69-1
SGGSS

nMG18-1-MG69-1

Sequence
SEQ ID NO: 594
SESATPES
SEQ ID NO: 78
SEQ ID

Number:

NO: 52

A368

Example 14—Positive Selection of Base Editor Mutants in E. coli

FIGS. 14A-B show a positive selection method for TadA characterization in E. coli. FIG. 14A shows a map of one plasmid system used for TadA selection. The vector comprises CAT (H193Y), a sgRNA expression cassette targeting CAT, and an ABE expression cassette. In this figure, N-terminal TadA from E. coli and a C-terminal SpCas9 (D10A) from Streptococcus pyogenes are shown. FIG. 14B shows sequencing traces demonstrating that when introduced/transformed into E. coli cells, the A2 position of CAT (H193Y)'s template strand is edited, reverting the H193Y mutant to wild type and restoring its activity. Abbreviations: CAT, chloramphenicol acetyltransferase.

1 μL of plasmid solution with a concentration of 10 ng/μL was transformed into 25 μL BL21 (DE3) electrocompetent cells (Lucigen), recovered with 975 μL expression recovery medium at 37° C. for 1 h. 50 μL of the resulting cells were spread on a LB agar plate containing 100 μg/mL carbenicillin, 0.1 mM IPTG, and appropriate amount of chloramphenicol. The plate was incubated at 37° C. until colonies were pickable. Colony PCR were used to amplify the genomic region containing base edits, and the resulting products were submitted for Sanger sequencing at ELIM BIOPHARM. Primers used for PCR and sequencing are listed in Table 10 (SEQ ID NOs. 532-537).

TABLE 10

Primers used for base edit analysis of CAT (H193Y)

SEQ

Name
ID NO.
Description
Sequence (5′→3′)

P570
532
Forward primer used to amplify CAT
CCGCCGCCGCAAGGAATGGTTT

(H193Y) of CAT (H193Y)-sgRNA-
AATTAATTTGATCGGCACGTAAG

MG68-4 variant-nSpCas9 (D10A)
AGG

P1050
534
Forward primer used to amplify CAT
AAGGAATGGTTTAATTAATTCTA

(H193Y) of CAT (H193Y)-sgRNA-
GATTAATTAATTTGATCGGCACG

MG68-4 variant-nMG34-1 (D10A)
TAAG

P571
533
Reverse primer used to amplify CAT
GGACTGTTGGGCGCCATCTCCTT

(H193Y) of CAT (H193Y)-sgRNA-
GCATGCTTCACTTATTCAGGCGT

MG68-4 variant-nSpCas9
AGCA

P571
535
Sanger sequencing primer of CAT
GGACTGTTGGGCGCCATCTCCT

(H193Y)
TGCATGCTTCACTTATTCAGGCG

TAGCA

FIGS. 15A-B shows mutations caused by TadA enable high tolerance of chloramphenicol (Cm). FIG. 15A shows photographs of growth plates where different concentrations of chloramphenicol were used to select for antibiotics resistance of E. coli. In this example, wild type and two variants of TadA from E. coli (EcTadA) were tested. FIG. 15B shows a results summary table demonstrating that ABEs carrying mutated TadA show higher editing efficiencies than the wild type. In these experiments, colonies were picked from the plates with greater than or equal to 0.5 μg/mL Cm. For simplicity, identities of deaminases are shown in the table, but effectors (SpCas9) and construct organization are shown in the figures above.

FIGS. 16A-16B shows investigation of MG TadA activity in positive selection. FIG. 16A shows photographs of growth plates from an experiment where 8_MG68 TadA candidates were tested against 0 to 2 μg/mL of chloramphenicol (ABEs comprised N-terminal TadA variants and C-terminal SpCas9 (D10A) nickase). For simplicity, identities of deaminases are shown. Panel (b) shows a summary table depicting editing efficiencies of MG TadA candidates. FIG. 16B demonstrates that MG68-3 and MG68-4 drove base edits of adenine. In this experiment, colonies were picked from the plates with greater than or equal to 0.5 μg/mL Cm.

FIG. 17 shows an improvement of base editing efficiency of MG68-4 nSpCas9 via D109N mutation on MG68-4. Panel (a) shows photographs of growth plates where wild type MG68-4 and its variant were tested against 0 to 4 μg/mL of chloramphenicol. For simplicity, identities of deaminases are shown. Adenine base editors in this experiment comprise N-terminal TadA variants and C-terminal SpCas9 (D10A) nickase. Panel (b) shows a summary table depicting editing efficiencies of MG TadA candidates. Panel (b) demonstrates that MG68-4 and MG68-4 (D109N) showed base edits of adenine, with the D109N mutant showing increased activity. In this experiment, colonies were picked from the plates with greater than or equal to 0.5 μg/mL Cm.

FIG. 18 shows base editing of MG68-4 (D109N)_nMG34-1. Panel (a) shows photographs of growth plates of an experiment where an ABE comprising N-terminal MG68-4 (D109N) and C-terminal SpCas9 (D10A) nickase was tested against 0 to 2 μg/mL of chloramphenicol. Panel (b) shows a summary table depicting editing efficiencies with and without sgRNA. In this experiment, colonies were picked from the plates with greater than or equal to 1 μg/mL Cm.

FIG. 19 shows 28_MG68-4 variants designed for improvements of MG68-4-nMG34-1 base editing activity. 12 residues were selected for targeted mutagenesis to improve editing of the enzymes.

Example 15—Plasmid Construction for E. coli Optimized Constructs

All plasmids for cytidine deaminase expression were prepared by Twist Biosciences. Each construct was codon optimized for E. coli expression and inserted into the XhoI and BamHI restriction sites of the pET-21(+) vector. Sequences were designed to exclude BsaI restriction sites. The following sequence was appended to the beginning of each construct: 5′-GAAATAATTTTGTTTAACTTTAAGAAGGAGATATACATATGGGCAGCAGTCATCATC ATCACCATCAC-3′. This sequence encodes a ribosomal binding site and an N-terminal hexahistidine tag. At the end of each CDA sequence, a stop codon was added to prevent incorporation of the C-terminal HisTag encoded by pET-21(+).

Example 16—Plasmid Construction for Mammalian Optimized Constructs

All plasmids for cytidine deaminase expression in mammalian cells were codon optimized and ordered from Twist Biosciences. Each construct was codon optimized for H. sapiens expression. Restriction sites avoided were: BsaI, SphI, EcoRI, BmtI, BstX, BlpI and BamHI. The following sequence was appended 5′ of the codon optimized sequences: ACCGGTGCTAGCCCACC. This sequence contains a BmtI restriction site to be used for downstream cloning and a Kozak sequence for maximum translation. The following sequence was appended 3′ of the codon optimized CDA: AGCGCATGC. This sequence contains a SphI restriction site to allow for downstream cloning—stop codon was removed in all constructs.

Example 17—Cell Culture, Transfections, Next Generation Sequencing, and Base Edit Analysis

HEK293T cells were grown and passaged in Dulbecco's Modified Eagle's Medium plus GlutaMAX (gibco) supplemented with 10% (v/v) fetal bovine serum (gibco) at 37° C. with 5% CO₂2.5×10⁴cells were seeded on 96-well cell culture plates treated for cell attachment (Costar) grown for 20 to 24 h, and the spent media were refreshed with new media right before transfection. 300 ng expression plasmid and 1 μL lipofectamine 2000 (ThermoFisher Scientific) were used for transfection per well per manufacturer's instructions. Transfected cells were grown for 3 days, harvested, and gDNA was extracted with QuickExtract (Lucigen) per manufacturer's instructions. Targeted regions for base edits were amplified using Q5 High-Fidelity DNA polymerase (New England Biolabs) with primers (SEQ ID NOs: 690-707, 865-872, and 932-961) and extracted DNA as the templates. PCR products were purified by HighPrep PCR Clean-up System (MAGBIO) per manufacturer's instructions. To analyze base substitutions of adenine base editors, adapters used for next generation sequencing (NGS) were appended to PCR products by subsequent PCR reactions using KAPA HiFi HotStart ReadyMix PCR Kit (Roche) and primers compatible with TruSeq DNA Library Prep Kits (illumina). DNA concentrations of the resulting products were quantified by TapeStation (Agilent), and samples were pooled together to prepare the library for NGS analysis. The resulting library was quantified by qPCR with Aria Real-time PCR System (Agilent) and high through sequencing was performed with an Illumina Miseq instrument per manufacturer's instructions. Sequencing data was analyzed for base edits by Crispresso2.

Example 18—In Vitro Deaminase In-Gel Assay

Linear DNA constructs containing the cytidine deaminases were amplified from the previously mentioned plasmids from Twist via PCR. All constructs were cleaned via SPRI Cleanup (Lucigen) and eluted in a 10 mM tris buffer. Enzymes were expressed from the PCR templates in an in-vitro transcription-translation system, PURExpress (NEB), at 37° C. for 2 hours. Deamination reactions were prepared by mixing 2 uLs of the PURExpress reaction with 2 uM 5′-FAM labeled ssDNA (IDT) and 1U USER Enzyme (NEB) in 1× Cutsmart Buffer (NEB). The reactions were incubated at 37° C. for 2 hours and then quenched by adding 4 units of proteinase K (NEB) and incubation at 55° C. for 10 minutes. The reaction was further treated by addition of 11 uL of 2×RNA loading dye and incubation at 75° C. for 10 minutes. All reaction conditions were analyzed by gel electrophoresis in a 10% denaturing gel (Biorad). DNA bands were visualized by a Chemi-Doc imager (Biorad) and band intensities were quantified using BioRad Image Lab v6.0. Successful deamination is observed by the visualization of a 10 bp fluorescently labeled band in the gel (FIG. 20). The results indicated that MG93-3 through MG93-7, MG93-11, MG138-17, MG138-20, MG138-23, MG139-12, and MG139-19 through MG139-21 were capable of deaminating cytidine-containing substrates.

The in vitro activity of more than 90 novel cytidine deaminases on a ssDNA substrate containing cytosine in all four possible 5′-NC contexts was measured (FIG. 23). 38 of these cytidine deaminases displayed ssDNA deamination activity, including 5 that are capable of substantially total deamination of the target cytidine (MG139-84/SEQ ID NO:808, MG139-86/SEQ ID NO:810, MG139-87/SEQ ID NO:811, MG139-95/SEQ ID NO:819, and MG139-102/SEQ ID NO:826, see e.g. FIG. 23). Additionally, some of the deaminases also showed greater than 50% deamination of the target cytosine (MG139-30/SEQ ID NO:752, MG139-55/SEQ ID NO:777, MG139-99/SEQ ID NO:823). While most of the reported DNA cytidine deaminases operate predominantly on ssDNA, often with a preference for the base immediately 5′ of the substrate C, a related dsDNA substrate was also included as a control (FIG. 24), verifying that MG139-86 and MG139-87 are capable of also deaminating dsDNA substrates.

Example 19—NGS-Based Deep Deamination In Vitro Assay

We created an ssDNA library with a single target C to determine cytosine deaminase activity and binding location preference. Briefly, an ssDNA substrate oligonucleotide 5′-NNNCNNN flanked by 21-nt and 21-nt regions comprising adenine, an upstream 20 nt randomized barcode, and two conserved primer binding site was synthesized (Integrated DNA Technologies).

This yielded an oligonucleotides pool with 4096 unique substrate sequences. Unique barcodes were included on each oligo to determine the original variable region post-sequencing in case of non-target C deamination events. First, deaminases were expressed from the PCR templates in an in-vitro transcription-translation system, PURExpress (NEB), at 37° C. for 2 hours. Then the PURExpress was then incubated with 0.5 pmol of the substrate oligonucleotide pool for 1 h at 37° C. in 50 mM Tris, pH 7.5, 75 mM NaCl.

A. Half of the treated pool was amplified using the Accel-NGS 1S Plus kit (Swift) to create a dsDNA pool. This pool was then further amplified with unique dual indexes and sequenced on a MiSeq for >15,000 reads per sample.

B. Half of the treated pool was annealed to an appropriate 3′-barcoded adaptor (IDT) and treated with T4 DNA polymerase at 12° C. for 20 min to create a dsDNA pool. Using the conserved regions this pool was amplified with unique dual indexes (IDT) and sequenced on a MiSeq for >15,000 per samples.

Example 20—Lentivirus Production and Transduction

HEK293T cells were grown and passaged in Dulbecco's Modified Eagle's Medium plus GlutaMAX (gibco) supplemented with 10% (v/v) fetal bovine serum (gibco) at 37° C. with 5% CO₂. The day before transfection, cells were seeded at 5×10⁶per dish. The day of transfection, 8 g of PsPax, 1 μg of pMD2-G, and 9 μg of plasmid containing the cytidine deaminase fused with MG3-6 or Cas9 were mixed together and packaged into Mirus LT1 transfection reagent (Mirus Bio). The mixture was transfected into HEK293T cells. Lentiviruses were collected 3 days post-transfection, filtered through a 0.4 uM filter, and immediately used for transducing cells. Transduction occurs by adding 12 volume of virus containing supernatant to cells with 8 μg/mL of polybrene.

Example 21—Adenine and Cytidine Base Editors in E. coli and Mammalian Cells

To demonstrate that MG34-1, a small type II CRISPR nuclease, can be used as a base editor, a construct comprising TadA*(8.17m)-nMG34-1 (ABE-MG34-1, SEQ ID NO: 727), where TadA*(8.17m) is an engineered TadA from E. coli, and a construct comprising rAPOBEC1-nMG34-1-UGI (PBS) (CBE-MG34-1, SEQ ID NO: 739), where rAPOBEC1 is rat APOBEC1 and UGI (PBS) is the uracil glycosylase inhibitor of Bacillus subtilis bacteriophage, were generated. TadA*(8.17m)-nSpCas9 (SEQ ID NO: 728) and rAPOBEC1-nSpCas9-UGI (PBS) (SEQ ID NO: 740) were generated as positive controls for editing profile analysis. Four guides that target lacZ gene in E. coli (SEQ ID NOs: 729-736) were designed and prepared for each base editor construct. Plasmids were transformed into BL21(DE3), recovered in recovery media at 37° C. for 1 h, and cell plates were plated on LB agar plates containing 100 μg/mL carbenicillin and 0.1 mM IPTG. After growing cells at 37° C. for 16 to 20 h, colony PCR was used to amplify the targeted regions in E. coli genome, and the resulting products were analyzed with Sanger sequencing at Elim BIOPHARM (FIGS. 22A-22C). Sequencing results indicated that both ABE-MG34-1 and CBE-MG34-1 edited target loci in the E. coli genome at levels and within editing windows comparable to the positive control SpCas9 base editors (FIGS. 22A and 22B). Further, TadA*(8.17m)-nMG34-1 showed higher base substitution on two targeted loci. ABE-MG34-1 also displayed base editing in human cells with up to 22% editing efficiency across three different genomic targets (FIG. 22C).

To determine whether the SMART HNH endonuclease-associated RNA and ORF (HEARO) enzymes can be used as base editors, an ABE was constructed by fusing a TadA*-(7.10) deaminase monomer to the C-terminus of an engineered MG35-1 containing a D59A mutation (FIG. 22E). The A to G editing of this ABE was tested in a positive selection single-plasmid E. coli system in which the ABE is required to revert a chloramphenicol acetyltransferase (CAT) gene containing a Y193 mutation back to H193 to survive chloramphenicol selection (FIG. 22D). This plasmid contains a sgRNA with a spacer either targeting the mutant CAT gene or a scrambled, non-targeting spacer region (control). An enrichment of colonies was detected with E. coli transformed with the ABE-MG35-1 targeting the CAT gene when grown on plates containing 2, 3, and 4 μg/mL of chloramphenicol, while no colonies grew on the plate containing 8 μg/mL of chloramphenicol (FIG. 22E). Sanger sequencing confirmed that 26 of 30 colonies picked from the 2, 3, and 4 μg/mL plates transformed with the target spacer contained the expected Y193H reversion (Table 11 and FIG. 31).

TABLE 11

E. coli survival assay with ABE-MG35-1

Chloramphenicol
Edited colonies

(ug/mL)
Target spacer
Non-target spacer

0
1/10
0/10

2-4
26/30
No colonies

8
No colonies
No colonies

Colonies grown on plates containing chloramphenicol concentrations of 0, 2, 3, and 4 μg/mL were sequenced to confirm reversion of the CAT gene function. Experiments were performed as n = 2.

It is understood that the four colonies without the reverted CAT sequence contain more unedited than edited copies of the selection construct, as a single reverted CAT gene is sufficient to confer colony survival. No colonies were seen on the 2, 3, 4, and 8 μg/mL plates for E. coli cells transformed with the non-targeting spacer. While the 0 μg/mL condition was used as a transformation control, 1 of 10 colonies picked from the 0 μg/mL plate for cells transformed with the targeting spacer contained the Y193H reversion, indicating a detectable level of editing without chloramphenicol selection. However, the colony growth enrichment under chloramphenicol selection for the targeting ABE-MG35-1 condition confirmed that the MG35-1 nickase is a successful component for base editing. At 623 aa long, the ABE-MG35-1 represents the smallest, nickase-based adenine base editor to date (Table 12).

TABLE 12

Size comparison of SMART nucleases vs. references

Enzyme
Length (aa)
ABE length* (aa)
CBE length (aa)

MG34-1
748
969
1104

MG35-1
429
623
—

SpCas9
1376
1588
1723

CasMINI (type V)
529
—
—

Base editor (ABE and CBE) size is approximated based on linkers and number of NLS signals added.

*For ABE, size was estimated with one TadA monomer.

Example 22—Adenine Base Editor in Mammalian Cells

In a previous experiment, MG68-4v1 (predicted as a tRNA adenosine deaminase) was able to convert adenine to guanine, resulting in bacterial survival under chloramphenicol selection. Next, two base editors fusing deaminase with nickase, MG68-4v1-nMG34-1 and MG68-4v1-nSpCas9 were constructed. As a positive control for deaminase activity, an active variant engineered by Gaudelli et al. and created TadA*(8.8m)-nMG34-1 was used. To ensure genomic loci are able to be accessed by base editors, we selected guides that have shown activity for SpCas9 in mammalian cells. Out of 9 sites tested, MG68-4v1-nMG34-1 showed 11.3% editing efficiency at position 8 of site 2. When MG68-4v1 was fused to nSpCas9, the base editor exhibited 22.3% efficiency at position 5 of site 1 and 4.4% efficiency at position 6 of site 8. The replacement of MG68-4v1 with TadA*(8.8m) in MG68-4v1-nMG34-1 showed 7.3% and 9.7% at position 5 and 7 of site 1, respectively. The efficiencies were increased to 16.5% and 19.5% at position 6 and 8 of site 2, respectively. Besides, 4.1% and 3.4% editing were observed at position 7 and 8 when targeting to site 7. Taken together, these results indicate that MG68-4v1 and nMG34-1 demonstrate base editing activity in mammalian cells (FIG. 21).

Example 23—Activity in Mammalian Cells (Cytidine Deaminase Assay in Tissue Culture Cells) (Prophetic)

The cytidine deaminase assay in cells is designed so that when the mutated stop codon ACG is mutated to ATG by a cytidine deaminase, cells can translate the blasticidin gene and therefore acquire resistance to this antibiotic. Upon transducing a reporter cell line (ACG containing cell) with a library of cytidine deaminases fused to Cas9 or MG3-6, it is expected that a fraction of cells will mutate the ACG to ATG and therefore gain resistance to blasticidin. Cells that have acquired such resistance and thus survive the selection assay are later subjected to next generation sequencing (NGS) to unveil the identity of the successful cytidine deaminase displaying cytidine base editor activity.

Example 24—Mammalian Constructs for Cytosine Base Editors (CBEs)

Plasmids for CBEs using the nickase forms of spCas9, MG3-6, and MG34-1 were constructed using NEB HiFi assembly mix and DNA fragments containing the novel cytidine deaminases, the nuclease enzymes, and UNG sequence. For constructs containing spCas9, pAL318 was digested with the NotI and XmaI restriction enzymes. For constructs containing MG3-6, pAL320 was digested with the NcoI restriction enzyme. For constructs containing MG34-1, pAL226 was digested with the NotI and BamHI restriction enzymes.

For experiments targeting the engineered cell line (SEQ ID NO. 962), CDAs were fused with MG3-6 nickase. For cloning CDA constructs in the MG3-6 nickase backbone, CDAs were ordered as gene fragments from Twist and digested with SphI and BmtI. The plasmid backbone containing MG3-6 was digested with SphI and BmtI, and the gene fragments were ligated using T4 DNA ligase. The plasmid backbone contains a mU6 promoter for cloning gRNAs targeting the engineered sites. The spacers targeting the engineered sites using MG3-6 are shown in SEQ ID NOs. 963-967.

CBEs were constructed using various combinations of cytidine deaminases, nickase effectors, and uracil glycosylase inhibitors (FIGS. 25A-25C). Overall, 14 cytidine deaminases (13 novel cytidine deaminases (MG139-12 (SEQ ID NO. 970), MG93-3 (SEQ ID NO. 971), MG93-4 (SEQ ID NO. 972), MG93-5 (SEQ ID NO. 973), MG93-6 (SEQ ID NO. 974), MG93-7 (SEQ ID NO. 975), MG93-9 (SEQ ID NO. 976), MG93-11 (SEQ ID NO. 977), MG138-17 (SEQ ID NO. 978), MG138-20 (SEQ ID NO. 979), MG138-23 (SEQ ID NO. 980), MG138-32 (SEQ ID NO. 981), and MG142-1 (SEQ ID NO. 982)) that were shown to be active in vitro and the A0A2K5RDN7 cytidine deaminase were each fused with 3 effectors (spCas9 (SEQ ID NOs. 877-889 and 968), MG3-6 (SEQ ID NOs. 890-902 and 969), or MG34-1 (SEQ ID NOs. 903-916)) to generate 42 distinct CBEs. Fusions containing spCas9 were fused with a C-terminal UGI, and fusions containing MG3-6 or MG34-1 were fused with a C-terminal MG69-1 UGI. Each CBE was tested with 5 sgRNAs (spCas9 (SEQ ID NOs. 917-921), MG3-6 (SEQ ID NOs. 922-926), or MG34-1 (SEQ ID NOs. 927-931)) targeting the HEK293 genome. Editing levels (C to T (%)) are shown for all cytosines within 5 bp of the spacer region. Numerous CBEs showed detectable editing levels when transiently transfected into HEK293 cells. When fused to spCas9, both MG93-4 and MG138-20 exceeded 5% editing at certain sites with MG93-3, MG93-7, and A0A2K5RDN7 exceeding 10% editing. When fused to MG3-6, MG93-4 and A0A2K5RDN7 exceeded 5% editing at certain sites. When fused to MG34-1, MG93-4, MG93-6, and MG93-9 exceeded 5% editing at certain sites, MG93-3, MG93-7, and MG139-12 exceeded 10% editing, and MG93-11 and A0A2K5RDN7 exceeded 20% editing. Numerous novel cytidine deaminases have been identified that are compatible with spCas9, MG3-6, and MG34-1 and are able to deaminate cytosines in mammalian cells.

In order to test the novel CDAs and assay for −1 nucleotide preferences, the CDAs were fused to MG3-6 and targeted a reporter cell line with 5 engineered PAMs in tandem (sequence ID no. 962). 14 CDAs were tested using this system, and many show >1% editing (Panel (a) of FIG. 26). The highest activity observed for a novel CDA fused to MG3-6 was 38.4% for MG152-6, with the second highest showing 17.6% for MG139-52. Their relative activity in comparison to A0A2K5RDN7 is shown in Panel (b) of FIG. 26. Interestingly, it was also observed that the highly active MG139-52 might deaminate the DNA strand that is part of the DNA/RNA heteroduplex in the R-loop (as well as the ssDNA); an example of this is shown in Panel (c) of FIG. 26. This activity (DNA deamination when the DNA is in a DNA/RNA heteroduplex) may highly improve off target effects as well as editing window, both of which may be beneficial for cytotoxicity.

Example 25—Cytosine Base Editors Toxicity in Mammalian Cells

HEK293T cells were transduced with lentiviruses carrying newly discovered CDAs fused to MG3-6. Successful transformants were selected by using 2 μg/mL of puromycin for 3 days. Death cells were washed with PBS and surviving cells were fixed and stained with 50% methanol and 1% crystal violet (Panel (a) of FIG. 27). Cells were then photographed in a chemidoc and the absorbance was measured by dissolving the crystal violet in 1% SDS and taking measurements at 570 nm (Panel (b) of FIG. 27).

The highly active CDA A0A2K5RDN7 shows high editing efficiency, but it also exhibits a high degree of cell toxicity (Panel (a) of FIG. 27). The deaminases were assayed as base editors (fused to MG3-6) and stably expressed in HEK293T cells. MG93-3 and MG93-4 both showed much less cellular toxicity than A0A2K5RDN7. Quantification of the toxicity assay (Panel (b) of FIG. 27) shows that MG93-3 and MG93-4 are less toxic than rAPOBEC.

Example 26—Directed Evolution of Adenosine Deaminase in E. coli

MG68-4 harboring a D109N mutation can improve DNA editing efficiency in E. coli. For simplicity, this variant was designated r1v1. To further improve the efficiency for editing in mammalian cells, the deaminase portion of MG68-4 (D109N)-nMG34-1 was randomly mutagenized by error prone PCR. The resulting library was tested for the editing activity of variants by an E. coli positive selection using chloramphenicol acetyltransferase with H193Y mutation.

To perform this experiment, the gene fragment of MG68-4 (D109N) was mutagenized by GeneMorph II Random mutagenesis kit according to the manufacturer's instructions. In general, 500 ng DNA template was used, and 20 cycles of PCR reaction was carried out to get a mutation frequency ranging from 0 to 4.5 mutations/kb. The vector pAL478 carrying nMG34-1, CAT (H193Y), and single guide expression cassette was linearized by SacII and KpnI digestion. PCR products from random mutagenesis were then cloned into the linearized vector by NEBuilder HiFi DNA assembly kit. The assembled product was transformed into BL21(DE3) (Lucigen), recovered with recovery media, and plated on LB agar plates containing 100 μg/mL carbenicillin, 0.1 mM IPTG, and chloramphenicol with concentrations of 2, 4, and 8 μg/mL. After bacterial selection, 260 colonies from plates of 4 and 8 μg/mL chloramphenicol were picked and sequenced by Sanger sequencing at Elim Biopharmaceuticals. Colonies carrying point mutations on MG68-4 (D109N) were grown in 96-well deep well plates and pooled together. Plasmids of these cells were isolated using QIAprep Spin Miniprep Kit (Qiagen) and MG68-4 variants were subcloned into pAL478 by digestion and ligation using restriction enzymes (SacII and KpnI) and T4 DNA ligase, respectively. The resulting library was transformed into Endura electrocompetent cells (Lucigen), amplified, and isolated by miniprep. Collected DNA was transformed into BL21(DE3) and tested for deaminase activity using chloramphenicol selection with concentrations of 2, 16, 32, 64, and and 128 μg/mL. 128 colonies (which were understood to contain mutations that facilitated deaminase activity of the MG68 enzyme and survival under chloramphenicol selection) from plates of 32, 64, and 128 μg/mL chloramphenicol were picked and sequenced by Sanger sequencing.

A total of 25 variants (r2v1 to r2v24 (SEQ ID NOs. 837-860) were uncovered and mutations were confirmed by Sanger sequencing. Through this evolution process, 24 residues were identified that were mutated to other amino acids (FIG. 28). These mutants contained mutations at T2 (e.g. T2A), D7 (e.g. D7G), E10 (e.g. E10G), M13 (e.g. M13R), W24 (e.g. W24G), G32 (e.g. G32A), K38 (e.g. K38E), G45 (e.g. G45D), G51 (e.g. G51V), A63 (e.g. A63S), E66 (e.g. E66V or E66D), R75 (e.g. R75H), C91 (e.g. C91R), G93 (e.g. G93W), H97 (e.g. H97Y or H97L), A107 (e.g. A107V), E108 (e.g. E108D), D109 (e.g. D109N), P110 (e.g. P110H), H124 (e.g. H124Y), A126 (e.g. A126D), H129 (e.g. H129R or H129N), F150 (e.g. F150P or F150S), S165 (e.g. S165L).

Example 27—Adenine Base Editors in Mammalian Cells

Variants of adenine base editors identified from E. coli selection in Example 27 were codon-optimized for mammalian cell expression and tested in HEK293T cells. Four guides were designed to test A to G conversion in cells (SEQ ID NOs. 861-864 for spacers and SEQ ID NO. 876 for MG34-1 guide scaffold). 11 variants (r2v3, r2v5, r2v7, r2v8, r2v11, r2v12, r2v13, r2v14, r2v15, r2v16, and r2v23 (SEQ ID NOs. 839, 841, 843, 844, 847, 848, 849, 850. 851, 852, and 859) outperformed r1v1 in the first three guides screened. When the mutations were displayed on the predicted structure of MG68-4, it was found that five residues (W24, G51, E108, P110, and F150) surrounding the active site were changed. Notably, r2V7 (D7G and E10G (SEQ ID NO. 843)) and r2V16 (H129N (SEQ ID NO. 852)), while containing mutations away from the active site, displayed greater improvement of editing efficiencies than other mutations (FIG. 29). With this round of screening, editing efficiency of r1v1 was increased from 2.8% to 7.9% on r2v7 and from 2.8% to 9.09% on r2v16 when guide 2 was used (FIG. 30).

Example 28—Deaminase Activity on ssRNA (Prophetic)

This protocol was adapted from Wolfe, et. al. (NAR Cancer, 2020, Vol. 2, No. 4 1 doi: 10.1093/narcan/zcaa027). Linear DNA constructs containing the CDA and A1CF, a cofactor, are amplified from constructs prepared by Twist (SEQ ID NO. 741) using the same primers developed for the in gel assay on ssDNA. Constructs are cleaned by PCR Spin Column Cleanup (Qiagen) and analyzed by gel electrophoresis. Enzymes are expressed from the PCR templates in an in vitro transcription-translation system, PURExpress (NEB), at 37° C. for 2.5 hours. Deamination reactions are prepared by mixing 2 uLs of the PURExpress reaction (CDA and A1 CF) with 2 uM ssRNA substrate (IDT, SEQ ID NO. 742) in the presence of an RNAse inhibitor and incubating at 37C for 2 hours. 5′ FAM labeled DNA primer (IDT, SEQ ID NO. 743) is then added to a concentration of 1.3 μM. The reaction is heated at 95° C. for 10 minutes and then allowed to cool gradually to room temperature for at least 30 minutes. Then, a reverse transcription mastermix comprising 5 mM DTT, Protoscript II RT (NEB) (5 U/μL), Protoscript II Buffer (NEB) (1×), RNAseOut (ThermoFisher) (0.4 U/μL), dTTP (0.25 mM), dCTP (0.25 mM), dATP (0.25 mM), and ddGTP (5 mM) is added. A full length transcription product is produced when the RNA substrate is deaminated. In contrast, when there is no deamination, a “C” will remain in the RNA substrate, and the reverse transcription reaction will terminate upon incorporation of ddGTP opposite this C. The reaction is incubated at 42° C. for one hour, and then at 65° C. for 10 minutes. Aliquots are then mixed with 2×RNA loading dye (NEB) and heated at 75° C. for 10 minutes, then cooled on ice for two minutes. Samples are loaded onto 10% or 15% Urea-TBE denaturing gels (Biorad). DNA bands are visualized by a Chemi-Doc imager (Biorad). Successful deamination is observed by the visualization of a full length (55 bp) fluorescently labeled band in the gel. Non-deaminated products appear as shorter (43 bp) fluorescently labeled bands.

Example 29—Increased Cytosine Base Editing Efficiency Upon Fam72a Expression

Fam72a has been documented as opposing uracil DNA glycosylase (UDG) during B cell somatic hypermutation and class-switch recombination to prevent mismatch-repair-based correction of mutated Immunoglobulin alleles. Expression of Fam72a during engineered cytosine base editing may suppress UDG activity and thereby increase the conversion targeted of C into T.

HEK293 cells (150,000) were lipofected using JetOptimus according to the manufacturer's instructions with plasmids encoding a Cas9-CBE fusion (pMG3078; 500 ng), a plasmid encoding either sgRNA PE266 or PE691 (250 ng), and a plasmid encoding either Fam72a (pMG3072; 500 ng) or not. Cells were harvested 72 hours post-transfection, genomic DNA prepared, and the degree of base editing was determined via computational analysis of next-generation sequencing reads (FIG. 32). The CMV-driven Fam72a expression construct demonstrated increased CBE activity at two loci when Fam72a was co-expressed with a Cas9-based cytosine base editor. It was determined that Fam72a can be useful to improve cytosine base editing (CBE) with any type of cytosine base editor, not just Cas9-based constructs.

Example 30—Structural Optimization of Adenine Base Editors

33 rationally-designed ABE variants were constructed for use in mammalian cells under control of a CMV promoter (SEQ ID NOs: 1128-1160). Eights constructs contained ABEs with a MG68-4 (D109N) adenine deaminase fused to either the N- or C-terminus of a MG3-6/3-8 nickase enzyme (D13A) with linker lengths of 20, 36, 48, and 62 amino acid residues. Additionally, 25 constructs contained ABEs with an MG68-4 (D109N) adenine deaminase inlaid within the RUVC-I, REC, HNH, RUVC-III, or WED domains with 18 amino acid linkers fused to either end. These constructs are summarized in Table 12A.

TABLE 12A

Rationally-designed ABE Variants from Example 30

SEQ ID

Fusion/Inlaid
MG3-6/3-8 Domain

NO:
Description
position*
Containing Inlaid MG68-4

1128
3-68_DIV1_M_RDr1v1_B
N-term 36AA linker
N-terminal fusion

1129
3-68_DIV2_M_RDr1v1_B
N-term 48AA linker
N-terminal fusion

1130
3-68_DIV3_M_RDr1v1_B
N-term 62AA linker
N-terminal fusion

1131
3-68_DIV4_M_RDr1v1_B
N-term 20AA linker
N-terminal fusion

1132
3-68_DIV5_M_RDr1v1_B
C-term 36AA linker
C-terminal fusion

1133
3-68_DIV6_M_RDr1v1_B
C-term 48AA linker
C-terminal fusion

1134
3-68_DIV7_M_RDr1v1_B
C-term 62AA linker
C-terminal fusion

1135
3-68_DIV8_M_RDr1v1_B
C-term 20AA linker
C-terminal fusion

1136
3-68_DIV9_M_RDr1v1_B
Inlaid 26AA
RUVC-I

1137
3-68_DIV10_M_RDr1v1_B
Inlaid 202AA
REC

1138
3-68_DIV11_M_RDr1v1_B
Inlaid 262AA
REC

1139
3-68_DIV12_M_RDr1v1_B
Inlaid 297AA
REC

1140
3-68_DIV13_M_RDr1v1_B
Inlaid 335AA
REC

1141
3-68_DIV14_M_RDr1v1_B
Inlaid 409AA
REC

1142
3-68_DIV15_M_RDr1v1_B
Inlaid 537AA
Between Linker 1 and HNH

1143
3-68_DIV16_M_RDr1v1_B
Inlaid 550AA
HNH

1144
3-68_DIV17_M_RDr1v1_B
Inlaid 575AA
HNH

1145
3-68_DIV18_M_RDr1v1_B
Inlaid 591AA
HNH

1146
3-68_DIV19_M_RDr1v1_B
Inlaid 615AA
HNH

1147
3-68_DIV20_M_RDr1v1_B
Inlaid 657AA
HNH

1148
3-68_DIV21_M_RDr1v1_B
Inlaid 661AA
HNH

1149
3-68_DIV22_M_RDr1v1_B
Inlaid 688AA
Between Linker 2 and

RUVC-III

1150
3-68_DIV23_M_RDr1v1_B
Inlaid 696AA
RUVC-III

1151
3-68_DIV24_M_RDr1v1_B
Inlaid 717AA
RUVC-III

1152
3-68_DIV25_M_RDr1v1_B
Inlaid 768AA
RUVC-III

1153
3-68_DIV26_M_RDr1v1_B
Inlaid 771AA
RUVC-III

1154
3-68_DIV27_M_RDr1v1_B
Inlaid 775AA
RUVC-III

1155
3-68_DIV28_M_RDr1v1_B
Inlaid 782AA
RUVC-III

1156
3-68_DIV29_M_RDr1v1_B
Inlaid 788AA
RUVC-III

1157
3-68_DIV30_M_RDr1v1_B
Inlaid 791AA
RUVC-III

1158
3-68_DIV31_M_RDr1v1_B
Inlaid 836AA
Between RUVC-III and WED

1159
3-68_DIV32_M_RDr1v1_B
Inlaid 866AA
WED

1160
3-68_DIV33_M_RDr1v1_B
Inlaid 887AA
WED

*Inlaid denotes the upstream native residue after which the deaminase is inserted. For example, “Inlaid 887AA” indicates that the deaminase is inlaid between amino acids 887 and 888.

Plasmids expressing the 33 ABE variants were separately transiently co-transfected into HEK293 cells with plasmids expressing 8 sgRNAs (SEQ ID NOs: 1188-1195) targeting a specific locus in the human genome. After 72 hours, cells were harvested and analyzed for on-target editing (FIG. 36 and Table 12B).

TABLE 12B

Frequency of base editing detected for the HEK293T editing experiment of Example 30

A1
A3
A5
A7
A8
A9
A10
A18
A20
A22

Insertion

(A to
(A to
(A to
(A to
(A to
(A to
(A to
(A to
(A to
(A to

Construct
Site
Linker Length
G %)
G %)
G %)
G %)
G %)
G %)
G %)
G %)
G %)
G %)

3-68_DIV1_M_RDr1v1_B
N-terminal
36AA linker
0.1
0.005
0.655
0.05
0.465
0.24
0.65
0.03
0.1
0.03

insertion

3-68_DIV2_M_RDr1v1_B
N-terminal
48AA linker
0.045
0.01
1.185
0.325
0.76
0.5
1.325
0.035
0.085
0.01

insertion

3-68_DIV3_M_RDr1v1_B
N-terminal
62AA linker
0.03
0.02
1.315
0.22
0.575
0.19
1.56
0.05
0.09
0.03

insertion

3-68_DIV4_M_RDr1v1_B
N-terminal
20AA linker

insertion

3-68_DIV5_M_RDr1v1_B
C-terminal
36AA linker
0.04
0.015
0.08
0.045
0.095
0.32
1.86
0.035
0.075
0.025

insertion

3-68_DIV6_M_RDr1v1_B
C-terminal
48AA linker
0.03
0.015
0.39
0.05
0.215
0.655
4.065
0.04
0.095
0.025

insertion

3-68_DIV7_M_RDr1v1_B
C-terminal
62AA linker
0.015
0.02
0.205
0.535
0.555
0.905
5.45
0.025
0.095
0.02

insertion

3-68_DIV8_M_RDr1v1_B
C-terminal
20AA linker
0.025
0.015
0.29
0.125
0.14
0.14
1.16
0.05
0.12
0.03

insertion

3-68_DIV9_M_RDr1v1_B
Inlaid 26AA
18AA linker

3-68_DIV10_M_RDr1v1_B
Inlaid 202AA
18AA linker
0.025
0.035
0.14
0.05
0.03
0.46
4.26
0.105
0.08
0.025

3-68_DIV11_M_RDr1v1_B
Inlaid 262AA
18AA linker

3-68_DIV12_M_RDr1v1_B
Inlaid 297AA
18AA linker
0.01
0.015
5.86
2.14
1.635
2.495
6.18
0.085
0.123
0.02

3-68_DIV13_M_RDr1v1_B
Inlaid 335AA
18AA linker

3-68_DIV14_M_RDr1v1_B
Inlaid 409AA
18AA linker

3-68_DIV15_M_RDr1v1_B
Inlaid 537AA
18AA linker
0.02
0.015
0.165
0.04
0.08
0.1
0.805
0.03
0.06
0.06

3-68_DIV16_M_RDr1v1_B
Inlaid 550AA
18AA linker
0.03
0.015
0.26
0.12
0.345
0.345
2.62
0.025
0.09
0.015

3-68_DIV17_M_RDr1v1_B
Inlaid 575AA
18AA linker

3-68_DIV18_M_RDr1v1_B
Inlaid 591AA
18AA linker

3-68_DIV19_M_RDr1v1_B
Inlaid 615AA
18AA linker
0.04
0.01
0.075
0.015
0.075
0.16
1.05
0.025
0.095
0.015

3-68_DIV20_M_RDr1v1_B
Inlaid 657AA
18AA linker

3-68_DIV21_M_RDr1v1_B
Inlaid 661AA
18AA linker
0.045
0.025
0.43
0.065
0.315
0.4
3.305
0.03
0.04
0.015

3-68_DIV22_M_RDr1v1_B
Inlaid 688AA
18AA linker

3-68_DIV23_M_RDr1v1_B
Inlaid 696AA
18AA linker

3-68_DIV24_M_RDr1v1_B
Inlaid 717AA
18AA linker

3-68_DIV25_M_RDr1v1_B
Inlaid 768AA
18AA linker
0.135
0.015
6.395
1.52
3.595
4.615
12.8
0.025
0.045
0.025

3-68_DIV26_M_RDr1v1_B
Inlaid 771AA
18AA linker
0.275
0.11
6.855
1.67
3.81
4.285
12.98
0.015
0.035
0.01

3-68_DIV27_M_RDr1v1_B
Inlaid 775AA
18AA linker
0.09
0.04
5.87
1.515
3.245
4.54
11.65
0.015
0.075
0.02

3-68_DIV28_M_RDr1v1_B
Inlaid 782AA
18AA linker
0.105
0.125
5.84
1.98
3.68
4.315
12.705
0.035
0.08
0.01

3-68_DIV29_M_RDr1v1_B
Inlaid 788AA
18AA linker
0.15
0.045
4.57
1.475
2.07
2.85
8.215
0.015
0.065
0.025

3-68_DIV30_M_RDr1v1_B
Inlaid 791AA
18AA linker
0.32
0.18
6.545
2.99
3.44
4.25
13.295
0.02
0.07
0.04

3-68_DIV31_M_RDr1v1_B
Inlaid 836AA
18AA linker

3-68_DIV32_M_RDr1v1_B
Inlaid 866AA
18AA linker

3-68_DIV33_M_RDr1v1_B
Inlaid 887AA
18AA linker

Background
N/A
N/A
0.015
0.015
0.005
0.03
0.025
0.035
0.245
0.03
0.075
0.025

Sequencing results showed that 19 of the 33 ABEs were capable of on-target editing at a level of at least 1% editing when co-expressed with an sgRNA targeting the TRAC locus (FIG. 33). Constructs used in this experiment included 3-68_DIV1_M_RDr1v1_B, 3-68_DIV2_M_RDr1v1_B, 3-68_DIV3_M_RDr1v1_B, 3-68_DIV4_M_RDr1v1_B, 3-68 DIV5_M_RDr1v1_B, 3-68 DIV6_M_RDr1v1_B, 3-68 DIV7_M_RDr1v1_B, 3-68_DIV8_M_RDr1v1_B, 3-68_DIV9_M_RDr1v1_B, 3-68_DIV10_M_RDr1v1_B, 3-68_DIV11_M_RDr1v1_B, 3-68_DIV12_M_RDr1v1_B, 3-68_DIV13_M_RDr1v1_B, 3-68_DIV14_M_RDr1v1_B, 3-68_DIV15_M_RDr1v1_B, 3-68_DIV16_M_RDr1v1_B, 3-68 DIV17_M_RDr1v1_B, 3-68 DIV18_M_RDr1v1_B, 3-68 DIV19_M_RDr1v1_B, 3-68_DIV20_M_RDr1v1_B, 3-68_DIV21_M_RDr1v1_B, 3-68_DIV22_M_RDr1v1_B, 3-68_DIV23_M_RDr1v1_B, 3-68_DIV24_M_RDr1v1_B, 3-68_DIV25_M_RDr1v1_B, 3-68_DIV26_M_RDr1v1_B, 3-68_DIV27_M_RDr1v1_B, 3-68_DIV28_M_RDr1v1_B, 3-68 DIV29_M_RDr1v1_B, 3-68 DIV30_M_RDr1v1_B, 3-68 DIV31_M_RDr1v1_B, 3-68_DIV32_M_RDr1v1_B, and 3-68_DIV33_M_RDr1v1_B (FIG. 36). The construct with the highest levels of editing of any A residue within the spacer region was 3-68_DIV30_M_RDr1v1_B, with a maximum on-target editing rate of 13.3% (n=2) (FIG. 33). Also of note was 3-68_DIV12_M_RDr1v1_B, which displayed similar editing levels between A5 (5.86%) and A10 (6.18%), indicating that v12 may have an altered base editing window within the spacer region relative to the other active ABEs. In addition to evaluating on-target editing, the cell viability of each base editor/sgRNA co-transfection was visually assessed. Cells transfected with numerous constructs, including 3-68_DIV30_M_RDr1v1_B and 3-68_DIV12_M_RDr1v1_B, had high cell viability, whereas many cells transfected with the N- or C-terminally fused constructs had low cell viability.

Example 31—Engineering of the Adenosine Deaminase

As tRNA adenosine deaminase (TadA) from E. coli has been engineered to target DNA and improve the base editing activity in mammalian cells, it was postulated that porting analogous mutations documented to improve editing in EcTadA to MG68-4 (D109N) may improve the deaminase activity. By surveying the literature, mutations of EcTadA from ABE7.10, ABE8.8m, ABE8.17m, and ABE8e were collected. The equivalent residues on MG68-4 were parsed out through multiple sequence alignment and structural alignment. 22 rationally designed variants on top of MG68-4 (D109N) were generated and fused to the N-terminus of MG34-1 (D10A) (SEQ ID NOs: 1161-1183). To import base editors into the nucleus, a nuclear localization signal (NLS) was incorporated to the c-terminus of the enzyme. The effect of dual NLS system (e.g. on both N- and C-termini) on editing efficiency was evaluated (FIGS. 34A and 34B) (SEQ ID NOs: 1184-1186). Genes of base editors and guide RNAs were coexpressed by CMV and U6 promoters, respectively. In this experiment, single plasmids carrying required editing components (SEQ ID NOs: 1187 and 1207) were transfected into HEK293T cells, and editing efficiencies were evaluated through NGS. The results showed that the top three performers (RD9, RD18, and RD5) achieved 27.4%, 26.6%, and 23.8% A to G conversion on A8, respectively. A 45% increase in editing efficiency was obtained when comparing RD9 (MG68-4 (D109N/T112R)) to MGA1.1 (MG68-4 (D109N)). The two-NLS design had comparable activity to the one-NLS design. MGA1.1_2NLS achieved 11.4% conversion, which is lower than 19.2% MGA1.1 (FIG. 35).

Example 32—Engineered CBEs to Relax Sequence Selectivity of CDA at −1 Position of the Target Cytosine and Improved On-Target Activity on DNA

Two approaches were taken toward mutagenesis to improve the editing activity and selectivity for cytosine base editors (CBEs). First, as it was hypothesized that low or mid-editing efficiency and nickase-independent deamination events of wild-type CBEs may be caused by the intrinsic DNA/RNA binding affinities of the cytidine deaminase(s), mutagenesis (point mutation) of cytidine deaminases to alter intrinsic DNA/RUNA affinity was considered. Second, as a loop adjacent to the active site has been identified as important for determining selectivity at the −1 position relative to the targeted cytosine in related families of base editors (loop 7, Kolhi et al., J. Biol. Chem 2009, 284, 22898-22904), experiments to swap loop 7 sequences among cytosine base editors were considered.

Utilizing structural-based homology models of APOBEC1 (Wolfe et al., NAR Cancer 2020, 2, 1-15), AID (Kolhi et al., J Biol. Chem. 2009, 284, 22898-22904), and APOBEC3A (Shi et al., Nat Struct Mol Biol. 2017, 24, 131-139), the putative loop 7 of novel cytidine deaminases described herein were predicted and identified in order to develop a loop 7 swapping experiment to relax the sequence selectivity of these candidates. Several residues were also targeted for mutation to increase activity on DNA and reduce RNA activity (Yu et al., Nature Communications 2020, 11, 2052). A total of 108 CDA variants (with MG93, MG139 and MG152 families) were designed with either a point mutation or a loop 7 swapping with AID deaminase that is documented to have a 5′RC selectivity (SEQ ID NOs: 1208-1315).

TABLE 12C

Cytosine Base Editor Mutants Investigated in Example 32

Background

SEQ ID

Nomenclature
enzyme for

NO:
Description
in Experiments
mutation

1208
W90A
MG93_4v1
MG93-4

1209
W90F
MG93_4v2
MG93-4

1210
W90H
MG93_4v3
MG93-4

1211
W90Y
MG93_4v4
MG93-4

1212
Y120F
MG93_4v5
MG93-4

1213
Y120H
MG93_4v6
MG93-4

1214
Y121F
MG93_4v7
MG93-4

1215
Y121H
MG93_4v8
MG93-4

1216
Y121Q
MG93_4v9
MG93-4

1217
Y121A
MG93_4v10
MG93-4

1218
Y121D
MG93_4v11
MG93-4

1219
Y121W
MG93_4v12
MG93-4

1220
H122Y
MG93_4v13
MG93-4

1221
H122F
MG93_4v14
MG93-4

1222
H122I
MG93_4v15
MG93-4

1223
H122A
MG93_4v16
MG93-4

1224
H122W
MG93_4v17
MG93-4

1225
H122D
MG93_4v18
MG93-4

1226
Replace with hAID loop7
MG93_4v19
MG93-4

1227
Replace with 139_86 loop 7
MG93_4v20
MG93-4

1228
Truncate from 188 to end
MG93_4v21
MG93-4

1229
Y121T
MG93_4v22
MG93-4

1230
Replace with a smaller
MG93_4v23
MG93-4

section of hAID loop7

1231
Replace with a smaller
MG93_4v24
MG93-4

section of hAID loop7

1232
R33A
MG93_4v25
MG93-4

1233
R34A
MG93_4v26
MG93-4

1234
R34K
MG93_4v27
MG93-4

1235
H122A R33A
MG93_4v28
MG93-4

1236
H122A R34A
MG93_4v29
MG93-4

1237
R52A
MG93_4v30
MG93-4

1238
H122A R52A
MG93_4v31
MG93-4

1239
N57G (Shown to have
MG93_4v32
MG93-4

lower off target

activity in A3A)

1240
N57G H122A
MG93_4v33
MG93-4

1241
Replace with A3A loop7
MG139_86v1
MG139-86

1242
E123A
MG139_95v1
MG139-95

1243
E123Q
MG139_95v2
MG139-95

1244
Replace with hAID loop7
MG93_3v1
MG93-3

1245
Replace with 139_86 loop 7
MG93_3v2
MG93-3

1246
W127F
MG93_3v3
MG93-3

1247
W127H
MG93_3v4
MG93-3

1248
W127Q
MG93_3v5
MG93-3

1249
W127A
MG93_3v6
MG93-3

1250
W127D
MG93_3v7
MG93-3

1251
R39A
MG93_3v8
MG93-3

1252
K40A
MG93_3v9
MG93-3

1253
H128A
MG93_3v10
MG93-3

1254
N63G
MG93_3v11
MG93-3

1255
R58A
MG93_3v12
MG93-3

1256
Replace with hAID loop7
MG93_11v1
MG93-11

1257
Replace with 139_86 loop 7
MG93_11v2
MG93-11

1258
H121F
MG93_11v3
MG93-11

1259
H121Y
MG93_11v4
MG93-11

1260
H121Q
MG93_11v5
MG93-11

1261
H121A
MG93_11v6
MG93-11

1262
H121D
MG93_11v7
MG93-11

1263
H121W
MG93_11v8
MG93-11

1264
N57G (Shown to have
MG93_11v9
MG93-11

lower off target

activity in A3A)

1265
R33A
MG93_11v10
MG93-11

1266
K34A
MG93_11v11
MG93-11

1267
H122A
MG93_11v12
MG93-11

1268
H121A
MG93_11v13
MG93-11

1269
R52A
MG93_11v14
MG93-11

1270
K16 through P25 of pgtA3H
139_52v1
MG139-52

replaces G20 through P26

1271
S170 through D138 of pgtA3H
139_52v2
MG139-52

replaces K196 to V215

1272
P26R
139_52v3
MG139-52

1273
P26A
139_52v4
MG139-52

1274
N27R
139_52v5
MG139-52

1275
N27A
139_52v6
MG139-52

1276
W44A (equivalent to R52A)
139_52v7
MG139-52

1277
W45A (equivalent to R52A)
139_52v8
MG139-52

1278
K49G (equivalent to N57G)
139_52v9
MG139-52

1279
S50G (equivalent to N57G)
139_52v10
MG139-52

1280
R51G (equivalent to N57G)
139_52v11
MG139-52

1281
R121A (equivalent to H121A)
139_52v12
MG139-52

1282
I122A (equivalent to H122A)
139_52v13
MG139-52

1283
N123A (equivalent to H122A)
139_52v14
MG139-52

1284
Y88F (equivalent to W90F)
139_52v15
MG139-52

1285
Y120F (equivalent to Y120F)
139_52v16
MG139-52

1286
P22R
139_86v2
MG139-86

1287
P22A
139_86v3
MG139-86

1288
K23A
139_86v4
MG139-86

1289
K41R
139_86v5
MG139-86

1290
K41A
139_86v6
MG139-86

1291
truncate K179 and onwards
139_86v7
MG139-86

1292
Insert hAID loop 7 and
139_86v8
MG139-86

truncate K179 onwards

1293
E54D and truncation
139_86v9
MG139-86

1294
E54A Mutate catalytic E residue
139_86v10
MG139-86

1295
Mutate neighboring E residue
139_86v11
MG139-86

1296
E54AE55A Mutate both
139_86v12
MG139-86

catalytic E residues

1297
K30A
152_6v1
MG152-6

1298
K30R
152_6v2
MG152-6

1299
M32A
152_6v3
MG152-6

1300
M32K
152_6v4
MG152-6

1301
Y117A
152_6v5
MG152-6

1302
K118A
152_6v6
MG152-6

1303
I119A
152_6v7
MG152-6

1304
I119H
152_6v8
MG152-6

1305
R120A
152_6v9
MG152-6

1306
R121A
152_6v10
MG152-6

1307
P46A
152_6v11
MG152-6

1308
P46R
152_6v12
MG152-6

1309
N29A
152_6v13
MG152-6

1310
Loop 7 from MG138-20
152_6v14
MG152-6

1311
Loop 7 from MG139-12
152_6v15
MG152-6

1312
R27A
138_20v1
MG138-20

1313
N50G
138_20v2
MG138-20

1314
Loop 7 from MG138-20
139_52v17
MG139-52

1315
Loop 7 from MG139-12
139_52v18
MG139-52

Example 33—In Vitro Activity of Novel CDA Variants from the MG93, MG139, and MG152 Families
In Vitro Deaminase In-Gel Assay

Linear DNA constructs containing the CDA were amplified from the previously mentioned plasmids from Twist via PCR. All constructs were cleaned via SPRI Cleanup (Lucigen) and eluted in a 10 mM tris buffer. Enzymes were expressed from the PCR templates in an in vitro transcription-translation system, PURExpress (NEB), at 37° C. for 2 hours. Deamination reactions were prepared by mixing 2 μL of the PURExpress reaction with 2 μM 5′-FAM labeled ssDNA (IDT) (4 different ssDNA substrates were used with different −1 nucleobase (A or C or T or G) next to the target cytidine (SEQ ID NOs: 1316-1319; FIG. 37) or with 0.5 μM Cy3 and Cy5.5 labeled ssDNA (IDT, 2 different substrates with either AC vs GC or CC vs TC, SEQ ID NOs: 1320-1321; FIG. 38) and 1U USER Enzyme (NEB) in 1× Cutsmart Buffer (NEB). The reactions were incubated at 37° C. for 2 hours and then quenched by adding 4 units of proteinase K (NEB) and incubating at 55° C. for 10 minutes. The reaction was further treated by addition of 11 μL of 2×RNA loading dye and incubation at 75° C. for 10 minutes. All reaction conditions were analyzed by gel electrophoresis in a 10% denaturing gel (Biorad). DNA bands were visualized by a Chemi-Doc imager (Biorad) and band intensities were quantified using BioRad Image Lab v6.0 (FIG. 39). Successful deamination is observed by the visualization of a 10 bp fluorescently labeled band in the gel.

The deamination of cytosine (C) is catalyzed by cytidine deaminases and results in uracil (U), which has the base-pairing properties of thymine (T). Most documented cytidine deaminases operate on RNA, and the few examples that are documented to accept DNA require single-stranded DNA (ssDNA). The in vitro activity of 108 CDAs on 4 ssDNA substrates containing cytosine in all four possible 5′-NC contexts was measured (FIGS. 37 and 38). The percentage of deamination for each nucleobase at 1-nt position was also calculated to evaluate if the selected mutations altered the sequence selectivity of the designed variants in vitro (FIGS. 39 and 40). Notably, several variants display a more relaxed sequence base selectivity for MG93 and MG139 families (FIGS. 39 and 40) and were selected for downstream in vivo mammalian cell activity as full CBEs.

Example 34—Mammalian Editing Activity of Novel and Engineered CDAs as CBEs

In order to test the activity of novel CDAs as well as engineered variants, an engineered cell line was devised with 5 consecutive PAMs compatible with MG3-6 and Cas9. This cell line allows for gRNA tiling to test editing efficiency and find −1 nt selectivity.

In order to test the novel and engineered CDAs, the CDAs were cloned in a plasmid backbone containing MG3-6. The CDAs were cloned in the N termini. Once the cloning of novel and variant CDAs was confirmed, they were transiently transfected into the engineered HEK293T cells using lipofectamine 2000. A total of 32 novel CDAs and 2 engineered variants (139-52-V6 and 93-4-V16) were tested in the gRNA tiling experiment described above (SEQ ID NOs: 1322-1355). Out of the 34 tested CDAs. 22 showed editing activity higher than 1% (FIG. 41A). The top performers were MG152-6, MG139-52v6, MG93-4, MG139-52, MG139-94, MG93-7, MG93-3, MG139-12, MG139-103, MG139-95, MG139-99, MG139-90, MG139-89, MG139-93, MG138-30, MG139-102, MG93-4v16, MG152-5, MG138-20, MG138-23, MG93-5, MG152-4, and MG152-1. When the editing activity was normalized per experimental condition relative to a positive control (documented high activity CDA: A0A2K5RDN7), it was observed that 9 candidates showed at least 20% the activity of the A0A2K5RDN7 positive control (FIG. 41B). Amongst these 9 candidates, 3 of them showed at least 50% the activity of A0A2K5RDN7; 139-52-V6, 152-6, and 139-52 showed 95%, 65%, and 60% of the activity, respectively. FIG. 41C shows side by side comparison of 2 targeting spacers. 139-52-V6 shows essentially the same editing activity as A0A2K5RDN7, as observed in FIG. 41C.

To characterize the −1 nt selectivity. 16 candidates of interest were selected. The −1 nt mammalian cell selectivity was calculated by selecting the top 4 modified cytosines per guide RNA and calculating the ratio per −1 position. The analysis was restricted to cytosines with >1% editing. The average ratio for all 5 guides were plotted. The −1 nt in vitro selectivity was plotted by calculating the sum of percentage cleavages (percent cleavage measures percent deamination) per −1 nt selectivity and then calculating the ratio per −1 nucleotide. The mammalian cell and in vitro −1 nt selectivity is shown in FIG. 42. Notably, different CDA families are documented as having different −1 nt selectivities, and their selectivities tend to be conserved amongst proteins belonging to the same family. For example, the MG93 family is documented to be selective for T as −1, while the MG139 family is documented to be selective for C as −1. Importantly, the active candidates are documented to have different −1 nt selectivities: 152-6 is selective for T in the −1 position, whereas the 139-52 (WT and engineered variant) has a strong selectivity for C at the −1 position. Having candidates with strong −1 nt selectivities is advantageous, since having a tighter nt selectivity improves off target activity. Candidates with different and strong −1 nt selectivities allow for targeting of different loci with minimal off target activity. Notably, candidates with unusual −1 selectivities were identified. Candidates with purine selectivities include 139-12 and 138-20, with A and G selectivities. These properties may generate variants with G and/or A −1 selectivities with high editing efficiencies.

The candidate 139-52 vas documented as having deaminase activity on both ssDNA and on the DNA strand forming a DNA/RNA heteroduplex (also shown in FIG. 43B). Having exclusive activity in the DNA forming a DNA/RNA heteroduplex may be advantageous in terms of guide-independent off target activity and smaller editing window, as such engineering for this feature is an important venue. When the 139-52-V6 mutant was generated, it was interestingly noted that it abolished the deaminase activity in the DNA/RNA heteroduplex, thus shedding light on the potential importance of this residue for such activity.

The 139-52-V6, 152-6, and 139-52 candidates have high editing efficiencies (FIGS. 41A, 41B, and 41C) and different −1 nt selectivities (FIG. 42). Seeking to characterize them further, how wide their targeting window was in relation to the R-loop formation (spacer targeting) was analyzed. 2 out of the 3 candidates (152-6 and 139-52-V6) show a tighter editing window when compared to the high editing positive control A0A2K5RDN7 (FIG. 44). Having a tighter editing window may help to prevent off-target activities. The engineered candidate 139-52-V6 has a smaller editing window than its WT counterpart (FIG. 44), shedding light on the importance of this mutation. The mutation improved the on-target editing efficiency (FIGS. 41A and 41B), while narrowing the editing window (FIG. 44).

Moreover, the cytotoxicity of all CDA candidates was measured by stably expressing the candidates in mammalian cells through lentiviral transduction. Each CDA candidate was cloned as CBE (using MG3-6 as partner), lentiviruses were produced, and cells were transduced. 3 days post-transduction, cells were selected for viral integration and CBE expression by puromycin selection. The puromycin cassette was downstream of CBEs with a 2A peptide; thus, cells surviving selection expressed the CBEs. Surviving cells were dyed with crystal violet, crystal violet was then solubilized with SDS, and absorbance was taken in a plate reader. It was determined that different CDAs have various levels of cytotoxicity (FIG. 45). The 139-52-V6, 152-6, and 139-52 candidates show a promising cytotoxicity profile under these conditions. It is expected that when the candidates are expressed transiently, this effect may diminish greatly.

Example 35—Using Low Activity CDAs with Nickases with Improved Target Binding Affinity (Prophetic)

Analyzing the editing windows and cytotoxic profiles demonstrated that it may be advantageous to use CDAs with slower deamination kinetics in conjunction with effector enzymes with higher residency time in the targets. In order to create such systems, along form tracr RNA (see e.g. Workman et al. Cell 2021, 184, 675-688, which is incorporated by reference herein in its entirety) is used in the gRNA in conjunction with CDAs with various kinetics (low, medium, and high). These systems may improve on target editing efficiencies of low and medium CDAs, while generating a narrower editing window and a more favorable cytotoxic profile.

Example 36—Adenine Deaminase Engineering (Prophetic)

To improve on-target activity on ssDNA and minimize cellular RNA-unguided deamination, all beneficial mutations previously identified from rational design and directed evolution in the literature were used to design new adenine deaminase (ADA) variants from novel deaminases families (MG129-MG137 and MG68 families, SEQ ID NOs: 1556-1638).

TABLE 12D

Adenosine Deaminase Mutants Designed in Example 36

Name in

Experiments

(numbers

before “v”

denote

SEQ ID
Mutation (relative
background

NO:
to background enzyme)
enzyme

1556
A20R, A34L, R46A, E49L, V80S, L82F,
MG131-1v1

C104V, D106N, P107S, A109T, T117N,

A120N, D121Y, R144C, F147Y, L150P,

Q153V, G154F, K155N

1557
A12R, A26L, R38A, T41L, V72S, L74F,
MG131-2v2

C96V, D98N, P99S, G101T, A109N,

V112N, D113Y, R136C, F139Y, L142P,

L145V, G146F, K147N

1558
A21R, V34L, R46A, A49L, V80S, L82F,
MG131-5v3

C104V, D106N, P107S, A109T, S117N,

D121Y, Q144C, F147Y, L150P, Q153V,

G154F, K155N

1559
T43R, A56L, R68A, G71L, V102S,
MG131-6v4

M104F, C126V, D128N, P129S, A131T,

R139N, D142N, D143Y, R166C, F169Y,

L172P, ins175V

1560
T36R, R61A, N64L, V95S, M97F, C119V,
MG131-9v5

D121N, P122S, A124T, Q132N, D135N,

D136Y, K159C, F162Y, L165P, R168V

1561
G41R, V54L, R66A, G69L, V100S,
MG131-7v6

M102F, C124V, D126N, P127S, A129T,

S137N, E140N, D141Y, R164C, F167Y,

L170P, P173V, E174F, A175N

1562
G19R, R32L, R44A, W47L, V78S, L80F,
MG131-3v7

A102V, D104N, P105S, A107T, A115N,

E118N, D119Y, T141C, F144Y, L147P,

G150del, R151del, A153F, R154N, G156Q,

R157K, P158K, G160Q, E162S, E163I,

E164N

1563
A20R, D33L, R46A, E49L, V80S, L82F,
MG134-1v1

C104V, D106N, P107S, A109R, D117N,

R120N, D121Y, Q144C, F147Y, K153V,

N154F, R155N

1564
A19R, R32L, R44A, E47L, V78S, L80F,
MG134-2v2

A102V, D104N, P105S, A107R, E115N,

T118N, D119Y, R142C, F145Y, R151V,

A152F, K153N

1565
A25R, R50A, D53L, V84S, L86F, A108V,
MG134-3v3

D110N, A111S, A113R, Q121N, S124N,

D125Y, R148C, F151Y, R157V, R158F,

R159N

1566
G19R, R32L, R44A, E47L, V78S, L80F,
MG134-4v4

A102V, D104N, P105S, A107R, Q115N,

E118N, D119Y, K142C, F145Y, A148P,

R151V, A152F, R153N

1567
S20R, R33L, P45A, A48L, V79S, V81F,
MG135-1v1

A103V, D105N, P106S, A108T, Q116N,

H120Y, Q143C, F146Y, K149P

1568
L32R, S45L, P57A, A60L, V91S, V93F,
MG135v-2v2

A115V, D117N, A118S, A120T, Q128N,

H132Y, Q155C, F158Y, R161P, E164V,

P165F, D166N

1569
L12R, H25L, S37A, D40L, A71S, I73F,
MG135-4v3

A95V, SD97N, P98S, A100T, Q108N,

H112Y, Q135C, F138Y, R141P

1570
L25R, C38L, N50A, D53L, A84S, I86F,
MG135-5v4

A108V, D110N, L111S, A113T, Q121N,

H125Y, Q148C, F151Y, R154P

1571
L44R, H57L, N69A, D72L, V103S, I105F,
MG135-6v5

S127V, D139N, P130S, A132T, P140N,

H144Y, Q167C, F170Y, R173P

1572
L12R, H25L, N37A, E40L, V71S, I73F,
MG135-8v6

A95V, D97N, P98S, A100T, Q108N,

H112Y, Q135C, F138Y, R141P

1573
A20R, C33L, N45A, D48L, V79S, I81F,
MG135-7v7

A103V, D105N, P106S, A108T, T116N,

H120Y, R143C, F146Y, K149P

1574
Q20R, C33L, N45A, D48L, V79S, I81F,
MG135-3v8

A103V, D105N, P106S, A108T, G116N,

H120Y, Q143C, F146Y, K149P

1575
E30R, S43L, P55A, V80S, T114V, E116N,
MG137-1v1

P117S, A119R, Q127N, K130N, N131Y,

S155C, F158Y, R161P

1576
A30R, M43L, P55A, V89S, T113V,
MG137-2v2

E115N, P116S, A118R, Q126N, Q129N,

D130Y, Q153C, F156Y, R159P, K173I,

E174N

1577
A23R, R36L, P48A, V82S, A106V, E108N,
MG137-4v3

P109S, A111R, C119N, D122N, E123Y,

S146C, F149Y, R152P, K166I, E167N

1578
A23R, P48A, V82S, A106V, E108N,
MG137-6v4

P109S, A111R, R119N, E122N, E123Y,

S146C, F149Y, R152P, K166I, E167N

1579
A22R, P47A, V81S, A105V, E107N,
MG137-17v5

P108S, A110R, R118N, D121N, E122Y,

S145C, F148Y, R151P, K166I, E167N

1580
A28R, R41L, P53A, V87S, A111V, E113N,
MG137-9v6

P114S, A116R, S124N, D127N, E128Y,

S151C, F154Y, R157P, S172I, E173N

1581
E12R, P37A, V71S, A95V, E97N, P98S,
MG137-11v7

S100R, R108N, D111N, A112Y, S135C,

F138Y, R141P, R156I, E157N

1582
A29R, R42L, P54A, V88S, A112V, E114N,
MG137-12v8

P115S, A117R, R125N, D128N, A129Y,

Q152C, F155Y, R158P

1583
A20R, P45A, V79S, T103V, E105N,
MG137-13v9

P106S, R116N, D119N, T120Y, S144C,

F147Y, P150R

1584
A22R, R35L, V47A, V81S, A105V,
MG137-15v10

E107N, P108S, A110R, A118N, D121N,

Q122Y, Q145C, F148Y, R151P

1585
A27R, R40L, P52A, V86S, T110V, E112N,
MG137-5v11

P113S, A115R, R123N, E126N, Q127Y,

S150C, F153Y, R156P

1586
A29R, R42L, P54A, V88S, A112V, E114N,
MG137-14v12

P115S, A117R, R125N, E128N, Q129Y,

Q152C, F155Y, R158P

1587
A21R, R34L, P46A, V80S, A104V, E106N,
MG137-16v13

P107S, Y109R, R117N, D120N, S121Y,

R144C, F147Y, R150P

1588
A26R, P51A, V85S, A109V, E111N,
MG137-8v14

P112S, S114R, K122N, D125N, N126Y,

S149C, F152Y, R155P, G167I, P168N

1589
F20R, A34L, P46A, V80S, A104V, E106N,
MG137-3v15

P107S, T109R, A120N, Q121Y, K144C,

F147Y, K150P

1590
K21R, G34L, V46A, V80S, L82F, A104V,
MG68-55v1

D106N, P107S, A109R, Q117N, T120N,

L121Y, T144C, F147Y, K150P, A153V,

K154F, H155N

1591
W21R, G35L, S47A, V81S, L83F, A105V,
MG68-27v2

D107N, P108S, N110R, P120N, L121Y,

K144C, F147Y, R150P, E153V, T154F,

E163I, E164N

1592
Y12R, A26L, S38A, D41L, V72S, L74F,
MG68-52v3

A96V, D98N, L99S, T101R, S112N,

D113Y, S136C, F139Y, R142P, Q145V,

K146F, K147N

1593
Y22R, S36L, P48A, S51L, V82S, L84F,
MG68-15v4

A106V, D108N, P109S, T111R, D119N,

S122N, V123Y, R146C, F149Y, R152P,

E155V, G156F, K157N, R167I, P168N

1594
Y22R, S36L, T48A, D51L, V82S, L84F,
MG68-58v5

A106V, D108N, P109S, T111R, C119N,

A122N, N123Y, R146C, F149Y, R152P,

G155V, S156F, K157N

1595
A18R, I31L, P43A, T46L, V77S, L79F,
MG68-25v6

A101V, D103N, P104S, A106R, D114N,

S118N, D119Y, R142C, F145Y, K148P,

S151V, P152F, R153N, D167I, N168N

1596
G47R, G60L, P72A, V106S, L108F,
MG68-18v7

A130V, D132N, P133S, T135R, A143N,

T146N, D147Y, K170C, F173Y, R176P,

H179V, S180F, P181N, T190I, P191N

1597
Y26R, E40L, T52A, D55L, V86S, L88F,
MG68-45v8

A110V, D112N, L113S, T115R, D127Y,

S150C, F153Y, R156P, M159V, Q160F,

K161N, K179I, D180N

1598
W40R, H53L, P65A, D68L, V99S, L101F,
MG68-13v9

A123V, D125N, P126S, T128R, D136N,

A139N, Q140Y, Q163C, F166Y, R169P,

R172V, A173F, R174N, D204A, E205N

1599
W24R, R37L, S52L, V83S, L85F, A107B,
MG68-4v10

D109N, P110S, T112R, D120N, R123N,

H124Y, S147C, F150Y, R153P, G166I

1600
F23R, H36L, R49A, V83S, L85F, A107V,
MG132-1v1

D109N, A110S, A112R, E120N, D124Y,

G147C, F150Y, K153P

1601
D35R, S48L, R61A, V95S, L97F, C119V,
MG132-1v2

D121N, P122S, A124R, Q132N, S135N,

D136Y, S159C, F162Y, K165P

1602
L12R, H25L, R39A, D42L, V73S, L75F,
MG132-1v3

C97V, D99N, P100S, A102R, Q110N,

S113N, D114Y, T137C, F140Y, K143P

1603
L25R, R38L, R50A, D53L, V84S, L86F,
MG133-1v1

A108V, D110N, G121N, A124N, D125Y,

R149C, L155P, R158V, G159F, D160N

1604
A13R, Q28L, R40A, D43L, V74S, L76F,
MG133-2v2

A98V, D100N, E111N, S114N, D115Y,

R138C, L144P

1605
A37R, E52L, R64A, D67L, V98S, L100F,
MG133-7v3

A122V, D124N, E135N, D138N, D139Y,

R162C, L168P

1606
A28R, Q43L, R55A, H58L, V89S, L91F,
MG133-4v4

A113V, D115N, E126N, D129N, D130Y,

R153C, L159P, Q162V, R163F, K164N

1607
E27R, E42L, R54A, D57L, V88S, L90F,
MG133-12v5

A112V, D114N, A125N, S128N, D129Y,

R152C, R158P

1608
A43R, G58L, R70A, D73L, V104S, L106F,
MG133-5v6

A128V, D130N, R141N, S144N, D145Y,

K168C, L174P, G177V, G178F, R179N

1609
M25R, A40L, R52A, D55L, V86S, L88F,
MG133-9v7

A110V, D112N, R123N, Q126N, D127Y,

R150C, K156P, R159V, T160F, D161N

1610
G36R, A51L, R63A, D66L, V97S, L99F,
MG133-14v8

A121V, D123N, A134N, Q137N, D138Y,

R161C, R167P

1611
A24R, S39L, R51A, D54L, V85S, L87F,
MG133-8v9

A109V, D111N, G122N, T125N, D126Y,

S149C, R155P, A158V, D159F, K160N

1612
A13R, C26L, R38A, D41L, V72S, L74F,
MG133-10v10

A96V, D98N, Q109N, S112N, E113Y,

K136C, R142P, G145V, G146F

1613
A41R, H54L, R66A, E69L, V100S, L102F,
MG133-13v11

A124V, D126N, Q137N, S140N, D141Y,

R164C, L170P, R173V, R174F, R175N

1614
A33R, K46L, R58A, A60L, V92S, L94F,
MG133-3v12

A116V, D118N, E129N, I132N, D133Y,

R156C, R162P, I165V, N166F, R167N

1615
A33R, R46L, R58A, N61L, V92S, L94F,
MG133-6v13

A116V, D118N, E129N, S132N, D133Y,

K156C, R162P, I165V, N166F, R167N

1616
S22R, R35L, R47A, W50L, V81S, L83F,
MG133-11v14

I105V, D107N, R118N, D121N, T122Y,

Q154C, R151P, K154V, D155F, K156N

1617
E31R, I44L, P56A, R59L, L92F, A114V,
MG136-1v1

D116N, I117S, F119R, R127N, D130N,

S131Y, R154C, L157Y, A160P

1618
E18R, I31L, P43A, L79F, A101V, D103N,
MG136-6v2

L104S, F106R, R114N, D117N, S118Y,

K141C, F144Y, R147P

1619
A27R, A41L, P53A, M56L, V87S, L89F,
MG136-12v3

A111V, D113N, L114S, F116R, R124N,

D127N, S128Y, E151C, F154Y, R157P

1620
G12R, A25L, T37A, D40L, I72F, A94V,
MG136-2v4

D96N, E97S, A99R, R107N, D110N,

T111Y, Q134C, F137Y, R140P

1621
D38L, T50A, D53L, I86F, A108V, D110N,
MG136-3v5

E111S, A113R, S121N, T125Y, Q148C,

F151Y, R154P

1622
A22R, A36L, P48A, N51L, I84F, S106V,
MG136-9v6

D108N, E109S, F111R, R119N, D122N,

S123Y, Q146C, F149Y, R152P

1623
E20R, A34L, T46A, A49L, T80S, I82F,
MG136-10v7

A104V, D106N, E107S, F109R, D120N,

N121Y, Q144C, F147Y, K150P, F153V,

Q154F, K155N

1624
E12R, G26L, T38A, D41L, I74F, A96V,
MG136-11v8

D98N, E99S, F101R, K109N, S112N,

G113Y, T135C, R141P

1625
S23R, Y37L, R51A, D54L, V85S, L87F,
MG129-1v1

A109V, D111N, P112S, A114R, D122N,

R149C, F152Y, L155P

1626
E18R, H31L, R43A, D46L, V77S, L79F,
MG129-2v2

A101V, D103N, P104S, A106R, E117N,

E118Y, K141C, F144Y, L147P

1627
G21R, F34L, R46A, D49L, V80S, L82F,
MG129-11v3

T104V, D106N, P107S, A109R, E120N,

E121Y, S144C, F147Y, L150P

1628
A22R, H35L, R47A, D50L, V81S, L83F,
MG129-3v4

A105V, D107N, P108S, A110R, D118N,

S121N, D122Y, R145C, F148Y, R151P

1629
A25R, R50A, D53L, V84S, L86F, A108V,
MG129-7v5

D110N, P111S, A113R, D121N, A124N,

D125Y, R148C, F151Y, R154P

1630
G12R, R37A, G40L, V71S, L73F, A95V,
MG129-4v6

D97N, P98S, A100R, D108N, Q111N,

D112Y, R135C, F138Y, R141P

1631
A20R, F33L, R45A, A48L, V79S, L81F,
MG129-9v7

A103V, D105N, P105S, A108R, A116N,

T119N, D120Y, K143C, F146Y, K149P

1632
A12R, R25L, R37A, D40L, V71S, L73F,
MG129-10v8

C95V, D97N, P98S, G100R, D108N,

Q111N, V112Y, K135C, F138Y, L141P

1633
G15R, S28L, R40A, D43L, V74S, L76F,
MG129-12v9

A98V, D100N, A101S, Q103R, G111N,

K132C, F135Y, L138P

1634
A19R, H32L, R46A, D49L, V80S, L82F,
MG130-3v1

P107S, Q117N, D121Y, K144C, V147Y,

Q150P, L153V, G154F, K155N

1635
G32R, H45L, R57A, D60L, V90S, Q92F,
MG130-1v2

C114V, P117S, A119R, Q127N, T130N,

D131Y, F157Y, L160P, G163V, P164F,

E165N

1636
A59R, A92L, R105A, D108L, V138S,
MG130-5v3

Q140F, C162V, P165S, A167R, S175N,

S178N, D179Y, F205Y, L208P, G211V,

P212F, T213N

1637
G36R, I49L, R61A, S64L, V94S, Q96F,
MG130-2v4

C118V, P121S, A123R, E131N, T134N,

D135Y, F161Y, L164P, N167V, G168F,

R169N

1638
L18R, H31L, R45A, A48L, V79S, L81F,
MG130-4v5

C103V, D105N, P106S, A108R, E119N,

D120Y, V145Y, R158P, S161V, T162F,

T163N

In Vitro Activity of Novel ADA Variants from MG129-MG137 and MG68 families

In Vitro Deaminase In-Gel Assay

Linear templates for candidate deaminases are amplified using plasmids from TWIST via PCR. Products are cleaned using SPRI beads (Lucigen) and eluted in 10 mM tris. Enzymes are then expressed in PURExpress(NEB) at 37° C. for 2 hours. Deamination reactions are prepared by mixing PURExpress reactions (2 μL) with a 10 μM DNA substrate (IDT, SEQ ID NO: 1645) labeled with Cy5.5, 1 U EndoV(NEB), and 10×NEB4 Buffer. Reactions are incubated at 37° C. for 20 hours. Samples are quenched by adding 4 units of proteinase K (NEB) and incubated at 55° C. for 10 minutes. The reaction is further treated by addition of 11 μL of 2× RNA loading dye and incubated at 75° C. for 10 minutes. All reaction conditions are analyzed by gel electrophoresis in a 10% (TBE-urea) denaturing gel (Biorad). DNA bands are visualized by a Chemi-Doc imager (Biorad) and band intensities are quantified using BioRad Image Lab v6.0. Successful deamination is observed by the visualization of an intermediate fluorescently labeled band in the gel.

In Vitro NGS-Based Screening for In Vitro Deamination

Linear templates for candidate deaminases are amplified using plasmids from TWIST via PCR. Products are cleaned using SPRI beads (Lucigen) and eluted in 10 mM tris. Enzymes are then expressed in PURExpress(NEB) at 37° C. for 2 hours. Deamination reactions are prepared by mixing PURExpress reactions (2 μL) with a 250 nM single-stranded DNA substrate (IDT, SEQ ID NO: 1646) and 1 U of NEB4 buffer. Reactions are incubated at 37° C. for 2 hours. Reactions are quenched by incubating at 95° C. for 10 minutes, adding 90 μL of water at 95° C., and placing on ice for 2 minutes. 1 μL of digest reaction is used per PCR reaction (oligos IDT). Reactions are then cleaned using column purification (Zymo), eluted in 10 mM tris, and sequenced.

Example 37—Engineering of ABE Using nMG34-1 (D10A) Nickase
Plasmid Construction

DNA fragments of genes were either synthesized at Twist Bioscience or Integrated DNA Technologies (IDT). Plasmid DNA was amplified in Endura electrocompetent cells (Lucigen) and isolated by QIAprep Spin Miniprep Kit (Qiagen). Vector backbones were prepared by restriction enzyme digestion of plasmids. Inserts were amplified by Q5 High-Fidelity DNA polymerase (New England Biolabs) using primers ordered either from Elim BIOPHARM or IDT. Both vector backbones and inserts were purified by gel extraction using the Gel DNA Recovery Kit (Zymo Research). One or multiple DNA fragments were assembled into the vectors through NEBuilder HiFi DNA assembly (New England Biolabs). The plasmid sequence used for expression of nMG34-1 (D10A) adenine base editor and sgRNA are shown in SEQ ID NO: 1422.

Cell Culture, Transfections, Next Generation Sequencing, and Base Edit Analysis

HEK293T cells were grown and passaged in Dulbecco's Modified Eagle's Medium plus GlutaMAX (gibco) supplemented with 10% (v/v) fetal bovine serum (gibco) at 37° C. with 5% CO₂. 2.5×10⁴cells (passage 3-8) were seeded on 96-well cell culture plates treated for cell attachment (Costar), grown for 20 to 24 h, and the spent media were refreshed with new media right before transfection. For the dual plasmid system, 300 ng expression plasmid along with 100 ng guide plasmid were transfected using 1 μL lipofectamine 2000 (ThermoFisher Scientific) per well according to the manufacturer's instructions. For the single plasmid system, 300 ng plasmid carrying the base editor gene and guide RNA was transfected using 1 μL lipofectamine. Transfected cells were grown for 3 days, harvested, and gDNA was extracted with QuickExtract (Lucigen) according to the manufacturer's instructions. Targeted regions for base edits were amplified using Q5 High-Fidelity DNA polymerase (New England Biolabs) with primers and extracted DNA as the templates. PCR products were purified by HighPrep PCR Clean-up System (MAGBIO) according to the manufacturer's instructions. After 72 hours, individual wells were visually assessed for cell viability based on cell growth and presence of floating cells in media. Following the visual assessment of cell viability, cells were harvested and genomic DNA was extracted. PCR primers appropriate for use in NGS-based DNA sequencing were generated, optimized, and used to amplify the individual target sequences for each guide RNA. The amplicons were sequenced on an Illumina MiSeq machine and analyzed with a proprietary Python script to measure gene editing.

Results

MG68-4 is predicted to be a tRNA adenosine deaminase. As the natural enzymes of E. coli TadA (EcTadA) and S. aureus TadA (SaTadA) are both dimers, MG68-4 was suspected be a dimer as well. It has been shown that using a protein fusion of engineered EcTadA homodimer can increase the editing efficiency (Gaudelli, N. M. et al. Programmable base editing of AT to GC in genomic DNA without DNA cleavage. Nature 2017, 551, 464-471). As such, a series of MG68-4 (D109N) homodimers was designed and fused with nMG34-1 (D10A). To design the linkers between two monomers, the length between the N-terminus of the first monomer and the C-terminus of the second monomer was estimated using Visual Molecular Dynamics (VMD) (Humphrey, W. et al. VMD—Visual Molecular Dynamics, J Mol. Graph. 1996, 14, 33-38), and the model suggested 5.2 nm (FIG. 46A). The fusions were optimized by varying linker lengths ranging from 32 to 64 amino acids, and a negative control with 5 amino acids was included (SEQ ID NOs: 1356-1362). The result indicated that the best linker length was 64 amino acids, which might provide enough flexibility to accommodate the distance between monomers. With this optimized linker, an increase of 87% editing was obtained compared to the monomeric design of MG68-4 fused with nMG34-1 (D109N) (FIG. 46B).

Previously, MG68-4 (D109N)-nMG34-1 (D10A) was observed to have C to G edit on the sixth position when using guide 633 (SEQ ID NO: 1416). To reduce the promiscuous activity toward cytosine, the approach that was used by Jeong (Jeong, Y. K. et al. Adenine base editor engineering reduces editing of bystander cytosines. Nat. Biotechnol. 2021, 39, 1426-1433) was applied, where Q was installed at D108 position in EcTadA. By incorporating Q into the D109 position of MG68-4, the ABE showed 64% reduction of C to G edit on C6 position using guide 633 while maintaining comparable A to G edit on A8 position using guide 634 (SEQ ID NO: 1417). To increase editing efficiency, two beneficial mutations (H129N and D7G/E10G) were incorporated along with D109Q. The results showed that the editing efficiencies of new mutants were reduced, suggesting incompatibility of mutations (SEQ ID NOs: 1639-1644) (FIG. 47).

Example 38—Engineering of ABE Using nMG3-6/3-8 (D13A) Nickase
Plasmid Construction

DNA fragments of genes were either synthesized at Twist Bioscience or Integrated DNA Technologies (IDT). Plasmid DNA was amplified in Endura electrocompetent cells (Lucigen) and isolated by QIAprep Spin Miniprep Kit (Qiagen). Vector backbones were prepared by restriction enzyme digestion of plasmids. Inserts were amplified by Q5 High-Fidelity DNA polymerase (New England Biolabs) using primers ordered either from Elim BIOPHARM or IDT. Both vector backbones and inserts were purified by gel extraction using the Gel DNA Recovery Kit (Zymo Research). One or multiple DNA fragments were assembled into the vectors through NEBuilder HiFi DNA assembly (New England Biolabs). The plasmid sequences used for expression of the nMG3-6/3-8 adenine base editor and sgRNA are shown in SEQ ID NO: 1423.

Cell Culture, Transfections, Next Generation Sequencing, and Base Edit Analysis

HEK293T cells were grown and passaged in Dulbecco's Modified Eagle's Medium plus GlutaMAX (gibco) supplemented with 10% (v/v) fetal bovine serum (gibco) at 37° C. with 5% CO₂. 2.5×10⁴cells (passage 3-8) were seeded on 96-well cell culture plates treated for cell attachment (Costar), grown for 20 to 24 h, and the spent media were refreshed with new media right before transfection. For the dual plasmid system, 300 ng expression plasmid along with 100 ng guide plasmid were transfected using 1 μL lipofectamine 2000 (ThermoFisher Scientific) per well according to the manufacturer's instructions. For the single plasmid system, 300 ng plasmid carrying the base editor gene and guide RNA was transfected using 1 μL lipofectamine. Transfected cells were grown for 3 days, harvested, and gDNA was extracted with QuickExtract (Lucigen) according to the manufacturer's instructions. Targeted regions for base edits were amplified using Q5 High-Fidelity DNA polymerase (New England Biolabs) with primers and extracted DNA as the templates. PCR products were purified by HighPrep PCR Clean-up System (MAGBIO) according to the manufacturer's instructions. After 72 hours, individual wells were visually assessed for cell viability based on cell growth and presence of floating cells in media. Following the visual assessment of cell viability, cells were harvested and genomic DNA extracted. PCR primers appropriate for use in NGS-based DNA sequencing were generated, optimized, and used to amplify the individual target sequences for each guide RNA. The amplicons were sequenced on an Illumina MiSeq machine and analyzed with a proprietary Python script to measure gene editing.

Results

Through directed evolution of the predicted tRNA adenosine deaminase of MG68-4 (D109N)-nMG34-1 (D10A) in E. coli, two mutants (D109N/D7G/E10G and D109N/H129N) were observed to outperform the D109N mutant for higher editing A to G efficiency in HEK293T cells. Through rational design based on the reported mutations of EcTadA (Gaudelli, N. M. et al. Programmable base editing of AT to GC in genomic DNA without DNA cleavage. Nature 2017, 551, 464-471; Gaudelli N. M. et al. Directed evolution of adenine base editors with increased activity and therapeutic application. Nat. Biotechnol. 2020, 38, 892-900; and Richter M. F. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat. Biotechnol. 2020, 38, 883-891) for MG68-4, five mutants (V83S, L85F, T112R, D148R, and A155R) fused with nMG34-1 (D10A) were observed to be beneficial on top of D109N mutation. All identified mutations were combined, and a combinatorial library was designed to interrogate enzymatic performance of the adenosine deaminase (Table 13) (SEQ ID NOs: 1363-1409).

TABLE 13

Mutations installed in the combinatorial library of MG68-4. All Mg68-4

variants are inserted into 3-68_DIV30_M_RDr1v1_B

Variant
Mutation

CL1
WT

CL2
D109N

CL3
D7G/E10G/D109N

CL4
V83S/D109N

CL5
L85F/D109N

CL6
D109N/T112R

CL7
D109N/H129N

CL9
D109N/A155R

CL10
D7G/E10G/V83S/D109N

CL11
D7G/E10G/L85F/D109N

CL12
D7G/E10G/T112R/D109N

CL13
D7G/E10G/H129N/D109N

CL14
D7G/E10G/D148R/D109N

CL15
D7G/E10G/A155R/D109N

CL16
V83S/L85F/D109N

CL17
V83S/D109N/T112R

CL18
V83S/D109N/H129N

CL20
V83S//D109N/A155R

CL22
L85F/D109N/T112R

CL23
L85F/D109N/H129N

CL25
L85F/D109N/A155R

CL28
D109N/T112R/H129N

CL29
D109N/T112R/D148R

CL30
D109N/T112R/A155R

CL34
D109N/H129N/D148R

CL35
D109N/H129N/A155R

CL40
D109N/D148R/A155R

CL56
V83S/L85F/D109N/T112R

CL57
V83S/L85F/D109N/H129N

CL58
V83S/D109N/T112R/H129N

CL59
V83S/L85F/D109N/H129N

CL60
V83S/L85F/D109N/T112R/H129N

CL61
D7G/E10G/V83S/L85F/D109N/T112R/

H129N/D148R/A155R

CL62
E10G/V83S/L85F/D109N/T112R/H129N/

D148R/A155R

CL63
D7G/V83S/L85F/D109N/T112R/H129N/

D148R/A155R

CL64
D7G/E10G/L85F/D109N/T112R/H129N/

D148R/A155R

CL65
D7G/E10G/V83S/D109N/T112R/H129N/

D148R/A155R

CL66
D7G/E10G/V83S/L85F/D109N/H129N/

D148R/A155R

CL67
D7G/E10G/V83S/L85F/D109N/T112R/

D148R/A155R

CL68
D7G/E10G/V83S/L85F/D109N/T112R/

H129N/A155R

CL69
D7G/E10G/V83S/L85F/D109N/T112R/

H129N/D148R

CL70
L85F/D109N/T112R/H129N/D148R/A155R

CL71
V83S/D109N/T112R/H129N/D148R/A155R

CL72
V83S/L85F/D109N/H129N/D148R/A155R

CL73
V83S/L85F/D109N/T112R/D148R/A155R

CL74
V83S/L85F/D109N/T112R/H129N/A155R

CL75
V83S/L85F/D109N/T112R/H129N/D148R

All variants were inserted into 3-68_DIV30_M nickase chassis, where 3-68, DIV, and M stood for MG3-6/3-8 nickase, domain inlaid version 30, and monomer, respectively. The screening of the resulting ABEs revealed that 27 variants outperformed CL2 (MG68-4 (D109M)). The highest editing efficiency was observed when V83S/L85F/D109N were combined together, and the effect of improving editing was supported by increased activities of V83S/D109N and L85F/D109N observed in CL4 and CL5, respectively. In addition to CL16, CL22 also demonstrated high editing efficiency. In this variant, the mutation of V83S was replaced by T112R in the V83S/L85F/D109N triple mutant (FIG. 48).

In order to increase A to G base editing percentage of the 3-68_DIV30_M adenine base editor, a 3-68_DIV30_D ABE was designed in which two MG68-4 (D109N) monomers are connected by a 65AA linker and inlaid within the 3-68 scaffold at the same V30 insertion site as 3-68_DIV30_M (SEQ ID NOs: 1410-1411). This dimeric form of the 3-68 ABE increased editing at position A10 of a site within the TRAC gene when co-transfected with a plasmid expressing sgRNA68 (SEQ ID NO: 1421) from 8% (3-68_DIV30_M) to 18% (3-68_DIV30_D) sgRNA68. The influence of two different MG68-4 variants (H129N or D7G/E10G) was also tested on 3-68_DIV30_M and 3-68_DIV30_D already containing D109N (SEQ ID NOs: 1412-1415). For 3-68_DIV30_D, the H129N or D7G/E10G mutation was installed within the second MG68-4 D109N, and the first deaminase remained MG68-4 D109N. The H129N and D7G/E10G variants were identified using an error-prone PCR library of MG68-4 fused to MG34-1 and selecting for A to G conversion in E. Coli. After addition of either the H129N or D7G/E10G variants, in both the monomeric and dimeric MG68-4 D109N, editing was slightly lower as compared to the 3-68_DIV30_MG68-4 D109N ABE in the equivalent monomeric/dimeric form (FIG. 49).

Example 39—Engineering of nMG35-1 as a Base Editor

E. coli Selection

A nickase MG35-1 containing a D59A mutation with a C-terminally fused TadA*-(7.10) monomer along with a C-terminus SV40 NLS was constructed to test MG35-1 adenine base editor (ABE) activity (SEQ ID NOs: 1424-1426). This ABE was tested with its compatible sgRNA containing either a 20 nucleotide spacer sequence targeting the chloramphenicol acetyltransferase (CAT) gene or a non-targeting spacer sequence of the same 20 nucleotides in a scrambled order (SEQ ID NOs: 1429-1430). The CAT gene contains a H193Y mutation that renders the CAT gene nonfunctional against chloramphenicol selection. The ABE, sgRNA, and non-functional CAT gene were cloned into a pET-21 backbone containing Ampicillin resistance. For both constructs, 10 ng of the plasmid was transformed into 25 μL of BL21(DE3) (Lucigen) E. Coli cells and the cells were left shaking at 37° C. in 450 μL of recovery media for 90 minutes. Next, 70 μL of recovery media containing transformed cells was plated onto plates containing chloramphenicol concentrations of 0, 2, 3, 4, and 8 μg/mL. The 0 μg/mL plate was used as a transformation control. Plates also contained 100 μg/mL Carbecillin and 0.1 mM IPTG. Plates were left at 37° C. for 40 hours. Colonies were sequenced by Elim Biopharmaceuticals, Inc.

Results

In order to determine whether the SMART II enzymes can be used as base editors, an adenine base editor (ABE) was constructed by fusing a TadA*-(7.10) monomer to the C-terminus of a nickase form of MG35-1 containing a D59A mutation (SEQ ID NO: 1424). The A to G editing of this ABE was tested in a positive selection single-plasmid E. Coli system in which the ABE is required to revert a chloramphenicol acetyltransferase (CAT) gene containing a Y193 mutation back to H193 in order for the E. Coli cell to survive chloramphenicol selection. This plasmid contained an sgRNA with a spacer either targeting the mutant CAT gene or a scrambled, non-targeting spacer region. An enrichment of colonies was detected with E. Coli transformed with the MG35-1 ABE targeting the CAT gene when plated on plates containing 2, 3, and 4 μg/mL of chloramphenicol, while no colonies grew on the plate containing 8 μg/mL of chloramphenicol. Sanger sequencing confirmed that 26/30 colonies picked from the 2, 3, and 4 μg/mL plates transformed with the targeting MG35-1 ABE contained the expected Y193H reversion. It is likely that the 4 colonies without the reverted CAT sequence contain more unedited than edited copies of the selection construct as one reverted CAT gene is sufficient to confer colony survival. No colonies were seen on the 2, 3, 4, and 8 μg/mL plates plated with E. Coli transformed with the non-targeting MG35-1 ABE. While the 0 μg/mL condition was used as a transformation control, Sanger sequencing found that 1/10 colonies picked from the 0 μg/mL plate transformed with the targeting MG35-1 ABE contained the Y193H reversion, indicating a detectable level of editing even without chloramphenicol selection. The colony growth enrichment from chloramphenicol selection of the targeting MG35-1 ABE condition from the CAT gene Y193H reversion confirms that the MG35-1 nickase can function as an ABE in E. Coli cells (FIG. 50).

Example 40—Guide Screening for the nMG3-6/3-8 ABE in Mouse Hepatocytes
Cell Culture, Transfections, Next Generation Sequencing, and Base Edit Analysis for Screens

Hepa1-6 cells were grown and passaged in Dulbecco's Modified Eagle's Medium plus 1×NEAA (gibco) supplemented with 10% (v/v) fetal bovine serum (gibco) and 1% pen-strep at 37° C. with 5% CO₂. 1×10⁴cells were nucleofected with 500 ng IVT mRNA and 150 pmol chemically-synthesized sgRNA (IDT) using a Lonza-4D nucleofector (program EH-100). Cells were grown for 3 days, harvested, and gDNA was extracted with QuickExtract (Lucigen) according to the manufacturer's instructions. Targeted regions for base edits were amplified using Q5 High-Fidelity DNA polymerase (New England Biolabs) with primers appropriate for use with NGS-based DNA sequencing (SEQ ID NOs: 1493-1554) and extracted DNA as the templates. PCR products were purified by HighPrep PCR Clean-up System (MAGBIO) according to the manufacturer's instructions. Amplicons were sequenced on an Illumina MiSeq machine and analyzed with a proprietary Python script to measure gene editing.

mRNA Production

Sequences for base editor mRNA were codon optimized for human expression (GeneArt), then synthesized and cloned into a high copy ampicillin plasmid (Twist Biosciences). Synthesized constructs encoding T7 promoter, UTRs, base editor ORF, and NLS sequences were digested from the Twist backbone with HindII and BamHI (NEB), and ligated into a pUC19 plasmid backbone (SEQ ID NO: 1555) with T4 DNA ligase and 1× reaction buffer (NEB). The complete base editor mRNA plasmid comprised an origin of replication, ampicillin resistance cassette, the synthesized construct, and an encoded polyA tail. Base editor mRNA was synthesized via in vitro transcription (IVT) using the linearized base editor mRNA plasmid. This plasmid was linearized by incubation at 37° C. for 16 hours with SapI (NEB) enzyme. The linearization reaction comprised a 50 μL reaction containing 10 μg pDNA, 50 units Sap I, and 1× reaction buffer. The linearized plasmid was purified with Phenol:Chloroform:Isoamyl Alcohol (25:24:1, v/v), precipitated in EtOH, and resuspended in nuclease-free water at an adjusted concentration of 500 ng/μL. The IVT reaction to generate base editor mRNA was performed at 50° C. for 1 hr under the following conditions: 1 μg linearized plasmid; 5 mM ATP, CTP, GTP (NEB), and N1-methyl pseudo-UTP (TriLink); 18750 U/mL Hi-T7 RNA Polymerase (NEB); 4 mM CleanCap AG (TriLink); 2.5 U/mL Inorganic E. coli pyrophosphatase (NEB); 1000 U/mL murine RNase Inhibitor (NEB); and 1× transcription buffer. After 1 hr, IVT was stopped, and plasmid DNA was digested with the addition of 250 U/mL DnaseI (NEB) and incubated for 10 min at 37° C. Purification of base editor mRNA was performed using an Rneasy Maxi Kit (Qiagen) using the standard manufacturer's protocol. Transcript concentration was determined by UV (NanoDrop) and further analyzed by capillary gel electrophoresis on a Fragment Analyzer (Agilent).

Results

To test the activity of the engineered dimeric form of the 3-68 ABE described above, 527 MG3-6/3-8 chemically-synthesized guides targeting four therapeutically relevant loci in the mouse genome were designed and purchased from IDT. These guides were co-transfected with in vitro synthesized mRNA in Hepa1-6 (a mouse immortalized mouse hepatocyte cell line) via nucleofection, and A to G conversion was assayed three days post-nucleofection. Guides were rank-ordered by percent total deamination within the spacer region, and deeper analysis of active guides was restricted to guides with >80% in-spacer deamination and with high number of NGS reads. Altogether, total spacer A to G deamination above 1000 was observed at 31 distinct guides across three loci (SEQ ID NOs: 1431-1492; FIGS. 51-53) with two guides showing conversion rates of 89m and 95% (Apoa1 D11 and Apoa1 F12, respectively).

TABLE 13A

Guide sequences used in Example 40

SEQ

ID

NO:
sgRNA name
Sequence

1431
MG3-6/3-8
mC*mU*mG*rGrUrGrUrGrGrUrArCrUrCrGrUrUrCrArArGrG

mApoa1 BE F12
rGrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUr

ArArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCr

UrCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCr

GrGrGrCrGrGrUrArUrGrU*mU*mU*mU

1432
MG3-6/3-8
mA*mC*mU*rArUrGrGrCrGrCrArGrGrUrCrCrUrCrCrArGrCr

mApoa1 BE D11
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1433
MG3-6/3-8
mU*mU*mG*rGrGrUrGrArGrArCrArGrGrArGrArUrGrArArC

mApoa1 BE C5
rGrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUr

ArArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCr

UrCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCr

GrGrGrCrGrGrUrArUrGrU*mU*mU*mU

1434
MG3-6/3-8
mU*mC*mU*rCrCrUrGrGrArArArArCrUrGrGrGrArCrArCrUr

mApoa1 BE A4
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1435
MG3-6/3-8
mA*mG*mG*rArArCrGrGrCrUrGrGrGrCrCrCrArUrUrGrArCr

mApoa1 BE F4
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1436
MG3-6/3-8
mC*mU*mG*rGrGrArUrArArCrCrUrGrGrArGrArArArGrArA

mApoa1 BE A5
rGrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUr

ArArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCr

UrCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCr

GrGrGrCrGrGrUrArUrGrU*mU*mU*mU

1437
MG3-6/3-8
mC*mC*mU*rGrGrUrGrUrGrGrUrArCrUrCrGrUrUrCrArArG

mApoa1 BE E12
rGrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUr

ArArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCr

UrCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCr

GrGrGrCrGrGrUrArUrGrU*mU*mU*mU

1438
MG3-6/3-8
mA*mG*mC*rArUrGrGrGrCrArUrCrArGrArCrUrArUrGrGrC

mApoa1 BE A11
rGrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUr

ArArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCr

UrCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCr

GrGrGrCrGrGrUrArUrGrU*mU*mU*mU

1439
MG3-6/3-8
mC*mU*mC*rCrUrGrGrArArArArCrUrGrGrGrArCrArCrUrCr

mApoa1 BE B4
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1440
MG3-6/3-8
mG*mG*mA*rArCrGrGrCrUrGrGrGrCrCrCrArUrUrGrArCrUr

mApoa1 BE G4
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1441
MG3-6/3-8
mG*mC*mC*rArCrArGrGrGrGrArCrArGrUrCrUrCrCrCrUrUr

mApoa1 BE B2
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1442
MG3-6/3-8
mC*mA*mG*rCrGrArArCrArGrArUrGrCrGrCrGrArGrArGrCr

mApoa1 BE D7
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1443
MG3-6/3-8
mA*mU*mU*rGrGrGrUrGrArGrArCrArGrGrArGrArUrGrArA

mApoa1 BE B5
rGrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUr

ArArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCr

UrCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCr

GrGrGrCrGrGrUrArUrGrU*mU*mU*mU

1444
MG3-6/3-8
mA*mG*mG*rGrArGrArCrUrGrUrCrCrCrCrUrGrUrGrGrCrUr

mApoa1 BE G6
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1445
MG3-6/3-8
mC*mC*mU*rArCrCrUrUrGrArArCrGrArGrUrArCrCrArCrAr

mApoa1 BE A8
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1446
MG3-6/3-8
mG*mG*mC*rCrCrArArGrGrArGrGrArGrGrArUrUrCrArArA

mApoa1 BE F2
rGrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUr

ArArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCr

UrCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCr

GrGrGrCrGrGrUrArUrGrU*mU*mU*mU

1447
MG3-6/3-8
mA*mG*mC*rArArGrArUrGrArArCrCrCrCrArGrUrCrCrCrAr

mApoa1 BE E1
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1448
MG3-6/3-8
mC*mU*mA*rCrCrUrUrGrArArCrGrArGrUrArCrCrArCrArCr

mApoa1 BE B8
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1449
MG3-6/3-8
mC*mA*mU*rGrCrUrGrGrArGrArCrGrCrUrUrArArGrArCrCr

mApoa1 BE H8
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1450
MG3-6/3-8
mU*mC*mG*rCrGrArCrCrGrCrArUrGrCrGrCrArCrArCrArCr

mApoa1 BE H6
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1451
MG3-6/3-8
mA*mC*mG*rArArUrUrCrCrArGrArArGrArArArUrGrGrArA

mApoa1 BE F5
rGrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUr

ArArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCr

UrCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCr

GrGrGrCrGrGrUrArUrGrU*mU*mU*mU

1452
MG3-6/3-8
mC*mU*mA*rGrCrCrUrGrArArUrCrUrCrCrUrGrGrArArArAr

mApoal BE H3
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1453
MG3-6/3-8
mU*mG*mG*rGrCrCrCrArUrUrGrArCrUrCrGrGrGrArCrUrUr

mApoa1 BE H4
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1454
MG3-6/3-8
mC*mG*mA*rGrArArArGrCrCrArGrArCrCrUrGrCrGrCrUrGr

mApoa1 BE E8
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1455
MG3-6/3-8
CTGGTGTGGTACTCGTTCAAGG

mApoa1 BE F12

1456
MG3-6/3-8
ACTATGGCGCAGGTCCTCCAGC

mApoa1 BE D11

1457
MG3-6/3-8
TTGGGTGAGACAGGAGATGAAC

mApoa1 BE C5

1458
MG3-6/3-8
TCTCCTGGAAAACTGGGACACT

mApoa1 BE A4

1459
MG3-6/3-8
AGGAACGGCTGGGCCCATTGAC

mApoa1 BE F4

1460
MG3-6/3-8
CTGGGATAACCTGGAGAAAGAA

mApoa1 BE A5

1461
MG3-6/3-8
CCTGGTGTGGTACTCGTTCAAG

mApoa1 BE E12

1462
MG3-6/3-8
AGCATGGGCATCAGACTATGGC

mApoa1 BE A11

1463
MG3-6/3-8
CTCCTGGAAAACTGGGACACTC

mApoa1 BE B4

1464
MG3-6/3-8
GGAACGGCTGGGCCCATTGACT

mApoa1 BE G4

1465
MG3-6/3-8
GCCACAGGGGACAGTCTCCCTT

mApoa1 BE B2

1466
MG3-6/3-8
CAGCGAACAGATGCGCGAGAGC

mApoa1 BE D7

1467
MG3-6/3-8
ATTGGGTGAGACAGGAGATGAA

mApoa1 BE B5

1468
MG3-6/3-8
AGGGAGACTGTCCCCTGTGGCT

mApoa1 BE G6

1469
MG3-6/3-8
CCTACCTTGAACGAGTACCACA

mApoa1 BE A8

1470
MG3-6/3-8
GGCCCAAGGAGGAGGATTCAAA

mApoa1 BE F2

1471
MG3-6/3-8
AGCAAGATGAACCCCAGTCCCA

mApoa1 BE E1

1472
MG3-6/3-8
CTACCTTGAACGAGTACCACAC

mApoa1 BE B8

1473
MG3-6/3-8
CATGCTGGAGACGCTTAAGACC

mApoa1 BE H8

1474
MG3-6/3-8
TCGCGACCGCATGCGCACACAC

mApoa1 BE H6

1475
MG3-6/3-8
ACGAATTCCAGAAGAAATGGAA

mApoa1 BE F5

1476
MG3-6/3-8
CTAGCCTGAATCTCCTGGAAAA

mApoa1 BE H3

1477
MG3-6/3-8
TGGGCCCATTGACTCGGGACTT

mApoa1 BE H4

1478
MG3-6/3-8
CGAGAAAGCCAGACCTGCGCTG

mApoa1 BE E8

1479
MG3-6/3-8
mA*mC*mU*rArUrUrArArArCrCrArArGrArArArCrUrCrCrCr

mAngpt13 BE
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

C12
rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1480
MG3-6/3-8
mC*mG*mA*rArArCrArUrGrGrGrArArArArCrUrArCrGrArA

mAngpt13 BE B2
rGrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUr

ArArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCr

UrCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCr

GrGrGrCrGrGrUrArUrGrU*mU*mU*mU

1481
MG3-6/3-8
mA*mG*mU*rArArUrUrGrCrArUrCrCrArGrArGrUrGrGrArU

mAngpt13 BE C1
rGrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUr

ArArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCr

UrCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCr

GrGrGrCrGrGrUrArUrGrU*mU*mU*mU

1482
MG3-6/3-8
mA*mA*mG*rArGrArArGrArCrArGrCrCrCrUrUrCrArArCrAr

mAngpt13 BE F3
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1483
MG3-6/3-8
mU*mU*mU*rArGrCrGrArArUrGrGrCrCrUrCrCrUrGrCrArGr

mAngptl3 BE G1
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1484
MG3-6/3-8
ACTATTAAACCAAGAAACTCCC

mAngpt13 BE

C12

1485
MG3-6/3-8
CGAAACATGGGAAAACTACGAA

mAngpt13 BE B2

1486
MG3-6/3-8
AGTAATTGCATCCAGAGTGGAT

mAngpt13 BE C1

1487
MG3-6/3-8
AAGAGAAGACAGCCCTTCAACA

mAngpt13 BE F3

1488
MG3-6/3-8
TTTAGCGAATGGCCTCCTGCAG

mAngpt13 BE G1

1489
MG3-6/3-8
mA*mC*mC*rArGrUrUrArArArArGrArUrCrCrUrCrGrGrUrCr

mTrac BE El
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1490
MG3-6/3-8
mU*mU*mC*rArCrArArUrCrCrCrArCrCrUrGrGrArUrCrUrCr

mTrac BE D10
GrUrUrGrArGrArArUrCrGrArArArGrArUrUrCrUrUrArArUrA

rArGrGrCrArUrCrCrUrUrCrCrGrArUrGrCrUrGrArCrUrUrCrU

rCrArCrCrGrUrCrCrGrUrUrUrUrCrCrArArUrArGrGrArGrCrG

rGrGrCrGrGrUrArUrGrU*mU*mU*mU

1491
MG3-6/3-8
ACCAGTTAAAAGATCCTCGGTC

mTrac BE E1

1492
MG3-6/3-8
TTCACAATCCCACCTGGATCTC

mTrac BE D10

r = native ribose base, m = 2′-O methyl modified base, F = 2′-fluoro modified base, * = phosphorothioate bond

While the pattern of base conversion varied across spacers, detectable conversion was observed across an editing of A4 to A15. To assess background at these genomic regions, NGS primer pairs used for the experimental samples were used in mock nucleofected samples and showed low to undetectable background conversion (0-0.12%) (FIG. 54). In summary, engineered dimeric 3-68 ABE exhibits high editing activity in mammalian cells at three independent loci and across a large panel of guides.

Example 41—mRNA Cytidine Base Editors

To test the activity of the engineered cytidine deaminases at scale, 527 chemically-synthesized guides suitable for use with MG3-6/3-8 to target four therapeutically relevant loci in the mouse genome were designed and purchased from IDT. These guides were co-transfected with in vitro synthesized mRNA in Hepa1-6 (a mouse immortalized mouse hepatocyte cell line) via nucleofection, and C to T conversion was assayed three days post-nucleofection. Prior to harvesting, individual wells were visually assessed for cell viability based on cell growth and presence of floating cells in media. The 3-68 152-6 CBE did not show appreciable cytotoxicity compared to mock samples.

Cell Culture, Transfections, Next Generation Sequencing, and Base Edit Analysis for Screens (Prophetic)

Hepa1-6 cells are grown and passaged in Dulbecco's Modified Eagle's Medium plus 1×NEAA (gibco) supplemented with 10% (v/v) fetal bovine serum (gibco) and 1% pen-strep at 37° C. with 5% CO₂. 1×10⁴cells are nucleofected with 500 ng IVT mRNA and 150 pmol chemically synthesized sgRNA (IDT) using a Lonza-4D nucleofector (program EH-100). Cells are grown for 3 days, visually assessed for viability, harvested, and gDNA is extracted with QuickExtract (Lucigen) according to the manufacturer's instructions. Targeted regions for base edits are amplified using Q5 High-Fidelity DNA polymerase (New England Biolabs) with primers appropriate for use with NGS-based DNA sequencing and extracted DNA as the templates. PCR products are purified by HighPrep PCR Clean-up System (MAGBIO) according to the manufacturer's instructions. Amplicons are sequenced on an Illumina MiSeq machine and analyzed with a proprietary Python script to measure gene editing.

Example 42—Base Editing Preferences for nMG35-1 ABE

As described in Example 39, E. coli was transformed with a plasmid containing the nMG35-1-ABE, a non-functional chloramphenicol acetyltransferase (CAT Y193) gene, and an sgRNA that either targets the CAT gene (targeting spacer) or not (scrambled spacer). Cell growth is dependent on the ABE base editing the non-functional CAT gene (A at position 17 from the TAM) (FIG. 55A) to its wild-type variant (H193) and restoring activity. Multiple linkers were evaluated for nMG35-1 fusions to the TadA deaminase monomer (Table 14).

TABLE 14

Linkers evaluated for nMG35-1 fusions with a TadA deaminase.

SEQ ID

Length
Sequence
NO

7 AAs
PAPAPAP
1654

14 AAs
KLGGGAPAVGGGPK
1655

15 AAs
GGGGSGGGGSGGGGS
1649

XTEN (17 aa)
SGSETPGTSEASTPESA
1650

26 AAs
GGGGSGGGGSEAAAAKGGGGSGGGGS
1651

32 AAs
GGGGSGGGGSEAAAAKEAAAAKGGGGSGGGGS
1652

44 AAs
KGKGKGMGAGTLSTDKGESLGIKYEEGQSHRPTNPNASR
1653

MAQKV

Results

Base editing was tested in an E. coli positive selection assay targeting the chloramphenicol acetyltransferase (CAT) gene that was expressed from the same plasmid co-expressing the MG35-1 ABE containing various linkers. The nMG35-1 ABE construct with the 17 amino acid linker (XTEN) outperformed other linkers in base editing experiments (FIG. 55B-55E). In addition, when analyzing the adenine positions across the targeting spacer that were edited by the nMG35-1 ABE, the A at the 9th position (in the middle of the spacer region) showed the highest editing levels in E. coli (FIG. 55D).

Example 43—the nMG35-1 ABE Edits Additional Target Sites in E. coli

E. coli Positive Selection

As described in Example 39, a single plasmid construct encompassing a nickase MG35-1 (D59A mutation), a C-terminally fused TadA*-(7.10) monomer, and a C-terminus SV40 NLS (SEQ ID NO: 369) was tested as a base editor with its compatible sgRNA containing a 20 bp spacer sequence targeting the chloramphenicol acetyltransferase (CAT) gene. A non-targeting sgRNA lacking a spacer sequence was used as negative control. The CAT gene contained either an engineered stop codon (at amino acid positions 98 or 122) or a H193Y mutation that renders the CAT gene nonfunctional (FIGS. 56A and 56B). The ABE construct, sgRNA, and non-functional CAT gene were cloned into a pET-21 backbone containing Ampicillin resistance. Ten ng of the plasmid was transformed into 25 μL of BL21(DE3) (Lucigen) E. coli cells and incubated at 37° C. in 450 μL of recovery media for 90 minutes. Next, 70 μL of recovery media containing transformed cells was plated onto plates containing chloramphenicol concentrations of 0, 2, 3, 4, and 8 μg/mL. The 0 μg/mL plate was used as a transformation control. Plates also contained 100 μg/mL Carbecillin and 0.1 mM IPTG. Plates were left at 37° C. for 40 hours. CAT mutations were verified in the resulting colonies by Sanger sequencing (Elim Biopharmaceuticals, Inc).

Results

The A to G editing of the nMG35-1 ABE was tested in a positive selection single-plasmid E. coli system in which the ABE is required to revert a chloramphenicol acetyltransferase (CAT) gene stop codon mutation back to glutamine or a tyrosine mutation back to histidine (FIGS. 56A and 56B) in order for E. coli to survive growth under chloramphenicol selection. Four distinct non-functional CAT genes were tested for reversion by the nMG35-1 ABE: three single mutations (a stop codon at residue 98 reversion to Q; a stop codon at residue 122 reversion to Q; and Y at residue 193 reversion to H) and a double mutation in which a CAT gene contains two stop codons at both residues 98 and 122 (both need to be reverted to Q simultaneously to restore CAT gene functionality). These four conditions were tested alongside paired negative controls in which the non-functional CAT genes were co-expressed with sgRNAs missing a spacer sequence. The nMG35-1 ABE successfully edited the four conditions, including the double mutant reversion, as shown by an enrichment of E. coli colonies when grown on plates containing 2 and 4 μg/mL of chloramphenicol (FIG. 56C, “targeting” row). Few colonies also grew on the plate containing 8 μg/mL of chloramphenicol for reversion of the individual stop codon mutations at residues 98 and 122 (FIG. 56C, “targeting” row). Sanger sequencing of the colonies growing on the 2 μg/mL plate from the CAT double mutant reversion determined that 17 of 18 colonies showed the expected A to G edit at both target sites (FIG. 56D). No colonies were seen on the 2, 4, and 8 μg/mL plates plated with E. coli transformed with the non-targeting guide (FIG. 56C, “no spacer” row), confirming that the nMG35-1-ABE is a successful base editor in E. coli.

When the predicted 3D structure of MG35-1 is aligned to the cryoEM structure of an IscB nuclease (PDB: 7UTN), the PLMP domain of IscB aligns with amino acid (AAs) positions 1-53 of MG35-1. A nickase nMG35-1 ABE with a deletion of AAs 1-53 was tested in the bacterial positive selection assay in which the ABE needs to revert a Y193 mutation to H within the CAT gene to restore CAT functionality (FIG. 57). When these AAs were truncated from the nMG35-1 ABE, E. coli was unable to survive chloramphenicol selection at the minimum inhibitory chloramphenicol concentration of 2 μg/mL. These results suggest that AAs 1-53 of MG35-1 drive efficient base editing of the MG35-1 ABE in E. coli cells.

Example 44—Base Editing in Human Cells with nMG35-1-ABE (Prophetic)

In order to demonstrate that an nMG35-1-ABE system is capable of base editing in human cells, a nickase MG35-1 (D59A mutation), a C-terminally fused TadA(8.8m) deaminase monomer, and a C-terminus SV40 NLS fusion system is constructed. HEK293T cells are grown and passaged in Dulbecco's Modified Eagle's Medium plus GlutaMAX (gibco) supplemented with 10% (v/v) fetal bovine serum (gibco) at 37° C. with 5% CO₂. About 2.5×10⁴cells are seeded on 96-well cell culture plates treated for cell attachment (Costar), and grown for 20 to 24 h (spent media are refreshed with new media before transfection). Each plate well receives 300 ng expression plasmid and 1 μL lipofectamine 2000 (ThermoFisher Scientific) for transfection according to the manufacturer's instructions. Transfected cells are grown for three days, harvested, and genomic DNA is extracted with QuickExtract (Lucigen) according to the manufacturer's instructions. Targeted regions for base edits are amplified using Q5 High-Fidelity DNA polymerase (New England Biolabs) with target-specific primers and PCR products purified with the HighPrep PCR Clean-up System (MAGBIO) according to the manufacturer's instructions. To analyze nMG35-1-ABE base editing in human cells, adapters used for next generation sequencing (NGS) are appended to PCR products by subsequent PCR reactions using the KAPA HiFi HotStart ReadyMix PCR Kit (Roche) and primers compatible with TruSeq DNA Library Prep Kits (Illumina). DNA concentrations of the resulting products are quantified by TapeStation (Agilent), and samples are pooled to prepare the library for NGS analysis. The resulting library is quantified by qPCR with the Aria Real-time PCR System (Agilent), and high throughput sequencing is performed with an Illumina Miseq instrument per manufacturer's instructions.

While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

TABLE 15

Listing of PAM sequences referred to herein

Sequence Number
Sequence

A360
nRRR

A361
nnRRAY

A362
nnRGGnT

A363
nnRnYAY

A364
—

A365
nRCCV

A366
nRnnGRKA

A367
nnnnC

A368
nRWART

A598
NGG

A1106
AnGg

A1107
nARAA

A1108
ATGaaa

A1109
ATGA

A1110
WTGG

A1111
RTGA

Embodiments

The following embodiments are not intended to be limiting in any way.

Embodiment 1. An engineered nucleic acid editing system, comprising:

- (a) an endonuclease comprising a RuvC domain and an HNH domain, wherein said endonuclease is derived from an uncultivated microorganism, wherein said endonuclease is a class 2, type II endonuclease, wherein said endonuclease is configured to be deficient in nuclease activity;
- (b) a base editor coupled to said endonuclease; and
- (c) an engineered guide ribonucleic acid structure configured to form a complex with said endonuclease comprising:
  - i. a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and
  - ii. a ribonucleic acid sequence configured to bind to said endonuclease.

Embodiment 2. The engineered nucleic acid editing system of Embodiment 1, wherein said RuvC domain lacks nuclease activity.

Embodiment 3. The engineered nucleic acid editing system of Embodiment 1, wherein said endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid.

Embodiment 4. The engineered nucleic acid editing system of Embodiment 1 or Embodiment 2, wherein said class 2, type II endonuclease comprises a nickase mutation.

Embodiment 5. The engineered nucleic acid editing system of any one of Embodiment 1-Embodiment 4, wherein said endonuclease comprises a sequence with at least 95% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof.

Embodiment 6. The engineered nucleic acid editing system of any one of Embodiment 1-Embodiment 5, wherein said class 2, type II endonuclease comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597 when optimally aligned.

Embodiment 7. The engineered nuclease system of any one of Embodiment 1-Embodiment 5, wherein said endonuclease comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NO: 72, or residue 17 relative to SEQ ID NO: 75 when optimally aligned.

Embodiment 8. An engineered nucleic acid editing system comprising:

- (a) an endonuclease having at least 95% sequence identity to any one of SEQ ID NOs: 70-78, 596, or 597, Sequence Number: A598, or a variant thereof;
- (b) a base editor coupled to said endonuclease; and
- (c) an engineered guide ribonucleic acid structure configured to form a complex with said endonuclease comprising:
  - i. a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and
  - ii. a ribonucleic acid sequence configured to bind to said endonuclease.

Embodiment 9. An engineered nucleic acid editing system comprising:

- (a) an endonuclease configured to bind to a protospacer adjacent motif (PAM) sequence comprising any one of Sequence Numbers: A360-A368 or A598, or a variant thereof,
- wherein said endonuclease is a class 2, type II endonuclease, and
- wherein said endonuclease is configured to be deficient in nuclease activity;
- (b) a base editor coupled to said endonuclease; and
- (c) an engineered guide ribonucleic acid structure configured to form a complex with said endonuclease comprising:
  - i. a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and
  - ii. a ribonucleic acid sequence configured to bind to said endonuclease.

Embodiment 10. The engineered nucleic acid editing system of Embodiment 9, wherein said endonuclease comprises a nickase mutation.

Embodiment 11. The engineered nucleic acid editing system of Embodiment 9, wherein said endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid.

Embodiment 12. The engineered nucleic acid editing system of Embodiment 9, wherein said class 2, type II endonuclease comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597 when optimally aligned.

Embodiment 13. The engineered nucleic acid editing system of Embodiment 9, wherein said base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 1-51, 57-66, 385-443, 444-475, 594-595, or 599-675, or a variant thereof.

Embodiment 14. The engineered nucleic acid editing system of Embodiment 9, wherein said base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 50-51 or 385-390.

Embodiment 15. The engineered nucleic acid editing system of any one of Embodiment 8-Embodiment 14, wherein said endonuclease comprises a RuvC domain lacking nuclease activity.

Embodiment 16. The engineered nucleic acid editing system of any one of Embodiment 8-Embodiment 15, wherein said endonuclease is derived from an uncultivated microorganism.

Embodiment 17. The engineered nucleic acid editing system of any one of Embodiment 8-Embodiment 16, wherein said endonuclease has less than 80% identity to a Cas9 endonuclease.

Embodiment 18. The engineered nucleic acid editing system of any one of Embodiment 8-Embodiment 17, wherein said endonuclease further comprises an HNH domain.

Embodiment 19. The engineered nucleic acid editing system of any one of Embodiment 1-Embodiment 18, wherein said engineered guide ribonucleic acid sequence comprises a sequence with at least 80% sequence identity to non-degenerate nucleotides of any one of SEQ ID NOs: 88-96, 488-489, or 679-680, or a variant thereof.

Embodiment 20. An engineered nucleic acid editing system comprising,

- (a) an engineered guide ribonucleic acid structure comprising:
  - (i) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and
  - (ii) a ribonucleic acid sequence configured to bind to an endonuclease,
    - wherein said engineered ribonucleic acid sequence comprises a sequence with at least 80% sequence identity to non-degenerate nucleotides of any one of SEQ ID NOs: 88-96, 488-489, or 679-680, or a variant thereof; and
- (b) a class 2, type II endonuclease configured to bind to said engineered guide ribonucleic acid; and
- (c) a base editor coupled to said endonuclease.

Embodiment 21. The engineered nucleic acid editing system of Embodiment 20, wherein said endonuclease is configured to bind to a protospacer adjacent motif (PAM) sequence selected from the group consisting of Sequence Numbers: A360-A368 or A598.

Embodiment 22. The engineered nucleic acid editing system of any one of Embodiment 1-Embodiment 21, wherein said base editor comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 1-51, 57-66, 385-443, 444-475, 594-595, or 599-675, or a variant thereof.

Embodiment 23. The engineered nucleic acid editing system of any one of Embodiment 1-Embodiment 22, wherein said base editor comprises a sequence having at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 50-51 or 385-390.

Embodiment 24. The engineered nucleic acid editing system of any of embodiments Embodiment 1-Embodiment 22, wherein said base editor is an adenine deaminase.

Embodiment 25. The engineered nucleic acid editing system of Embodiment 23, wherein said adenosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 50-51, 57, 385-443, 448-475, or 595, or a variant thereof.

Embodiment 26. The engineered nucleic acid editing system of any of Embodiment 1-Embodiment 22, wherein said base editor is a cytosine deaminase.

Embodiment 27. The engineered nucleic acid editing system of Embodiment 26, wherein said cytosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 1-49, 444-447, 594, 58-66, or 599-675, or a variant thereof.

Embodiment 28. The engineered nucleic acid editing system of any one of Embodiment 1-Embodiment 27, comprising a uracil DNA glycosylase inhibitor (UGI) coupled to said endonuclease or said base editor.

Embodiment 29. The engineered nucleic acid editing system of Embodiment 28, wherein said uracil DNA glycosylase inhibitor (UGI) comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 52-56 or SEQ ID NO: 67.

Embodiment 30. The engineered nucleic acid editing system of any one of embodiments 1-Embodiment 29, wherein said engineered guide ribonucleic acid structure comprises at least two ribonucleic acid polynucleotides.

Embodiment 31. The engineered nucleic acid editing system of any one of embodiments 1-Embodiment 29, wherein said engineered guide ribonucleic acid structure comprises one ribonucleic acid polynucleotide comprising said guide ribonucleic acid sequence and said ribonucleic acid sequence configured to bind to an endonuclease.

Embodiment 32. The engineered nucleic acid editing system of any one of Embodiment 1-Embodiment 31, wherein said guide ribonucleic acid sequence is complementary to a prokaryotic, bacterial, archaeal, eukaryotic, fungal, plant, mammalian, or human genomic sequence.

Embodiment 33. The engineered nucleic acid editing system of any one of Embodiment 1-Embodiment 32, wherein said guide ribonucleic acid sequence is 15-24 nucleotides in length.

Embodiment 34. The engineered nucleic acid editing system of any one of embodiments 1-Embodiment 33, further comprising one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of said endonuclease.

Embodiment 35. The engineered nucleic acid editing system of Embodiment 34, wherein said NLS comprises a sequence with at least 90% identity to a selected from SEQ ID NOs: 369-384, or a variant thereof.

Embodiment 36. The engineered nucleic acid editing system of any one of Embodiment 1-Embodiment 35, wherein said endonuclease is covalently coupled directly to said base editor or covalently coupled to said base editor through a linker.

Embodiment 37. The engineered nucleic acid editing system of Embodiment 36, wherein a polypeptide comprises said endonuclease and said base editor.

Embodiment 38. The engineered nucleic acid editing system of any one of Embodiment 1-Embodiment 37, wherein said endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid.

Embodiment 39. The engineered nucleic acid editing system of any one of Embodiment 1-Embodiment 38, wherein said system further comprises a source of Mg²⁺.

Embodiment 40. The engineered nucleic acid editing system of any one of embodiments 1-Embodiment 39, wherein:

- a) said endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to any one of SEQ ID NOs: 70, 71, 73, 74, 76, 78, 77, or 78, or a variant thereof;
- b) said guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to non-degenerate nucleotides of any one of SEQ ID NOs: 88, 89, 91, 92, 94, 96, 95, or 488;
- c) said endonuclease is configured to bind to a PAM comprising any one of Sequence Numbers: A360, A361, A363, A365, A367, or A368; or
- d) said base editor comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NOs: 58 or 595, or a variant thereof.

Embodiment 41. The engineered nucleic acid editing system of any one of embodiments 1-Embodiment 39, wherein:

- a) said endonuclease comprises a sequence at least 70%, at least 80%, or at least 90% identical to any one of SEQ ID NOs: 70, 71, or 78, or a variant thereof;
- b) said guide RNA structure comprises a sequence at least 70%, at least 80%, or at least 90% identical to non-degenerate nucleotides of at least one of SEQ ID NOs: 88, 89, or 96;
- c) said endonuclease is configured to bind to a PAM comprising any one of Sequence Numbers: A360, A362, or A368; or
- d) said base editor comprises a sequence at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 594, or a variant thereof.

Embodiment 42. The engineered nucleic acid editing system of any one of embodiments 1-Embodiment 41, wherein said sequence identity is determined by a BLASTP, CLUSTALW, MUSCLE, MAFFT, or Smith-Waterman homology search algorithm.

Embodiment 43. The engineered nucleic acid editing system of Embodiment 42, wherein said sequence identity is determined by said BLASTP homology search algorithm using parameters of a wordlength (W) of 3, an expectation (E) of 10, and a BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment.

Embodiment 44. The engineered nucleic acid editing system of any one of Embodiment 1-Embodiment 43, wherein said endonuclease is configured to be catalytically dead.

Embodiment 45. A nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein said nucleic acid encodes a class 2, type II endonuclease coupled to a base editor, and wherein said endonuclease is derived from an uncultivated microorganism.

Embodiment 46. A nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein said nucleic acid encodes an endonuclease having at least 70% sequence identity to any one of SEQ ID NOs: 70-78 coupled to a base editor.

Embodiment 47. The nucleic acid of any one of Embodiment 44-Embodiment 46, wherein said endonuclease comprises a sequence encoding one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of said endonuclease.

Embodiment 48. The nucleic acid of Embodiment 47, wherein said NLS comprises a sequence with at least 90% identity to a selected from SEQ ID NOs: 369-384, or a variant thereof.

Embodiment 49. The nucleic acid of any one of Embodiment 44-Embodiment 48, wherein said organism is prokaryotic, bacterial, eukaryotic, fungal, plant, mammalian, rodent, or human.

Embodiment 50. A vector comprising a nucleic acid sequence encoding a class 2, type II endonuclease coupled to a base editor, wherein said endonuclease is derived from an uncultivated microorganism.

Embodiment 51. A vector comprising the nucleic acid of any of embodiments Embodiment 44-Embodiment 49.

Embodiment 52. The vector of any of Embodiment 50-Embodiment 51, further comprising a nucleic acid encoding an engineered guide ribonucleic acid structure configured to form a complex with said endonuclease comprising:

- a) a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and
- b) a ribonucleic acid sequence configured to binding to said endonuclease.

Embodiment 53. The vector of any of Embodiment 50-Embodiment 52, wherein the vector is a plasmid, a minicircle, a CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus.

Embodiment 54. A cell comprising the vector of any of Embodiment 50-Embodiment 53.

Embodiment 55. A method of manufacturing an endonuclease, comprising cultivating said cell of Embodiment 54.

Embodiment 56. A method for modifying a double-stranded deoxyribonucleic acid polynucleotide comprising contacting said double-stranded deoxyribonucleic acid polynucleotide with a complex comprising:

- a) an endonuclease comprising a RuvC domain and an HNH domain, wherein said endonuclease is derived from an uncultivated microorganism, wherein said endonuclease is a class 2, type II endonuclease, and wherein said RuvC domain lacks nuclease activity;
- b) a base editor coupled to said endonuclease; and
- c) an engineered guide ribonucleic acid structure configured to bind to said endonuclease and said double-stranded deoxyribonucleic acid polynucleotide;
  
  wherein said double-stranded deoxyribonucleic acid polynucleotide comprises a protospacer adjacent motif (PAM).

Embodiment 57. The method of Embodiment 56, wherein said endonuclease comprising a RuvC domain and an HNH domain is covalently coupled directly to said base editor or covalently coupled to said base editor through a linker.

Embodiment 58. The method of Embodiment 56 or Embodiment 57, wherein said endonuclease comprising a RuvC domain and an HNH domain comprises a sequence with at least 95% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof.

Embodiment 59. The method of any one of Embodiment 56-Embodiment 57, wherein said endonuclease comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73 or 78, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, residue 8 relative to SEQ ID NO: 77, or residue 10 relative to SEQ ID NO: 597 when optimally aligned.

Embodiment 60. The method of any one of Embodiment 56-Embodiment 57, wherein said endonuclease comprises an aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NO: 72, or residue 17 relative to SEQ ID NO: 75 when optimally aligned.

Embodiment 61. A method for modifying a double-stranded deoxyribonucleic acid polynucleotide, comprising contacting said double-stranded deoxyribonucleic acid polynucleotide with a complex comprising:

- a class 2, type II endonuclease,
- a base editor coupled to said endonuclease, and
- an engineered guide ribonucleic acid structure configured to bind to said endonuclease
- and said double-stranded deoxyribonucleic acid polynucleotide;
  
  wherein said double-stranded deoxyribonucleic acid polynucleotide comprises a protospacer adjacent motif (PAM); and
  
  wherein said PAM comprises a sequence selected from the group consisting of SEQ ID NOs:70-78 or 597.

Embodiment 62. The method of Embodiment 61, wherein said class 2, type II endonuclease is covalently coupled to said base editor or coupled to said base editor through a linker.

Embodiment 63. The method of Embodiment 61 or Embodiment 62, wherein said base editor comprises a sequence with at least 70%, at least 80%, at least 90% or at least 95% identity to a sequence selected from SEQ ID NOs: 1-51, 57-66, 385-443, 444-475, 594-595, or 599-675, or a variant thereof.

Embodiment 64. The method of any one of Embodiment 61-Embodiment 63, wherein

- said base editor comprises an adenine deaminase;
- said double-stranded deoxyribonucleic acid polynucleotide comprises an adenine; and
- modifying said double-stranded deoxyribonucleic acid polypeptide comprises converting said adenine to guanine.

Embodiment 65. The method of Embodiment 64, wherein said adenine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs: 50-51, 57, 385-443, 448-475, or 595, or a variant thereof.

Embodiment 66. The method of any one of Embodiment 61-Embodiment 63, wherein

- said base editor comprises a cytosine deaminase;
- said double-stranded deoxyribonucleic acid polynucleotide comprises a cytosine; and
- modifying said double-stranded deoxyribonucleic acid polypeptide comprises converting said cytosine to uracil.

Embodiment 67. The method of Embodiment 66, wherein said cytosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 594, 58-66, or 599-675, or a variant thereof.

Embodiment 68. The method of any one of Embodiment 61-Embodiment 67, wherein said complex further comprises a uracil DNA glycosylase inhibitor coupled to said endonuclease or said base editor.

Embodiment 69. The method of Embodiment 68, wherein said uracil DNA glycosylase inhibitor comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 52-56 or SEQ ID NO: 67, or a variant thereof.

Embodiment 70. The method of any one of Embodiment 61-Embodiment 69, wherein said double-stranded deoxyribonucleic acid polynucleotide comprises a first strand comprising a sequence complementary to a sequence of said engineered guide ribonucleic acid structure and a second strand comprising said PAM.

Embodiment 71. The method of Embodiment 70, wherein said PAM is directly adjacent to the 3′ end of said sequence complementary to said sequence of said engineered guide ribonucleic acid structure.

Embodiment 72. The method of any one of Embodiment 61-Embodiment 71, wherein said class 2, type II endonuclease is not a Cas9 endonuclease, a Cas14 endonuclease, a Cas12a endonuclease, a Cas12b endonuclease, a Cas 12c endonuclease, a Cas12d endonuclease, a Cas12e endonuclease, a Cas13a endonuclease, a Cas13b endonuclease, a Cas13c endonuclease, or a Cas 13d endonuclease.

Embodiment 73. The method of any one of Embodiment 61-Embodiment 72, wherein said class 2, type II endonuclease is derived from an uncultivated microorganism.

Embodiment 74. The method of any one of Embodiment 61-Embodiment 73, wherein said double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.

Embodiment 75. A method of modifying a target nucleic acid locus, said method comprising delivering to said target nucleic acid locus said engineered nucleic acid editing system of any one of embodiments 1-Embodiment 44, wherein said endonuclease is configured to form a complex with said engineered guide ribonucleic acid structure, and wherein said complex is configured such that upon binding of said complex to said target nucleic acid locus, said complex modifies a nucleotide of said target nucleic locus.

Embodiment 76. The method of Embodiment 75, wherein said engineered nucleic acid editing system comprises an adenine deaminase, said nucleotide is an adenine, and modifying said target nucleic acid locus comprises converting said adenine to a guanine.

Embodiment 77. The method of Embodiment 75, wherein said engineered nucleic acid editing system comprises a cytidine deaminase and a uracil DNA glycosylase inhibitor, said nucleotide is a cytosine and modifying said target nucleic acid locus comprises converting said adenine to a uracil.

Embodiment 78. The method of any one of Embodiment 75-Embodiment 77, wherein said target nucleic acid locus comprises genomic DNA, viral DNA, or bacterial DNA.

Embodiment 79. The method of any one of Embodiment 75-Embodiment 78, wherein said target nucleic acid locus is in vitro.

Embodiment 80. The method of any one of Embodiment 75-Embodiment 78, wherein said target nucleic acid locus is within a cell.

Embodiment 81. The method of Embodiment 80, wherein said cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, or a human cell.

Embodiment 82. The method of any one of Embodiment 80-Embodiment 81, wherein said cell is within an animal.

Embodiment 83. The method of Embodiment 82, wherein said cell is within a cochlea.

Embodiment 84. The method of any one of Embodiment 80-Embodiment 81, wherein said cell is within an embryo.

Embodiment 85. The method of Embodiment 84, wherein said embryo is a two-cell embryo.

Embodiment 86. The method of Embodiment 84, wherein said embryo is a mouse embryo.

Embodiment 87. The method of any one of Embodiment 75-Embodiment 86, wherein delivering said engineered nucleic acid editing system to said target nucleic acid locus comprises delivering the nucleic acid of any of embodiments Embodiment 46-Embodiment 49 or the vector of any of embodiments Embodiment 50-Embodiment 53.

Embodiment 88. The method of any one of Embodiment 75-Embodiment 87, wherein delivering said engineered nucleic acid editing system to said target nucleic acid locus comprises delivering a nucleic acid comprising an open reading frame encoding said endonuclease.

Embodiment 89. The method of Embodiment 88, wherein said nucleic acid comprises a promoter to which said open reading frame encoding said endonuclease is operably linked.

Embodiment 90. The method of any one of Embodiment 75-Embodiment 89, wherein delivering said engineered nucleic acid editing system to said target nucleic acid locus comprises delivering a capped mRNA comprising said open reading frame encoding said endonuclease.

Embodiment 91. The method of any one of Embodiment 75-Embodiment 86, wherein delivering said engineered nucleic acid editing system to said target nucleic acid locus comprises delivering a polypeptide.

Embodiment 92. The method of any one of Embodiment 75-Embodiment 86, wherein delivering said engineered nucleic acid editing system to said target nucleic acid locus comprises delivering a deoxyribonucleic acid (DNA) encoding said engineered guide ribonucleic acid structure operably linked to a ribonucleic acid (RNA) pol III promoter.

Embodiment 93. An engineered nucleic acid editing polypeptide, comprising:

- an endonuclease comprising a RuvC domain and an HNH domain, wherein said endonuclease is derived from an uncultivated microorganism, wherein said endonuclease is a class 2, type II endonuclease, and
- wherein said RuvC domain lacks nuclease activity; and a base editor coupled to said endonuclease.

Embodiment 94. The engineered nucleic acid editing polypeptide of Embodiment 93, wherein said endonuclease comprises a sequence with at least 95% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof.

Embodiment 95. An engineered nucleic acid editing polypeptide, comprising:

- an endonuclease having at least 95% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof,
  - wherein said endonuclease comprises a RuvC domain lacking nuclease activity; and a base editor coupled to said endonuclease.

Embodiment 96. An engineered nucleic acid editing polypeptide, comprising:

- an endonuclease configured to bind to a protospacer adjacent motif (PAM) sequence comprising any one of Sequence Numbers: A360-A368 or A598,
- wherein said endonuclease is a class 2, type II endonuclease, and
- wherein said endonuclease comprises a RuvC domain lacks nuclease activity; and a base editor coupled to said endonuclease.

Embodiment 97. The engineered nucleic acid editing polypeptide of Embodiment 95 or Embodiment 96, wherein said endonuclease is derived from an uncultivated microorganism.

Embodiment 98. The engineered nucleic acid editing polypeptide of any one of Embodiment 93-Embodiment 97, wherein said endonuclease has less than 80% identity to a Cas9 endonuclease.

Embodiment 99. The engineered nucleic acid editing polypeptide of any one of Embodiment 95-Embodiment 98, wherein said endonuclease further comprises an HNH domain.

Embodiment 100. The engineered nucleic acid editing polypeptide of any one of Embodiment 95-Embodiment 99, wherein said tracr ribonucleic acid sequence comprises a sequence with at least 80% sequence identity to about 60 to 90 consecutive nucleotides selected from any one of SEQ ID NOs: 88-96, 488, 489, and 679-680.

Embodiment 101. The engineered nucleic acid editing polypeptide of any one of Embodiment 93-Embodiment 100, wherein said base editor comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 1-51, 57-66, 385-443, 444-475, 594-595, or 599-675, or a variant thereof.

Embodiment 102. The engineered nucleic acid editing polypeptide of any one of Embodiment 93-Embodiment 101, wherein said base editor is an adenine deaminase.

Embodiment 103. The engineered nucleic acid editing polypeptide of Embodiment 102, wherein said adenosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs: 50-51, 57, 385-443, 448-475, or 595, or a variant thereof.

Embodiment 104. The engineered nucleic acid editing polypeptide of any one of Embodiment 93-Embodiment 101, wherein said base editor is a cytosine deaminase.

Embodiment 105. The engineered nucleic acid editing polypeptide of Embodiment 104, wherein said cytosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs: 1-49, 444-447, 594, or 58-66, or a variant thereof.

Embodiment 106. An engineered nucleic acid editing polypeptide, comprising:

- an endonuclease, wherein said endonuclease is configured to be deficient in endonuclease activity; and
- a base editor coupled to said endonuclease,
- wherein said base editor comprises a sequence with at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs: 1-51, 385-386, 387-443, 444-447, 488-475, 595, or 599-675, or a variant thereof.

Embodiment 107. The engineered nucleic acid editing polypeptide of Embodiment 106, wherein said endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid.

Embodiment 108. The engineered nucleic acid editing polypeptide of Embodiment 106, wherein said endonuclease is configured to be catalytically dead.

Embodiment 109. The engineered nucleic acid editing polypeptide of any one of Embodiment 106-Embodiment 108, wherein said endonuclease is an endonuclease.

Embodiment 110. The engineered nucleic acid editing polypeptide of Embodiment 109, wherein said endonuclease is a Class II, type II endonuclease or a Class II, type V endonuclease.

Embodiment 111. The engineered nucleic acid editing polypeptide of Embodiment 106, wherein said endonuclease comprises a sequence having at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof.

Embodiment 112. The engineered nucleic acid editing polypeptide of any one of Embodiment 109-Embodiment 111, wherein said endonuclease comprises a nickase mutation.

Embodiment 113. The engineered nucleic acid editing polypeptide of Embodiment 112, wherein said endonuclease comprises the aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597.

Embodiment 114. The engineered nucleic acid editing polypeptide of any one of Embodiment 109-Embodiment 113 wherein said endonuclease is configured to bind to a protospacer adjacent motif (PAM) sequence selected from the group consisting of Sequence Numbers: A360-A368 or A598.

Embodiment 115. The engineered nucleic acid editing polypeptide of any one of Embodiment 106-Embodiment 114, wherein said base editor is an adenine deaminase.

Embodiment 116. The engineered nucleic acid editing polypeptide of Embodiment 115, wherein said adenosine deaminase comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 50-51, 385-443, 448-475, or 595, or a variant thereof.

Embodiment 117. The engineered nucleic acid editing polypeptide of Embodiment 116, wherein said adenosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 50-51, 385-390, or 595, or a variant thereof.

Embodiment 118. The engineered nucleic acid editing polypeptide of any one of Embodiment 106-Embodiment 114, wherein said base editor is a cytosine deaminase.

Embodiment 119. The engineered nucleic acid editing polypeptide of Embodiment 118, wherein said cytosine deaminase comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 1-49, 444-447, or a variant thereof.

Embodiment 120. The engineered nucleic acid editing polypeptide of any one of Embodiment 106-Embodiment 119, further comprising a uracil DNA glycosylase inhibitor (UGI) coupled to said endonuclease or said base editor.

Embodiment 121. The engineered nucleic acid editing polypeptide of Embodiment 120, wherein said uracil DNA glycosylase inhibitor comprises a sequence with at least 70%, 80%, 90% or 95% identity to any one of SEQ ID NOs: 52-56 or SEQ ID NO: 67, or a variant thereof.

Embodiment 122. The engineered nucleic acid editing polypeptide of any one of Embodiment 106-Embodiment 121, wherein a polypeptide comprising said endonuclease comprises one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of said endonuclease.

Embodiment 123. The engineered nucleic acid editing polypeptide of Embodiment 122, wherein said NLS comprises a sequence with at least 90% identity to a selected from SEQ ID NOs: 369-384, or a variant thereof.

Embodiment 124. The engineered nucleic acid editing polypeptide of any one of Embodiment 106-Embodiment 123, wherein said endonuclease is covalently coupled directly to said base editor or covalently coupled to said base editor through a linker.

Embodiment 125. A nucleic acid comprising an engineered nucleic acid sequence optimized for expression in an organism, wherein said nucleic acid encodes a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 1-51, 385-386, 387-443, 444-447, 488-475, or 595, or a variant thereof.

Embodiment 126. The nucleic acid of Embodiment 125, wherein said organism is prokaryotic, bacterial, eukaryotic, fungal, plant, mammalian, rodent, or human.

Embodiment 127. A vector comprising the nucleic acid of any of Embodiment 125-Embodiment 126.

Embodiment 128. The vector of Embodiment 127, wherein the vector is a plasmid, a minicircle, a CELiD, an adeno-associated virus (AAV) derived virion, or a lentivirus.

Embodiment 129. A cell comprising the vector of any one of Embodiment 127-Embodiment 128.

Embodiment 130. A method of manufacturing a base editor, comprising cultivating said cell of Embodiment 129.

Embodiment 131. A system comprising:

- (a) the nucleic acid editing polypeptide of any one of Embodiment 106-Embodiment 124; and
- (b) an engineered guide ribonucleic acid structure configured to form a complex with said nucleic acid editing polypeptide comprising:
  - i. a guide ribonucleic acid sequence configured to hybridize to a target deoxyribonucleic acid sequence; and
  - ii. ribonucleic acid sequence configured to bind to said endonuclease.

Embodiment 132. The system of Embodiment 131, wherein said engineered guide ribonucleic acid sequence comprises a sequence with at least 80% sequence identity to non-degenerate nucleotides of any one of SEQ ID NOs: 88-96, 488-489, or 679-680.

Embodiment 133. A method of modifying a target nucleic acid locus, said method comprising delivering to said target nucleic acid locus said engineered nucleic acid editing polypeptide of any one of embodiments Embodiment 106-Embodiment 124 or said system of any one of embodiments Embodiment 131-Embodiment 132, wherein said complex is configured such that upon binding of said complex to said target nucleic acid locus, said complex modifies a nucleotide of said target nucleic locus.

Embodiment 134. A nucleic acid editing polypeptide, comprising:

- an adenosine deaminase, comprising a polypeptide sequence comprising a substitution at at least one residue selected from the group consisting of residue 24, residue 83, residue 85, residue 107, residue 109, residue 112, residue 124, residue 143, residue 147, residue 148, residue 154, or residue 158 relative to SEQ ID NO: 386 when optimally aligned.

Embodiment 135. The nucleic acid editing polypeptide of Embodiment 134, wherein said residue substituted is selected from W24, V83, L85, A107, D109, T112, H124, A143, S147, D148, R154, and K158.

Embodiment 136. The nucleic acid editing polypeptide of Embodiment 134 or Embodiment 135, wherein said substitution is a conservative substitution.

Embodiment 137. The nucleic acid editing polypeptide of Embodiment 134 or Embodiment 135, wherein said substitution is a non-conservative substitution.

Embodiment 138. The nucleic acid editing polypeptide of any one of Embodiment 134-Embodiment 137, comprising a substitution at W24, wherein said substitution is W24R.

Embodiment 139. The nucleic acid editing polypeptide of any one of Embodiment 134-Embodiment 138, comprising a substitution at V83, wherein said substitution is V83S.

Embodiment 140. The nucleic acid editing polypeptide of any one of Embodiment 134-Embodiment 139, comprising a substitution at L85, wherein said substitution is L85F.

Embodiment 141. The nucleic acid editing polypeptide of any one of Embodiment 134-Embodiment 140, comprising a substitution at A107, wherein said substitution is AT07V.

Embodiment 142. The nucleic acid editing polypeptide of any one of Embodiment 134-Embodiment 141, comprising a substitution at D109, wherein said substitution is D109N.

Embodiment 143. The nucleic acid editing polypeptide of any one of Embodiment 134-Embodiment 142, comprising a substitution at T112, wherein said substitution is T112R.

Embodiment 144. The nucleic acid editing polypeptide of any one of Embodiment 134-Embodiment 143, comprising a substitution at H124, wherein said substitution is H124Y.

Embodiment 145. The nucleic acid editing polypeptide of any one of Embodiment 134-Embodiment 144, comprising a substitution at A143, wherein said substitution is A143N.

Embodiment 146. The nucleic acid editing polypeptide of any one of Embodiment 134-Embodiment 145, comprising a substitution at S147, wherein said substitution is S147C.

Embodiment 147. The nucleic acid editing polypeptide of any one of Embodiment 134-Embodiment 146, comprising a substitution at D148, wherein said substitution is D148Y or D148R.

Embodiment 148. The nucleic acid editing polypeptide of any one of Embodiment 134-Embodiment 147, comprising a substitution at R154, wherein said substitution is R154P.

Embodiment 149. The nucleic acid editing polypeptide of any one of Embodiment 134-Embodiment 148, comprising a substitution at K158, wherein said substitution is K158N.

Embodiment 150. The nucleic acid editing polypeptide of any one of Embodiment 134-Embodiment 149, wherein said adenosine deaminase comprises a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% sequence identity to any one of SEQ ID NOs: 50-51 or 385-443.

Embodiment 151. The engineered nucleic acid editing polypeptide of any one of Embodiment 134-Embodiment 150, further comprising an endonuclease, wherein said endonuclease is configured to be deficient in endonuclease activity.

Embodiment 152. The engineered nucleic acid editing polypeptide of Embodiment 151, wherein said endonuclease is configured to cleave one strand of a double-stranded target deoxyribonucleic acid.

Embodiment 153. The engineered nucleic acid editing polypeptide of Embodiment 151, wherein said endonuclease is configured to be catalytically dead.

Embodiment 154. The engineered nucleic acid editing polypeptide of any one of Embodiment 151-Embodiment 153, wherein said endonuclease is a Cas endonuclease.

Embodiment 155. The engineered nucleic acid editing polypeptide of Embodiment 154, wherein said endonuclease is a Class II, type II endonuclease or a Class II, type V endonuclease.

Embodiment 156. The engineered nucleic acid editing polypeptide of Embodiment 155, wherein said endonuclease comprises a sequence having at least 70%, 80%, 90% or 95% sequence identity to any one of SEQ ID NOs:70-78 or 597, or a variant thereof.

Embodiment 157. The engineered nucleic acid editing polypeptide of any one of Embodiment 151-Embodiment 156, wherein said endonuclease comprises a nickase mutation.

Embodiment 158. The engineered nucleic acid editing polypeptide of Embodiment 157, wherein said endonuclease comprises the aspartate to alanine mutation at residue 9 relative to SEQ ID NO: 70, residue 13 relative to SEQ ID NOs: 71, 72, or 74, residue 12 relative to SEQ ID NO: 73, residue 17 relative to SEQ ID NO: 75, residue 23 relative to SEQ ID NO: 76, or residue 10 relative to SEQ ID NO: 597.

Embodiment 159. The engineered nucleic acid editing polypeptide of any one of Embodiment 151-Embodiment 156, wherein said endonuclease is configured to bind to a protospacer adjacent motif (PAM) sequence selected from the group consisting of Sequence Numbers: A360-A368 or A598.

Number	Date	Country
63276461	Nov 2021	US
63289998	Dec 2021	US
63342824	May 2022	US
63356888	Jun 2022	US
63378171	Oct 2022	US

	Number	Date	Country
Parent	PCT/US2022/079345	Nov 2022	WO
Child	18653454		US

BASE EDITING ENZYMES

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE

Provisional Applications (5)

Continuations (1)