SITE-SPECIFIC GENOME MODIFICATION TECHNOLOGY

SEQUENCE LISTING

The text of the computer readable sequence listing filed herewith, titled “39212-601_SEQUENCE_LISTING_ST25”, created Feb. 14, 2022, having a file size of 144,908 bytes, is hereby incorporated by reference in its entirety.

FIELD

The present disclosure provides compositions, methods, and systems related to template-mediated genome modification. In particular, the present disclosure provides novel genome modification technology involving site-specific chemical modification of a nucleotide to introduce a replication-blocking lesion. The compositions, methods, and systems described herein facilitate efficient site-specific genome modification of a DNA target, while minimizing the unintended edits and cellular toxicity associated with current genome editing approaches.

BACKGROUND

CRISPR-based genome editing tools have found widespread application, relying on their easily programmable targeting and robust activity. Early use of these CRISPR-based tools has focused on the ability of Cas nucleases to cleave DNA. In the process of repairing the cleaved DNA, a genomic edit is introduced through homologous recombination with a supplied DNA repair template. DNA cleavage is, however, among the most toxic cellular events; DNA cleavage sets off cellular alarm systems which lead to mutations, DNA re-arrangements, or loss of cellular viability. Subsequent CRISPR-Cas genome editing tools have sought alternative approaches through target modification of individual bases or integration of a short template encoded within the guide RNA. Still, these methods are restricted in the range of edits that can be generated and can produce undesired edits. Therefore, there is a need for efficient genome editing and modification platforms that overcome the limitations of current systems.

SUMMARY

Embodiments of the present disclosure include a composition for targeted genome modification. In accordance with these embodiments, the composition includes a gap editor complex comprising a DNA-recognition domain and a DNA-modifying domain, wherein the DNA-recognition domain binds a DNA target sequence in the genome, and wherein the DNA-modifying domain induces formation of a replication blocking moiety on at least one nucleotide in the genome.

In some embodiments, the composition further comprises a donor nucleic acid template. In some embodiments, the donor nucleic acid template comprises a polynucleotide from an endogenous homologous sequence corresponding to the DNA target sequence. In some embodiments, the donor nucleic acid template comprise an exogenous single-stranded DNA (ssDNA) molecule or double-stranded DNA (dsDNA) molecule. In some embodiments, the donor nucleic acid template is an RNA molecule. In some embodiments, the presence of the donor nucleic acid template facilitates homology-directed gap repair and/or recombination, wherein the donor nucleic acid template or a fragment thereof is recombined into the genome of the DNA target sequence.

In some embodiments, the DNA-recognition domain comprises at least one Cas protein or fragment thereof lacking deoxyribonuclease activity. In some embodiments, the DNA-recognition domain comprises a complex of Cas proteins lacking deoxyribonuclease activity. In some embodiments, the DNA-recognition domain comprises a Cas protein or fragment thereof having nickase activity. In some embodiments, the Cas protein or Cas protein complex comprises a Type I Cascade, a Type II Cas9, a Type IV effector module, a Type V Cas12, a Cas9-related IscB, a Cas9-related TnpB, and combinations thereof.

In some embodiments, the DNA-recognition domain and the DNA-modifying domain are functionally coupled. In some embodiments, functionally coupled comprises polypeptide fusions, peptide tags, peptide linkers, RNA tags, and any combinations thereof.

In some embodiments, the DNA-modifying domain blocks DNA replication by adding the replication blocking moiety to: (i) at least one nucleotide in the DNA strand complementary to the DNA target sequence; (ii) at least one nucleotide in the DNA strand containing the DNA target sequence; or (iii) both at least one nucleotide in the DNA strand complementary to the DNA target sequence and at least one nucleotide in the DNA strand containing the DNA target sequence.

In some embodiments, the DNA-recognition domain induces a single-stranded break in the DNA target strand, and the DNA-modifying domain adds the replication blocking moiety to at least one nucleotide in the DNA strand complementary to the DNA target sequence.

In some embodiments, the DNA-modifying domain has been engineered to have reduced DNA binding, increased specificity to single-stranded DNA, and/or decreased enzymatic activity.

In some embodiments, the DNA-modifying domain catalyzes addition of ADP ribose to a thymine or guanine nucleotide. In some embodiments, the DNA-modifying domain comprises a DarT enzyme or a functional fragment, derivative, or variant thereof. In some embodiments, the DNA-modifying domain comprises a catalytic domain having at least 70% amino acid sequence identity with any of SEQ ID NOs: 18-21. In some embodiments, the DarT enzyme comprises one or more of the following amino acid substitutions: G49D, K56A, M86L, R92A, and/or R193A.

In some embodiments, the DNA-modifying domain comprises a Scabin enzyme or a functional fragment, derivative, or variant thereof. In some embodiments, the DNA-modifying domain comprises a catalytic domain having at least 70% amino acid sequence identity with any of SEQ ID NOs: 22-24. In some embodiments, the Scabin enzyme comprises an amino acid substitution that is K130A.

In some embodiments, the DNA-modifying domain catalyzes methylcarbamoylation of an adenine nucleotide. In some embodiments, the DNA-modifying domain comprises a Mom enzyme or a functional fragment, derivative, or variant thereof. In some embodiments, the DNA-modifying domain comprises a catalytic domain having at least 70% amino acid sequence identity with SEQ ID NO: 25-27. In some embodiments, the Mom enzyme comprises an amino acid substitution that is D149A.

In some embodiments, the DNA-modifying domain catalyzes addition a replication blocking moiety selected from the group consisting of: glucose, threonyl carbamoyl adenosine, acetate, glyceryl, L-ascorbic acid, uridine, adenosine mono-phosphate, a lipid, an amino acid, agmatine, L-threonylcarbamoyladenylate, L-threonylcarbamoyl, methylthiolate, sulfur, a methyl group, S-adenosyl-L-methione or a subgroup of S-adenosyl-L-methione, and dimethylallyl diphosphate or a subgroup thereof.

In some embodiments, the DNA-modifying enzyme domain comprises an enzyme or functional fragment, derivative, or variant thereof, selected from the group consisting of: Pierisin, Scabin, Cell cycle and apoptosis regulator 1 (CARP-1), SCO5461 protein (ScARP), adenine modification enzyme, acetyltransferase, amino acid transferase, nucleotidyl transferase, uridyltransferase, acyltransferase, ADP-ribsoyltransferase, methylthiotransferase, N-acetyl transferase 10, tRNA(Met) cytidine acetyltransferase (TmcA), tRNA cytidine acetyltransferase, GCN5-related N-acetyltransferase, lysidine synthase, m⁷G methyltransferase, N6 carbamoylmethyltransferase (Mom), N6-adenosine threonylcarbamoyltransferase, threonyl carbomyl transferase or threonyl carbomyl transferase complex, TsaB-TsaE-TsaD (TsaBDE) complex, tRNA N6-adenosine threonylcarbamoyltransferase (Qri7, Tcs4), methyltransferase, ATrm5a, tRNA:m¹G/imG2 methyltransferase, tRNA (adenosine(37)-N6)-dimethylallyltransferase, tRNA dimethylallyltransferase (MiaA), and isopentenyltransferase.

In some embodiments, the composition comprises at least one guide RNA molecule. In some embodiments, the at least one guide RNA comprises gRNA, sgRNA, crRNA, or any combinations thereof. In some embodiments, the at least one guide RNA comprises a handle sequence and a targeting sequence. In some embodiments, the at least one guide RNA is complementary to the DNA target sequence.

In some embodiments, the composition further comprises at least one gap editor accessory factor. In some embodiments, the at least one gap editor accessory factor comprises a protein that augments at least one step in a genome modification process. In some embodiments, the at least one gap editor accessory factor is recruited to the gap editor complex via interaction with the DNA-modifying domain, the DNA-recognition domain, and/or the at least one guide RNA. In some embodiments, the recruitment of the at least one gap editor accessory factor to the gap editor complex comprises a peptide tag, a peptide linker, an RNA tag, and any combinations thereof. In some embodiments, the at least one gap editor accessory factor comprises Rap, DarG, Orf, ExoI, Exonuclease III, PrimPol, RecJ, RecQ1, Rad51, Rad52, CtIP, Rad18, and any combinations thereof.

Embodiments of the present disclosure also includes a kit for targeted genome modification. In accordance with these embodiments, the kit includes a gap editor complex comprising a DNA-recognition domain and a DNA-modifying domain, wherein the DNA-recognition domain binds a DNA target sequence in the genome, and wherein the DNA-modifying domain induces formation of a replication blocking moiety on at least one nucleotide in the genome.

In some embodiments, the kit further comprises a donor nucleic acid template. In some embodiments, the presence of the donor nucleic acid template facilitates homology-directed gap repair and/or recombination.

In some embodiments, the kit further comprises a guide RNA molecule.

In some embodiments of the kit, the DNA-recognition domain comprises at least one Cas protein or fragment thereof lacking deoxyribonuclease activity. In some embodiments, the DNA-recognition domain comprises at least one Cas protein or fragment thereof having nickase activity. In some embodiments, the Cas protein or Cas protein complex comprises a Type I Cascade, a Type II Cas9, a Type IV effector module, a Type V Cas12, a Cas9-related IscB, a Cas9-related TnpB, and combinations thereof.

In some embodiments of the kit, the DNA-recognition domain and the DNA-modifying domain are functionally coupled. In some embodiments, the DNA-recognition domain induces a single-stranded break in the DNA target strand, and wherein the DNA-modifying domain adds the replication blocking moiety to at least one nucleotide in the DNA strand complementary to the DNA target sequence.

In some embodiments of the kit, the DNA-modifying domain catalyzes addition of ADP ribose to a thymine or guanine nucleotide. In some embodiments, the DNA-modifying domain comprises a DarT enzyme or a functional fragment, derivative, or variant thereof. In some embodiments, the DNA-modifying domain comprises a Scabin enzyme or a functional fragment, derivative, or variant thereof. In some embodiments, the DarT enzyme has been engineered to have reduced DNA binding, increased specificity to single-stranded DNA, and/or decreased enzymatic activity.

In some embodiments of the kit, the DNA-modifying domain catalyzes methylcarbamoylation of an adenine nucleotide. In some embodiments, the DNA-modifying domain comprises a Mom enzyme or a functional fragment, derivative, or variant thereof. In some embodiments, the Mom enzyme has been engineered to have reduced DNA binding, increased specificity to single-stranded DNA, and/or decreased enzymatic activity.

In some embodiments of the kit, the DNA-modifying domain catalyzes addition a replication blocking moiety selected from the group consisting of: glucose, threonyl carbamoyl adenosine, acetate, glyceryl, L-ascorbic acid, uridine, adenosine mono-phosphate, a lipid, an amino acid, agmatine, L-threonylcarbamoyladenylate, L-threonylcarbamoyl, methylthiolate, sulfur, a methyl group, S-adenosyl-L-methione or a subgroup of S-adenosyl-L-methione, and dimethylallyl diphosphate or a subgroup thereof.

In some embodiments of the kit, the DNA-modifying enzyme domain comprises an enzyme or functional fragment, derivative, or variant thereof, selected from the group consisting of: Pierisin, Scabin, Cell cycle and apoptosis regulator 1 (CARP-1), SCO5461 protein (ScARP), adenine modification enzyme, acetyltransferase, amino acid transferase, nucleotidyl transferase, uridyltransferase, acyltransferase, ADP-ribsoyltransferase, methylthiotransferase, N-acetyl transferase 10, tRNA(Met) cytidine acetyltransferase (TmcA), tRNA cytidine acetyltransferase, GCN5-related N-acetyltransferase, lysidine synthase, m⁷G methyltransferase, N6 carbamoylmethyltransferase (Mom), N6-adenosine threonylcarbamoyltransferase, threonyl carbomyl transferase or threonyl carbomyl transferase complex, TsaB-TsaE-TsaD (TsaBDE) complex, tRNA N6-adenosine threonylcarbamoyltransferase (Qri7, Tcs4), methyltransferase, ATrm5a, tRNA:m¹G/imG2 methyltransferase, tRNA (adenosine(37)-N6)-dimethylallyltransferase, tRNA dimethylallyltransferase (MiaA), and isopentenyltransferase.

In some embodiments of the kit, the at least one guide RNA comprises gRNA, sgRNA, crRNA, or any combinations thereof. In some embodiments, the at least one guide RNA comprises a handle sequence and a targeting sequence. In some embodiments, the targeting sequence in the at least one guide RNA is complementary to the DNA target sequence.

In some embodiments, the kit further comprises at least one gap editor accessory factor.

Embodiments of the present disclosure also include a method for targeted genome modification. In accordance with these embodiments, the method includes introducing any of the compositions of the present disclosure into a cell, and assessing the cell for presence of a desired genome alteration.

In some embodiments, a gap editor complex and/or a at least one guide RNA molecule are introduced into the cell as a polypeptide(s), mRNA(s), and/or DNA expression construct(s). In some embodiments, the gap editor complex and/or the guide RNA are introduced into the cell as part of a gene drive system.

In some embodiments, the cell is a prokaryotic cell or a eukaryotic cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is a plant cell.

In some embodiments, the method leads to a reduced degree of indel formation, chromosomal rearrangements, and/or DNA duplications.

In some embodiments, cell viability is enhanced and/or cell toxicity is reduced.

Other aspects and embodiments of the disclosure will be apparent in light of the following detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1B: FIG. 1A provides a representative illustration of the general mechanism of gap editing. A bulky chemical group appended to one strand of DNA by a gap editor blocks DNA replication, resulting in a single-stranded DNA gap. That gap is then repaired through homologous recombination that can integrate a homologous repair template. The opposite strand can also be nicked or chemically modified to block recombination with sister chromatid and enhance editing. FIG. 1B includes representative results of experiments demonstrating efficient lacZ gene repair with significantly reduced cytotoxic effects using gap editor complexes comprising a DNA-modifying enzyme (DarT) engineered to have reduced DNA binding.

FIG. 2 includes representative results of experiments demonstrating efficient lacZ gene repair with significantly reduced cytotoxic effects using gap editor complexes comprising a DNA-recognition domain (DarT_G49D_K56A-ScnCas9 or GE2n) engineered to have nickase activity.

FIG. 3 includes representative results of experiments demonstrating the attenuation of lacZ gene repair by gap editor complexes when a gap editor accessory factor is used (DarG) to counteract the function of the DNA-modifying domain (DarT) of the gap editor complex.

FIG. 4 includes representative results of experiments demonstrating successful genome modification through increased frequency of kanamycin gene repair using gap editor complexes comprising a DNA-modifying domain (Scabin) in combination with a Cas9 DNA-recognition domain (Scabin-K130A-ScdCas9).

FIG. 5 includes representative results of experiments demonstrating successful genome modification through increased frequency of kanamycin gene repair using gap editor complexes comprising a DNA-modifying domain (Mom) in combination with a Cas9 DNA-recognition domain (Mom-D149A-ScdCas9).

FIG. 6 includes representative results of experiments demonstrating that successful genome modification (e.g., though increased frequency of kanamycin gene repair) using gap editor complexes relies on a DNA-modifying domain (DarT) in combination with a Cas9 DNA-recognition domain (DarT-G49D-ScdCas9) and active RNA-directed targeting. (ScdCas9 alone did not lead to kanamycin gene repair.)

FIG. 7 includes representative results of experiments using a gap editor complex with a DarT DNA-modifying domain comprising a specific mutation (R193A) that significantly reduces toxicity (DarT-G49D-R193A-ScdCas9).

FIG. 8 includes representative results of experiments using a gap editor complex with a DarT DNA-modifying domain comprising mutations (G49D, R193A, M86L, and R92A) that significantly reduces background editing while maintaining on-target editing, as demonstrated through reduced and maintained frequency of kanamycin gene repair, respectively.

FIG. 9 includes representative results of experiments demonstrating successful genome modification through increased frequency of kanamycin gene repair using gap editor complexes comprising a DNA-modifying domain (DarT) with mutations (G49D and/or R193A) that significantly reduce toxicity in combination with a Cas9 DNA-recognition domain having nickase activity (ScdCas9). Adding the R193A mutation to the G49D mutation further reduced toxicity without compromising modification. Site-specific genome modification was nearly 100% effective.

FIG. 10 includes representative results of experiments demonstrating that gene knockout of fcy1 confers resistance to 5-Fluorocytosine (5-FC). Targeting the fcy1 gene in Saccharomyces Cerevisiae with a Cas9 nickase (ScnCas9) or the fusion of an engineered DarT gene to a Cas9 nickase and providing a repair template resulted in genome modification at fcy1. For all mutations, the fusion of DarT provides a >10-fold increase in the rate of genome editing, demonstrating the utility of the introduction of replication blocking moieties in a eukaryotic cell.

FIG. 11 includes representative results of experiments demonstrating that gene knockout of fcy1 confers resistance to 5-Fluorocytosine (5-FC). Targeting the fcy1 gene in Saccharomyces Cerevisiae with a Cas9 nickase (ScnCas9) or the fusion of an engineered DarT gene to a Cas9 nickase and providing a repair template resulted in genome modification at fcy1. The repair template encodes 6 mutations introducing two or three stop codons in fcy1, which results in a loss of fcy1 function after genome modification, and resistance to 5-FC. The use of an engineered DarT variant including the G49D, R193A, M86L and R92A mutations improves cell viability up to approximately 50-fold over DarT with the G49D and R193A mutations alone. This gap editor complex effectuates efficient and low toxicity genome modification using two separate single guide RNAs and repair templates targeting fcy1 in yeast.

FIG. 12 includes representative chromatographs providing confirmation of fcy1 genome modification and gene knockout by sanger sequencing. Two or three stop codons were introduced by targeting a gap editor complex to the fcy1 gene and providing a DNA repair template. The edited nucleotides are highlighted in red. Genomic edits for two separate targets within fcy1 are shown.

FIG. 13 includes representative results of experiments demonstrating that gene knockout of lacZ results in a white colony color in the presence of the lactose analog IPTG and the colorimetric indicator X-gal. Targeting the lacZ gene in E. coli with a nuclease-inactive Cas12a protein (dLbCas12a) fused to an engineered DarT gene and providing a repair template resulted in genome modification at lacZ. No genome modification was observed without targeting of the gap editor complex to the lacZ gene.

FIG. 14 includes representative chromatographs demonstrating successful introduction of one or more stop codons into the lacZ gene, eliminating beta-galactosidase expression and thereby resulting in a white colored colony when plated in the presence of the inducer IPTG and the colorimetric indicator X-gal using DarT(G49D/R193A)-dLbCas12a associated with different crRNAs.

FIG. 15 includes representative results of experiments demonstrating that introduction of the D516G mutation into the rpoB gene confers resistance to the antibiotic rifampicin, and thus serves as a readout of genome modification. Targeting the rpoB gene in E. coli with an engineered DarT variant fused to a Cas9 nickase (ScnCas9) and co-expression of an RNA repair template and a reverse transcriptase resulted in site-specific RNA templated genome modification.

FIG. 16 includes representative results of experiments demonstrating that introduction of the D516G mutation into the rpoB gene confers resistance to the antibiotic rifampicin, and thus serves as a readout of genome modification. Targeting the rpoB gene in E. coli with an engineered DarT variant fused to a Cas9 nickase (ScnCas9) and providing a linear single-stranded DNA repair template resulted in genome modification at rpoB. Targeting of the gap editor complex to rpoB results in a 100 to 6,000-fold increase in genome modification rates, demonstrating the effect of the gap editors.

FIG. 17 includes representative chromatograms of the RNA-templated mutations in the rpoB gene introduced by the targeting of a gap editor complex to the rpoB gene, expression of the RNA repair template, and expression of the reverse transcriptase Ec86. Mutations include the AC>GT mutation required for D516G mediated rifampicin resistance.

FIG. 18 includes an image of a consensus sequence for a DarT catalytic domain (SEQ ID NO: 18) of the DNA-modifying domains of the gap editor complexes of the present disclosure.

FIG. 19 includes an image of a consensus sequence for a DarT catalytic domain (SEQ ID NO: 19) of the DNA-modifying domains of the gap editor complexes of the present disclosure.

FIG. 20 includes an image of a consensus sequence for a DarT catalytic domain (SEQ ID NO: 20) of the DNA-modifying domains of the gap editor complexes of the present disclosure.

FIG. 21 includes an image of a consensus sequence for a DarT catalytic domain (SEQ ID NO: 21) of the DNA-modifying domains of the gap editor complexes of the present disclosure.

FIG. 22 includes an image of a consensus sequence for a Scabin catalytic domain (SEQ ID NO: 22) of the DNA-modifying domains of the gap editor complexes of the present disclosure.

FIG. 23 includes an image of a consensus sequence for a Scabin catalytic domain (SEQ ID NO: 23) of the DNA-modifying domains of the gap editor complexes of the present disclosure.

FIG. 24 includes an image of a consensus sequence for a Scabin catalytic domain (SEQ ID NO: 24) of the DNA-modifying domains of the gap editor complexes of the present disclosure.

FIG. 25 includes an image of a consensus sequence for a Mom catalytic domain (SEQ ID NO: 25) of the DNA-modifying domains of the gap editor complexes of the present disclosure.

FIG. 26 includes an image of a consensus sequence for a Mom catalytic domain (SEQ ID NO: 26) of the DNA-modifying domains of the gap editor complexes of the present disclosure.

FIG. 27 includes an image of a consensus sequence for a Mom catalytic domain (SEQ ID NO: 27) of the DNA-modifying domains of the gap editor complexes of the present disclosure.

DETAILED DESCRIPTION

Nucleotide modifications can take the form of functional modifications, such as DNA methylation at certain positions, or damaging modification (DNA lesions), such as cross-linking, oxidation, and nitrosylation. These DNA lesions need to be repaired to maintain information fidelity and DNA functionality. Commonly occurring lesions are directly repaired through base excision, mismatch, and nucleotide excision repair processes. However, if these lesions are not repaired before DNA replication, then they can become locked into the genome as mutated DNA or stifle cellular division altogether. To avoid this, replication-dependent repair processes have evolved. One such process, translesion synthesis, can directly bypass some DNA lesions; however, this can introduce DNA mutations across some DNA lesions. Alternatively, replicating the DNA near the lesion can be skipped altogether by re-priming synthesis downstream of the lesion. This re-priming can occur via a lagging strand primase, or in higher eukaryotes by the leading strand primase-polymerase, PRIMPOL. This re-priming action enables replication to continue but leaves an unreplicated region complementary to the DNA lesion and surrounding DNA. The cell still needs to determine the appropriate sequence complementary to the DNA lesion, and to do this, cells employ a mechanism called homology-dependent gap repair (a subset of homologous recombination).

Homology-dependent gap repair (HDGR) is a highly accurate repair process in which a sister chromatid is used as a template to copy DNA complementary to the lesion-containing strand. As a subset of homologous recombination, experiments were conducted, as described further herein, to investigate whether this pathway could be co-opted to instead use an ectopic repair template instead of (or in addition to) the sister chromatid, generating synthetic genomic edits. Previous results demonstrated that site-specific introduction of abasic DNA could trigger HDGR and be completed using a plasmid-borne DNA template for repair, generating accurately edited genomic DNA. However, in some cases, this approach can be somewhat dependent on the stability of the abasic site. For example, an abasic site can be stabilized through inhibition of a cell's AP endonuclease activity but AP endonuclease inhibition can negatively affect cell viability and genomic stability and may not be feasible for some applications. Therefore, as described further herein, an alternative class of DNA lesions was identified that are not as susceptible to base excision or similar repair processes. Embodiments of the present disclosure include a class of lesions involving the addition of chemical groups to DNA that block DNA replication (replication blocking moiety) and facilitate HDGR.

For example, experiments were conducted to investigate whether the addition of adenosine-diphosphate ribose (ADPr) might be a promising DNA lesion candidate and act as a replication blocking moiety. ADPr transferases, which catalyze ADPr addition to nucleotides, are cytotoxic. Therefore, methods were developed to limit ADPr activity to the R-loop exposed after CRISPR-Cas binding to the genome, in an effort to trigger HDGR without loss of cell viability. Extracted dsDNA binding ADPr-transferases were shown to be lethal when electroporated into eukaryotic cells. Separately, dsDNA binding DNA modifying enzymes have been fused to DNA binding proteins to localize their activity, but they retain high rates of off-target modification, which necessitates additional mitigating steps to control activity. Single-stranded DNA binding enzymes can have their activity localized to the DNA R-loop exposed after target binding by a Cas effector to the DNA.

Previous work has described a class of single-stranded binding ADPr-transferase enzymes, including DarT and the DarT mutant DarT_G49D, which acts as a bacterial toxin. DarT expression is lethal in E. coli, and seems to be primarily repaired through recombination, and more weakly, through nucleotide excision repair. Therefore, experiments were conducted to investigate whether DarT could be used to trigger site-specific HDGR templated not by the genome, but by a recombinant DNA sequence. Experiments sought to understand whether DarT could be sufficiently controlled to localize ADPr modification to the Cas target site, avoiding cytotoxicity and allowing for efficient genome modification.

Section headings as used in this section and the entire disclosure herein are merely for organizational purposes and are not intended to be limiting.

1. DEFINITIONS

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. In case of conflict, the present document, including definitions, will control. Preferred methods and materials are described below, although methods and materials similar or equivalent to those described herein can be used in practice or testing of the present disclosure. The phrase “in one embodiment” as used herein does not necessarily refer to the same embodiment, though it may. Furthermore, the phrase “in another embodiment” as used herein does not necessarily refer to a different embodiment, although it may. Thus, as described below, various embodiments of the invention may be readily combined, without departing from the scope or spirit of the invention. All publications, patent applications, patents and other references mentioned herein are incorporated by reference in their entirety. The materials, methods, and examples disclosed herein are illustrative only and not intended to be limiting.

The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms, or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “and” and “the” include plural references unless the context clearly dictates otherwise. The present disclosure also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not.

For the recitation of numeric ranges herein, each intervening number there between with the same degree of precision is explicitly contemplated. For example, for the range of 6-9, the numbers 7 and 8 are contemplated in addition to 6 and 9, and for the range 6.0-7.0, the number 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, and 7.0 are explicitly contemplated.

“Correlated to” as used herein refers to compared to.

As used herein, the term “nucleic acid molecule” refers to any nucleic acid containing molecule, including but not limited to, DNA or RNA. The term encompasses sequences that include any of the known base analogs of DNA and RNA including, but not limited to, 4-acetylcytosine, 8-hydroxy-N6-methyladenosine, aziridinylcytosine, pseudoisocytosine, 5-(carboxyhydroxylmethyl) uracil, 5-fluorouracil, 5-bromouracil, 5-carboxymethylaminomethyl-2-thiouracil, 5-carboxymethylaminomethyluracil, dihydrouracil, inosine, N6-isopentenyladenine, 1-methyladenine, 1-methylpseudouracil, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-methyladenine, 7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine, 5′-methoxyc arbonylmethyluracil, 5-methoxyuracil, 2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid, oxybutoxosine, pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, N-uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid, pseudouracil, queosine, 2-thiocytosine, and 2,6-diaminopurine.

The term “gene” refers to a nucleic acid (e.g., DNA) sequence that comprises coding sequences for the production of a polypeptide, precursor, or RNA (e.g., rRNA, tRNA, sRNA, microRNA, lincRNA). The polypeptide can be encoded by a full-length coding sequence or by any portion of the coding sequence so long as the desired activity or functional properties (e.g., enzymatic activity, ligand binding, signal transduction, immunogenicity, etc.) of the full-length or fragment are retained. The term also encompasses the coding region of a structural gene and the sequences located adjacent to the coding region on both the 5′ and 3′ ends for a distance of about 1 kb or more on either end such that the gene corresponds to the length of the full-length mRNA. Sequences located 5′ of the coding region and present on the mRNA are referred to as 5′ non-translated sequences. Sequences located 3′ or downstream of the coding region and present on the mRNA are referred to as 3′ non-translated sequences. The term “gene” encompasses both cDNA and genomic forms of a gene. A genomic form or clone of a gene contains the coding region interrupted with non-coding sequences termed “introns” or “intervening regions” or “intervening sequences.” Introns are segments of a gene that are transcribed into nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or “spliced out” from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide.

As used herein, the term “heterologous gene” refers to a gene that is not in its natural environment. For example, a heterologous gene includes a gene from one species introduced into another species. A heterologous gene also includes a gene native to an organism that has been altered in some way (e.g., mutated, added in multiple copies, linked to non-native regulatory sequences, etc.). Heterologous genes are distinguished from endogenous genes in that the heterologous gene sequences are typically joined to DNA sequences that are not found naturally associated with the gene sequences in the chromosome or are associated with portions of the chromosome not found in nature (e.g., genes expressed in loci where the gene is not normally expressed).

As used herein, the term “oligonucleotide,” refers to a short length of single-stranded polynucleotide chain. Oligonucleotides are typically less than about 300 residues long (e.g., between 15 and 100), however, as used herein, the term is also intended to encompass longer polynucleotide chains. Oligonucleotides are often referred to by their length. For example, a 24-residue oligonucleotide is referred to as a “24-mer.” Oligonucleotides can form secondary and tertiary structures by self-hybridizing or by hybridizing to other polynucleotides. Such structures can include, but are not limited to, duplexes, hairpins, cruciforms, bends, and triplexes.

The term “homology” and “homologous” refers to a degree of identity. There may be partial homology or complete homology. A partially homologous sequence is one that is less than 100% identical to another sequence.

As used herein, the terms “complementary” or “complementarity” are used in reference to polynucleotides (e.g., a sequence of nucleotides such as an oligonucleotide or a target nucleic acid) related by the base-pairing rules. For example, for the sequence “5′-A-G-T-3′” is complementary to the sequence “3′-T-C-A-5′.” Complementarity may be “partial,” in which only some of the nucleic acids' bases are matched according to the base pairing rules. Or, there may be “complete” or “total” complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. This is of particular importance in amplification reactions, as well as detection methods that depend upon binding between nucleic acids. Either term may also be used in reference to individual nucleotides, especially within the context of polynucleotides. For example, a particular nucleotide within an oligonucleotide may be noted for its complementarity, or lack thereof, to a nucleotide within another nucleic acid strand, in contrast or comparison to the complementarity between the rest of the oligonucleotide and the nucleic acid strand.

In some contexts, the term “complementarity” and related terms (e.g., “complementary”, “complement”) refers to the nucleotides of a nucleic acid sequence that can bind to another nucleic acid sequence through hydrogen bonds, e.g., nucleotides that are capable of base pairing, e.g., by Watson-Crick base pairing or other base pairing. Nucleotides that can form base pairs, e.g., that are complementary to one another, are the pairs: cytosine and guanine, thymine and adenine, adenine and uracil, and guanine and uracil. The percentage complementarity need not be calculated over the entire length of a nucleic acid sequence. The percentage of complementarity may be limited to a specific region of which the nucleic acid sequences that are base-paired, e.g., starting from a first base-paired nucleotide and ending at a last base-paired nucleotide. The complement of a nucleic acid sequence as used herein refers to an oligonucleotide which, when aligned with the nucleic acid sequence such that the 5′ end of one sequence is paired with the 3′ end of the other, is in “antiparallel association.” Certain bases not commonly found in natural nucleic acids may be included in the nucleic acids of the present invention and include, for example, inosine and 7-deazaguanine. Complementarity need not be perfect; stable duplexes may contain mismatched base pairs or unmatched bases. Those skilled in the art of nucleic acid technology can determine duplex stability empirically considering a number of variables including, for example, the length of the oligonucleotide, base composition and sequence of the oligonucleotide, ionic strength and incidence of mismatched base pairs.

Thus, in some embodiments, “complementary” refers to a first nucleobase sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identical to the complement of a second nucleobase sequence over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, or more nucleobases, or that the two sequences hybridize under stringent hybridization conditions. “Fully complementary” means each nucleobase of a first nucleic acid is capable of pairing with each nucleobase at a corresponding position in a second nucleic acid. For example, in certain embodiments, an oligonucleotide wherein each nucleobase has complementarity to a nucleic acid has a nucleobase sequence that is identical to the complement of the nucleic acid over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, or more nucleobases.

As used herein, a “double-stranded nucleic acid” may be a portion of a nucleic acid, a region of a longer nucleic acid, or an entire nucleic acid. A “double-stranded nucleic acid” may be, e.g., without limitation, a double-stranded DNA, a double-stranded RNA, a double-stranded DNA/RNA hybrid, etc. A single-stranded nucleic acid having secondary structure (e.g., base-paired secondary structure) and/or higher order structure comprises a “double-stranded nucleic acid”. For example, triplex structures are considered to be “double-stranded”. In some embodiments, any base-paired nucleic acid is a “double-stranded nucleic acid”

The term “isolated” when used in relation to a nucleic acid, as in “an isolated oligonucleotide” or “isolated polynucleotide” refers to a nucleic acid sequence that is identified and separated from at least one component or contaminant with which it is ordinarily associated in its natural source. Isolated nucleic acid is such present in a form or setting that is different from that in which it is found in nature. In contrast, non-isolated nucleic acids as nucleic acids such as DNA and RNA found in the state they exist in nature. For example, a given DNA sequence (e.g., a gene) is found on the host cell chromosome in proximity to neighboring genes; RNA sequences, such as a specific mRNA sequence encoding a specific protein, are found in the cell as a mixture with numerous other mRNAs that encode a multitude of proteins. However, isolated nucleic acid encoding a given protein includes, by way of example, such nucleic acid in cells ordinarily expressing the given protein where the nucleic acid is in a chromosomal location different from that of natural cells, or is otherwise flanked by a different nucleic acid sequence than that found in nature. The isolated nucleic acid, oligonucleotide, or polynucleotide may be present in single-stranded or double-stranded form. When an isolated nucleic acid, oligonucleotide or polynucleotide is to be utilized to express a protein, the oligonucleotide or polynucleotide will contain at a minimum the sense or coding strand (i.e., the oligonucleotide or polynucleotide may be single-stranded), but may contain both the sense and anti-sense strands (i.e., the oligonucleotide or polynucleotide may be double-stranded).

As used herein, the term “purified” or “to purify” refers to the removal of components (e.g., contaminants) from a sample. For example, antibodies are purified by removal of contaminating non-immunoglobulin proteins; they are also purified by the removal of immunoglobulin that does not bind to the target molecule. The removal of non-immunoglobulin proteins and/or the removal of immunoglobulins that do not bind to the target molecule results in an increase in the percent of target-reactive immunoglobulins in the sample. In another example, recombinant polypeptides are expressed in bacterial host cells and the polypeptides are purified by the removal of host cell proteins; the percent of recombinant polypeptides is thereby increased in the sample.

Preferred methods and materials are described below, although methods and materials similar or equivalent to those described herein can be used in practice or testing of the present disclosure. All publications, patent applications, patents and other references mentioned herein are incorporated by reference in their entirety. The materials, methods, and examples disclosed herein are illustrative only and not intended to be limiting.

2. GAP EDITORS

CRISPR-based genome editing tools have found widespread application, relying on their easily programmable targeting and robust activity. Early use of these CRISPR-based tools has focused on the ability of Cas nucleases to cleave DNA. In the process of repairing the cleaved DNA, a genomic edit is introduced. DNA cleavage is, however, among the most toxic events a cell can endure. DNA cleavage sets off cellular alarm systems which lead to mutations, DNA rearrangements, or loss of cellular viability. Subsequent CRISPR-Cas genome editing tools have sought to minimize these toxic effects by instead introducing single-stranded nicks or directly modifying DNA via an enzyme. Still, these newer methods exhibit a limited range of edits that can be introduced and can suffer from undesired insertions, deletions, and mutations.

Embodiments of the present disclosure demonstrate that efficient non-toxic genome modification can be performed through the introduction and repair of single-stranded DNA gaps. Previous work has demonstrated that site-specific introduction of abasic sites into DNA drives homology-dependent gap recombination. By introducing an ectopic DNA repair template, genome modification can be achieved at DNA sequences adjacent to the introduced abasic site. However, in some cases, this approach can be dependent on the stabilization of the abasic sites. Therefore, embodiments of the present disclosure include the development of a system to induce homology-dependent gap repair with the addition of stable chemical groups onto DNA. This modified DNA is not recognized or repaired by cellular glycosylases, which increases lesion stability, and drives homology-dependent gap repair. Site specific DNA targeting is achieved by fusion of the modification enzyme to a Cas effector, and in some cases, the rate of genome modification can be increased using a Cas effector to nick the target DNA strand. As described further herein, the combination of nicking and DNA modification can have synergistic effects on genome modification because they mutually abrogate sister chromatid repair.

As would be recognized by one of ordinary skill in the art, the original and most widely used CRISPR-Cas genome editing technology relies on Cas nucleases introducing a double strand break which is then repaired through homologous recombination via an editing template, similar to gap editors. While broadly applied, the toxicity of double-stranded breaks and their tendency to drive mutations or chromosomal rearrangements is a consistent challenge for therapeutic applications. These DNA breaks are highly toxic (particularly in bacteria) and often lead to error prone repair via non-homologous end joining pathways. Cleave and repair is potentially the best known way to insert large segments of DNA, which is important for many scientific and industrial applications.

Additionally, base editors can be used in an effort to avoid toxicity by enzymatically converting nucleotides from one to another. For example, cytosine can be converted to thymine and adenine can be converted to guanine. However, these base editors can only change one or a few nucleotides at a time, and they have to be carefully targeted to avoid undesired editing. Furthermore, base editors are mutagenic, meaning that untargeted nucleotides are more likely to be incorrectly replicated while the base editors are being used. Base editors are also constrained by the availability of target sequences. Compared to other techniques, base editors are relatively efficient and only rely on nicking a single strand of DNA, as opposed to cutting both strands.

Prime editors have only recently been described. Based on recent publications, it seems that prime editors are relatively efficient, and they have a major advantage in that they use a very small repair template which is encoded on the backbone of the Cas9 single guide RNA. While touted as a double-strand break-free technique, efficient prime editing still involves nicking both strands of DNA in relatively close (<200 bp) proximity This dual nicking is only moderately less toxic than the cleave-and-repair approach. Error-prone insertions and deletions still occur in mammalian cells as a result of dual nicking. It is unclear to what degree prime editors will function in prokaryotes. It also is unclear whether any mutagenic side effects might occur in their application, though their CRISPR-dependent off-target activity is muted.

As compared to other techniques, gap editors have the least amount of data pertaining to their use. Regardless, gap editors seem to have minimal toxic effects, as described further herein; and some experiments show no detectable toxicity. The lack of toxicity may be especially advantageous for therapeutic applications, as low toxicity typically indicates a low rate of undesired mutations, DNA insertions, or DNA rearrangements. Also, multiplex engineering is commonly hampered by toxicity (particularly in bacteria). For in vivo therapeutics, gap editors would likely suffer from the same DNA and protein delivery issues as all of the other CRISPR-Cas methods, although there are newer delivery platforms that allow co-delivery of RNPs with repair templates.

Embodiments of the present disclosure include compositions, systems, kits, and methods for targeted modification of a nucleic acid in a genome. In accordance with these embodiments, the present disclosure provides gap editors and gap editor complexes that generally include a DNA-recognition domain and a DNA-modifying domain. As described further in the Examples provided herein, gap editors and gap editor complexes facilitate programmable DNA targeting with a DNA-recognition domain that is functionally coupled to a DNA-modifying domain to drive genome modification via homology-directed gap repair. In some embodiments, the DNA-recognition domain binds a DNA target sequence in the genome, and the DNA-modifying domain induces formation of a replication blocking moiety on at least one nucleotide in the genome. Targeting of gap editors in a specific orientation generates persistent DNA gaps, thereby improving gap editor efficiency.

In some embodiments, the DNA-recognition domain and the DNA-modifying domain are functionally coupled. Functionally coupled includes any means for integrating the DNA-recognition domain and the DNA-modifying domain at a specific target site for the purposes of functioning as genome editors. In some embodiments, “functionally coupled,” includes but is not limited to polypeptide fusions, peptide tags, peptide linkers, RNA tags, and any combinations thereof. For example, a gap editor or gap editor complex can include a DNA-recognition domain that is fused to a DNA-modifying domain (e.g., a fusion polypeptide). The DNA-recognition domain of the gap editor fusion protein recognizes a specific site (e.g., nucleic acid sequence in a genome) in a target nucleic acid, and the DNA-modifying domain is then capable of modifying one or more nucleic acids in or around the target site to facilitate genome modification.

As would be recognized by one of ordinary skill in the art based on the present disclosure, the gap editor complexes described herein can be used to modify any part of a genome of an organism or cell. For example, the gap editor complexes of the present disclosure can be used to target a specific site in a genome to generate a desired site-specific modification, and/or the gap editor complexes of the present disclosure can be used to target one or more specific sites in a genome to generate a modification that results in the addition, exchange, and/or removal of a portion of the genome. Additionally, the gap editor complexes of the present disclosure can be used to target any region of a gene, including but not limited to, an open reading frame, an intron, an exon, an intron-exon boundary, a functional non-coding region, and any upstream and/or downstream DNA/gene regulatory sequences. The terms “DNA/gene regulatory sequences,” “control elements,” and “regulatory elements,” used interchangeably herein, refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate transcription of a non-coding sequence or a coding sequence and/or regulate translation of an encoded polypeptide. Thus, the gap editor complexes of the present disclosure can be used to generate modifications in the genome that result in altered gene expression patterns and/or activity (e.g., upregulation or downregulation).

In some embodiments, the DNA-recognition domain and the DNA-modifying domain do not comprise a fusion polypeptide (e.g., do not form a single fusion polypeptide or protein). In some embodiments, the DNA-modifying domain is recruited to the gap editor or gap editor complex by the DNA-recognition domain. For example, the DNA-recognition domain of the gap editor can recruit the DNA-modifying domain via a protein-protein interaction. In some embodiments, this recruitment is facilitated by a tag or linker that serves to recruit and functionally couple the DNA-modifying domain to the DNA-recognition domain at a specific site of a target nucleic acid. Other means for recruiting and functionally coupling the DNA-modifying domain to the DNA-recognition domain based on protein-protein interactions can also be used, including but not limited to, antigen-antibody interactions (e.g., the DNA-modifying domain fused to an antigen binding domain and the DNA-recognition domain fused to the corresponding antigen), protein tags (e.g., a streptavidin-biotin interaction), a peptide and single chain variable antibody fragment, a split-protein system, or any ligand-receptor interaction. In other embodiments, the DNA-modification domain can be integrated into the DNA-recognition domain, such as, for example, by replacing the HNH domain of Cas9 with the DNA-modification domain, or inserting the DNA-modification domain into the PAM-interacting domain.

In other embodiments, the DNA-modifying domain is recruited to the gap editor or gap editor complex by an interaction with a nucleic acid. For example, a guide RNA molecule that interacts with the DNA-recognition domain to bind a site in a target nucleic acid can include a sequence and/or structure that binds the DNA-modifying domain (e.g., a scaffold domain) In some embodiments, the sequence and/or structure on the guide RNA includes domains that are recognized by RNA binding proteins. In some embodiments, the -modifying domain is fused to an RNA-binding protein that is recruited to the gap editor or gap editor complex via binding to the domain on the guide RNA. Other means for recruiting and functionally coupling the DNA-modifying domain to the DNA-recognition domain based on RNA-binding interactions can also be used. In some embodiments, the guide RNA is extended to encode an RNA aptamer that recognizes different proteins or protein domains, such as the MS2 coat protein, Tat, or Rev. The recognized protein or protein domain is then fused to the DNA-modifying domain. The guide RNA can encode multiple copies of the same protein-binding domain or different protein-binding domains. These protein-binding domains can be incorporated into different parts of the gRNA, such as through the loop of the gRNA or sgRNA or at the 3′ end of the sgRNA.

As described further herein, the gap editor complexes of the present disclosure can be used to generate various modifications in the genome of an organism or cell, such as through the mechanism of homology directed repair. In some embodiments, genome modifications using the gap editors of the present disclosure can generate specific nucleotide modifications ranging from a single nucleotide change to large insertions or deletions. In some embodiments, the gap editor complexes of the present disclosure can be used to add or remove large sequences of DNA through the use of more than one guide RNA sequence to target distinct sites in the genome (e.g., generate large genomic deletions by removing the sequence between two gRNA target sites and/or inserting an exogenous DNA sequence). In some embodiments, multiple gRNAs can be used to target multiple sites in a genome to generate any number of desired modifications in a genome (e.g., multiplexing). As would be recognized by one of ordinary skill in the art based on the present disclosure, any type of genetic modification can be achieved using the gap editor complexes of the present disclosure in any cell type and/or organism, regardless of how the gap editor complexes are delivered to the cell (e.g., transformation), including in vitro, ex vivo, or in vivo methods of delivery. A general discussion of these methods can be found in Ausubel, et al., Short Protocols in Molecular Biology, 3rd ed., Wiley & Sons, 1995.

DNA-Recognition Domains. In accordance with these embodiments, the DNA-recognition domains of the gap editors or gap editor complexes of the present disclosure include use of a sequence-specific nucleic acid binding component (e.g., molecule, biomolecule, or complex of one or more molecules and/or biomolecules) to target a specific nucleic acid target site). In some embodiments, the DNA-recognition domain includes at least one Cas protein or fragment thereof lacking nuclease or deoxyribonuclease activity. In some embodiments, the DNA-recognition domain comprises a complex of Cas proteins lacking nuclease or deoxyribonuclease activity. In some embodiments, the DNA-recognition domain includes at least one Cas protein or a complex of Cas proteins that exhibit nickase activity, including but not limited to, a Cas9 or a Cas12a with nickase activity.

In some embodiments, the Cas protein or Cas protein complex comprises a Type I Cascade, a Type II Cas9, a Type IV effector module, a Type V Cas12, a Cas9-related IscB, a Cas9-related TnpB, and combinations thereof. Cascade is a set of Cas proteins that form a stable complex in different proportions with the guide RNA. The gRNA is normally encoded within a CRISPR array, where the Cas6 protein of the complex cleaves a hairpin in the transcribed repeat. The other proteins then form around the freed RNA. The fully-formed complex binds target DNA flanked by a protospacer-adjacent motif (PAM) encoded on the 5′ end of the non-target strand. Upon target recognition, the complex then recruits the Type I endonuclease Cas3 to nick and processively degrade the non-target strand in the 3′-to-5′ direction, although the complex will stably bind target DNA in the absence of Cas3. The specific number and stoichiometry of the proteins in Cascade varies between CRISPR-Cas sub-types, such as Cas8c(1):Cas5c(1):Cas7(7) for the I-C sub-type and Cse1(1):Cse2(2):Cas5e(1):Cas7(6):Cas6e(1) for the I-E sub-type. Furthermore, these proteins can be fused to recapitulate the complex with fewer expressed polypeptides, and the Cas6 protein is dispensable if the guide RNA is expressed as a processed CRISPR RNA. Varying the length of the guide sequence within the gRNA can further alter the protein stoichiometry of Cascade and can change the length of the R-loop and displaced DNA strand. Cas9 is a single-effector nuclease that binds target DNA with a PAM encoded on the 3′ end of the non-target strand. Bound DNA is then nicked on opposite strands through the HNH and RuvC domains of Cas9, resulting in a double-stranded break. The gRNA utilized by Cas9 is normally encoded with a CRISPR array, where a trans-activating crRNA (tracrRNA) pairs with the transcribed repeat, and the RNA duplex is cleaved by the endoribonuclease RNase III. The resulting processed crRNA:tracrRNA duplex is bound by Cas9 and directs DNA targeting. The crRNA:tracrRNA duplex can be fused to form a single guide RNA (sgRNA). Cas12 represents a diverse family of Cas nucleases designated by their sub-type (e.g. Cas12a, Cas12e) and have been given alternative names such as Cpf1, C2c1, CasX, or Cas14a. Cas12 nucleases target DNA with a PAM encoded on the 5′ end of the non-target strand, with the nuclease's RuvC domain nicking the both the target and non-target stranded to create a staggered double-stranded break with a 5′ overhang. The gRNA is encoded within a CRISPR array and can be processed from the transcribed CRISPR array through one of two mechanisms depending on the nuclease: cleavage of a hairpin within the repeat by a riboendonucleolytic domain with the Cas12 nuclease (e.g. Cas12a), or pairing of the transcribed repeat with a tracrRNA that is subsequently cleaved by RNase III. As a result, the gRNA can be readily expressed in its processed form when the nuclease alone is responsible for crRNA processing, the gRNA can be expressed as an sgRNA when a tracrRNA is involved in crRNA processing.

In some embodiments, the DNA-recognition domain comprises a deoxyribonuclease-inactivated Cas9 (“dCas9”), which can be generated by introducing deactivating mutations within the HNH domain and the RuvC domain of the protein. In some embodiments, the DNA-recognition domain comprises a deoxyribonuclease-inactivated Cas12a (“dCas12a”), which can be generated by introducing deactivating mutations within at least one of the RuvC domains, such as RuvC-I. Alternatively, a guide RNA that is truncated on the PAM-distal end or contains mismatches with the target can allow DNA binding but not DNA nicking or cleavage by an otherwise catalytically active Cas nuclease.

In some embodiments, various other DNA-recognition domains can also be used in the gap editor complexes of the present disclosure. For example, certain embodiments of the compositions and methods described herein do not require guide RNAs to effectuate efficient genome editing and modification. As described above, these gap editor complexes include, but are not limited to, meganucleases, zinc-fingers (ZFs), and transcription activator-like effectors (TALEs). In some embodiments, the DNA-recognition domains of the present disclosure can include a meganuclease. Meganucleases can be used to replace, eliminate or modify sequences in a targeted manner and their recognition target sequence can be altered through protein engineering. Meganucleases can be used to modify all genome types, whether bacterial, plant or animal, and they are amendable to in vivo delivery due to their relatively small sizes. The high degree of target specificity of meganucleases allows for a concomitantly high degree of precision and much lower cell toxicity. However, targeting novel sequences is challenging due to the limited number of the meganuclease available.

In some embodiments, the DNA-recognition domains of the present disclosure can include zinc-fingers (ZFs). ZFs are fusions of the nonspecific DNA cleavage domain from the restriction endonuclease with zinc-finger proteins. ZFNs can target specific DNA sequences and this allows the ZFN to address and accurately change unique sequences inside a target organisms. A single zinc-finger is made up of around 30 amino acids in a conserved ββα figure. Some amino acids on the surface of the α-helix usually select three base pairs within the DNA smooth groove. Zinc-finger proteins have become an important framework for the design of custom DNA-binding proteins, as the development of unnatural arrays with more than three domains have become available, along with the development of a highly-conserved linker sequence that allows synthetic zinc-finger proteins, which recognize DNA sequences 9 to 18 bps in length.

In some embodiments, the DNA-recognition domains of the present disclosure can include transcription activator-like effectors (TALEs). TALES are very versatile and can be combined with numerous effector domains to affect genomic structure and function, including nucleases, transcriptional activators and repressors, recombinases, transposases, DNA and histone methyltransferases, and histone acetyltransferases. TALENs are transcription activator-like effector nucleases which are fusions of the Fokl cleavage domain and DNA-binding domains. TALEs are naturally occurring proteins from bacteria with genus Xanthomonas and contain DNA-binding domains made up of a series of 33-35 amino acid repeat domains that each recognize a single base pair. TALE specificity is determined by two hypervariable amino acids that are known as repeat-variable di-residues (RVDs). Numerous effector domains have been made available to fuse to TALE repeats for targeted genetic modifications, including nucleases, transcriptional activators, and site-specific recombinases. While the single base recognition of TALE-DNA binding repeats affords greater design flexibility than triplet-confined zinc-fingers, the cloning of repeat TALE arrays presents an elevated technical challenge due to extensive identical repeat sequences.

DNA-Modifying Domains. In some embodiments, the DNA-modifying domain catalyzes the formation or addition of at least one replication blocking moiety to at least one nucleotide in the DNA target sequence. In some embodiments, the DNA-modifying domain blocks DNA replication by adding the replication blocking moiety to at least one nucleotide in the DNA strand complementary to the DNA target sequence. In some embodiments, the DNA-modifying domain blocks DNA replication by adding the replication blocking moiety to at least one nucleotide in the DNA strand containing the DNA target sequence. In some embodiments, the DNA-modifying domain blocks DNA replication by adding the replication blocking moiety to both a nucleotide in the DNA strand complementary to the DNA target sequence and a nucleotide in the DNA strand containing the DNA target sequence.

In some embodiments, the DNA-recognition domain induces a single-stranded break in the DNA target strand (via nickase activity), and the DNA-modifying domain adds the replication blocking moiety to at least one nucleotide in the DNA strand complementary to the DNA target sequence. In some embodiments, the DNA-modifying domain catalyzes addition of ADP ribose to a thymine or guanine nucleotide. In some embodiments, the DNA-modifying domain comprises a DarT enzyme or a functional fragment, derivative, or variant thereof. In some embodiments, the DarT enzyme has been engineered to have reduced DNA binding, increased specificity to single-stranded DNA, and/or decreased enzymatic activity. DarT homologs (and any fragments, derivatives, or variants thereof) that can be used in the various embodiments disclosed herein include, but are not limited to, those provided in Table 1 below. In some embodiments, the DNA-modifying domain comprises a Scabin enzyme or a functional fragment, derivative, or variant thereof. In some embodiments, the Scabin enzyme has been engineered to have reduced DNA binding, increased specificity to single-stranded DNA, and/or decreased enzymatic activity. Scabin homologs (and any fragments, derivatives, or variants thereof) that can be used in the various embodiments disclosed herein include, but are not limited to, those provided in Table 1 below. In some embodiments, the Mom enzyme has been engineered to have reduced DNA binding, increased specificity to single-stranded DNA, and/or decreased enzymatic activity. Mom homologs (and any fragments, derivatives, or variants thereof) that can be used in the various embodiments disclosed herein include, but are not limited to, those provided in Table 1 below.

TABLE 1

DarT homologs and their corresponding

UniProt reference numbers.

DarT Homologs
Scabin Homologs
Mom Homologs

UniProt Ref. No.
UniProt Ref. No.
UniProt Ref. No.

A0A3Y1AXM4
P06018
A0A7G7C6V3

A0A0M9E739
P08794
A0A6G3TAN8

A0A6H3DQB7
A0A0A6ZQD1
A0A4Q4DBR5

A0A2D5FEV0
A0A747H2I6
A0A7K2MJA2

A0A009QG24
F3WIW6
A0A1I5DGQ6

A0A1Y1QH60
A0A5Y2Q823
A0A0N1NCQ4

A0A1H2WEE3
A0A5T7EP05
A0A117EGR9

A0A365SDE9
A0A5X5CI68
A0A7K3F6T9

A0A2T2YIK3
A0A736I828
A0A7K3QWB6

U7P928
Q32F84
A0A4Z1DI83

A0A0B7IUM8
Q53980
A0A3N6FY95

A0A1C4E3X9
A0A0A6ZUU6
A0A7K2GZ37

UPI0009FFBBAF
A0A090NAC5
A0A1X1N6K7

UPI0011835755
A0A734N076
A0A286EGA2

UPI000A066936
A0A5Z9VNA9
A0A1H1REA6

G7TGB0
A0A0E1SZ91
L8PML2

A0A109CYV8
A0A718VE50
A0A401MBD2

A0A1J1EN49
A0A3V2P1F8
A0A505DEP0

A0A6N8HLA1
F4ST91
A0A5C4V5D6

A0A0F9A3N8
A0A0L1BX31
A0A6G2X7S2

A0A0F9ID55
A0A6N8K5P2
A0A231PCB5

UPI00146D40AF
A0A2X2IFR7
A0A117RXM5

UPI0015EC5998
Q32I99
A0A854W491

X0U0F3
A0A398TE36
A0A7K2M2S6

A0A1F2WQI4
A0A366YZA8
A0A845VQ73

A0A4Q9B657
A0A2X3K063
A0A444QU29

A0A1A6KRV4
A0A6C9HIT1
A0A126Y4C7

A0A2W0FJ31
F3WLY8
A0A3Q9KV10

UPI00131E585C
A0A4D9HQK3
A0A8B0F419

A0A521GSZ3
A0A7B2BKV1
A0A1B1MHN6

A0A3C0UL77
A0A659GZW5
A0A0M8WMD9

A0A128EDT6
A0A376P4X4
A0A3S9MED3

A0A0S4KU33
A0A829JC85
A0A7G1P3D5

A0A0K8QWE7
A0A8A5HYQ3
L7FDM7

A0A1I2BV64
A0A2Y0KN27
A0A7H0IBA3

A0A074JDH1
A0A6C8GMD6
A0A1V4ECW4

S6GJD4
A0A855SJL4
A0A7K2GG48

UPI0003A70E4B
A0A1X3JSV2
A0A6B3CTN6

A0A1G7QJ47
F3WRA7
A0A5J6EZ40

A0A1G7XXY4
A0A0L1BYZ7
A0A3N6F8E7

A0A077F777
A0A2X9WZ16
A0A2C8XEE2

A1WMK8
A0A5T6ITA7
A0A0M4DAA4

M5AN74
A0A5Z9MRI6
A0A7M3P2N8

A0A0X1T5G3
A0A774N8E0
A0A6B3QVN7

A0A2A9FUD7
A0A653FTS2
A0A6G4V177

UPI000BE34E2B
A0A7D7IKR8
A0A7D8B5M0

A0A021VVM8
A0A793PNZ0
A0A7Y6CBB1

UPI0009EEB1C1
A0A3Y6RE47
A0A542HUQ5

A0A212J8X1
A0A7U8TEQ3
A0A1Q5GYR2

A0A143XZK3
A0A7T2JHL6
A0A7K2JG06

A0A2D8CA1
A0A2X2K6P7
A0A0N1FX41

A0A2M6ZMD7
A0A828BG22
A0A1Q5KVP4

D4ZX17
A0A243UWN1
A0A421LHY3

A0A1V2YE96
A0A7D3UWA8
A0A1C4SR45

UPI0004795285
A0A7D3QJ09
A0A7H8P376

A0A2I1RLA3
A0A6I4LGA3
A0A4V2U6X2

A0A069DSZ4
A0A833L0X9
A0A2A3GZG2

A0A1B1TKQ4
A0A844VV27
D6K1C1

A0A1M5YS26
A0A2X3A730
A0A7H0HXY6

UPI001081FF81
A0A7D3UWP6
A0A7K2VU35

UPI00058ECA86
A0A7D3QJ52
A0A6I6RSN3

A0A439F9A2
A0A789M987
A0A6H1NCH2

A0A0K6IM62
A0A479J9Y1
A0A2N3K2V7

A0A3M1TMP6
A0A1X3J0Y0
A0A7K2ULE5

A0A4Z0LYH6
A0A6L7FCA8
V4I776

UPI000CEA333A
A0A398QB61
A0A5J6IH58

A0A0E9M297
E7STE3
A0A2Z5K877

A0A4R4QZG6
A0A4Z0T8W4
A0A3N4ZXP2

A0A5C4P404
A0A7G6K9Y2
A0A2P8A6J8

A0A2E5CCR5
A0A2Y4XYF1
A0A3R9UHD1

A0A0F9FER9
F3WJW5
A0A6B3DTW3

A0A6L6K3W2
F5NRV4
A0A7K3E8Z7

A0A2N0GBR2
A0A2S8JPX1
A0A5P8KCS9

A0A3D0ST31
B3X6Z6
A0A6G3W7K4

A0A086DYY8
A0A826W5G8
A0A7S7X9R1

UPI00138FF367
A0A656BX08
A0A5Q4TE11

UPI0009E9D184
A0A2T3SJ22
A0A2G7F715

A0A0Q4H114
A0A5E8GB30
A0A2P8PUY9

A0A1C6SGK0
F3WQG1
A0A7H8H741

A0A2W5HPA9
A0A376FNN0
A0A6I5D8I2

A0A2P8KB33
A0A3U8JEK9
A0A1I6W4M7

UPI0009C0D9CF
I6CWT9
A0A6A0BTB8

A0A4S5BBM9
A0A3P6KJV4
A0A1V9KFP9

A0A2G6E1H5
A0A3U5WED1
A0A4Q7Z2V3

A0A2V4F7G0
B3X4P5
A0A0T1UEA6

UPI000C6F263C
E7SSY4
A0A5N6A8S8

UPI0004B149FA
E0J798
A0A6G3ABW5

UPI000BF71297
A0A1X0YFM5
A0A0B5DFX2

A0A0S8HVY0
A0A854VRL6
A0A540PEE8

A0A081BFQ8
A0A379ZXH3
A0A2M9I3D9

A0A2T3K4E8
A0A6D0FK22
A0A086GVM1

UPI00140B28F9
A0A193LSI7
A0A250VCC4

A0A450ZNU6
A0A746IF37
A0A7K2WAZ7

A0A434FTJ1
A0A6X7AJ78
A0A7K2WPB2

UPI001575F606
A0A826N5K3
A0A6G9GX41

UPI00131CDEC9
A0A6D0FPQ2
A0A5R9FQN8

UPI000E34E22D

A0A380MTQ1

UPI001575232E

A0A2A3J625

A0A2V5QXN0

A0A1D8SUV6

A0A1H3GAX0

A0A1S2P573

A0A1G6MG07

A0A2A5E1Y0

A0A662P7C8

A0A6L7A0Y8

A0A1I2KC92

A0A5Q4HAE6

A0A0G3UZG3

A0A1V3SKR4

A0A0D5M555

UPI0003F90624

X0QNL7

UPI0009DA5757

UPI0002EF3C8F

A0A399YQF2

A0A2D3M0N6

A0A087MEL2

A0A1JSTVU6

UPI00143CD06E

A0A3G6X2L4

A0A369I9T2

UPI0015935B35

A0A699RGA3

A0A0Q8DZI6

A0A1T4V1K5

UPI00081C8979

A0A0F9B5C2

A0A6I7PSY2

UPI000C7E3428

UPI00066E6B23

A0A0K8QWM3

A0A1F7S2E1

UPI00106D6FED

A0A0N7A0X9

A0A3B0TNW4

A0A1B3LKQ8

A0A1V0QE61

UPI000A33B150

UPI00145C4C23

A0A654U036

UPI000BB413AC

A0A2J6NE32

A0A4P5X2M7

J1H157

A0A562Y4W9

A0A222SFK8

A0A3L7NYM4

A0A3B8NG16

UPI0014451E71

A0A398DRP6

A0A1H3ZRX1

U6H3Z0

A0A2E0XMC9

A0A3Q2ZTE2

A0A1Q5T734

J1Y9X6

A0A1X9SM09

A0A4U0XTT2

A0A151NT80

A0A2E6Y7V9

A0A0F9A8D5

A0A562XL28

UPI000A32FC88

UPI001295C460

A0A059ZR15

A0A2K1Z809

A0A4R4IBZ9

A0A193FXT9

A0A328V872

F9FTA7

A0A2A4PLD2

A0A6B1F5X5

A0A0N1D5X2

UPI00114F1E30

A0A6A4SK98

A0A416G6Z1

A0A2D8R8I3

A0A0F9S1T0

A0A2H3U3T0

A0A0J6SV50

A0A3M1HEV7

A0A1Q4RC56

A0A1H9ZTD0

M5XRC1

A0A4P8RI99

A0A287ISE0

A0A3M1HHN8

A0A1I8FRJ7

A0A1Q9P5U5

U2QX64

UPI000B773353

UPI0004140561

A0A0K2R4T0

A0A1Z4JP41

A0A2W6XRC8

A0A1B7W4E5

A0A367V7P0

A0A1U8LNE6

A0A165DJ89

A0A0U1M3L7

A0A109CYU7

A0A3C1G1M6

A0A6A6P153

A0A078K042

A0A0F9E1N9

A0A6L2M8A9

A0A384DPW3

UPI0006B07CD7

UPI0012B63E61

A0A679F6I9

M4EQE8

A0A2N2MUF5

A0A1I8J2P8

A0A699GHG3

A0A061RT73

A0A4Q5Z9M4

A0A0C3CY40

A0A562LHY2

A0A1H2WEE3

A0A1F9LMB0

A0A6B0VHE9

A0A1W9IKF6

A0A1J4WMX2

A0A4Q6DQE0

UPI00131D0A3D

A0A5Q0PIV9

UPI0014767B89

A0A0D9YA74

UPI0003C8CEDA

A0A4P7QDQ0

A0A1I3L2R8

A0A060SSG3

UPI0011DDD910

A0A2V9JXV7

A0A0D0ARU6

T1EWK1

A0A1G8HQU1

A0A1C6SGK0

A0A238YN77

A0A0C4ETD4

UPI0015A92654

A0A218WZU7

L9L887

A0A0T9QHP2

A0A1H4B661

A0A4D9EGJ1

UPI00145515B0

A0A1V2LC08

A0A6F9DHT9

A0A1E3NPN8

A0A1X6MJD8

As would be recognized by one of ordinary skill in the art based on the present disclosure, other DNA-modifying domains/enzymes can be used in the gap editors and gap editor complexes of the present disclosure to induce formation of a replication blocking moiety at a given target site. For example, in some embodiments, the DNA-modifying domain/enzyme can include, but is not limited to, any of the following enzymes (or functional fragments, derivatives, or variants thereof): Pierisin, Scabin, Cell cycle and apoptosis regulator 1 (CARP-1), SCO5461 protein (ScARP), adenine modification enzyme, acetyltransferase, amino acid transferase, nucleotidyl transferase, uridyltransferase, acyltransferase, ADP-ribsoyltransferase, methylthiotransferase, N-acetyl transferase 10, tRNA(Met) cytidine acetyltransferase (TmcA), tRNA cytidine acetyltransferase, GCN5-related N-acetyltransferase, lysidine synthase, m⁷G methyltransferase, N6C carbamoylmethyltransferase (Mom), N6-adenosine threonylcarbamoyltransferase, threonyl carbomyl transferase or threonyl carbomyl transferase complex, TsaB-TsaE-TsaD (TsaBDE) complex, tRNA N6-adenosine threonylcarbamoyltransferase (Qri7, Tcs4), methyltransferase, ATrm5a, tRNA:m¹G/imG2 methyltransferase, tRNA (adenosine(37)-N6)-dimethylallyltransferase, tRNA dimethylallyltransferase (MiaA), and isopentenyltransferase.

In some embodiments, the DNA-modifying domain used in the gap editor complexes of the present disclosure includes a catalytic domain (or a functional fragment, derivative, or variant thereof) that induces formation of a replication blocking moiety on at least one nucleotide in a genome. In some embodiments, the catalytic domain includes a portion of a DarT enzyme that is sufficient to carry out ADP-ribosylation of a target nucleic acid, as described further herein. In some embodiments, the catalytic domain includes a portion of a Scabin enzyme that is sufficient to carry out ADP-ribosylation of a target nucleic acid, as described further herein.

For example, the catalytic domain of the DNA-modifying domain that can be used in the gap editor complexes of the present disclosure includes, but is not limited to, any sequence having at least 70% amino acid identity with any of SEQ ID NOs: 18-21. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 75% amino acid sequence identity with SEQ ID NO: 18. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 80% amino acid sequence identity with SEQ ID NO: 18. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 85% amino acid sequence identity with SEQ ID NO: 18. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 90% amino acid sequence identity with SEQ ID NO: 18. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 91% amino acid sequence identity with SEQ ID NO: 18. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 92% amino acid sequence identity with SEQ ID NO: 18. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 93% amino acid sequence identity with SEQ ID NO: 18. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 94% amino acid sequence identity with SEQ ID NO: 18. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 95% amino acid sequence identity with SEQ ID NO: 18. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 96% amino acid sequence identity with SEQ ID NO: 18. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 97% amino acid sequence identity with SEQ ID NO: 18. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 98% amino acid sequence identity with SEQ ID NO: 18. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 99% amino acid sequence identity with SEQ ID NO: 18.

In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 75% amino acid sequence identity with SEQ ID NO: 19. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 80% amino acid sequence identity with SEQ ID NO: 19. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 85% amino acid sequence identity with SEQ ID NO: 19. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 90% amino acid sequence identity with SEQ ID NO: 19. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 91% amino acid sequence identity with SEQ ID NO: 19. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 92% amino acid sequence identity with SEQ ID NO: 19. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 93% amino acid sequence identity with SEQ ID NO: 19. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 94% amino acid sequence identity with SEQ ID NO: 19. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 95% amino acid sequence identity with SEQ ID NO: 19. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 96% amino acid sequence identity with SEQ ID NO: 19. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 97% amino acid sequence identity with SEQ ID NO: 19. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 98% amino acid sequence identity with SEQ ID NO: 19. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 99% amino acid sequence identity with SEQ ID NO: 19.

In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 75% amino acid sequence identity with SEQ ID NO: 20. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 80% amino acid sequence identity with SEQ ID NO: 20. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 85% amino acid sequence identity with SEQ ID NO: 20. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 90% amino acid sequence identity with SEQ ID NO: 20. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 91% amino acid sequence identity with SEQ ID NO: 20. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 92% amino acid sequence identity with SEQ ID NO: 20. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 93% amino acid sequence identity with SEQ ID NO: 20. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 94% amino acid sequence identity with SEQ ID NO: 20. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 95% amino acid sequence identity with SEQ ID NO: 20. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 96% amino acid sequence identity with SEQ ID NO: 20. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 97% amino acid sequence identity with SEQ ID NO: 20. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 98% amino acid sequence identity with SEQ ID NO: 20. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 99% amino acid sequence identity with SEQ ID NO: 20.

In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 75% amino acid sequence identity with SEQ ID NO: 21. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 80% amino acid sequence identity with SEQ ID NO: 21. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 85% amino acid sequence identity with SEQ ID NO: 21. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 90% amino acid sequence identity with SEQ ID NO: 21. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 91% amino acid sequence identity with SEQ ID NO: 21. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 92% amino acid sequence identity with SEQ ID NO: 21. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 93% amino acid sequence identity with SEQ ID NO: 21. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 94% amino acid sequence identity with SEQ ID NO: 21. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 95% amino acid sequence identity with SEQ ID NO: 21. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 96% amino acid sequence identity with SEQ ID NO: 21. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 97% amino acid sequence identity with SEQ ID NO: 21. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 98% amino acid sequence identity with SEQ ID NO: 21. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 99% amino acid sequence identity with SEQ ID NO: 21.

In some embodiments, the catalytic domain of the DNA-modifying domain that can be used in the gap editor complexes of the present disclosure includes, but is not limited to, any sequence having at least 70% amino acid identity with any of SEQ ID NOs: 22-24. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 75% amino acid sequence identity with SEQ ID NO: 22. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 80% amino acid sequence identity with SEQ ID NO: 22. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 85% amino acid sequence identity with SEQ ID NO: 22. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 90% amino acid sequence identity with SEQ ID NO: 22. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 91% amino acid sequence identity with SEQ ID NO: 22. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 92% amino acid sequence identity with SEQ ID NO: 22. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 93% amino acid sequence identity with SEQ ID NO: 22. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 94% amino acid sequence identity with SEQ ID NO: 22. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 95% amino acid sequence identity with SEQ ID NO: 22. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 96% amino acid sequence identity with SEQ ID NO: 22. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 97% amino acid sequence identity with SEQ ID NO: 22. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 98% amino acid sequence identity with SEQ ID NO: 22. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 99% amino acid sequence identity with SEQ ID NO: 22.

In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 75% amino acid sequence identity with SEQ ID NO: 23. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 80% amino acid sequence identity with SEQ ID NO: 23. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 85% amino acid sequence identity with SEQ ID NO: 23. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 90% amino acid sequence identity with SEQ ID NO: 23. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 91% amino acid sequence identity with SEQ ID NO: 23. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 92% amino acid sequence identity with SEQ ID NO: 23. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 93% amino acid sequence identity with SEQ ID NO: 23. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 94% amino acid sequence identity with SEQ ID NO: 23. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 95% amino acid sequence identity with SEQ ID NO: 23. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 96% amino acid sequence identity with SEQ ID NO: 23. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 97% amino acid sequence identity with SEQ ID NO: 23. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 98% amino acid sequence identity with SEQ ID NO: 23. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 99% amino acid sequence identity with SEQ ID NO: 23.

In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 75% amino acid sequence identity with SEQ ID NO: 24. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 80% amino acid sequence identity with SEQ ID NO: 24. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 85% amino acid sequence identity with SEQ ID NO: 24. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 90% amino acid sequence identity with SEQ ID NO: 24. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 91% amino acid sequence identity with SEQ ID NO: 24. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 92% amino acid sequence identity with SEQ ID NO: 24. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 93% amino acid sequence identity with SEQ ID NO: 24. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 94% amino acid sequence identity with SEQ ID NO: 24. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 95% amino acid sequence identity with SEQ ID NO: 24. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 96% amino acid sequence identity with SEQ ID NO: 24. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 97% amino acid sequence identity with SEQ ID NO: 24. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 98% amino acid sequence identity with SEQ ID NO: 24. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 99% amino acid sequence identity with SEQ ID NO: 24.

In some embodiments, the DNA-modifying domain used in the gap editor complexes of the present disclosure includes a catalytic domain (or a functional fragment, derivative, or variant thereof) of a Mom (also referred to as methylcarbamoyltransferase, methylcarbamoylase, or acetyltransferase). The catalytic domain can include the portion of a methylcarbamoylase enzyme that is sufficient to carry out methylcarbamoylation of adenine using acetyl CoA as a donor substrate transferred to a target nucleic acid, as described further herein. For example, the catalytic domain of a Mom that can be used as the DNA-modifying domain in the gap editor complexes of the present disclosure includes, but is not limited to, any sequence that has at least 70% amino acid identity with any of SEQ ID NOs: 25-27. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 75% amino acid sequence identity with SEQ ID NO: 25. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 80% amino acid sequence identity with SEQ ID NO: 25. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 85% amino acid sequence identity with SEQ ID NO: 25. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 90% amino acid sequence identity with SEQ ID NO: 25. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 91% amino acid sequence identity with SEQ ID NO: 25. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 92% amino acid sequence identity with SEQ ID NO: 25. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 93% amino acid sequence identity with SEQ ID NO: 25. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 94% amino acid sequence identity with SEQ ID NO: 25. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 95% amino acid sequence identity with SEQ ID NO: 25. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 96% amino acid sequence identity with SEQ ID NO: 25. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 97% amino acid sequence identity with SEQ ID NO: 25. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 98% amino acid sequence identity with SEQ ID NO: 25. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 99% amino acid sequence identity with SEQ ID NO: 25.

In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 75% amino acid sequence identity with SEQ ID NO: 26. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 80% amino acid sequence identity with SEQ ID NO: 26. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 85% amino acid sequence identity with SEQ ID NO: 26. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 90% amino acid sequence identity with SEQ ID NO: 26. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 91% amino acid sequence identity with SEQ ID NO: 26. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 92% amino acid sequence identity with SEQ ID NO: 26. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 93% amino acid sequence identity with SEQ ID NO: 26. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 94% amino acid sequence identity with SEQ ID NO: 26. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 95% amino acid sequence identity with SEQ ID NO: 26. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 96% amino acid sequence identity with SEQ ID NO: 26. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 97% amino acid sequence identity with SEQ ID NO: 26. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 98% amino acid sequence identity with SEQ ID NO: 26. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 99% amino acid sequence identity with SEQ ID NO: 26.

In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 75% amino acid sequence identity with SEQ ID NO: 27. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 80% amino acid sequence identity with SEQ ID NO: 27. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 85% amino acid sequence identity with SEQ ID NO: 27. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 90% amino acid sequence identity with SEQ ID NO: 27. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 91% amino acid sequence identity with SEQ ID NO: 27. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 92% amino acid sequence identity with SEQ ID NO: 27. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 93% amino acid sequence identity with SEQ ID NO: 27. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 94% amino acid sequence identity with SEQ ID NO: 27. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 95% amino acid sequence identity with SEQ ID NO: 27. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 96% amino acid sequence identity with SEQ ID NO: 27. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 97% amino acid sequence identity with SEQ ID NO: 27. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 98% amino acid sequence identity with SEQ ID NO: 27. In some embodiments, the DNA-modifying domain includes a catalytic domain having at least 99% amino acid sequence identity with SEQ ID NO: 27.

Replication Blocking Moieties. One of ordinary skill in the art would recognize, based on the present disclosure, that a replication blocking moiety can include, but is not limited to, glucose, threonyl carbamoyl adenosine, acetate, glyceryl, L-ascorbic acid, uridine, adenosine mono-phosphate, adenosine di-phosphate ribose, methylcarbamoyl, a lipid, an amino acid, agmatine, L-threonylcarbamoyladenylate, L-threonylcarbamoyl, methylthiolate, sulfur, a methyl group, S-adenosyl-L-methione or a subgroup of S-adenosyl-L-methione, and dimethylallyl diphosphate or a subgroup thereof. These and other replication blocking moieties have the general feature of being able to functionalize a nucleotide in a target sequence such that DNA replication is blocked and homology-directed gap repair is induced. This can occur by enzymatic means or by enzyme-independent means.

Guide RNA. Embodiments of the present disclosure also include gap editors and gap editor complexes that can include at least one guide RNA molecule. In accordance with these embodiments, the guide RNA molecule comprises a handle sequence and a targeting sequence. The targeting sequence interacts with a sequence in the target nucleic acid, and the handle sequence facilitates binding of the gap editor or gap editor complex. As would be recognized by one of ordinary skill in the art based on the present disclosure, a single chimeric guide RNA (sgRNA) can mimic the structure of an annealed crRNA/tracrRNA; this type of guide RNA has become more widely used than crRNA/tracrRNA because the gRNA approach provides a simplified system with only two components (e.g., the Cas9 and the sgRNA). Thus, sequence-specific binding to a nucleic acid target can be guided by a natural dual-RNA complex (e.g., comprising a crRNA, a tracrRNA, and Cas9) or a chimeric single-guide RNA (e.g., a sgRNA and Cas9). (see, e.g., Jinek et al. (2012) “A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity” Science 337:816-821). Multiple gRNAs can be further expressed using CRISPR arrays that naturally encode the crRNA utilized by the nucleases. The gRNAs can also be expressed separately by being operably linked to a promoter and terminator. The gRNAs can also be fused in a single transcript by including intervening RNA cleavages sites, such as ribozymes or sites recognized by RNA-cleaving enzymes such as RNase P, RNase Z, RNase III, or Csy4. The gRNAs or sgRNAs may include RNA templates for reverse transcription into cDNA repair templates. The sgRNAs may include aptamer sequences, for example, RNA-binding protein recognition sites so as to recruit accessory genome editing factors to the gap editor complex or gap editor target site.

As described further herein, genome modifications using the gap editors of the present disclosure can generate specific nucleotide modifications ranging from a single nucleotide change to large insertions or deletions. In some embodiments, the gap editor complexes of the present disclosure can be used to add, exchange, and/or remove large sequences of DNA through the use of more than one guide RNA sequence to target distinct sites in the genome. For example, large genomic deletions can be generated by removing the sequence between two gRNA target sites and/or inserting an exogenous DNA sequence (e.g., by virtue of the endogenous repair/recombination mechanisms in a cell or organism). In some embodiments, multiple gRNAs can be used to target multiple sites in a genome to generate any number of desired modifications in a genome (e.g., multiplexing).

In some embodiments, guide RNA molecules are not required in the gap editor complexes of the present disclosure. For example, certain embodiments of the compositions and methods described herein do not require guide RNAs to effectuate efficient genome editing and modification. As described above, these gap editor complexes include, but are not limited to, meganucleases, zinc-fingers (ZFs), and transcription activator-like effectors (TALEs).

Donor Template. In some embodiments, the presence of a donor nucleic acid template facilitates homology-directed gap recombination and/or repair, which includes the donor nucleic acid template or a fragment thereof being recombined into the double-stranded target DNA molecule. In some embodiments, the donor DNA template can serve as a replication template, resulting in the sequence encoded by the exogenous DNA or RNA being copied into the genome, but the exogenous DNA or RNA polynucleotide molecule itself is not directly transferred into the genome. The donor nucleic acid template can be single-stranded or double-stranded. In some embodiments, the donor template is a cDNA that has reversed transcribed from an endogenous, expressed, synthetic, or delivered RNA. The donor nucleic acid may be delivered into a cell as plasmid or linear DNA. A donor nucleic acid may also be generated in vivo from a template ribonucleic acid by a reverse transcriptase. In other embodiments, the donor nucleic acid may itself be a ribonucleic acid. The donor nucleic acid can also contain chemical modifications. The donor nucleic acid may include chemical modifications or sequences specifically recruited to the gap editor complex, or gap editor target site.

In some embodiments, the donor nucleic acid template comprises a polynucleotide from an endogenous homologous sequence corresponding to the DNA target sequence. In some embodiments, the donor nucleic acid template comprises a polynucleotide from an endogenous allele (e.g., to facilitate loss of heterozygosity). In some embodiments, the donor nucleic acid template comprise an exogenous single-stranded DNA (ssDNA) molecule or double-stranded DNA (dsDNA) molecule. In some embodiments, the presence of the donor nucleic acid template facilitates homology-directed gap repair and/or recombination, wherein the donor nucleic acid template or a fragment thereof is recombined into the genome of the DNA target sequence. In accordance with these embodiments, the gap editors of the present disclosure can be particularly advantageous for inserting large donor DNA sequences, replacing large segments of DNA, and/or removing large DNA sequences in a genome. In some embodiments, the gap editor complexes of the present disclosure can be used to add, exchange, and/or remove large sequences of DNA through the use of more than one guide RNA sequence to target distinct sites in the genome. For example, large genomic deletions can be generated by removing the sequence between two gRNA target sites and/or inserting an exogenous DNA sequence (e.g., by virtue of the endogenous repair/recombination mechanisms in a cell or organism). In some embodiments, multiple gRNAs can be used to target multiple sites in a genome to generate any number of desired modifications in a genome (e.g., multiplexing).

Accessory Factors. In some embodiments, the compositions and systems of the present disclosure further comprise a one gap editor accessory factor. In some embodiments, the composition further comprises at least one gap editor accessory factor. In some embodiments, the at least one gap editor accessory factor comprises a protein that augments at least one step in a genome modification process. In some embodiments, the at least one gap editor accessory factor is recruited to the gap editor complex via interaction with the DNA-modifying domain, the DNA-recognition domain, and/or the at least one guide RNA. In some embodiments, the recruitment of the at least one gap editor accessory factor to the gap editor complex comprises a peptide tag, a peptide linker, an RNA tag, and any combinations thereof. In some embodiments, the at least one gap editor accessory factor comprises Rap, DarG, Orf, ExoI, Exonuclease III, PrimPol, RecJ, RecQ1, Rad51, Rad52, CtIP, Rad18, and any combinations thereof. In some embodiments, and as described further herein, the present disclosure can include gap editor complexes in which the DNA-modifying domain comprises DarT. In accordance with these embodiments, DarG, TARG1, or another glycohydolase domain can be included as a gap editor accessory factor by modulating off-target editing (e.g., attenuating DarT activity) or removing the added ADPr after HDGR occurs.

As would be recognized by one of ordinary skill in the art based on the present disclosure, methods for delivering gap editors and gap editor complexes into a cell include any currently known methods and systems for delivering polynucleotides and/or polypeptides/proteins. For example, gap editors and gap editor complexes can be delivered using plasmid DNA, ssDNA, RNA, or other means for delivering polynucleotide molecules, including but not limited to, lipid-based delivery systems (e.g., using cationic lipids), conjugation from a donor cell, viral/bacteriophage-based delivery systems, and chemical-based systems (e.g., calcium phosphate precipitation, DEAE-dextran, polybrene). In some embodiments, the delivery system can include mechanical and/or electrical devices and methods for delivering the gap editors and gap editor complexes of the present disclosure as polynucleotides and/or as polypeptides/proteins (or any combinations thereof). In some embodiments, gap editors and gap editor complexes are delivered using a gene gun (e.g., bombardment and Agrobacterium transformation as used for plant cells), and electroporation-based methods, as well as any other physical methods (e.g., mechanical, electrical, thermal, optical, chemical stimulation, and the like) that use membrane disruption as a means for delivering polynucleotides and polypeptides/proteins (see, e.g., Sun et al., Recent advances in micro/nanoscale intracellular delivery, Nanotechnology and Precision Engineering 3, 18 (2020)).

3. KITS, SYSTEMS, AND METHODS

Embodiments of the present disclosure also include kits and systems for targeted modification of a nucleic acid. In accordance with these embodiments, the kit includes a gap editor complex comprising a DNA-recognition domain and a DNA-modifying domain. In some embodiments, the kit also includes at least one guide RNA molecule. In some embodiments, the DNA-recognition domain binds a DNA target sequence in the genome, and the DNA-modifying domain induces formation of a replication blocking moiety on at least one nucleotide in the genome. As would be recognized by one of ordinary skill based on the present disclosure, the kits and systems can also include one or more of the other components of the gene modification compositions described herein (e.g., gap editor accessory factors). In some embodiments of the kit, the composition further comprises a donor nucleic acid template. In some embodiments of the kit, the presence of the donor nucleic acid template facilitates homology-directed gap repair and/or recombination.

In some embodiments of the kit, the DNA-recognition domain comprises at least one Cas protein or fragment thereof lacking deoxyribonuclease activity. In some embodiments of the kit, the DNA-recognition domain comprises at least one Cas protein or fragment thereof having nickase activity. In some embodiments, the Cas protein or Cas protein complex comprises a Type I Cascade, a Type II Cas9, a Type IV effector module, a Type V Cas12, a Cas9-related IscB, a Cas9-related TnpB, and combinations thereof.

In some embodiments of the kit, the DNA-recognition domain and the DNA-modifying domain are functionally coupled. In some embodiments of the kit, the DNA-recognition domain induces a single-stranded break in the DNA target strand, and the DNA-modifying domain adds the replication blocking moiety to at least one nucleotide in the DNA strand complementary to the DNA target sequence. In some embodiments of the kit, the DNA-modifying domain catalyzes addition of ADP ribose to a thymine or guanine nucleotide. In some embodiments, the DNA-modifying domain comprises a DarT enzyme or a functional fragment, derivative, or variant thereof. In some embodiments of the kit, the DarT enzyme has been engineered to have reduced DNA binding, increased specificity to single-stranded DNA, and/or decreased enzymatic activity.

In some embodiments of the kit, the DNA-modifying domain catalyzes addition of a replication blocking moiety selected from the group consisting of: glucose, threonyl carbamoyl adenosine, acetate, glyceryl, L-ascorbic acid, uridine, adenosine mono-phosphate, a lipid, an amino acid, agmatine, L-threonylcarbamoyladenylate, L-threonylcarbamoyl, methylthiolate, sulfur, a methyl group, S-adenosyl-L-methione or a subgroup of S-adenosyl-L-methione, and dimethylallyl diphosphate or a subgroup thereof. In some embodiments of the kit, the DNA-modifying enzyme domain comprises an enzyme or functional fragment, derivative, or variant thereof, selected from the group consisting of: Pierisin, Scabin, Cell cycle and apoptosis regulator 1 (CARP-1), SCO5461 protein (ScARP), adenine modification enzyme, acetyltransferase, amino acid transferase, nucleotidyl transferase, uridyltransferase, acyltransferase, ADP-ribsoyltransferase, methylthiotransferase, N-acetyl transferase 10, tRNA(Met) cytidine acetyltransferase (TmcA), tRNA cytidine acetyltransferase, GCN5-related N-acetyltransferase, lysidine synthase, m⁷G methyltransferase, N6 carbamoylmethyltransferase (Mom), N6-adenosine threonylcarbamoyltransferase, threonyl carbomyl transferase or threonyl carbomyl transferase complex, TsaB-TsaE-TsaD (TsaBDE) complex, tRNA N6-adenosine threonylcarbamoyltransferase (Qri7, Tcs4), methyltransferase, ATrm5a, tRNA:m¹G/imG2 methyltransferase, tRNA (adenosine(37)-N6)-dimethylallyltransferase, tRNA dimethylallyltransferase (MiaA), and isopentenyltransferase.

In some embodiments of the kit, the at least one guide RNA comprises gRNA, sgRNA, crRNA, or any combinations thereof. In some embodiments of the kit, the at least one guide RNA comprises a handle sequence and a targeting sequence. In some embodiments of the kit, the targeting sequence in the at least one guide RNA is complementary to the DNA target sequence. In some embodiments, the gap editor complexes of the present disclosure can be used to add, exchange, and/or remove large sequences of DNA through the use of more than one guide RNA sequence to target distinct sites in the genome. For example, large genomic deletions can be generated by removing the sequence between two gRNA target sites and/or inserting an exogenous DNA sequence (e.g., by virtue of the endogenous repair/recombination mechanisms in a cell or organism). In some embodiments, multiple gRNAs can be used to target multiple sites in a genome to generate any number of desired modifications in a genome (e.g., multiplexing).

Embodiments of the present disclosure also include methods for targeted modification of a nucleic acid. In accordance with these embodiments, the methods include introducing any of the components of the genome modification compositions described herein, and assessing the cell for presence of a desired genetic alteration using techniques known in the art. In some embodiments of the method, the components include gap editors and gap editor complexes comprising a DNA-recognition domain and a DNA-modifying domain, at least one guide RNA molecule, and a donor nucleic acid template. In some embodiments, one or more gap editor accessory factors can also be included. One or more of these factors can be introduced into a cell or organism as a polypeptide(s), mRNA(s), and/or DNA expression construct(s), or any combination thereof, by means known in the art. As would be recognized by one of ordinary skill in the art based on the present disclosure, the gap editor compositions, systems, and methods can be used to facilitate the modification of whole organisms, including but not limited to, humans, plants, livestock, and the like.

In some embodiments of the method, at least one of these components are introduced into the cell as part of a gene drive system. In a gene drive system, all or some of genome modification components such as the DNA-recognition domain, DNA-modifying domain, gRNA, and accessory factors are encoded within the donor nucleic acid sequence present in one copy of a chromosome. The gRNA directs the DNA-modifying domain to the sister chromosome in the region where the donor nucleic acid sequence would reside. Upon targeting by the gap editor proteins or complexes, the donor nucleic acid (which also encodes the gap editor system) is copied over to a new chromosome. Thus, the gap editor system becomes self-propagating, efficiently forming homozygously edited organisms. Example organisms in which gene drives can be implemented include fungi, flatworms, mosquitos, and mice.

In some embodiments, the compositions, systems, and methods of the present disclosure include one or more components that enhance or improve one or more aspects of gene modification. In some embodiments, improving or enhancing one or more aspects of genome modification includes the use of a gap editor accessory factor(s), as described above. In some embodiments, methods that enhance or improve one or more aspects of genome modification include reducing or attenuating nuclease activity in a cell in which genome modification is desired. Reducing nuclease activity in a cell can lead to enhanced or improved modification frequency and/or efficiency. In some embodiments, reducing nuclease activity in a cell includes reducing activity of an endogenous AP endonuclease (e.g., encoded by xthA) by any means known in the art. In some embodiments, nuclease activity in a cell can be reduced via genetic means and/or by pharmacological means (e.g., treatment with endonuclease inhibitors including but not limited to AJAY-4, CRT0044876, aurintricarboxylic acid, 6-hydroxy-DL-DOPA, Reactive Blue 2, myricetin, mitoxantrone, methyl-3,4-dephostatin, thiolactomycin, and (2E)-3-[5-(2,3-dimethoxy-6-methyl-1,4-benzoquinoyl)]-2-nonyl-2-propenoic acid (E3330)).

Embodiments of the compositions, systems, and methods provided herein can be used to edit the genome of a cell. The cell can be a prokaryotic cell, a eukaryotic cell, or a plant cell. In some embodiments, the cell is a mammalian cell. The present disclosure also provides an isolated cell comprising any of the components or systems described herein. Exemplary cells can include those that can be easily and reliably grown, have reasonably fast growth rates, have well characterized expression systems, and can be transformed or transfected easily and efficiently. Examples of suitable prokaryotic cells include, but are not limited to, cells from the genera Bacillus (such as Bacillus subtilis and Bacillus brevis), Clostridia (such as Clostridium difficile or Clostridium autoethanogenum), Escherichia (such as E. coli), Lactobacilli, Klebsiella, Myxobacteria, Pseudomonas, Streptomyces, Salmonella, Vibrio (such as Vibrio cholerae or Vibrio nutrifaciens) and Envinia. Suitable eukaryotic cells are known in the art and include, for example, yeast cells, insect cells, and mammalian cells. Examples of suitable yeast cells include those from the genera Kluyveromyces, Pichia, Rhino-sporidium, Saccharomyces, and Schizosaccharomyces. Exemplary insect cells include Sf-9 and HIS (Invitrogen, Carlsbad, Calif.) and are described in, for example, Kitts et al., Biotechniques, 14: 810-817 (1993); Lucklow, Curr. Opin. Biotechnol., 4: 564-572 (1993); and Lucklow et al., J. Virol., 67: 4566-4579 (1993).

In some embodiments, the compositions and methods of the present disclosure can be employed to induce DNA modification, and/or transcriptional modulation in mitotic or post-mitotic cells in vivo and/or ex vivo and/or in vitro (e.g., to produce genetically modified cells that can be reintroduced into an individual). Because the gap editors of the present disclosure include site-specific DNA-targeting, a mitotic and/or post-mitotic cell-of-interest can include a cell from any organism (e.g. a bacterial cell, an archaeal cell, a cell of a single-cell eukaryotic organism, a plant cell, an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens C. Agardh, and the like, a fungal cell (e.g., a yeast cell), an animal cell, a cell from an invertebrate animal (e.g. fruit fly, cnidarian, echinoderm, nematode, etc.), a cell from a vertebrate animal (e.g., fish, amphibian, reptile, bird, mammal), a cell from a mammal, a cell from a rodent, a cell from a human, etc.). Any type of cell may be of interest (e.g. a stem cell, e.g. an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a germ cell; a somatic cell, e.g. a fibroblast, a hematopoietic cell, a neuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell; an in vitro or in vivo embryonic cell of an embryo at any stage, e.g., a 1-cell, 2-cell, 4-cell, 8-cell, etc. stage zebrafish embryo; etc.). Cells may be from established cell lines or they may be primary cells, where “primary cells”, “primary cell lines”, and “primary cultures” are used interchangeably herein to refer to cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages of the culture. Target cells can include any unicellular organisms, multicellular organisms, or any cells grown in culture.

In some embodiments, the cell can also be a cell that is used for therapeutic purposes. The cell can be a mammalian cell, and in some embodiments, the cell is a human cell. A number of suitable mammalian and human cells are known in the art, and many are available from the American Type Culture Collection (ATCC, Manassas, Va.). Examples of suitable mammalian cells include, but are not limited to, Chinese hamster ovary cells (CHO) (ATCC No. CCL61), CHO DHFR-cells (Urlaub et al., Proc. Natl. Acad. Sci. USA, 97: 4216-4220 (1980)), human embryonic kidney (HEK) 293 or 293T cells (ATCC No. CRL1573), and 3T3 cells (ATCC No. CCL92). Other suitable mammalian cell lines are the monkey COS-1 (ATCC No. CRL1650) and COS-7 cell lines (ATCC No. CRL1651), as well as the CV-1 cell line (ATCC No. CCL70). Further exemplary mammalian cells include primate, rodent, and human cell lines, including transformed cell lines. Normal diploid cells, cell strains derived from in vitro culture of primary tissue, as well as primary explants, are also suitable. Other suitable mammalian cell lines include, but are not limited to, mouse neuroblastoma N2A cells, HeLa, HEK, A549, HepG2, mouse L-929 cells, and BHK or HaK hamster cell lines. Methods for selecting suitable cells and methods for transformation, culture, amplification, screening, and purification of cells are known in the art. Examples of suitable plant cell lines are derived from plants such as Arabidopsis (such as the Landsberg erecta cell line), sugarcane, tomato, pea, rice, wheat, tobacco (such as the BY-2 cell line).

In accordance with the methods described above embodiments, the compositions and systems of the present disclosure can be used to edit a genome of a cell in a manner that reduces the degree of indel formation, chromosomal rearrangements, or DNA duplications. In some embodiments, the compositions, systems, and methods described herein reduce cell toxicity as compared to currently available methods, at least in part due to the lack double-stranded breaks in the target nucleic acid.

4. MATERIALS AND METHODS

Measurement of gap editing in E. coli by a colorimetric assay was performed by co-transforming the DNA modifying domain fused to a DNA binding domain such as Cas9 (e.g. DarT-ScdCas9) and an sgRNA and nucleic acid donor into E. coli by electroporation and plated on LB agar plus the appropriate antibiotic(s). The resulting colonies were picked and inoculated into 750 mL of liquid LB media in a deep well plate shaking at 900 rpm and 37° C. for 12 to 16 hours overnight. Gap editor expression was induced by diluting overnight culture 1:500 into 750 mL of liquid LB media with antibiotics, 1 mM IPTG and 33 mM arabinose, shaking at 900 rpm for 8 hours. After 8 hours, samples were removed for spot plating on LB agar with antibiotics, IPTG, and X-gal. The next day, white and blue colonies were counted to determine frequency of lacZ recombination and repair. Repair was confirmed by sanger sequencing.

Measurement of gap editing in E. coli by antibiotic resistance assays was performed by co-transforming a DNA modifying domain fused to a DNA binding domain such as Cas9 or Cas12a, and an sgRNA with nucleic acid donor by electroporation. The transformation mixture was plated on LB agar plus the appropriate antibiotics. The resulting colonies were picked and inoculated into 750 mL of liquid LB media in a deep well plate shaking at 900 rpm and 30° C. for 12 to 16 hours overnight. Gap editor cultures were first back-diluted 1:100 into liquid LB with antibiotics shaking at 37° C. for 1 hour. Gap editor expression was then induced by further diluting this culture 1:100 into 750 mL of liquid LB media with antibiotics and 33 mM arabinose, shaking at 900 rpm for 5 hours. After 5 hours of induction, samples were removed for spot plating on two separate LB agar plates. One plate contained antibiotics to selected only for the gap editor, sgRNA, and repair template (typically chloramphenicol and ampicillin) and the other plate also included either rifampicin or kanamycin to select for edited cells. The next day colonies were counted. Genome editing efficiency was tabulated as being the number of colonies on the plates with rifampicin or kanamycin divided by the number of colonies on plates without rifampicin or kanamycin.

The measurement of gap editor toxicity in FIG. 7 was performed by co-transforming DarT-ScdCas9 gap editors into an E. coli strain lacking recA, a key factor in homologous recombination. These bacterial lack the capability for lesion bypass by homologous recombination, and are thus highly sensitive to replication blocking lesions on the DNA. Thus, DNA modification domains are expected to be especially toxic in these strains, unless their latent DNA binding activity is contained. In this fashion, we can more easily assess gap editor complexes for undesirable off-target DNA modification. After transforming and plating, single colonies were selected and inoculated into 750 mL of LB Chloramphenicol in a deep well plate shaking at 37° C. overnight. The next day, cultures were back-diluted 1:500 into LB Chloramphenicol with glucose to maintain gap editor repression, or arabinose to induce expression of the gap editor. Cultures were incubated shaking at 900 rpm in a deep well plate at 37° C. for 5 hours. Cultures were then spot plated on LB Chloramphenicol. The next day, colonies were counted to assess the final cell density, and therefore the rate of off-target DNA modification.

Measurement of ssDNA-templated gap editing in E. coli by rifampicin resistance was performed by first co-transforming the strand annealing beta recombinase plasmid and a DNA modifying domain fused to a DNA binding domain such as Cas9. The resulting clones were inoculated into LB, antibiotics, and anhydrotetracycline for induction of beta recombinase expression. These cultures were prepared for electroporation and transformed with the sgRNA plasmid, and cultured for 3 hours in a rich media at 37° C. and shaking at 250 RPM prior to spot plating on two separate LB agar plates. One plate contained antibiotics to selected only for the gap editor, sgRNA, and recombinase. The other plate additionally included rifampicin to select for edited cells. The next day colonies were counted. Genome editing efficiency was tabulated as being the number of colonies on the plates with rifampicin divided by the number of colonies on plates without rifampicin.

TABLE 2

Strain information corresponding to gap editors and gap editor complexes used in the present disclosure.

DNA or

Strain Name
Composition
Function
Appears in:

SPC1879 Or
darT G49D-
Site specific replication block onto thymine, induction of
FIG. 1

dTd-ScdC9
ScdCas9 pBAD
HDGR

SPC1881 Or
araC CmR p15a

GE2
darT G49D_K56A-
Site specific replication block onto thymine, induction of
FIGS. 1-3

ScdCas9 pBAD
HDGR, with reduced DarT DNA binding

araC CmR p15a

SPC1883 or
darT G49D-
Site specific replication block onto thymine, induction of
FIG. 9

dTd-ScnC9
ScnCas9 pBAD
HDGR

araC CmR p15a

SPC1884 Or
darT G49D_K56A-
Site specific replication block onto thymine, induction of
FIG. 16

GE2n
ScnCas9 pBAD
HDGR, with reduced DarT DNA binding, with target

araC CmR p15a
strand nicking

SPC1466
lacZ_sg705-

E. coli with defective lacZ gene
FIGS. 1-3

araF_pCON

ΔaraBAD

SPC1911
ScdCas9 pBAD
DNA binding only
FIG. 1

araC CmR p15a

SPC1912
ScnCas9 pBAD
Nicking of target strand
FIG. 2

araC CmR p15a

SPC1901
darT_G49D_K56A-
Site specific replication block onto thymine, induction of
FIG. 3

ScdCas9-darG
HDGR, with reduced DarT DNA binding, with full length

pBAD araC CmR
DarT inhibitor, DarG

p15a

SPC1902
darT_G49D_K56A-
Site specific replication block onto thymine, induction of
FIG. 3

ScdCas9-
HDGR, with reduced DarT DNA binding with C terminal

darG_Cterminal
domain of DarT inhibitor, DarG

pBAD araC CmR

p15a

SPC1903
darT_G49D_K56A-
Site specific replication block onto thymine, induction of
FIG. 3

ScdCas9-
HDGR, with reduced DarT DNA binding, with N terminal

darG_Nterminal
domain of DarT inhibitor, DarG

pBAD araC CmR

p15a

SPC1904
darT_G49D_K56A-
Site specific replication block onto thymine, induction of
FIG. 3

ScnCas9-darG
HDGR, with reduced DarT DNA binding, with target

pBAD araC CmR
strand nicking, with full length DarT inhibitor, DarG

p15a

SPC1905
darT_G49D_K56A-
Site specific replication block onto thymine, induction of
FIG. 3

ScnCas9-
HDGR, with reduced DarT DNA binding, with target

darG_Cterminal
strand nicking, with C terminal domain of DarT inhibitor,

pBAD araC CmR
DarG

p15a

SPC1906
darT_G49D_K56A-
Site specific replication block onto thymine, induction of
FIG. 3

ScnCas9-
HDGR, with reduced DarT DNA binding, with target

darG_Nterminal
strand nicking, with N terminal domain of DarT inhibitor,

pBAD araC CmR
DarG

p15a

SPC2503
Scabin-K130A-
Site specific replication block (adenosine di-phosphate
FIG. 4

ScdCas9)
ribose) transfer onto guanine, induction of HDGR,

nuclease-inactive Cas9

SPC2548
Scabin-K130A-
Catalytically inactive scabin fused to nuclease inactive
FIG. 4

E160A-ScdCas9
Cas9 to serve as a negative control

SPC2488
Non-targeting
Negative control, non-targeting guide RNA. Includes
FIGS. 4, 5,

sgRNA SS2 KanR
repair template for kanamycin resistance gene repair, but
6, 8, 9

HRT L2/RE
lacks a guide RNA directing the gap editor to the correct

AmpR ColE1
genomic location.

SPC2480
Scabin stop
Guide RNA directing the gap editor complex to the target
FIG. 4

sgRNA SS2 KanR
site for scabin gap editor-directed kanamycin gene repair.

HRT L2/RE
Includes repair template for kanamycin gene restoration.

AmpR ColE1
For use with strain SPC2496.

SPC2496
KanR_mut Scabin
A mutated kanamycin resistance gene inserted into the
FIG. 4

stop lead_first::SS2

E. coli genome with a site for targeting by a scabin gap

araF_pCON
editor. Targeting this site will trigger HDGR and confer

ΔaraBAD
resistance to kanamycin.

ΔlacZ_519

SPC2642
MOM-D149A-
Site specific replication block (carbamoyl group) transfer
FIG. 5

ScdCas9
onto adenine, induction of HDGR, nuclease-inactive Cas9

SPC2490
Mom sgRNA SS2
Guide RNA directing the gap editor complex to the target
FIG. 5

KanR HRT L2/RE
site for mom gap editor-directed kanamycin gene repair.

AmpR ColE1
Includes repair template for kanamycin gene restoration.

For use with strain SPC2514.

SPC2514
KanR_mut mom
A mutated kanamycin resistance gene inserted into the E.
FIG. 5

stop lead_first::SS2

coli genome with a site for targeting by a mom gap editor.

araF_pCON
Targeting this site will trigger HDGR and confer

ΔaraBAD
resistance to kanamycin.

ΔlacZ_519

SPC2495
KanR_mut DarT
A mutated kanamycin resistance gene inserted into the E.
FIGS. 6, 8,

stop lead_first::SS2

coli genome with a site for targeting by a DarT gap editor.
9

araF_pCON
Targeting this site will trigger HDGR and confer

ΔaraBAD
resistance to kanamycin.

ΔlacZ_519

SPC1134
MG1655 ΔrecA
An E. coli strain defective for the homologous
FIG. 7

recombination factor recA. Sensitizes E. coli to off-target

DNA modifications. Allows for easier measurement of

off-target DNA modifications.

SPC2716
DarT-G49D-
Site specific replication block onto thymine, induction of
FIG. 7, 8,

R193A-ScdCas9
HDGR, with reduced DarT DNA binding, nuclease-
9

inactive Cas9.

SPC2690
DarT-G49D-
Site specific replication block onto thymine, induction of
FIG. 8

M86L-R92A-
HDGR, with further reduced DarT DNA binding,

R193A-ScdCas9
nuclease-inactive Cas9.

SPC2189
DarT_G49D_R193A-
Site specific replication block onto thymine, induction of
FIG. 9

ScnCas9 pBAD
HDGR, with reduced DarT DNA binding, nicking Cas9.

araC CmR p15a

SPC2530
DarT_G49D_R193A-
Site specific replication block onto thymine, induction of
FIG. 10

ScnCas9 huOpt
HDGR, with reduced DarT DNA binding, nicking Cas9.

pGAL Leu CEN AmpR
Yeast expression.

SPC2525
ScnCas9 D10A
Cas9 nickase, yeast expression.
FIG. 10

huOpt pGAL Leu

CEN AmpR

SPC2435
FCY1 KO HRT
Guide RNA directing the DarT gap editor complex to a
FIG. 10

sgRNA 5 pSNR52
genomic site in the fcyl gene. Includes a repair template

sgRNA TRP1
encoding stop codons to edit and disrupt the translation of

2 micron LS/R1
fcy1, resulting in 5-FC resistance and colony growth.

AmpR

SPC2467
FCY1 KO HRT
Negative control, non-targeting guide RNA. Includes a
FIG. 10

Non-Targeting
repair template for disruption of the fcy1 gene, but lacks

sgRNA TRP1
the guide RNA directing the gap editor to the correct

2 micron LS/R1
genomic site.

SPC2629
FCY1 US1 KO
Guide RNA directing the DarT gap editor complex to a
FIG. 10

HRT sgRNA 5
genomic site in the fcy1 gene. Includes a repair template

pSNR52 sgRNA
encoding stop codons to edit and disrupt the translation of

TRP1 2 micron
fcy1, resulting in 5-FC resistance and colony growth.

LS/R1

SPC2631
FCY1 DS1 KO
Guide RNA directing the DarT gap editor complex to a
FIGS. 10,

HRT sgRNA 5
genomic site in the fcy1 gene. Includes a repair template
11

pSNR52 sgRNA
encoding stop codons to edit and disrupt the translation of

TRP1 2 micron
fcy1, resulting in 5-FC resistance and colony growth.

LS/R1

SPC2635
FCY1 US2 KO
Guide RNA directing the DarT gap editor complex to a
FIG. 10

HRT Non-
genomic site in the fcy1 gene. Includes a repair template

Targeting sgRNA
encoding stop codons to edit and disrupt the translation of

TRP1 2 micron
fcy1, resulting in 5-FC resistance and colony growth.

LS/R1

SPC2637
FCY1 DS2 KO
Guide RNA directing the DarT gap editor complex to a
FIG. 10

HRT Non-
genomic site in the fcy1 gene. Includes a repair template

Targeting sgRNA
encoding stop codons to edit and disrupt the translation of

TRP1 2 micron
fcy1, resulting in 5-FC resistance and colony growth.

LS/R1

SPC2722
DarT_G49D_R193A_M86L_R92A-
Site specific replication block onto thymine, induction of
FIG. 11

ScnCas9 huOpt
HDGR, with further reduced DarT DNA binding, nicking

pGAL Leu CEN
Cas9. Yeast expression.

AmpR

SPC2777
DarT_G49D_R193A-
Site specific replication block onto thymine, induction of
FIG. 13

dLbCas12a pBAD
HDGR, with reduced DarT DNA binding, nuclease-

CmR p15a
inactive Cas12a fusion.

SPC2795
LbCas12a Non-
Negative control, non-targeting gRNA with lacZ repair
FIG. 13

targeting crRNA
template encoding a stop codon.

mut short lacZ

HRT AmpR ColE1

SPC2796
LbCas12a crRNA
gRNA directing LbCas12a gap editor complex to lacZ
FIG. 13

1 mut short lacZ
gene and repair template encoding a stop codon as a

HRT AmpR ColE1
genome editing template.

SPC2797
LbCas12a crRNA
gRNA directing LbCas12a gap editor complex to lacZ
FIG. 13

2 mut short lacZ
gene and repair template encoding a stop codon as a

HRT AmpR ColE1
genome editing template.

SPC2798
LbCas12a crRNA
gRNA directing LbCas12a gap editor complex to lacZ
FIG. 13

3 mut short lacZ
gene and repair template encoding a stop codon as a

HRT AmpR ColE1
genome editing template.

SPC2799
LbCas12a crRNA
gRNA directing LbCas12a gap editor complex to lacZ
FIG. 13

4 mut short lacZ
gene and repair template encoding a stop codon as a

HRT AmpR ColE1
genome editing template.

SPC2800
LbCas12a crRNA
gRNA directing LbCas12a gap editor complex to lacZ
FIG. 13

5 mut short lacZ
gene and repair template encoding a stop codon as a

HRT AmpR ColE1
genome editing template.

SPC2801
LbCas12a crRNA
gRNA directing LbCas12a gap editor complex to lacZ
FIG. 13

6 mut short lacZ
gene and repair template encoding a stop codon as a

HRT AmpR ColE1
genome editing template.

SPC2802
LbCas12a crRNA
gRNA directing LbCas12a gap editor complex to lacZ
FIG. 13

7 mut short lacZ
gene and repair template encoding a stop codon as a

HRT AmpR ColE1
genome editing template.

SPC1895
DarT_G49D-
Site specific replication block onto thymine, induction of
FIG. 15

ScnCas9 Ec86 RT
HDGR, fusion with nicking Cas9. Co-expression of Ec86

pBAD araC CmR
reverse transcriptase for use of RNA repair templates.

p15a

SPC2132
rpoB GE2n retron
Guide RNA targeting the DarT gap editor complex to the
FIG. 15

FWD ld1 D516
rpoB gene at residue D516 for genome editing and

sgRNA AmpR ColE1
rifampicin resistance. Includes the an RNA repair

template with flanking sequences for reverse transcription

by Ec86 reverse transcriptase.

SPC2133
Non-Targeting
Negative control for D516 rpoB editing with RNA repair
FIG. 16

DarT D516 rpoB
template. Includes RNA repair template expression, but

retron FWD
lacks a guide RNA targeting the DarT gap editor complex

sgRNA AmpR ColE1
to the rpoB gene.

SPC2095
rpoB ld1 sgRNA
Guide RNA targeting rpoB gene at residue D516 for
FIG. 16

AmpR ColE1
genome editing and rifampicin resistance

SPC2026
lambda beta pTet
Beta recombinase under an anhydrotetracycline inducible
FIGS. 15,

4.6k TIR tetR
promoter. Used for gap editing using ssDNA and RNA
16

kanR sc 101
templates.

5. EXAMPLES

It will be readily apparent to those skilled in the art that other suitable modifications and adaptations of the methods of the present disclosure described herein are readily applicable and appreciable, and may be made using suitable equivalents without departing from the scope of the present disclosure or the aspects and embodiments disclosed herein. Having now described the present disclosure in detail, the same will be more clearly understood by reference to the following examples, which are merely intended only to illustrate some aspects and embodiments of the disclosure, and should not be viewed as limiting to the scope of the disclosure. The disclosures of all journal references, U.S. patents, and publications referred to herein are hereby incorporated by reference in their entireties.

The present disclosure has multiple aspects, illustrated by the following non-limiting examples.

Example 1

Experiments were conducted to assess the efficiency and toxicity of the gap editor complexes of the present disclosure. In one set of experiments, the DarT enzyme from E. coli EPEC with the attenuating mutation G49D was fused to the N-terminus of the fully or partially catalytically-dead version of ScCas9 (ScdCas9, or ScCas9 D10A also known as ScnCas9) with a long flexible linker. It was hypothesized that if chemical modification would occur, they would be made to the non-target strand exposed by ScdCas9 binding to its DNA target. Previous work indicated that DarT modifies thymine within a sequence motif possibly as wide as TYTN. Accordingly, genome editing in E. coli was assessed using these gap editor complexes.

The DarT-ScdCas9 fusion protein (gap editor complex) was targeted to four sites containing an NGG or NAG PAM and a TTTC motif on the non-target strand. The four sites surrounded a premature stop codon in the lacZ gene, which was the desired site of genome modification. The targets were chosen such that if a replication blocking lesion was introduced, a DNA gap would form that overlapped the premature stop codon. The four sites included two lagging strand targets and two leading strand targets. A plasmid encoding an arabinose inducible DarT-ScdCas9 was co-transformed with a plasmid containing a 1.5 kb repair template encoding mutations to block ScdCas9 re-targeting while repairing the lacZ stop codon. After culturing these colonies overnight, the cells were back-diluted into inducing medium, cultured for 8 hours, and then plated onto selective media with the β-galactosidase (lacZ gene product) indicator dye X-gal with the inducer IPTG.

When targeting only one site, the lacZ gene was efficiently repaired, as demonstrated by the results of in FIG. 1. However, targeting this site included a 10-fold drop in CFUs compared to the non-targeting condition, and a 50-fold drop in CFUs compared to the ScdCas9 control. This observed cytotoxicity could be due to ScdCas9-independent binding of DarT to ssDNA, which introduced widespread DNA replication blocks. By attenuating DNA binding within DarT, it was hypothesized that DarT could be more dependent on ScdCas9 for DNA binding. Computational prediction tools were used to identify potential DNA binding sites. To improve prediction accuracy, a set of DarT homologs were identified with some sequence divergences and predicted DNA binding sites for all of these homologs. By aligning the proteins and the DNA predictions, some DNA binding site predictions were found to be conserved across these DarT homologs. Based on this, alanine mutations were installed at these predicted sites. In one example, a K56A mutation substantially reduced the cytotoxic effects of DarT-ScdCas9, while maintaining efficient genome modification activity (FIG. 1). This new DarT-ScdCas9 fusion protein was referred to as gap editor 2 (GE2).

Example 2

Because a single replication block was being introduced into the DNA, it was expected that the dominant repair template would be the sister chromatid and not an ectopic repair template. Previous work has demonstrated that targeting two sites on either side of a DNA sequence-of-interest can boost genome modification, possibly by creating overlapping DNA gaps and interfering with sister chromatid repair. Therefore, it was hypothesized that the combination of DNA nicking and DNA modification/gap formation might similarly prevent sister chromatid repair, leaving the plasmid repair template as the preferred template for repair.

Cas9 nicking can drive low rates of genome editing in prokaryotes and eukaryotes. These nicks form single-ended double-strand breaks (seDSB) when encountered by the replisome. This typically involves replisome dissociation. These single-ended breaks are repaired by homologous recombination, most frequently with the sister chromatid. Importantly, in eukaryotic cells, Cas9 nicking can generate precise edits while minimizing indels presumably caused by non-homologous end-joining (NHEJ) machinery. There is no natural end joining partner at seDSBs, so NHEJ is inhibited at these breaks.

In accordance with the embodiments of the present disclosure, it was hypothesized that an overlapping DNA gap and seDSB could mutually exclude sister chromatid repair (e.g., exert synergistic effects). Where the seDSB end would typically look for homology on the sister chromatid, there would instead be a ssDNA gap. Similarly, where the DNA gap would typically find a homologous DNA template, there would be a seDSB, possibly resected to ssDNA. Therefore, the H848A mutation in ScdCas9 was re-activated, creating the target-strand nickase ScnCas9.

This nicking DarT-ScnCas9 fusion was tested in the lacZ repair assay described above using the most efficient target. As shown in FIG. 2, the nickase alone produced low levels of gene repair and a substantial drop in CFUs when expressed with the targeting sgRNA. DarT-ScdCas9 and the engineered DarT_K56A-ScdCas9 (GE2) produced modest levels of gene repair. After reactivating the nicking capacity, DarT-ScnCas9 proved to be cytotoxic, but DarT_K56A-ScnCas9 did not exhibit cytotoxicity and successfully edited nearly 80% of cells after 8 hours of induction. This nicking version of GE2 was referred to as GE2n.

Experiments were also conducted to investigate the use of DarT's antitoxin partner, DarG, to determine whether it would eliminate the genome modification capacity of GE2. The N-terminal domain of DarG contains a glycohydrolase which can directly repair ADPr modified thymine. The C-terminal domain of DarG contains a DarT inhibitor. GE2 and GE2n were each co-expressed with full length DarG, the C-terminal domain of DarG, or the N-terminal domain of DarG in an operon in the lacZ gene repair assay (FIG. 3). As shown in FIG. 3, GE2 and GE2n genome modification capacity was attenuated when both the N-terminal and C-terminal domains of DarG were expressed. This provides a means to mitigate potential off-target modification effects and toxicity without compromising on-target modification.

Additionally, as would be recognized by one of ordinary skill in the art based on the present disclosure, either the N-terminal or C-terminal domains of DarG can be used to counteract DarT activity. The N-terminal domain can remove ADP ribose, reverting the nucleotide to its original state. The C-terminal domain can directly inhibit DarT activity. Thus, single domains of DarG can be expressed at a low level, and in some cases, randomly distributed through the cell, to help counteract off-target effects of the DarT-Cas protein. In some embodiments, a single DarT domain can be used to reduce off-target effects without affecting on-target genome modification activity.

Example 3

Experiments were conducted to test the ability of a gap editing complex comprising a Scabin DNA-modifying domain in combination with a Cas9 DNA-recognition domain (Scabin-K130A-ScdCas9) to induce successful genome modification, measured based on the frequency of kanamycin gene repair in E. coli. In this exemplary set of experiments, expression of a Scabin-dCas9 fusion protein increased the frequency of kanamycin gene repair dependent on Scabin's DNA modification catalytic activity. Scabin is known to modify guanine within single and double-stranded DNA with an adenosine diphosphate ribose group, but it is structurally and evolutionarily divergent from DarT outside of a single shared catalytic motif. Recombination between the plasmid repair template and the targeted defective kanamycin gene in the E. coli genome results in repair of the targeted gene, and consequently, kanamycin resistance. Therefore, the fraction of kanamycin resistance serves as a readout for the rate of genome modification. The K130A mutation in Scabin attenuated Scabin's activity, which is otherwise toxic to the cells. The E160A mutation catalytically inactivates Scabin, removing all DNA modification activity (negative control). As shown in FIG. 4, the Scabin-K130A-ScdCas9 gap editor complex resulted in successful genome modification through increased frequency of kanamycin gene repair.

In another set of exemplary experiments, the ability of a gap editing complex comprising a Mom DNA-modifying domain in combination with a Cas9 DNA-recognition domain (Mom-D149A-ScdCas9) to induce successful genome modification, measured based on the frequency of kanamycin gene repair in E. coli, was also tested. Fusion of the Mom to dCas9 and targeting a defective kanamycin gene resulted in recombination, genome modification, and thereby kanamycin resistant cells. The Mom protein is known to modify adenine with a methylcarbamoyl group, which is known to block DNA replication, triggering gap repair recombination. The D149A mutation in Mom attenuated the catalytic activity, which is otherwise lethal to the cells. As shown in FIG. 5, the MOM-D149A-ScdCas9 gap editor complex resulted in successful genome modification through increased frequency of kanamycin gene repair.

Example 4

Experiments were also conducted to assess the DNA-modifying domain in the gap editing complexes of the present disclosure. Firstly, FIG. 6 includes representative results of experiments demonstrating that successful genome modification (e.g., though increased frequency of kanamycin gene repair) using gap editor complexes reliant on a DNA-modifying domain (DarT) in combination with a Cas9 DNA-recognition domain (DarT-G49D-ScdCas9). (ScdCas9 alone did not lead to kanamycin gene repair.) DarT was used as an exemplary DNA-modifying domain in these experiments.

Additionally, experiments were conducted to investigate whether DarT could be improved by reducing its toxic effects on cells. As shown in FIG. 7, introduction of the R193A mutation into DarT (DarT-G49D-R193A-ScdCas9) significantly reduced the toxicity of DarT when expression was induced by the addition of arabinose to the culture media. As shown in FIG. 8, the M86L and R92A mutations further reduced the toxicity of DarT, and also reduced CRISPR independent off-target modification, over and above that of the R193A mutation (FIG. 7). Furthermore, FIG. 9 shows successful genome modification using gap editor complexes comprising a DarT DNA-modifying domain with mutations (G49D and/or R193A) that significantly reduced toxicity in combination with a Cas9 DNA-recognition domain having nickase activity (ScnCas9). Site-specific genome modification was nearly 100% effective.

Thus, these results demonstrate the novel CRISPR-based genome modification technology of the present disclosure, which facilitates efficient site-specific genome modification while minimizing the unintended modification and cellular toxicity associated with current genome editing approaches.

Example 5

As shown in FIG. 10, experiments were conducted to assess the efficacy of genome modification in eukaryotic cells using the gap editor complexes of the present disclosure by assessing whether gene knockout of fcy1 is able to confer resistance to 5-Fluorocytosine (5-FC). The fcy1 gene was targeted in Saccharomyces Cerevisiae with a Cas9 nickase (ScnCas9) or the fusion of an engineered DarT gene to a Cas9 nickase and a repair template was provided. As shown, this resulted in successful genome modification at fcy1. The repair template encoded 6 mutations introducing two or three stop codons in fcy1, which resulted in a loss of fcy1 function after genome modification, and resistance to 5-FC. Additionally, as shown, one single guide RNA is combined with 5 different repair templates. For all mutations, the fusion of DarT provided a >10 fold increase in the rate of genome modification, demonstrating the utility of the introduction of replication blocking moieties in a eukaryotic cell.

As shown in FIG. 11, experiments were conducted to assess the efficacy of genome modification using the gap editor complexes of the present disclosure by assessing whether gene knockout of fcy1 is able to confer resistance to 5-Fluorocytosine (5-FC). The fcy1 gene was targeted in Saccharomyces Cerevisiae with a Cas9 nickase (ScnCas9) or the fusion of an engineered DarT gene to a Cas9 nickase and a repair template was provided. As shown, this resulted in successful genome modification at fcy1. The repair template encoded 6 mutations introducing two or three stop codons in fcy1, which resulted in a loss of fcy1 function after genome modification, and resistance to 5-FC. The use of an engineered DarT variant including the G49D, R193A, M86L and R92A mutations improved cell viability up to approximately 50 fold over DarT with the G49D and R193A mutations alone. This gap editor complex effectuates efficient and low toxicity genome modification using two separate single guide RNAs and repair templates targeting fcy1 in yeast.

Example 6

As shown in FIG. 13, experiments were conducted to assess the efficacy of genome modification using the gap editor complexes of the present disclosure by assessing whether gene knockout of lacZ. Gene knockout of lacZ results in a white colony color in the presence of the lactose analog IPTG and the colorimetric indicator X-gal. The lacZ gene was targeted in E. coli with a nuclease-inactive Cas12a protein (dLbCas12a) fused to an engineered DarT gene and a repair template was provided. As shown, this resulted in genome modification at lacZ. The repair template encoded lacZ DNA with a stop codon, which resulted in a loss of lacZ function after genome modification, and a white colony color. No genome modification was observed without targeting of the gap editor complex to the lacZ gene.

FIG. 14 includes representative chromatographs demonstrating successful introduction of one or more stop codons into the lacZ gene using DarT(G49D/R193A)-dLbCas12a associated with different crRNAs. The lacZ gene from white colored colonies was amplified and sent for sanger sequencing. Highlighted in red are mutations which introduce one or more stop codons into the lacZ gene, eliminating beta-galactosidase expression and thereby resulting in a white colored colony when plated in the presence of the inducer IPTG and the colorimetric indicator X-gal.

Example 7

As shown in FIG. 15, experiments were conducted to assess the efficacy of genome modification using the gap editor complexes of the present disclosure by assessing whether the introduction of the D516G mutation into the rpoB gene is able to confer resistance to the antibiotic rifampicin. The rpoB gene was targeted in E. coli with an engineered DarT variant fused to a Cas9 nickase (ScnCas9), and an RNA repair template and a reverse transcriptase were co-expressed. This resulted in successful site-specific RNA templated genome modification. A recT type recombinase was co-expressed to accelerate strand annealing. The RNA repair template encoded the D516G mutation, and was successfully integrated into the genome after targeting by the gap editor complex.

As shown in FIG. 16, experiments were conducted to assess the efficacy of genome modification using the gap editor complexes of the present disclosure by assessing whether the introduction of the D516G mutation into the rpoB gene is able to confer resistance to the antibiotic rifampicin. The rpoB gene was targeted in E. coli with an engineered DarT variant fused to a Cas9 nickase (ScnCas9) and a linear single-stranded DNA repair template was provided. As shown, this resulted in successful genome modification at rpoB. A recT type recombinase was co-expressed to accelerate annealing of the single-stranded DNA repair template. The repair template encoded the D516G mutation conferring rifampicin resistance. Two guides and repair templates were tested, targeting opposite DNA strands at the rpoB D516 genomic locus. Targeting of the gap editor complex to rpoB resulted in a 100 to 6,000 fold increase in genome modification rates, demonstrating the effect of the gap editors.

Sequences.

Sequences of exemplary gap editors as described herein are provided below.

SPC1879 darT G49D-ScdCas9 pBAD araC CmR p15a:

MAYDYSASLNPQKALIWRIVHRDNIPWILDNGLHCGNSLVQAENWINIDN

PELIGKRAGHPVPVGTGGTLHDYVPFYFTPFSPMLMNIHSGRGGIKRRPNEEIVILVSN

LRNVAAHDVPFVFTDSHAYYNWTNYYTSLNSLDQIDWPILQARDFRRDPDDPAKFE

RYQAEALIWQHCPISLLDGIICYSEEVRLQLEQWLFQRNLTMSVHTRSGWYFSSGGSS

GGSSGSETPGTSESATPESSGGSSGGSEKKYSIGLAIGTNSVGWAVITDDYKVPSKKF

KVLGNTNRKSIKKNLMGALLFDSGETAEATRLKRTARRRYTRRKNRIRYLQEIFANE

MAKLDDSFFQRLEESFLVEEDKKNERHPIFGNLADEVAYHRNYPTIYHLRKKLADSP

EKADLRLIYLALAHIIKFRGHFLIEGKLNAENSDVAKLFYQLIQTYNQLFEESPLDEIE

VDAKGILSARLSKSKRLEKLIAVFPNEKKNGLFGNIIALALGLTPNFKSNFDLTEDAKL

QLSKDTYDDDLDELLGQIGDQYADLFSAAKNLSDAILLSDILRSNSEVTKAPLSASMV

KRYDEHHQDLALLKTLVRQQFPEKYAEIFKDDTKNGYAGYVGIGIKHRKRTTKLAT

QEEFYKFIKPILEKMDGAEELLAKLNRDDLLRKQRTFDNGSIPHQIHLKELHAILRRQ

EEFYPFLKENREKIEKILTFRIPYYVGPLARGNSRFAWLTRKSEEAITPWNFEEVVDKG

ASAQSFIERMTNFDEQLPNKKVLPKHSLLYEYFTVYNELTKVKYVTERMRKPEFLSG

EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEIIGVEDRFNASLGTYHDLLKII

KDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRHYTG

WGRLSRKMINGIRDKQSGKTILDFLKSDGFSNRNFMQLIHDDSLTFKEEIEKAQVSGQ

GDSLHEQIADLAGSPAIKKGILQTVKIVDELVKVMGHKPENIVIEMARENQTTTKGLQ

QSRERKKRIEEGIKELESQILKENPVENTQLQNEKLYLYYLQNGRDMYVDQELDINR

LSDYDVDAIVPQSFIKDDSIDNKVLTRSVENRGKSDNVPSEEVVKKMKNYWRQLLN

AKLITQRKFDNLTKAERGGLSEADKAGFIKRQLVETRQITKHVARILDSRMNTKRDK

NDKPIREVKVITLKSKLVSDFRKDFQLYKVRDINNYHHAHDAYLNAVVGTALIKKYP

KLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKRFFYSNIMNFFKTEVKLANGEIRK

RPLIETNGETGEVVWNKEKDFATVRKVLAMPQVNIVKKTEVQTGGFSKESILSKRES

AKLIPRKKGWDTRKYGGFGSPTVAYSILVVAKVEKGKAKKLKSVKVLVGITIMEKG

SYEKDPIGFLEAKGYKDIKKELIFKLPKYSLFELENGRRRMLASATELQKANELVLPQ

HLVRLLYYTQNISATTGSNNLGYIEQHREEFKEIFEKIIDFSEKYILKNKVNSNLKSSFD

EQFAVSDSILLSNSFVSLLKYTSFGASGGFTFLDLDVKQGRLRYQTVTEVLDATLIYQ

SITGLYETRTDLSQLGGD* (SEQ ID NO: 1)

SPC1881 GE2 darT G49D-K56A-ScdCas9 pBAD araC CmR p15a:

MAYDYSASLNPQKALIWRIVHRDNIPWILDNGLHCGNSLVQAENWINIDN

PELIGARAGHPVPVGTGGTLHDYVPFYFTPFSPMLMNIHSGRGGIKRRPNEEIVILVSN

LRNVAAHDVPFVFTDSHAYYNWTNYYTSLNSLDQIDWPILQARDFRRDPDDPAKFE

RYQAEALIWQHCPISLLDGIICYSEEVRLQLEQWLFQRNLTMSVHTRSGWYFSSGGSS

GGSSGSETPGTSESATPESSGGSSGGSEKKYSIGLAIGTNSVGWAVITDDYKVPSKKF

KVLGNTNRKSIKKNLMGALLFDSGETAEATRLKRTARRRYTRRKNRIRYLQEIFANE

MAKLDDSFFQRLEESFLVEEDKKNERHPIFGNLADEVAYHRNYPTIYHLRKKLADSP

EKADLRLIYLALAHIIKFRGHFLIEGKLNAENSDVAKLFYQLIQTYNQLFEESPLDEIE

VDAKGILSARLSKSKRLEKLIAVFPNEKKNGLFGNIIALALGLTPNFKSNFDLTEDAKL

QLSKDTYDDDLDELLGQIGDQYADLFSAAKNLSDAILLSDILRSNSEVTKAPLSASMV

KRYDEHHQDLALLKTLVRQQFPEKYAEIFKDDTKNGYAGYVGIGIKHRKRTTKLAT

QEEFYKFIKPILEKMDGAEELLAKLNRDDLLRKQRTFDNGSIPHQIHLKELHAILRRQ

EEFYPFLKENREKIEKILTFRIPYYVGPLARGNSRFAWLTRKSEEAITPWNFEEVVDKG

ASAQSFIERMTNFDEQLPNKKVLPKHSLLYEYFTVYNELTKVKYVTERMRKPEFLSG

EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEIIGVEDRFNASLGTYHDLLKII

KDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRHYTG

WGRLSRKMINGIRDKQSGKTILDFLKSDGFSNRNFMQLIHDDSLTFKEEIEKAQVSGQ

GDSLHEQIADLAGSPAIKKGILQTVKIVDELVKVMGHKPENIVIEMARENQTTTKGLQ

QSRERKKRIEEGIKELESQILKENPVENTQLQNEKLYLYYLQNGRDMYVDQELDINR

LSDYDVDAIVPQSFIKDDSIDNKVLTRSVENRGKSDNVPSEEVVKKMKNYWRQLLN

AKLITQRKFDNLTKAERGGLSEADKAGFIKRQLVETRQITKHVARILDSRMNTKRDK

NDKPIREVKVITLKSKLVSDFRKDFQLYKVRDINNYHHAHDAYLNAVVGTALIKKYP

KLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKRFFYSNIMNFFKTEVKLANGEIRK

RPLIETNGETGEVVWNKEKDFATVRKVLAMPQVNIVKKTEVQTGGFSKESILSKRES

AKLIPRKKGWDTRKYGGFGSPTVAYSILVVAKVEKGKAKKLKSVKVLVGITIMEKG

SYEKDPIGFLEAKGYKDIKKELIFKLPKYSLFELENGRRRMLASATELQKANELVLPQ

HLVRLLYYTQNISATTGSNNLGYIEQHREEFKEIFEKIIDFSEKYILKNKVNSNLKSSFD

EQFAVSDSILLSNSFVSLLKYTSFGASGGFTFLDLDVKQGRLRYQTVTEVLDATLIYQ

SITGLYETRTDLSQLGGD* (SEQ ID NO: 2)

SPC1883 darT G49D-ScnCas9 pBAD araC CmR p15a:

MAYDYSASLNPQKALIWRIVHRDNIPWILDNGLHCGNSLVQAENWINIDN

PELIGKRAGHPVPVGTGGTLHDYVPFYFTPFSPMLMNIHSGRGGIKRRPNEEIVILVSN

LRNVAAHDVPFVFTDSHAYYNWTNYYTSLNSLDQIDWPILQARDFRRDPDDPAKFE

RYQAEALIWQHCPISLLDGIICYSEEVRLQLEQWLFQRNLTMSVHTRSGWYFSSGGSS

GGSSGSETPGTSESATPESSGGSSGGSEKKYSIGLAIGTNSVGWAVITDDYKVPSKKF

KVLGNTNRKSIKKNLMGALLFDSGETAEATRLKRTARRRYTRRKNRIRYLQEIFANE

MAKLDDSFFQRLEESFLVEEDKKNERHPIFGNLADEVAYHRNYPTIYHLRKKLADSP

EKADLRLIYLALAHIIKFRGHFLIEGKLNAENSDVAKLFYQLIQTYNQLFEESPLDEIE

VDAKGILSARLSKSKRLEKLIAVFPNEKKNGLFGNIIALALGLTPNFKSNFDLTEDAKL

QLSKDTYDDDLDELLGQIGDQYADLFSAAKNLSDAILLSDILRSNSEVTKAPLSASMV

KRYDEHHQDLALLKTLVRQQFPEKYAEIFKDDTKNGYAGYVGIGIKHRKRTTKLAT

QEEFYKFIKPILEKMDGAEELLAKLNRDDLLRKQRTFDNGSIPHQIHLKELHAILRRQ

EEFYPFLKENREKIEKILTFRIPYYVGPLARGNSRFAWLTRKSEEAITPWNFEEVVDKG

ASAQSFIERMTNFDEQLPNKKVLPKHSLLYEYFTVYNELTKVKYVTERMRKPEFLSG

EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEIIGVEDRFNASLGTYHDLLKII

KDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRHYTG

WGRLSRKMINGIRDKQSGKTILDFLKSDGFSNRNFMQLIHDDSLTFKEEIEKAQVSGQ

GDSLHEQIADLAGSPAIKKGILQTVKIVDELVKVMGHKPENIVIEMARENQTTTKGLQ

QSRERKKRIEEGIKELESQILKENPVENTQLQNEKLYLYYLQNGRDMYVDQELDINR

LSDYDVDHIVPQSFIKDDSIDNKVLTRSVENRGKSDNVPSEEVVKKMKNYWRQLLN

AKLITQRKFDNLTKAERGGLSEADKAGFIKRQLVETRQITKHVARILDSRMNTKRDK

NDKPIREVKVITLKSKLVSDFRKDFQLYKVRDINNYHHAHDAYLNAVVGTALIKKYP

KLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKRFFYSNIMNFFKTEVKLANGEIRK

RPLIETNGETGEVVWNKEKDFATVRKVLAMPQVNIVKKTEVQTGGFSKESILSKRES

AKLIPRKKGWDTRKYGGFGSPTVAYSILVVAKVEKGKAKKLKSVKVLVGITIMEKG

SYEKDPIGFLEAKGYKDIKKELIFKLPKYSLFELENGRRRMLASATELQKANELVLPQ

HLVRLLYYTQNISATTGSNNLGYIEQHREEFKEIFEKIIDFSEKYILKNKVNSNLKSSFD

EQFAVSDSILLSNSFVSLLKYTSFGASGGFTFLDLDVKQGRLRYQTVTEVLDATLIYQ

SITGLYETRTDLSQLGGD* (SEQ ID NO: 3)

SPC1884 GE2n darT G49D-K56A-ScnCas9 pBAD araC CmR p15a:

MAYDYSASLNPQKALIWRIVHRDNIPWILDNGLHCGNSLVQAENWINIDN

PELIGARAGHPVPVGTGGTLHDYVPFYFTPFSPMLMNIHSGRGGIKRRPNEEIVILVSN

LRNVAAHDVPFVFTDSHAYYNWTNYYTSLNSLDQIDWPILQARDFRRDPDDPAKFE

RYQAEALIWQHCPISLLDGIICYSEEVRLQLEQWLFQRNLTMSVHTRSGWYFSSGGSS

GGSSGSETPGTSESATPESSGGSSGGSEKKYSIGLAIGTNSVGWAVITDDYKVPSKKF

KVLGNTNRKSIKKNLMGALLFDSGETAEATRLKRTARRRYTRRKNRIRYLQEIFANE

MAKLDDSFFQRLEESFLVEEDKKNERHPIFGNLADEVAYHRNYPTIYHLRKKLADSP

EKADLRLIYLALAHIIKFRGHFLIEGKLNAENSDVAKLFYQLIQTYNQLFEESPLDEIE

VDAKGILSARLSKSKRLEKLIAVFPNEKKNGLFGNIIALALGLTPNFKSNFDLTEDAKL

QLSKDTYDDDLDELLGQIGDQYADLFSAAKNLSDAILLSDILRSNSEVTKAPLSASMV

KRYDEHHQDLALLKTLVRQQFPEKYAEIFKDDTKNGYAGYVGIGIKHRKRTTKLAT

QEEFYKFIKPILEKMDGAEELLAKLNRDDLLRKQRTFDNGSIPHQIHLKELHAILRRQ

EEFYPFLKENREKIEKILTFRIPYYVGPLARGNSRFAWLTRKSEEAITPWNFEEVVDKG

ASAQSFIERMTNFDEQLPNKKVLPKHSLLYEYFTVYNELTKVKYVTERMRKPEFLSG

EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEIIGVEDRFNASLGTYHDLLKII

KDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRHYTG

WGRLSRKMINGIRDKQSGKTILDFLKSDGFSNRNFMQLIHDDSLTFKEEIEKAQVSGQ

GDSLHEQIADLAGSPAIKKGILQTVKIVDELVKVMGHKPENIVIEMARENQTTTKGLQ

QSRERKKRIEEGIKELESQILKENPVENTQLQNEKLYLYYLQNGRDMYVDQELDINR

LSDYDVDHIVPQSFIKDDSIDNKVLTRSVENRGKSDNVPSEEVVKKMKNYWRQLLN

AKLITQRKFDNLTKAERGGLSEADKAGFIKRQLVETRQITKHVARILDSRMNTKRDK

NDKPIREVKVITLKSKLVSDFRKDFQLYKVRDINNYHHAHDAYLNAVVGTALIKKYP

KLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKRFFYSNIMNFFKTEVKLANGEIRK

RPLIETNGETGEVVWNKEKDFATVRKVLAMPQVNIVKKTEVQTGGFSKESILSKRES

AKLIPRKKGWDTRKYGGFGSPTVAYSILVVAKVEKGKAKKLKSVKVLVGITIMEKG

SYEKDPIGFLEAKGYKDIKKELIFKLPKYSLFELENGRRRMLASATELQKANELVLPQ

HLVRLLYYTQNISATTGSNNLGYIEQHREEFKEIFEKIIDFSEKYILKNKVNSNLKSSFD

EQFAVSDSILLSNSFVSLLKYTSFGASGGFTFLDLDVKQGRLRYQTVTEVLDATLIYQ

SITGLYETRTDLSQLGGD* (SEQ ID NO: 4)

DarG:

MITYTQGNLLDAPVEALVNTVNTVGVMGKGIALMFKERFPENMKVYALA

CKQKQVITGKMFITETGELMGPRWIVNFPTKQHWRADSRMEWIEDGLQDLRRFLIEE

NVQSIAIPPLGAGNGGLNWPDVRAQIESALGDLQDVDILIYQPTEKYQNVAKSTGVK

KLTPARAAIAELVRRYWVLGMECSLLEIQKLAWLLQRAIEQHQQDDILKLRFEAHYY

GPYAPNLNHLLNALDGTYLKAEKRIPDSQPLDVIWFNDQKKEHVNAYLNNEAREWL

PALEQVSQLIDGFESPFGLELLATVDWLLSRGECQPTLDSVKEGLHQWPAGERWASR

KLRLFDNNNLQFAINRVMEFHC* (SEQ ID NO: 5)

DarG_C-terminal:

MDVRAQIESALGDLQDVDILIYQPTEKYQNVAKSTGVKKLTPARAAIAELV

RRYWVLGMECSLLEIQKLAWLLQRAIEQHQQDDILKLRFEAHYYGPYAPNLNHLLN

ALDGTYLKAEKRIPDSQPLDVIWFNDQKKEHVNAYLNNEAREWLPALEQVSQLIDG

FESPFGLELLATVDWLLSRGECQPTLDSVKEGLHQWPAGERWASRKLRLFDNNNLQ

FAINRVMEFHC* (SEQ ID NO: 6)

DarG N-terminal:

MITYTQGNLLDAPVEALVNTVNTVGVMGKGIALMFKERFPENMKVYALA

CKQKQVITGKMFITETGELMGPRWIVNFPTKQHWRADSRMEWIEDGLQDLRRFLIEE

NVQSIAIPPLGAGNGGLNWP* (SEQ ID NO: 7)

Mom:

MPASIPRRNIVGKEKKSRILTKPCVIEYEGQIVGYGSKELRVETISCWLARTI

IQTKHYSRRFVNNSYLHLGVFSGRDLVGVLQWGYALNPNSGRRVVLETDNRGYME

LNRMWLHDDMPRNSESRAISYALKVIRLLYPSVEWVQSFADERCGRAGVVYQASNF

DFIGSHESTFYELDGEWYHEITMNAIKRGGQRGVYLRANKERAVVHKFNQYRYIRFL

NKRARKRLNTKLFKVQPYPK (SEQ ID NO: 8)

Mom_D149A:

MPASIPRRNIVGKEKKSRILTKPCVIEYEGQIVGYGSKELRVETISCWLARTI

IQTKHYSRRFVNNSYLHLGVFSGRDLVGVLQWGYALNPNSGRRVVLETDNRGYME

LNRMWLHDDMPRNSESRAISYALKVIRLLYPSVEWVQSFAAERCGRAGVVYQASNF

DFIGSHESTFYELDGEWYHEITMNAIKRGGQRGVYLRANKERAVVHKFNQYRYIRFL

NKRARKRLNTKLFKVQPYPK (SEQ ID NO: 9)

Mom_D149A-ScdCas9:

MPASIPRRNIVGKEKKSRILTKPCVIEYEGQIVGYGSKELRVETISCWLARTI

IQTKHYSRRFVNNSYLHLGVFSGRDLVGVLQWGYALNPNSGRRVVLETDNRGYME

LNRMWLHDDMPRNSESRAISYALKVIRLLYPSVEWVQSFAAERCGRAGVVYQASNF

DFIGSHESTFYELDGEWYHEITMNAIKRGGQRGVYLRANKERAVVHKFNQYRYIRFL

NKRARKRLNTKLFKVQPYPKSGGSSGGSSGSETPGTSESATPESSGGSSGGSEKKYSI

GLAIGTNSVGWAVITDDYKVPSKKFKVLGNTNRKSIKKNLMGALLFDSGETAEATR

LKRTARRRYTRRKNRIRYLQEIFANEMAKLDDSFFQRLEESFLVEEDKKNERHPIFGN

LADEVAYHRNYPTIYHLRKKLADSPEKADLRLIYLALAHIIKFRGHFLIEGKLNAENS

DVAKLFYQLIQTYNQLFEESPLDEIEVDAKGILSARLSKSKRLEKLIAVFPNEKKNGLF

GNIIALALGLTPNFKSNFDLTEDAKLQLSKDTYDDDLDELLGQIGDQYADLFSAAKN

LSDAILLSDILRSNSEVTKAPLSASMVKRYDEHHQDLALLKTLVRQQFPEKYAEIFKD

DTKNGYAGYVGIGIKHRKRTTKLATQEEFYKFIKPILEKMDGAEELLAKLNRDDLLR

KQRTFDNGSIPHQIHLKELHAILRRQEEFYPFLKENREKIEKILTFRIPYYVGPLARGNS

RFAWLTRKSEEAITPWNFEEVVDKGASAQSFIERMTNFDEQLPNKKVLPKHSLLYEY

FTVYNELTKVKYVTERMRKPEFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIEC

FDSVEIIGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEE

RLKTYAHLFDDKVMKQLKRRHYTGWGRLSRKMINGIRDKQSGKTILDFLKSDGESN

RNFMQLIHDDSLTFKEEIEKAQVSGQGDSLHEQIADLAGSPAIKKGILQTVKIVDELV

KVMGHKPENIVIEMARENQTTTKGLQQSRERKKRIEEGIKELESQILKENPVENTQLQ

NEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFIKDDSIDNKVLTRSVENR

GKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSEADKAGFIKRQ

LVETRQITKHVARILDSRMNTKRDKNDKPIREVKVITLKSKLVSDFRKDFQLYKVRDI

NNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKAT

AKRFFYSNIMNFFKTEVKLANGEIRKRPLIETNGETGEVVWNKEKDFATVRKVLAMP

QVNIVKKTEVQTGGFSKESILSKRESAKLIPRKKGWDTRKYGGFGSPTVAYSILVVAK

VEKGKAKKLKSVKVLVGITIMEKGSYEKDPIGFLEAKGYKDIKKELIFKLPKYSLFEL

ENGRRRMLASATELQKANELVLPQHLVRLLYYTQNISATTGSNNLGYIEQHREEFKE

IFEKIIDFSEKYILKNKVNSNLKSSFDEQFAVSDSILLSNSFVSLLKYTSFGASGGFTFL

DLDVKQGRLRYQTVTEVLDATLIYQSITGLYETRTDLSQLGGD (SEQ ID NO: 10)

Scabin:

MRRRAAAVVLSLSAVLATSAATAPAQTPTATATSAKAAAPACPRFDDPVH

AAADPRVDVERITPDPVWRTTCGTLYRSDSRGPAVVFEQGFLPKDVIDGQYDIESYV

LVNQPSPYVSTTYDHDLYKTWYKSGYNYYIDAPGGVDVNKTIGDRHKWADQVEVA

FPGGIRTEFVIGVCPVDKKTRTEKMSECVGNPHYEPWH (SEQ ID NO: 11)

Scabin_K130A:

MRRRAAAVVLSLSAVLATSAATAPAQTPTATATSAKAAAPACPRFDDPVH

AAADPRVDVERITPDPVWRTTCGTLYRSDSRGPAVVFEQGFLPKDVIDGQYDIESYV

LVNQPSPYVSTTYDHDLYKTWYASGYNYYIDAPGGVDVNKTIGDRHKWADQVEVA

FPGGIRTEFVIGVCPVDKKTRTEKMSECVGNPHYEPWH (SEQ ID NO: 12)

Scabin_K130A-ScdCas9:

MRRRAAAVVLSLSAVLATSAATAPAQTPTATATSAKAAAPACPRFDDPVH

AAADPRVDVERITPDPVWRTTCGTLYRSDSRGPAVVFEQGFLPKDVIDGQYDIESYV

LVNQPSPYVSTTYDHDLYKTWYASGYNYYIDAPGGVDVNKTIGDRHKWADQVEVA

FPGGIRTEFVIGVCPVDKKTRTEKMSECVGNPHYEPWHSGGSSGGSSGSETPGTSESA

TPESSGGSSGGSEKKYSIGLAIGTNSVGWAVITDDYKVPSKKFKVLGNTNRKSIKKNL

MGALLFDSGETAEATRLKRTARRRYTRRKNRIRYLQEIFANEMAKLDDSFFQRLEES

FLVEEDKKNERHPIFGNLADEVAYHRNYPTIYHLRKKLADSPEKADLRLIYLALAHII

KFRGHFLIEGKLNAENSDVAKLFYQLIQTYNQLFEESPLDEIEVDAKGILSARLSKSKR

LEKLIAVFPNEKKNGLFGNIIALALGLTPNFKSNFDLTEDAKLQLSKDTYDDDLDELL

GQIGDQYADLFSAAKNLSDAILLSDILRSNSEVTKAPLSASMVKRYDEHHQDLALLK

TLVRQQFPEKYAEIFKDDTKNGYAGYVGIGIKHRKRTTKLATQEEFYKFIKPILEKMD

GAEELLAKLNRDDLLRKQRTFDNGSIPHQIHLKELHAILRRQEEFYPFLKENREKIEKI

LTFRIPYYVGPLARGNSRFAWLTRKSEEAITPWNFEEVVDKGASAQSFIERMTNFDE

QLPNKKVLPKHSLLYEYFTVYNELTKVKYVTERMRKPEFLSGEQKKAIVDLLFKTNR

KVTVKQLKEDYFKKIECFDSVEIIGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDIL

EDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRHYTGWGRLSRKMINGIRD

KQSGKTILDFLKSDGFSNRNFMQLIHDDSLTFKEEIEKAQVSGQGDSLHEQIADLAGS

PAIKKGILQTVKIVDELVKVMGHKPENIVIEMARENQTTTKGLQQSRERKKRIEEGIK

ELESQILKENPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSF

IKDDSIDNKVLTRSVENRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK

AERGGLSEADKAGFIKRQLVETRQITKHVARILDSRMNTKRDKNDKPIREVKVITLKS

KLVSDFRKDFQLYKVRDINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKV

YDVRKMIAKSEQEIGKATAKRFFYSNIMNFFKTEVKLANGEIRKRPLIETNGETGEVV

WNKEKDFATVRKVLAMPQVNIVKKTEVQTGGFSKESILSKRESAKLIPRKKGWDTR

KYGGFGSPTVAYSILVVAKVEKGKAKKLKSVKVLVGITIMEKGSYEKDPIGFLEAKG

YKDIKKELIFKLPKYSLFELENGRRRMLASATELQKANELVLPQHLVRLLYYTQNISA

TTGSNNLGYIEQHREEFKEIFEKIIDFSEKYILKNKVNSNLKSSFDEQFAVSDSILLSNS

FVSLLKYTSFGASGGFTFLDLDVKQGRLRYQTVTEVLDATLIYQSITGLYETRTDLSQ

LGGD (SEQ ID NO: 13)

DarT_G49D_R193A:

MAYDYSASLNPQKALIWRIVHRDNIPWILDNGLHCGNSLVQAENWINIDN

PELIGKRAGHPVPVGTGGTLHDYVPFYFTPFSPMLMNIHSGRGGIKRRPNEEIVILVSN

LRNVAAHDVPFVFTDSHAYYNWTNYYTSLNSLDQIDWPILQARDFRRDPDDPAKFE

RYQAEALIWQHCPISLLDGIICYSEEVALQLEQWLFQRNLTMSVHTRSGWYFS (SEQ

ID NO: 14)

DarT_G49D_R193A-ScdCas9:

MAYDYSASLNPQKALIWRIVHRDNIPWILDNGLHCGNSLVQAENWINIDN

PELIGKRAGHPVPVGTGGTLHDYVPFYFTPFSPMLMNIHSGRGGIKRRPNEEIVILVSN

LRNVAAHDVPFVFTDSHAYYNWTNYYTSLNSLDQIDWPILQARDFRRDPDDPAKFE

RYQAEALIWQHCPISLLDGIICYSEEVALQLEQWLFQRNLTMSVHTRSGWYFSSGGSS

GGSSGSETPGTSESATPESSGGSSGGSEKKYSIGLAIGTNSVGWAVITDDYKVPSKKF

KVLGNTNRKSIKKNLMGALLFDSGETAEATRLKRTARRRYTRRKNRIRYLQEIFANE

MAKLDDSFFQRLEESFLVEEDKKNERHPIFGNLADEVAYHRNYPTIYHLRKKLADSP

EKADLRLIYLALAHIIKFRGHFLIEGKLNAENSDVAKLFYQLIQTYNQLFEESPLDEIE

VDAKGILSARLSKSKRLEKLIAVFPNEKKNGLFGNIIALALGLTPNFKSNFDLTEDAKL

QLSKDTYDDDLDELLGQIGDQYADLFSAAKNLSDAILLSDILRSNSEVTKAPLSASMV

KRYDEHHQDLALLKTLVRQQFPEKYAEIFKDDTKNGYAGYVGIGIKHRKRTTKLAT

QEEFYKFIKPILEKMDGAEELLAKLNRDDLLRKQRTFDNGSIPHQIHLKELHAILRRQ

EEFYPFLKENREKIEKILTFRIPYYVGPLARGNSRFAWLTRKSEEAITPWNFEEVVDKG

ASAQSFIERMTNFDEQLPNKKVLPKHSLLYEYFTVYNELTKVKYVTERMRKPEFLSG

EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEIIGVEDRFNASLGTYHDLLKII

KDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRHYTG

WGRLSRKMINGIRDKQSGKTILDFLKSDGFSNRNFMQLIHDDSLTFKEEIEKAQVSGQ

GDSLHEQIADLAGSPAIKKGILQTVKIVDELVKVMGHKPENIVIEMARENQTTTKGLQ

QSRERKKRIEEGIKELESQILKENPVENTQLQNEKLYLYYLQNGRDMYVDQELDINR

LSDYDVDAIVPQSFIKDDSIDNKVLTRSVENRGKSDNVPSEEVVKKMKNYWRQLLN

AKLITQRKFDNLTKAERGGLSEADKAGFIKRQLVETRQITKHVARILDSRMNTKRDK

NDKPIREVKVITLKSKLVSDFRKDFQLYKVRDINNYHHAHDAYLNAVVGTALIKKYP

KLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKRFFYSNIMNFFKTEVKLANGEIRK

RPLIETNGETGEVVWNKEKDFATVRKVLAMPQVNIVKKTEVQTGGFSKESILSKRES

AKLIPRKKGWDTRKYGGFGSPTVAYSILVVAKVEKGKAKKLKSVKVLVGITIMEKG

SYEKDPIGFLEAKGYKDIKKELIFKLPKYSLFELENGRRRMLASATELQKANELVLPQ

HLVRLLYYTQNISATTGSNNLGYIEQHREEFKEIFEKIIDFSEKYILKNKVNSNLKSSFD

EQFAVSDSILLSNSFVSLLKYTSFGASGGFTFLDLDVKQGRLRYQTVTEVLDATLIYQ

SITGLYETRTDLSQLGGD (SEQ ID NO: 15)

DarT_G49D_R193A_M86L_R92A:

MAYDYSASLNPQKALIWRIVHRDNIPWILDNGLHCGNSLVQAENWINIDN

PELIGKRAGHPVPVGTGGTLHDYVPFYFTPFSPMLLNIHSGAGGIKRRPNEEIVILVSN

LRNVAAHDVPFVFTDSHAYYNWTNYYTSLNSLDQIDWPILQARDFRRDPDDPAKFE

RYQAEALIWQHCPISLLDGIICYSEEVALQLEQWLFQRNLTMSVHTRSGWYFS (SEQ

ID NO: 16)

DarT_G49D_R193A_M86L_R92A-ScdCas9

MAYDYSASLNPQKALIWRIVHRDNIPWILDNGLHCGNSLVQAENWINIDN

PELIGKRAGHPVPVGTGGTLHDYVPFYFTPFSPMLLNIHSGAGGIKRRPNEEIVILVSN

LRNVAAHDVPFVFTDSHAYYNWTNYYTSLNSLDQIDWPILQARDFRRDPDDPAKFE

RYQAEALIWQHCPISLLDGIICYSEEVALQLEQWLFQRNLTMSVHTRSGWYFSSGGSS

GGSSGSETPGTSESATPESSGGSSGGSEKKYSIGLAIGTNSVGWAVITDDYKVPSKKF

KVLGNTNRKSIKKNLMGALLFDSGETAEATRLKRTARRRYTRRKNRIRYLQEIFANE

MAKLDDSFFQRLEESFLVEEDKKNERHPIFGNLADEVAYHRNYPTIYHLRKKLADSP

EKADLRLIYLALAHIIKFRGHFLIEGKLNAENSDVAKLFYQLIQTYNQLFEESPLDEIE

VDAKGILSARLSKSKRLEKLIAVFPNEKKNGLFGNIIALALGLTPNFKSNFDLTEDAKL

QLSKDTYDDDLDELLGQIGDQYADLFSAAKNLSDAILLSDILRSNSEVTKAPLSASMV

KRYDEHHQDLALLKTLVRQQFPEKYAEIFKDDTKNGYAGYVGIGIKHRKRTTKLAT

QEEFYKFIKPILEKMDGAEELLAKLNRDDLLRKQRTFDNGSIPHQIHLKELHAILRRQ

EEFYPFLKENREKIEKILTFRIPYYVGPLARGNSRFAWLTRKSEEAITPWNFEEVVDKG

ASAQSFIERMTNFDEQLPNKKVLPKHSLLYEYFTVYNELTKVKYVTERMRKPEFLSG

EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEIIGVEDRFNASLGTYHDLLKII

KDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRHYTG

WGRLSRKMINGIRDKQSGKTILDFLKSDGFSNRNFMQLIHDDSLTFKEEIEKAQVSGQ

GDSLHEQIADLAGSPAIKKGILQTVKIVDELVKVMGHKPENIVIEMARENQTTTKGLQ

QSRERKKRIEEGIKELESQILKENPVENTQLQNEKLYLYYLQNGRDMYVDQELDINR

LSDYDVDAIVPQSFIKDDSIDNKVLTRSVENRGKSDNVPSEEVVKKMKNYWRQLLN

AKLITQRKFDNLTKAERGGLSEADKAGFIKRQLVETRQITKHVARILDSRMNTKRDK

NDKPIREVKVITLKSKLVSDFRKDFQLYKVRDINNYHHAHDAYLNAVVGTALIKKYP

KLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKRFFYSNIMNFFKTEVKLANGEIRK

RPLIETNGETGEVVWNKEKDFATVRKVLAMPQVNIVKKTEVQTGGFSKESILSKRES

AKLIPRKKGWDTRKYGGFGSPTVAYSILVVAKVEKGKAKKLKSVKVLVGITIMEKG

SYEKDPIGFLEAKGYKDIKKELIFKLPKYSLFELENGRRRMLASATELQKANELVLPQ

HLVRLLYYTQNISATTGSNNLGYIEQHREEFKEIFEKIIDFSEKYILKNKVNSNLKSSFD

EQFAVSDSILLSNSFVSLLKYTSFGASGGFTFLDLDVKQGRLRYQTVTEVLDATLIYQ

SITGLYETRTDLSQLGGD (SEQ ID NO: 17)

DarT catalytic domain motif: X₁X₂X₃X₃R (SEQ ID NO: 18), wherein X₁is L, I, V, or A; X₂is I, Q, K, T, or N; and X₃is any amino acid (FIG. 18).

DarT catalytic domain motif: X₁X₁X₁X₁X₂X₃X₄X₅X₆PFYFX₇X₁X₁X₈X₉MX₁₀X₁(SEQ ID NO: 19), wherein X₁is any amino acid; X₂is L, V, or I; X₃is H, G, N, S, or A; X₄is D or E; X₅is Y or F; X₆is V, I, or A; X₇is T, A, G, K, N, or W; X₈is S, T, N, M, or K; and X₉is P, V, M, I, A; X₁₀is L, M or F (FIG. 19).

DarT catalytic domain motif: X₁X₂X₃X₄X₅X₆X₇X₈(SEQ ID NO: 20), wherein X₁is F, Y, W, V, or C; X₂is V, L, I, A, C, or F; X₃is F, Y, or A; X₄is T, S, Y, or F; X₅is D, N, or S; X₆is G, R, S, A, M or Q; X₇is H, N, S, or Q; and X₈is A, G, C, H or K (FIG. 20).

DarT catalytic domain motif: X₁X₂X₃X₄X₅X₆X₇X₈X₉(SEQ ID NO: 21), wherein X₁is and amino acid; X₂is R, K, H, E, F, L, T, or M; X₃is Y, R, K, D, E, or H; X₄is Q, M, E, Y, A, R, or H; X₅is A Q, S, or Y; X₆is E, A, or Q; X₇is F, A, L, E, V, or C; X₈is L, A, E, or M; and X₉is V, I, L, or A (FIG. 21).

Scabin catalytic domain motif: X₁X₁X₁X₁X₂X₁EX₃X₄X₅X₆GGX₇(SEQ ID NO: 22), wherein X₁is and amino acid; X₂is Q, E, or R; X₃is V or I; X₄is A, L, V, S, or T; X₅is F, I, V, or L; X₆is P, A, or I; and X₇is I, V, or L (FIG. 22). DarT catalytic motif of SEQ ID NO: 21 and Scabin catalytic motif of SEQ ID NO: 22 are structural and functional analogs, with the conserved glutamate (E) being the catalytic residue.

Scabin catalytic domain motif: X₁X₂X₃X₄X₅X₆X₇(SEQ ID NO: 23), wherein X₁is S, T, or G; X₂is any amino acid; X₃is F, Y, or L; X₄is V, I, A, or L; X₅is S, G, or A; X₆is T or A; and X₇is T, S, or A (FIG. 23).

Scabin catalytic domain motif: X₁X₂X₃X₂X₄X₂X₅(SEQ ID NO: 24), wherein X₁is L or V; X₂is any amino acid; X₃is R, H, or K; X₄is D, S, or A; and X₅is R or D (FIG. 24).

Mom catalytic domain motif: X₁HYX₂X₃(SEQ ID NO: 25), wherein X₁is any amino acid; X₂is S or L; and X₃is H, G, K, R, N, D, or A (FIG. 25).

Mom catalytic domain motif: EX₁X₂X₃X₄X₅X₆X₇X₈X₇X₉X₁₀X₁₁X₁₂X₁₃EX₁₄(SEQ ID NO: 26), wherein X₁is L, I, or F; X₂is N, G, S, or T; X₃is R or K; X₄is M, L, or A; X₅is W, A, C, V, F, or Y; X₆is L, I, F, M, V, C, or T; X₇is any amino acid; X₈is D or E; X₉is L A M, C, V, Q, or T; X₁₀is P, G, A, or L; X₁₁is R, K, H, T, or M; X₁₂is N or F; X₁₃is S, A, T, or G; and X₁₄is S or T (FIG. 26).

Mom catalytic domain motif: X₁X₂DX₃X₄X₄X₅X₄X₄GX₆X₇YX₈AX₉X₁₀X (SEQ ID NO: 27), wherein X₁is F, W, Y, or M; X₂is A or S; X₃is E, G, P, A, or T; X₄is any amino acid; X₅is G, C, or Q; X₆is T, V, Y, or I; X₇is V or I; X₈is Q, K, or R; X₉is A, S, C, T, or N; X₁₀is N, G, or A; X₁₁is F, W, or Y (FIG. 27).

It is understood that the foregoing detailed description and accompanying examples are merely illustrative and are not to be taken as limitations upon the scope of the disclosure, which is defined solely by the appended claims and their equivalents.

All publications and patents mentioned in the above specification are herein incorporated by reference as if expressly set forth herein. Various changes and modifications to the disclosed embodiments will be apparent to those skilled in the art and may be made without departing from the spirit and scope thereof.

SITE-SPECIFIC GENOME MODIFICATION TECHNOLOGY

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS REFERENCE TO RELATED APPLICATION

GOVERNMENT FUNDING

PCT Information

Provisional Applications (1)