METHODS AND COMPOSITIONS FOR PRIME EDITING RNA

REFERENCE TO AN ELECTRONIC SEQUENCE LISTING

The contents of the electronic sequence listing (B119570087US01-SUBSEQ-TNG.txt; Size: 305,640 bytes; and Date of Creation: Jul. 29, 2024) is herein incorporated by reference in its entirety.

BACKGROUND OF THE INVENTION

A variety of nucleic acid-editing technologies have been developed to carry out RNA editing as a means to correct disease-relevant mutations. For example, RNA interference-based therapies (RNAi) uses synthetic, small interfering RNAs (siRNAs) to achieve the targeted knockdown of specific RNA targets.^{1, 2}However, this approach only enables knockdown of the targeted gene, and cannot install therapeutic mutations, severely limiting its applicability in the treatment of genetic diseases. In another example, trans-splicing ribozymes enable the removal of diseased exons and their replacement with non-diseased versions.³However, these enzymes are inefficient and must be targeted to a specific site on the RNA that may or may not be occluded. In addition, trans-splicing ribozymes can result in non-specific editing of a target site. These enzymes are can result in significant off-target effects owing to a small guide sequence. Trans-splicing ribozymes also are not catalytic, meaning that: (i) large amounts of ribozyme are necessary to enable editing; and (ii) highly-transcribed RNA targets are unlikely to be effectively edited by the ribozyme. RNA editing has also been described in the context of base editing which converts one base to another in a target RNA (e.g., see Cox et al., “RNA editing with CRISPR-Cas13,” Science Nov. 24, 2017, Vol. 258(6366), pp. 1019-1027.

Despite these developments of approaches to edit RNA molecules, technologies which are more flexible and which can introduce a wider range of edits directly in RNA are desired in the art. The present disclosure provides a novel approach for editing RNA.

SUMMARY OF THE INVENTION

The present disclosure provides a novel approach to editing RNA molecules. In certain aspects, the disclosure provides RNA-editing fusion proteins that combine (a) a programmable RNA-binding protein (napRNAbp), such as Cas13, and (b) an RNA-dependent RNA polymerase (RDRP). In still other aspects, the disclosure provides complexes comprising (a) napRNAbp-RDRP fusion proteins, and (b) an RNA prime editing guide RNA (“RpegRNA”) that comprise an extension arm containing a desired edit template to be integrated into a target RNA molecule. The RpegRNA associates with the napRNAbp:RDRP fusion protein (through its interaction with the napRNAbp component) and directs the enzyme to bind to an RNA molecule having complementarity with the RpegRNA. The RpegRNA comprises an extension arm on the 3′ end of the RpegRNA that comprises a prime sequence that binds to the 3′ end of a target RNA to create an RNA/RNA hybrid that provides the substrate for RDRP to polymerize a new RNA sequence at the 3′ of the RNA molecule, templated by the extension arm of the RpegRNA.

The present invention relates in part to the discovery that the mechanism of target-primed reverse transcription (TPRT) or “prime editing” can be leveraged or adapted for conducting precision CRISPR/Cas-based nucleic acid editing of RNA with high efficiency and genetic flexibility, as depicted in various embodiments of FIGS. 1-4.

As shown herein, the inventors have used Cas protein:RNA-dependent RNA Polymerase (RDRP) fusion protein to target a specific RNA sequence with a specialized guide RNA, i.e., a RpegRNA.

Accordingly, in aspects, the disclosure relates to a fusion protein comprising a nucleic acid-programmable RNA binding protein (napRNAbp) and an RNA-dependent RNA polymerase (RDRP). In some embodiments, the fusion protein when complexed to a RNA prime editing guide RNA (rpegRNA) is capable of appending a single-strand RNA sequence to a target RNA.

In some embodiments, the single-stand RNA sequence is appended to the 3′ terminus of the target RNA or to a 3′ terminus which is formed upon cleavage of the target RNA by the fusion protein at a cut site. In some embodiments, the single-strand RNA sequence is polymerized by the RDRP using the rpegRNA as a template.

In some embodiments, the napRNAbp is a Cas13 protein. In some embodiments, the Cas13 protein is a Cas13a, Cas13b, or Cas13d protein. In some embodiments, the Cas13 protein is nuclease inactive. In some embodiments, the Cas13 protein has an amino acid sequence of SEQ ID NO: 1, or an amino acid sequence having at least 70% sequence identity to SEQ ID NO: 1.

In some embodiments, the RDRP is capable of polymerizing a single-strand RNA sequence using rpegRNA as a template.

In some embodiments, the RDRP comprises an amino acid sequence selected from the group consisting of: SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, and SEQ ID NO: 8. In some embodiments, the RDRP comprises an amino acid sequence with at least 70% sequence identity to a sequence selected from the group consisting of: SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, and SEQ ID NO: 8.

In some embodiments, the fusion protein has one of the following structures: N-[RNA-dependent RNA polymerase]-[nucleic acid-programmable RNA binding protein]-C; or N-[nucleic acid-programmable RNA binding protein]-[RNA-dependent RNA polymerase]-C, wherein “]-[” represents a linker sequence.

In some embodiments, the linker sequence has an amino acid sequence selected from the group consisting of SEQ ID NOs: 14-23.

In an aspect, the disclosure relates to an RNA prime editor complex for appending a single-strand RNA sequence to a target RNA comprising any of the fusion proteins disclosed herein and a rpegRNA. In some embodiments, the rpegRNA is capable of programming the fusion protein to bind to the target RNA. In some embodiments, the rpegRNA comprises the following structure: 5′-[spacer sequence]-[scaffold sequence]-[template sequence]-3′, wherein the spacer sequence anneals to the target RNA at a complementary protospacer sequence, the scaffold sequence binds the rpegRNA to the nucleic acid-programmable RNA binding protein of the fusion protein, and the template sequence provides an RNA template for synthesis of the single-strand RNA sequence by the RNA-dependent RNA polymerase of the fusion protein. In some embodiments, napRNAbp of the fusion protein comprises a nuclease activity which cleaves the target RNA at a cut site upon binding of the complex thereto. In some embodiments, the napRNAbp of the fusion protein is catalytically inactive.

In an aspect, the disclosure relates to an RNA prime editor complex for appending a single-strand RNA sequence to a target RNA comprising: (i) a first fusion protein comprising a catalytically inactive nucleic acid-programmable RNA binding protein and a RNA-dependent RNA polymerase; (ii) a second fusion protein comprising catalytically active nucleic acid-programmable RNA binding protein that is capable of cleaving the target RNA to generate a free 3′ terminus; (iii) an rpegRNA that directs the first fusion protein to a first locus in the target RNA; (iv) a guide RNA that directs the second fusion protein to a second locus in the target RNA. In some embodiments, the second fusion protein cleaves the target RNA at the second locus to produce a 3′ terminus, and wherein the first fusion protein appends a single-strand RNA sequence to a target RNA using the rpegRNA as a template.

In an aspect, the disclosure relates to a method for appending a desired single-strand RNA sequence to the 3′ end of a target RNA, the method comprising contacting the target RNA with an RNA prime editor complex, said complex comprising a rpegRNA and a fusion protein that comprises an RNA-dependent RNA polymerase and a nucleic acid-programmable RNA binding protein.

In some embodiments, the rpegRNA comprises a spacer sequence, a scaffold sequence, and a template sequence.

In some embodiments, the spacer sequence directs the fusion protein to bind at the complementary protospacer in the target RNA.

In some embodiments, the scaffold sequence binds to the nucleic acid-programmable RNA binding protein of the fusion protein.

In some embodiments, the template sequence is used by the RNA-dependent RNA polymerase in the synthesis of the desired single-strand RNA.

In some embodiments, napRNAbp comprises a nuclease activity which cleaves the target RNA to generate an available 3′ terminus.

In some embodiments, the nucleic acid-programmable RNA binding protein comprises an inactive nuclease activity.

In some embodiments, the method is used for appending the desired RNA sequence to an internal 3′ terminus of the target RNA. In some embodiments, the method is used for appending the desired RNA sequence to the endogenous 3′ terminus of the target RNA.

In some embodiments, the method further comprises contacting the target RNA with a second fusion protein comprising a nucleic acid-programmable RNA binding protein with a nuclease activity and a second guide RNA for introducing a e 3′ terminus at a second RNA locus in the target RNA.

BRIEF DESCRIPTION OF THE DRAWINGS

The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present disclosure, which can be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.

FIG. 1 shows an illustration of Cas13 fused to an RNA-dependent RNA polymerase (RDRP) (Cas13:RDRP) enabling RNA Prime Editing (RPE) at the 3′ terminus of an RNA substrate. A rpegRNA enables recruitment of the RDRP to the 3′ end of the RNA and subsequent programmed installation of new sequence at the 3′ end (red).

FIG. 2 shows an illustration of wild-type Cas13:RDRP fusion targeting an internal site within an RNA substrate to enable RPE.

FIG. 3 shows an illustration of a tandem dCas13:RDRP wtCas13 strategy for affecting RPE at an internal site within an RNA substrate.

FIG. 4 shows an illustration of Cas13:MS2 fusion protein recruiting a trans-splicing ribozyme to an messenger RNA (mRNA) transcript to affect RNA editing.

DEFINITIONS

Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings ascribed to them unless specified otherwise.

Antisense Strand

In genetics, the “antisense” strand of a segment within double-stranded DNA is the template strand, and which is considered to run in the 3′ to 5′ orientation. By contrast, the “sense” strand is the segment within double-stranded DNA that runs from 5′ to 3′, and which is complementary to the antisense strand of DNA, or template strand, which runs from 3′ to 5′. In the case of a DNA segment that encodes a protein, the sense strand is the strand of DNA that has the same sequence as the mRNA, which takes the antisense strand as its template during transcription, and eventually undergoes (typically, not always) translation into a protein. The antisense strand is thus responsible for the RNA that is later translated to protein, while the sense strand possesses a nearly identical makeup to that of the mRNA. Note that for each segment of dsDNA, there will possibly be two sets of sense and antisense, depending on which direction one reads (since sense and antisense is relative to perspective). It is ultimately the gene product, or mRNA, that dictates which strand of one segment of dsDNA is referred to as sense or antisense.

Aptamer

An “aptamer” refers to an oligonucleotide or peptide molecule that binds to a specific target molecule. Aptamers include DNA or RNA aptamers that are short single-stranded DNA- or RNA-based oligonucleotides that can selectively bind to small molecular ligands or protein targets with high affinity and specificity, when folded into their unique three-dimensional structures. On the molecular level, aptamers bind to its cognate target through various non-covalent interactions, electrostatic interactions, hydrophobic interactions, and induced fitting.

Further reference can be made to Ku et al., “Nucleic Acid Aptamers: An Emerging Tool for Biotechnology and Biomedical Sensing,” Sensors, 2015, 15(7): 16281-16313. The present disclosure contemplates the use of any aptamer, including those obtained from commercial sources. For example, numerous aptamers may be obtained from APTAGEN (www.aptagen.com) and include, but are not limited to, thrombin (15mer), HIV-1 TAR RNA hairpin loop (B22-19), human immunoglobulin G (IgG) (Apt 8), reactive green 19 (GR-30), abrin toxin (TA6), malachite green (MG-4), PSMA aptamer (A10-3), tenascin-C(GBI-10), and methylenedianiline (M1). Another example is prequeosine₁-1 riboswitch aptamer-one of the smallest natural tertiary RNA structures (also known as evopreQ₁-1).

Cas9

The term “Cas9” or “Cas9 nuclease” refers to an RNA-guided nuclease comprising a Cas9 domain, or a fragment thereof (e.g., a protein comprising an active or inactive DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9). A “Cas9 domain” as used herein, is a protein fragment comprising an active or inactive cleavage domain of Cas9 and/or the gRNA binding domain of Cas9. A “Cas9 protein” is a full length Cas9 protein. A Cas9 nuclease is also referred to sometimes as a casn1 nuclease or a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat)-associated nuclease. CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements, and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 domain. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of which are hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J. J., McShan W. M., Ajdic D. J., Savic D. J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A. N., Kenton S., Lai H. S., Lin S. P., Qian Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S. W., Roe B. A., McLaughlin R. E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C. M., Gonzales K., Chao Y., Pirzada Z. A., Eckert M. R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease comprises one or more mutations that partially impair or inactivate the DNA cleavage domain.

A nuclease-inactivated Cas9 domain may interchangeably be referred to as a “dCas9” protein (for nuclease-“dead” Cas9). Methods for generating a Cas9 domain (or a fragment thereof) having an inactive DNA cleavage domain are known (see, e.g., Jinek et al., Science. 337:816-821(2012); Qi et al., “Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression” (2013) Cell. 28; 152(5):1173-83, the entire contents of each of which are incorporated herein by reference). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, the HNH nuclease subdomain and the RuvC1 subdomain. The HNH subdomain cleaves the strand complementary to the gRNA, whereas the RuvC1 subdomain cleaves the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas9. For example, the mutations D10A and H840A completely inactivate the nuclease activity of S. pyogenes Cas9 (Jinek et al., Science. 337:816-821(2012); Qi et al., Cell. 28; 152(5):1173-83 (2013)). In some embodiments, proteins comprising fragments of Cas9 are provided. For example, in some embodiments, a protein comprises one of two Cas9 domains: (1) the gRNA binding domain of Cas9; or (2) the DNA cleavage domain of Cas9. In some embodiments, proteins comprising Cas9 or fragments thereof are referred to as “Cas9 variants.” A Cas9 variant shares homology to Cas9, or a fragment thereof. For example, a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, at least about 99.8% identical, or at least about 99.9% identical to wild type Cas9. In some embodiments, the Cas9 variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes compared to wild type Cas9. In some embodiments, the Cas9 variant comprises a fragment of SEQ ID NO: 18 Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9. In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9.

Cas13

The term “Cas13” or “Cas13 domain” embraces any naturally occurring Cas13 from any organism, any naturally-occurring Cas13 equivalent or functional fragment thereof, any Cas13 homolog, ortholog, or paralog from any organism, and any mutant or variant of a Cas13, naturally-occurring or engineered. The term Cas13 is not meant to be particularly limiting and may be referred to as a “Cas13 or equivalent.” Exemplary Cas13 proteins are further described herein and/or are described in the art and are incorporated herein by reference. The present disclosure is unlimited with regard to the particular napRNAbp that is employed in the RNA prime editors of the disclosure.

Complementarity

As used herein, the term “complementarity” refers to the ability of a nucleic acid to form hydrogen bond(s) with another nucleic acid sequence by either traditional Watson-Crick or other non-traditional types. A percent complementarity indicates the percentage of residues in a nucleic acid molecule which can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 being 50%, 60%, 70%, 80%, 90%, and 100% complementary). “Perfectly complementary” means that all the contiguous residues of a nucleic acid sequence will hydrogen bond with the same number of contiguous residues in a second nucleic acid sequence. “Substantially complementary” as used herein refers to a degree of complementarity that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%. 97%, 98%, 99%, or 100% over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more nucleotides, or refers to two nucleic acids that hybridize under stringent conditions.

CRISPR

CRISPR is a family of DNA sequences (i.e., CRISPR clusters) in bacteria and archaea that represent snippets of prior infections by a virus that have invaded the prokaryote. The snippets of DNA are used by the prokaryotic cell to detect and destroy DNA from subsequent attacks by similar viruses and effectively compose, along with an array of CRISPR-associated proteins (including Cas9 and homologs thereof) and CRISPR-associated RNA, a prokaryotic immune defense system. In nature, CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In certain types of CRISPR systems (e.g., type II CRISPR systems), correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (mc) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the RNA. Specifically, the target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species—the guide RNA. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of which is hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. CRISPR biology, as well as Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J. J., McShan W. M., Ajdic D. J., Savic D. J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A. N., Kenton S., Lai H. S., Lin S. P., Qian Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S. W., Roe B. A., McLaughlin R. E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C. M., Gonzales K., Chao Y., Pirzada Z. A., Eckert M. R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference.

In certain types of CRISPR systems (e.g., type II CRISPR systems), correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc), and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular nucleic acid target complementary to the RNA. Specifically, the target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gRNA”) can be engineered so as to incorporate embodiments of both the crRNA and tracrRNA into a single RNA species—the guide RNA.

In general, a “CRISPR system” refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), or other sequences and transcripts from a CRISPR locus. The tracrRNA of the system is complementary (fully or partially) to the tracr mate sequence present on the guide RNA.

RNA Synthesis Template

As used herein, the term “RNA synthesis template” refers to the region or portion of the extension arm of a rpegRNA that is utilized as a template strand by a polymerase of a RNA prime editor to encode a 3′ single-strand DNA flap that contains the desired edit and which then, through the mechanism of prime editing, replaces the corresponding endogenous strand of DNA at the target site. In various embodiments, the DNA synthesis template is shown in FIG. 3A (in the context of a pegRNA comprising a 5′ extension arm), FIG. 3B (in the context of a pegRNA comprising a 3′ extension arm), FIG. 3C (in the context of an internal extension arm), FIG. 3D (in the context of a 3′ extension arm), and FIG. 3E (in the context of a 5′ extension arm). The extension arm, including the DNA synthesis template, may be comprised of DNA or RNA. In the case of RNA, the polymerase of the prime editor can be an RNA-dependent DNA polymerase (e.g., a reverse transcriptase). In the case of DNA, the polymerase of the prime editor can be a DNA-dependent DNA polymerase. In various embodiments (e.g., as depicted in FIGS. 3D-3E), the DNA synthesis template (4) may comprise the “edit template” and the “homology arm”, and all or a portion of the optional 5′ end modifier region, e2. That is, depending on the nature of the e2 region (e.g., whether it includes a hairpin, toeloop, or stem/loop secondary structure), the polymerase may encode none, some, or all of the e2 region, as well. Said another way, in the case of a 3′ extension arm, the DNA synthesis template (3) can include the portion of the extension arm (3) that spans from the 5′ end of the primer binding site (PBS) to 3′ end of the gRNA core that may operate as a template for the synthesis of a single-strand of DNA by a polymerase (e.g., a reverse transcriptase). In the case of a 5′ extension arm, the DNA synthesis template (3) can include the portion of the extension arm (3) that spans from the 5′ end of the pegRNA molecule to the 3′ end of the edit template. Preferably, the DNA synthesis template excludes the primer binding site (PBS) of pegRNAs either having a 3′ extension arm or a 5′ extension arm. Certain embodiments described here (e.g, FIG. 71A) refer to an “an RT template,” which is inclusive of the edit template and the homology arm, i.e., the sequence of the pegRNA extension arm which is actually used as a template during DNA synthesis. The term “RT template” is equivalent to the term “DNA synthesis template.”

In the case of trans prime editing (e.g., FIG. 3G and FIG. 3H), the primer binding site (PBS) and the DNA synthesis template can be engineered into a separate molecule referred to as a trans prime editor RNA template (tPERT).

Downstream

As used herein, the terms “upstream” and “downstream” are terms of relativity that define the linear position of at least two elements located in a nucleic acid molecule (whether single or double-stranded) that is orientated in a 5′-to-3′ direction. In particular, a first element is upstream of a second element in a nucleic acid molecule where the first element is positioned somewhere that is 5′ to the second element. For example, a SNP is upstream of a Cas9-induced nick site if the SNP is on the 5′ side of the nick site. Conversely, a first element is downstream of a second element in a nucleic acid molecule where the first element is positioned somewhere that is 3′ to the second element. For example, a SNP is downstream of a Cas9-induced nick site if the SNP is on the 3′ side of the nick site. The nucleic acid molecule can be a DNA (double or single stranded). RNA (double or single stranded), or a hybrid of DNA and RNA. The analysis is the same for single strand nucleic acid molecule and a double strand molecule since the terms upstream and downstream are in reference to only a single strand of a nucleic acid molecule, except that one needs to select which strand of the double stranded molecule is being considered. Often, the strand of a double stranded DNA which can be used to determine the positional relativity of at least two elements is the “sense” or “coding” strand. In genetics, a “sense” strand is the segment within double-stranded DNA that runs from 5′ to 3′, and which is complementary to the antisense strand of DNA, or template strand, which runs from 3′ to 5′. Thus, as an example, a SNP nucleobase is “downstream” of a promoter sequence in a genomic DNA (which is double-stranded) if the SNP nucleobase is on the 3′ side of the promoter on the sense or coding strand.

Edit Template

The term “edit template” refers to a portion of the extension arm that encodes the desired edit in the single strand 3′ DNA flap that is synthesized by the polymerase, e.g., a DNA-dependent DNA polymerase, RNA-dependent DNA polymerase (e.g., a reverse transcriptase). Certain embodiments described here (e.g., FIG. 71A) refer to “an RT template,” which refers to both the edit template and the homology arm together, i.e., the sequence of the pegRNA extension arm which is actually used as a template during DNA synthesis. The term “RT edit template” is also equivalent to the term “DNA synthesis template,” but wherein the RT edit template reflects the use of a prime editor having a polymerase that is a reverse transcriptase, and wherein the DNA synthesis template reflects more broadly the use of a prime editor having any polymerase.

Effective Amount

The term “effective amount,” as used herein, refers to an amount of a biologically active agent that is sufficient to elicit a desired biological response. For example, in some embodiments, an effective amount of a prime editor (PE) may refer to the amount of the editor that is sufficient to edit a target site nucleotide sequence, e.g., a genome. In some embodiments, an effective amount of a prime editor (PE) provided herein, e.g., of a fusion protein comprising a nickase Cas9 domain and a reverse transcriptase may refer to the amount of the fusion protein that is sufficient to induce editing of a target site specifically bound and edited by the fusion protein. As will be appreciated by the skilled artisan, the effective amount of an agent, e.g., a fusion protein, a nuclease, a hybrid protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide, may vary depending on various factors as, for example, on the desired biological response, e.g., on the specific allele, genome, or target site to be edited, on the cell or tissue being targeted, and on the agent being used.

Error-Prone Reverse Transcriptase

As used herein, the term “error-prone” reverse transcriptase (or more broadly, any polymerase) refers to a reverse transcriptase (or more broadly, any polymerase) that occurs naturally or which has been derived from another reverse transcriptase (e.g., a wild type M-MLV reverse transcriptase) which has an error rate that is less than the error rate of wild type M-MLV reverse transcriptase. The error rate of wild type M-MLV reverse transcriptase is reported to be in the range of one error in 15,000 (higher) to 27,000 (lower). An error rate of 1 in 15,000 corresponds with an error rate of 6.7×10⁻⁵. An error rate of 1 in 27,000 corresponds with an error rate of 3.7×10⁻⁵. See Boutabout et al. (2001) “DNA synthesis fidelity by the reverse transcriptase of the yeast retrotransposon Ty1,” Nucleic Acids Res 29(11):2217-2222, which is incorporated herein by reference. Thus, for purposes of this application, the term “error prone” refers to those RT that have an error rate that is greater than one error in 15,000 nucleobase incorporation (6.7×10⁻⁵or higher), e.g., 1 error in 14,000 nucleobases (7.14×10⁻⁵or higher), 1 error in 13,000 nucleobases or fewer (7.7×10⁻⁵or higher), 1 error in 12,000 nucleobases or fewer (7.7×10⁻⁵or higher), 1 error in 11,000 nucleobases or fewer (9.1×10⁻⁵or higher), 1 error in 10,000 nucleobases or fewer (1×10⁻⁴or 0.0001 or higher), 1 error in 9,000 nucleobases or fewer (0.00011 or higher), 1 error in 8,000 nucleobases or fewer (0.00013 or higher) 1 error in 7,000 nucleobases or fewer (0.00014 or higher), 1 error in 6,000 nucleobases or fewer (0.00016 or higher), 1 error in 5,000 nucleobases or fewer (0.0002 or higher), 1 error in 4,000 nucleobases or fewer (0.00025 or higher), 1 error in 3,000 nucleobases or fewer (0.00033 or higher), 1 error in 2,000 nucleobase or fewer (0.00050 or higher), or 1 error in 1,000 nucleobases or fewer (0.001 or higher), or 1 error in 500 nucleobases or fewer (0.002 or higher), or 1 error in 250 nucleobases or fewer (0.004 or higher).

Extein

The term “extein,” as used herein, refers to an polypeptide sequence that is flanked by an intein and is ligated to another extein during the process of protein splicing to form a mature, spliced protein. Typically, an intein is flanked by two extein sequences that are ligated together when the intein catalyzes its own excision. Exteins, accordingly, are the protein analog to exons found in mRNA. For example, a polypeptide comprising an intein may be of the structure extein(N)-intein-extein(C). After excision of the intein and splicing of the two exteins, the resulting structures are extein(N)-extein(C) and a free intein. In various configurations, the exteins may be separate proteins (e.g., half of a Cas9 or Prime editor), each fused to a split-intein, wherein the excision of the split inteins causes the splicing together of the extein sequences.

Extension Arm

The term “extension arm” refers to a nucleotide sequence component of a pegRNA which provides several functions, including a primer binding site and an edit template for reverse transcriptase. In some embodiments, e.g., FIG. 3D, the extension arm is located at the 3′ end of the guide RNA. In other embodiments, e.g., FIG. 3E, the extension arm is located at the 5′ end of the guide RNA. In some embodiments, the extension arm also includes a homology arm. In various embodiments, the extension arm comprises the following components in a 5′ to 3′ direction: the homology arm, the edit template, and the primer binding site. Since polymerization activity of the reverse transcriptase is in the 5′ to 3′ direction, the preferred arrangement of the homology arm, edit template, and primer binding site is in the 5′ to 3′ direction such that the reverse transcriptase, once primed by an annealed primer sequence, polymerases a single strand of DNA using the edit template as a complementary template strand. Further details, such as the length of the extension arm, are described elsewhere herein.

The extension arm may also be described as comprising generally two regions: a primer binding site (PBS) and a DNA synthesis template, as shown in FIG. 3G (top), for instance. The primer binding site binds to the primer sequence that is formed from the endogenous DNA strand of the target site when it becomes nicked by the prime editor complex, thereby exposing a 3′ end on the endogenous nicked strand. As explained herein, the binding of the primer sequence to the primer binding site on the extension arm of the pegRNA creates a duplex region with an exposed 3′ end (i.e., the 3′ of the primer sequence), which then provides a substrate for a polymerase to begin polymerizing a single strand of DNA from the exposed 3′ end along the length of the DNA synthesis template. The sequence of the single strand DNA product is the complement of the DNA synthesis template. Polymerization continues towards the 5′ of the DNA synthesis template (or extension arm) until polymerization terminates. Thus, the DNA synthesis template represents the portion of the extension arm that is encoded into a single strand DNA product (i.e., the 3′ single strand DNA flap containing the desired genetic edit information) by the polymerase of the prime editor complex and which ultimately replaces the corresponding endogenous DNA strand of the target site that sits immediate downstream of the PE-induced nick site. Without being bound by theory, polymerization of the DNA synthesis template continues towards the 5′ end of the extension arm until a termination event. Polymerization may terminate in a variety of ways, including, but not limited to (a) reaching a 5′ terminus of the pegRNA (e.g., in the case of the 5′ extension arm wherein the DNA polymerase simply runs out of template), (b) reaching an impassable RNA secondary structure (e.g., hairpin or stem/loop), or (c) reaching a replication termination signal, e.g., a specific nucleotide sequence that blocks or inhibits the polymerase, or a nucleic acid topological signal, such as, supercoiled DNA or RNA.

Flap Endonuclease (e.g., FEN1)

As used herein, the term “flap endonuclease” refers to an enzyme that catalyzes the removal of 5′ single strand DNA flaps. These are naturally occurring enzymes that process the removal of 5′ flaps formed during cellular processes, including DNA replication. The prime editing methods herein described may utilize endogenously supplied flap endonucleases or those provided in trans to remove the 5′ flap of endogenous DNA formed at the target site during prime editing. Flap endonucleases are known in the art and can be found described in Patel et al., “Flap endonucleases pass 5′-flaps through a flexible arch using a disorder-thread-order mechanism to confer specificity for free 5′-ends,” Nucleic Acids Research, 2012, 40(10): 4507-4519, Tsutakawa et al., “Human flap endonuclease structures, DNA double-base flipping, and a unified understanding of the FEN1 superfamily,” Cell, 2011, 145(2): 198-211, and Balakrishnan et al., “Flap Endonuclease 1,” Annu Rev Biochem, 2013, Vol 82: 119-138 (each of which are incorporated herein by reference). An exemplary flap endonuclease is FEN1, which can be represented by the following amino acid sequence:

DESCRIPTION
SEQUENCE
SEQ ID NO:

FEN1
MGIQGLAKLIADVAPSAIRENDIKSYFGRKVAIDASMSI
SEQ ID

WILD TYPE
YQFLIAVRQGGDVLQNEEGETTSHLMGMFYRTIRMME
NO: 53

NGIKPVYVFDGKPPQLKSGELAKRSERRAEAEKQLQQ

AQAAGAEQEVEKFTKRLVKVTKQHNDECKHLLSLMGI

PYLDAPSEAEASCAALVKAGKVYAAATEDMDCLTFGS

PVLMRHLTASEAKKLPIQEFHLSRILQELGLNQEQFVDL

CILLGSDYCESIRGIGPKRAVDLIQKHKSIEEIVRRLDPN

KYPVPENWLHKEAHQLFLEPEVLDPESVELKWSEPNE

EELIKFMCGEKQFSEERIRSGVKRLSKSRQGSTQGRLD

DFFKVTGSLSSAKRKEPEPKGSTKKKAKTGAAGKFKR

GK

Functional Equivalent

The term “functional equivalent” refers to a second biomolecule that is equivalent in function, but not necessarily equivalent in structure to a first biomolecule. For example, a “Cas9 equivalent” refers to a protein that has the same or substantially the same functions as Cas9, but not necessarily the same amino acid sequence. In the context of the disclosure, the specification refers throughout to “a protein X, or a functional equivalent thereof.” In this context, a “functional equivalent” of protein X embraces any homolog, paralog, fragment, naturally occurring, engineered, mutated, or synthetic version of protein X which bears an equivalent function.

Fusion Protein

The term “fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an “amino-terminal fusion protein” or a “carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic-acid editing protein. Another example includes a Cas9 or equivalent thereof to a reverse transcriptase. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4^thed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.

Gene of Interest (GOI)

The term “gene of interest” or “GOI” refers to a gene that encodes a biomolecule of interest (e.g., a protein or an RNA molecule). A protein of interest can include any intracellular protein, membrane protein, or extracellular protein, e.g., a nuclear protein, transcription factor, nuclear membrane transporter, intracellular organelle associated protein, a membrane receptor, a catalytic protein, and enzyme, a therapeutic protein, a membrane protein, a membrane transport protein, a signal transduction protein, or an immunological protein (e.g., an IgG or other antibody protein), etc. The gene of interest may also encode an RNA molecule, including, but not limited to, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), small nuclear RNA (snRNA), antisense RNA, guide RNA, microRNA (miRNA), small interfering RNA (siRNA), and cell-free RNA (cfRNA).

Guide RNA (“2RNA”)

As used herein, the term “guide RNA” is a particular type of guide nucleic acid which is mostly commonly associated with a Cas protein of a CRISPR-Cas9 and which associates with Cas9, directing the Cas9 protein to a specific sequence in a DNA molecule that includes complementarity to protospacer sequence of the guide RNA. However, this term also embraces the equivalent guide nucleic acid molecules that associate with Cas9 equivalents, homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and which otherwise program the Cas9 equivalent to localize to a specific target nucleotide sequence. The Cas9 equivalents may include other napDNAbp from any type of CRISPR system (e.g., type II, V, VI), including Cpf1 (a type-V CRISPR-Cas systems), C2c1 (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system) and C2c3 (a type V CRISPR-Cas system). Further Cas-equivalents are described in Makarova et al., “C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299), the contents of which are incorporated herein by reference. Exemplary sequences are and structures of guide RNAs are provided herein. In addition, methods for designing appropriate guide RNA sequences are provided herein. As used herein, the “guide RNA” may also be referred to as a “traditional guide RNA” to contrast it with the modified forms of guide RNA termed “prime editing guide RNAs” (or “pegRNAs”) which have been invented for the prime editing methods and composition disclosed herein.

Guide RNAs or pegRNAs may comprise various structural elements that include, but are not limited to:

Spacer sequence—the sequence in the guide RNA or pegRNA (having about 20 nts in length) which binds to the protospacer in the target DNA.

gRNA core (or gRNA scaffold or backbone sequence)—refers to the sequence within the gRNA that is responsible for Cas9 binding, it does not include the 20 bp spacer/targeting sequence that is used to guide Cas9 to target DNA.

Extension arm—a single strand extension at the 3′ end or the 5′ end of the pegRNA which comprises a primer binding site and a DNA synthesis template sequence that encodes via a polymerase (e.g., a reverse transcriptase) a single stranded DNA flap containing the genetic change of interest, which then integrates into the endogenous DNA by replacing the corresponding endogenous strand, thereby installing the desired genetic change.

Transcription terminator—the guide RNA or pegRNA may comprise a transcriptional termination sequence at the 3′ of the molecule.

G-Quadruplex

The term “G-quadruplex” refers to its ordinary and customary meaning. A G-quadruplex is a complex three-dimensional nucleic acid moiety formed in nucleic acid sequences that are rich in guanine (G). They are helical in shape and formed from interconnected stacks of guanine tetrads (or “G-tetrads”), which individually are flat, ring-shaped structures formed from four guanines, and which can be stabilized by the presence of a cation (e.g., potassium) which sits in a central channel between pairs of G-tetrads. G-quadruplexes are a diverse collection of structures and not a single structure. Further reference to G-quadruplexes can be found in (1) Kwok et al., “G-Quadruplexes: Prediction, Characterization, and Biological Application,” Trends in Biotechnology, 2017, Vol. 35(10; pp. 997-1013; (2) Hansel-Hertsch R. et al., “DNA G-quadruplexes in the human genome: detection, functions and therapeutic potential,” Nat. Rev. Mol. Cell Biol., 2017; 18: 279-284; and (3) Millevoi S. et al., “G-quadruplexes in RNA biology,” Wiley Interdiscip. Rev. RNA., 2012; 3: 495-507, each of which are incorporated herein by reference.

Homology Arm

The term “homology arm” refers to a portion of the extension arm that encodes a portion of the resulting reverse transcriptase-encoded single strand DNA flap that is to be integrated into the target DNA site by replacing the endogenous strand. The portion of the single strand DNA flap encoded by the homology arm is complementary to the non-edited strand of the target DNA sequence, which facilitates the displacement of the endogenous strand and annealing of the single strand DNA flap in its place, thereby installing the edit. This component is further defined elsewhere. The homology arm is part of the DNA synthesis template since it is by definition encoded by the polymerase of the prime editors described herein.

Host Cell

The term “host cell,” as used herein, refers to a cell that can host, replicate, and express a vector described herein, e.g., a vector comprising a nucleic acid molecule encoding a fusion protein comprising a Cas9 or Cas9 equivalent and a reverse transcriptase.

Inteins

As used herein, the term “intein” refers to auto-processing polypeptide domains found in organisms from all domains of life. An intein (intervening protein) carries out a unique auto-processing event known as protein splicing in which it excises itself out from a larger precursor polypeptide through the cleavage of two peptide bonds and, in the process, ligates the flanking extein (external protein) sequences through the formation of a new peptide bond. This rearrangement occurs post-translationally (or possibly co-translationally), as intein genes are found embedded in frame within other protein-coding genes. Furthermore, intein-mediated protein splicing is spontaneous; it requires no external factor or energy source, only the folding of the intein domain. This process is also known as cis-protein splicing, as opposed to the natural process of trans-protein splicing with “split inteins.” Inteins are the protein equivalent of the self-splicing RNA introns (see Perler et al., Nucleic Acids Res. 22:1125-1127 (1994)), which catalyze their own excision from a precursor protein with the concomitant fusion of the flanking protein sequences, known as exteins (reviewed in Perler et al., Curr. Opin. Chem. Biol. 1:292-299 (1997); Perler, F. B. Cell 92(1):1-4 (1998); Xu et al., EMBO J. 15(19):5146-5153 (1996)).

As used herein, the term “protein splicing” refers to a process in which an interior region of a precursor protein (an intein) is excised and the flanking regions of the protein (exteins) are ligated to form the mature protein. This natural process has been observed in numerous proteins from both prokaryotes and eukaryotes (Perler, F. B., Xu, M. Q., Paulus, H. Current Opinion in Chemical Biology 1997, 1, 292-299; Perler, F. B. Nucleic Acids Research 1999, 27, 346-347). The intein unit contains the necessary components needed to catalyze protein splicing and often contains an endonuclease domain that participates in intein mobility (Perler, F. B., Davis, E. O., Dean, G. E., Gimble, F. S., Jack, W. E., Neff, N., Noren, C. J., Thomer, J., Belfort, M. Nucleic Acids Research 1994, 22, 1127-1127). The resulting proteins are linked, however, not expressed as separate proteins. Protein splicing may also be conducted in trans with split inteins expressed on separate polypeptides spontaneously combine to form a single intein which then undergoes the protein splicing process to join to separate proteins.

The elucidation of the mechanism of protein splicing has led to a number of intein-based applications (Comb, et al., U.S. Pat. No. 5,496,714; Comb, et al., U.S. Pat. No. 5,834,247; Camarero and Muir, J. Amer. Chem. Soc., 121:5597-5598 (1999); Chong, et al., Gene, 192:271-281 (1997), Chong, et al., Nucleic Acids Res., 26:5109-5115 (1998); Chong, et al., J. Biol. Chem., 273:10567-10577 (1998); Cotton, et al. J. Am. Chem. Soc., 121:1100-1101 (1999); Evans, et al., J. Biol. Chem., 274:18359-18363 (1999); Evans, et al., J. Biol. Chem., 274:3923-3926 (1999); Evans, et al., Protein Sci., 7:2256-2264 (1998); Evans, et al., J. Biol. Chem., 275:9091-9094 (2000); Iwai and Pluckthun, FEBS Lett. 459:166-172 (1999); Mathys, et al., Gene, 231:1-13 (1999); Mills, et al., Proc. Natl. Acad. Sci. USA 95:3543-3548 (1998); Muir, et al., Proc. Natl. Acad. Sci. USA 95:6705-6710 (1998); Otomo, et al., Biochemistry 38:16040-16044 (1999); Otomo, et al., J. Biolmol. NMR 14:105-114 (1999); Scott, et al., Proc. Natl. Acad. Sci. USA 96:13638-13643 (1999); Severinov and Muir, J. Biol. Chem., 273:16205-16209 (1998); Shingledecker, et al., Gene, 207:187-195 (1998); Southworth, et al., EMBO J. 17:918-926 (1998); Southworth, et al., Biotechniques, 27:110-120 (1999); Wood, et al., Nat. Biotechnol., 17:889-892 (1999); Wu, et al., Proc. Natl. Acad. Sci. USA 95:9226-9231 (1998a); Wu, et al., Biochim Biophys Acta 1387:422-432 (1998b); Xu, et al., Proc. Natl. Acad. Sci. USA 96:388-393 (1999); Yamazaki, et al., J. Am. Chem. Soc., 120:5591-5592 (1998)). Each reference is incorporated herein by reference.

Ligand-Dependent Intein

The term “ligand-dependent intein,” as used herein refers to an intein that comprises a ligand-binding domain. Typically, the ligand-binding domain is inserted into the amino acid sequence of the intein, resulting in a structure intein (N)-ligand-binding domain-intein (C). Typically, ligand-dependent inteins exhibit no or only minimal protein splicing activity in the absence of an appropriate ligand, and a marked increase of protein splicing activity in the presence of the ligand. In some embodiments, the ligand-dependent intein does not exhibit observable splicing activity in the absence of ligand but does exhibit splicing activity in the presence of the ligand. In some embodiments, the ligand-dependent intein exhibits an observable protein splicing activity in the absence of the ligand, and a protein splicing activity in the presence of an appropriate ligand that is at least 5 times, at least 10 times, at least 50 times, at least 100 times, at least 150 times, at least 200 times, at least 250 times, at least 500 times, at least 1000 times, at least 1500 times, at least 2000 times, at least 2500 times, at least 5000 times, at least 10000 times, at least 20000 times, at least 25000 times, at least 50000 times, at least 100000 times, at least 500000 times, or at least 1000000 times greater than the activity observed in the absence of the ligand. In some embodiments, the increase in activity is dose dependent over at least 1 order of magnitude, at least 2 orders of magnitude, at least 3 orders of magnitude, at least 4 orders of magnitude, or at least 5 orders of magnitude, allowing for fine-tuning of intein activity by adjusting the concentration of the ligand. Suitable ligand-dependent inteins are known in the art, and in include those provided below and those described in published U.S. Patent Application U.S. 2014/0065711 A1; Mootz et al., “Protein splicing triggered by a small molecule.” J. Am. Chem. Soc. 2002; 124, 9044-9045; Mootz et al., “Conditional protein splicing: a new tool to control protein structure and function in vitro and in vivo.” J. Am. Chem. Soc. 2003; 125, 10561-10569; Buskirk et al., Proc. Natl. Acad. Sci. USA. 2004; 101, 10505-10510); Skretas & Wood, “Regulation of protein activity with small-molecule-controlled inteins.” Protein Sci. 2005; 14, 523-532; Schwartz, et al., “Post-translational enzyme activation in an animal via optimized conditional protein splicing.” Nat. Chem. Biol. 2007; 3, 50-54; Peck et al., Chem. Biol. 2011; 18 (5), 619-630; the entire contents of each are hereby incorporated by reference. Exemplary sequences are as follows:

NAME
SEQUENCE OF LIGAND-DEPENDENT INTEIN

2-4 INTEIN:
CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAAAKDGTLLARPVV

SWFDQGTRDVIGLRIAGGAIVWATPDHKVLTEYGWRAAGELRKGD

RVAGPGGSGNSLALSLTADQMVSALLDAEPPILYSEYDPTSPFSEAS

MMGLLTNLADRELVHMINWAKRVPGFVDLTLHDQAHLLECAWLEI

LMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLAT

SSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRA

LDKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGMEH

LYSMKYKNVVPLYDLLLEMLDAHRLHAGGSGASRVQAFADALDD

KFLHDMLAEELRYSVIREVLPTRRARTFDLEVEELHTLVAEGVVVH

NC (SEQ ID NO: 42)

3-2 INTEIN
CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAVAKDGTLLARPVVS

WFDQGTRDVIGLRIAGGAIVWATPDHKVLTEYGWRAAGELRKGDR

VAGPGGSGNSLALSLTADQMVSALLDAEPPILYSEYDPTSPFSEASM

MGLLTNLADRELVHMINWAKRVPGFVDLTLHDQAHLLECAWLEIL

MIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLATS

SRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRAL

DKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHL

YSMKYTNVVPLYDLLLEMLDAHRLHAGGSGASRVQAFADALDDK

FLHDMLAEELRYSVIREVLPTRRARTFDLEVEELHTLVAEGVVVHN

C (SEQ ID NO: 43)

30R3-1 INTEIN
CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAAAKDGTLLARPVV

SWFDQGTRDVIGLRIAGGATVWATPDHKVLTEYGWRAAGELRKG

DRVAGPGGSGNSLALSLTADQMVSALLDAEPPIPYSEYDPTSPFSEA

SMMGLLTNLADRELVHMINWAKRVPGFVDLTLHDQAHLLECAWL

EILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLL

ATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIH

RALDKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGM

EHLYSMKYKNVVPLYDLLLEMLDAHRLHAGGSGASRVQAFADAL

DDKFLHDMLAEGLRYSVIREVLPTRRARTFDLEVEELHTLVAEGVV

VHNC (SEQ ID NO: 44)

30R3-2 INTEIN
CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAAAKDGTLLARPVV

SWFDQGTRDVIGLRIAGGATVWATPDHKVLTEYGWRAAGELRKG

DRVAGPGGSGNSLALSLTADQMVSALLDAEPPILYSEYDPTSPFSEA

SMMGLLTNLADRELVHMINWAKRVPGFVDLTLHDQAHLLECAWL

EILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLL

ATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIH

RALDKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGM

EHLYSMKYKNVVPLYDLLLEMLDAHRLHAGGSGASRVQAFADAL

DDKFLHDMLAEELRYSVIREVLPTRRARTFDLEVEELHTLVAEGVV

VHNC (SEQ ID NO: 45)

30R3-3 INTEIN
CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAAAKDGTLLARPVV

SWFDQGTRDVIGLRIAGGATVWATPDHKVLTEYGWRAAGELRKG

DRVAGPGGSGNSLALSLTADQMVSALLDAEPPIPYSEYDPTSPFSEA

SMMGLLTNLADRELVHMINWAKRVPGFVDLTLHDQAHLLECAWL

EILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLL

ATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIH

RALDKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGM

EHLYSMKYKNVVPLYDLLLEMLDAHRLHAGGSGASRVQAFADAL

DDKFLHDMLAEELRYSVIREVLPTRRARTFDLEVEELHTLVAEGVV

VHNC (SEQ ID NO: 46)

37R3-1 INTEIN
CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAAAKDGTLLARPVV

SWFDQGTRDVIGLRIAGGATVWATPDHKVLTEYGWRAAGELRKG

DRVAGPGGSGNSLALSLTADQMVSALLDAEPPILYSEYNPTSPFSEA

SMMGLLTNLADRELVHMINWAKRVPGFVDLTLHDQAHLLERAWL

EILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLL

ATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIH

RALDKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGM

EHLYSMKYKNVVPLYDLLLEMLDAHRLHAGGSGASRVQAFADAL

DDKFLHDMLAEGLRYSVIREVLPTRRARTFDLEVEELHTLVAEGVV

VHNC (SEQ ID NO: 47)

37R3-2 INTEIN
CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAAAKDGTLLARPVV

SWFDQGTRDVIGLRIAGGAIVWATPDHKVLTEYGWRAAGELRKGD

RVAGPGGSGNSLALSLTADQMVSALLDAEPPILYSEYDPTSPFSEAS

MMGLLTNLADRELVHMINWAKRVPGFVDLTLHDQAHLLERAWLEI

LMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLAT

SSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRA

LDKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGMEH

LYSMKYKNVVPLYDLLLEMLDAHRLHAGGSGASRVQAFADALDD

KFLHDMLAEGLRYSVIREVLPTRRARTFDLEVEELHTLVAEGVVVH

NC (SEQ ID NO: 48)

37R3-3 INTEIN
CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAVAKDGTLLARPVVS

WFDQGTRDVIGLRIAGGATVWATPDHKVLTEYGWRAAGELRKGD

RVAGPGGSGNSLALSLTADQMVSALLDAEPPILYSEYDPTSPFSEAS

MMGLLTNLADRELVHMINWAKRVPGFVDLTLHDQAHLLERAWLEI

LMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLAT

SSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRA

LDKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGMEH

LYSMKYKNVVPLYDLLLEMLDAHRLHAGGSGASRVQAFADALDD

KFLHDMLAEELRYSVIREVLPTRRARTFDLEVEELHTLVAEGVVVH

NC (SEQ ID NO: 49)

Linker

The term “linker,” as used herein, refers to a molecule linking two other molecules or moieties. The linker can be an amino acid sequence in the case of a linker joining two fusion proteins. For example, a Cas9 can be fused to a reverse transcriptase by an amino acid linker sequence. The linker can also be a nucleotide sequence in the case of joining two nucleotide sequences together. For example, in the instant case, the traditional guide RNA is linked via a spacer or linker nucleotide sequence to the RNA extension of a prime editing guide RNA which may comprise a RT template sequence and an RT primer binding site. In other embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.

Isolated

“Isolated” means altered or removed from the natural state. For example, a nucleic 20 acid or a peptide naturally present in a living animal is not “isolated,” but the same nucleic acid or peptide partially or completely separated from the coexisting materials of its natural state is “isolated.” An isolated nucleic acid or protein can exist in substantially purified form, or can exist in a non-native environment such as, for example, a host cell.

In some embodiments, a gene of interest is encoded by an isolated nucleic acid. As used herein, the term “isolated,” refers to the characteristic of a material as provided herein being removed from its original or native environment (e.g., the natural environment if it is naturally occurring). Therefore, a naturally-occurring polynucleotide or protein or polypeptide present in a living animal is not isolated, but the same polynucleotide or polypeptide, separated by human intervention from some or all of the coexisting materials in the natural system, is isolated. An artificial or engineered material, for example, a non-naturally occurring nucleic acid construct, such as the expression constructs and vectors described herein, are, accordingly, also referred to as isolated. A material does not have to be purified in order to be isolated. Accordingly, a material may be part of a vector and/or part of a composition, and still be isolated in that such vector or composition is not part of the environment in which the material is found in nature.

MS2 Tagging Technique

In various embodiments (e.g., as depicted in the embodiments of FIGS. 72-73 and in Example 19), the term “MS2 tagging technique” refers to the combination of an “RNA-protein interaction domain” (aka “RNA-protein recruitment domain or protein”) paired up with an RNA-binding protein that specifically recognizes and binds to the RNA-protein interaction domain, e.g., a specific hairpin structure. These types of systems can be leveraged to recruit a variety of functionalities to a prime editor complex that is bound to a target site. The MS2 tagging technique is based on the natural interaction of the MS2 bacteriophage coat protein (“MCP” or “MS2cp”) with a stem-loop or hairpin structure present in the genome of the phage, i.e., the “MS2 hairpin.” In the case of prime editing, the MS2 tagging technique comprises introducing the MS2 hairpin into a desired RNA molecule involved in prime editing (e.g., a pegRNA or a tPERT), which then constitutes a specific interactable binding target for an RNA-binding protein that recognizes and binds to that structure. In the case of the MS2 hairpin, it is recognized and bound by the MS2 bacteriophage coat protein (MCP). And, if MCP is fused to another protein (e.g., a reverse transcriptase or other DNA polymerase), then the MS2 hairpin may be used to “recruit” that other protein in trans to the target site occupied by the prime editing complex.

The prime editors described herein may incorporate as an aspect any known RNA-protein interaction domain to recruit or “co-localize” specific functions of interest to a prime editor complex. A review of other modular RNA-protein interaction domains are described in the art, for example, in Johansson et al., “RNA recognition by the MS2 phage coat protein,” Sem Virol., 1997, Vol. 8(3): 176-185; Delebecque et al., “Organization of intracellular reactions with rationally designed RNA assemblies,” Science, 2011, Vol. 333: 470-474; Mali et al., “Cas9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering,” Nat. Biotechnol., 2013, Vol. 31: 833-838; and Zalatan et al., “Engineering complex synthetic transcriptional programs with CRISPR RNA scaffolds,” Cell, 2015, Vol. 160: 339-350, each of which are incorporated herein by reference in their entireties. Other systems include the PP7 hairpin, which specifically recruits the PCP protein, and the “com” hairpin, which specifically recruits the Com protein. See Zalatan et al.

The nucleotide sequence of the MS2 hairpin (or equivalently referred to as the “MS2 aptamer”) is:

(SEQ ID NO: 51)

GCCAACATGAGGATCACCCATGTCTGCAGGGCC.

The amino acid sequence of the MCP or MS2cp is:

(SEQ ID NO: 52)

GSASNFTQFVLVDNGGTGDVTVAPSNFANGVAEWISSNSRSQAYKVTCS

VRQSSAQNRKYTIKVEVPKVATQTVGGEELPVAGWRSYLNMELTIPIFA

TNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY.

The MS2 hairpin (or “MS2 aptamer”) may also be referred to as a type of “RNA effector recruitment domain” (or equivalently as “RNA-binding protein recruitment domain” or simply as “recruitment domain”) since it is a physical structure (e.g., a hairpin) that is installed into a pegRNA or tPERT that effectively recruits other effector functions (e.g., RNA-binding proteins having various functions, such as DNA polymerases or other DNA-modifying enzymes) to the pegRNA or rPERT that is so modified, and thus, co-localizing effector functions in trans to the prime editing machinery. This application is not intended to be limited in any way to any particular RNA effector recruitment domains and may include any available such domain, including the MS2 hairpin. Example 19 and FIG. 72(b) depicts the use of the MS2 aptamer joined to a DNA synthesis domain (i.e., the tPERT molecule) and a prime editor that comprises an MS2cp protein fused to a PE2 to cause the co-localization of the prime editor complex (MS2cp-PE2:sgRNA complex) bound to the target DNA site and the DNA synthesis domain of the tPERT molecule to effectuate the

napDNAbp

As used herein, the term “nucleic acid programmable DNA binding protein” or “napDNAbp,” of which Cas9 is an example, refer to a proteins which use RNA:DNA hybridization to target and bind to specific sequences in a DNA molecule. Each napDNAbp is associated with at least one guide nucleic acid (e.g., guide RNA), which localizes the napDNAbp to a DNA sequence that comprises a DNA strand (i.e., a target strand) that is complementary to the guide nucleic acid, or a portion thereof (e.g., the protospacer of a guide RNA). In other words, the guide nucleic-acid “programs” the napDNAbp (e.g., Cas9 or equivalent) to localize and bind to a complementary sequence.

Without being bound by theory, the binding mechanism of a napDNAbp-guide RNA complex, in general, includes the step of forming an R-loop whereby the napDNAbp induces the unwinding of a double-strand DNA target, thereby separating the strands in the region bound by the napDNAbp. The guide RNA protospacer then hybridizes to the “target strand.” This displaces a “non-target strand” that is complementary to the target strand, which forms the single strand region of the R-loop. In some embodiments, the napDNAbp includes one or more nuclease activities, which then cut the DNA leaving various types of lesions. For example, the napDNAbp may comprises a nuclease activity that cuts the non-target strand at a first location, and/or cuts the target strand at a second location. Depending on the nuclease activity, the target DNA can be cut to form a “double-stranded break” whereby both strands are cut. In other embodiments, the target DNA can be cut at only a single site, i.e., the DNA is “nicked” on one strand. Exemplary napDNAbp with different nuclease activities include “Cas9 nickase” (“nCas9”) and a deactivated Cas9 having no nuclease activities (“dead Cas9” or “dCas9”). Exemplary sequences for these and other napDNAbp are provided herein.

Nickase

The term “nickase” refers to a Cas9 with one of the two nuclease domains inactivated. This enzyme is capable of cleaving only one strand of a target DNA.

Nuclear Localization Sequence (NLS)

The term “nuclear localization sequence” or “NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., international PCT application, PCT/EP2000/011690, filed Nov. 23, 2000, published as WO/2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference for its disclosure of exemplary nuclear localization sequences. In some embodiments, a NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 50) or MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 25).

Nucleic Acid Molecule

The term “nucleic acid,” as used herein, refers to a polymer of nucleotides. The polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C5 bromouridine, C5 fluorouridine, C5 iodouridine, C5 propynyl uridine, C5 propynyl cytidine, C5 methylcytidine, 7 deazaadenosine, 7 deazaguanosine, 8 oxoadenosine, 8 oxoguanosine, 0(6) methylguanine, 4-acetylcytidine, 5-(carboxyhydroxymethyl)uridine, dihydrouridine, methylpseudouridine, 1-methyl adenosine, 1-methyl guanosine, N6-methyl adenosine, and 2-thiocytidine), chemically modified bases, biologically modified bases (e.g., methylated bases), intercalated bases, modified sugars (e.g., 2′-fluororibose, ribose, 2′-deoxyribose, 2′-O-methylcytidine, arabinose, and hexose), or modified phosphate groups (e.g., phosphorothioates and 5′ N phosphoramidite linkages).

Nucleotide Structural Motifs (or Nucleic Acid Moiety)

As used herein, the term “nucleotide structural motif” or equivalently, “nucleic acid moiety,” refers to nucleic acid molecule or a portion thereof, which forms a secondary or tertiary structure due to basepairing interactions within a single nucleic acid polymer or between two or more nucleic acid polymers. Such nucleotide structural motifs can be formed from DNA, RNA, or a hybrid of DNA and RNA. The term is not meant to refer to standard DNA double-helices. Examples of nucleic acid moieties include, but are not limited to, a toe-loop, hairpin, stem-loop, pseudoknot, aptamer, G quadraplex, tRNA, ribozyme, riboswitch, A-form DNA, B-form DNA, or Z-form DNA.

pegRNA

As used herein, the terms “prime editing guide RNA” or “pegRNA” or “pegRNA” refers to a specialized form of a guide RNA that has been modified to include one or more additional sequences for implementing the prime editing methods and compositions described herein. As described herein, the prime editing guide RNA comprise one or more “extended regions” of nucleic acid sequence. The extended regions may comprise, but are not limited to, single-stranded RNA or DNA. Further, the extended regions may occur at the 3′ end of a traditional guide RNA. In other arrangements, the extended regions may occur at the 5′ end of a traditional guide RNA. In still other arrangements, the extended region may occur at an intramolecular region of the traditional guide RNA, for example, in the gRNA core region which associates and/or binds to the napDNAbp. The extended region comprises a “DNA synthesis template” which encodes (by the polymerase of the prime editor) a single-stranded DNA which, in turn, has been designed to be (a) homologous with the endogenous target DNA to be edited, and (b) which comprises at least one desired nucleotide change (e.g., a transition, a transversion, a deletion, or an insertion) to be introduced or integrated into the endogenous target DNA. The extended region may also comprise other functional sequence elements, such as, but not limited to, a “primer binding site” and a “spacer or linker” sequence, or other structural elements, such as, but not limited to aptamers, stem loops, hairpins, toe loops (e.g., a 3′ toeloop), or an RNA-protein recruitment domain (e.g., MS2 hairpin). As used herein the “primer binding site” comprises a sequence that hybridizes to a single-strand DNA sequence having a 3′ end generated from the nicked DNA of the R-loop.

In certain embodiments, the pegRNAs are represented by FIG. 3A, which shows a pegRNA having a 5′ extension arm, a spacer, and a gRNA core. The 5′ extension further comprises in the 5′ to 3′ direction a reverse transcriptase template, a primer binding site, and a linker. As shown, the reverse transcriptase template may also be referred to more broadly as the “DNA synthesis template” where the polymerase of a prime editor described herein is not an RT, but another type of polymerase.

In certain other embodiments, the pegRNAs are represented by FIG. 3B, which shows a pegRNA having a 5′ extension arm, a spacer, and a gRNA core. The 5′ extension further comprises in the 5′ to 3′ direction a reverse transcriptase template, a primer binding site, and a linker. As shown, the reverse transcriptase template may also be referred to more broadly as the “DNA synthesis template” where the polymerase of a prime editor described herein is not an RT, but another type of polymerase.

In still other embodiments, the pegRNAs are represented by FIG. 3D, which shows a pegRNA having in the 5′ to 3′ direction a spacer (1), a gRNA core (2), and an extension arm (3). The extension arm (3) is at the 3′ end of the pegRNA. The extension arm (3) further comprises in the 5′ to 3′ direction a “primer binding site” (A), an “edit template” (B), and a “homology arm” (C). The extension arm (3) may also comprise an optional modifier region at the 3′ and 5′ ends, which may be the same sequences or different sequences. In addition, the 3′ end of the pegRNA may comprise a transcriptional terminator sequence. These sequence elements of the pegRNAs are further described and defined herein.

In still other embodiments, the pegRNAs are represented by FIG. 3E, which shows a pegRNA having in the 5′ to 3′ direction an extension arm (3), a spacer (1), and a gRNA core (2). The extension arm (3) is at the 5′ end of the pegRNA. The extension arm (3) further comprises in the 3′ to 5′ direction a “primer binding site” (A), an “edit template” (B), and a “homology arm” (C). The extension arm (3) may also comprise an optional modifier region at the 3′ and 5′ ends, which may be the same sequences or different sequences. The pegRNAs may also comprise a transcriptional terminator sequence at the 3′ end. These sequence elements of the pegRNAs are further described and defined herein.

PE1

As used herein, “PE1” refers to a PE complex comprising a fusion protein comprising Cas9(H840A) and a wild type MMLV RT having the following structure: [NLS]-[Cas9(H840A)]-[linker]-[MMLV_RT(wt)]+a desired pegRNA, wherein the PE fusion has the amino acid sequence of SEQ ID NO: 54, which is shown as follows;

(SEQ ID NO: 54)

MKRTADGSEFESPKKKRKV
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTD

RHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDS

FFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLI

YLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAI

LSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSK

DTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYD

EHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEK

MDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNRE

KIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERM

TNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDL

LFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLD

NEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSR

KLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLH

EHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNS

RERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL

SDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLN

AKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKY

PKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIR

KRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKR

NSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIME

RSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNEL

ALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILA

DANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK

EVLDATLIHQSITGLYETRIDLSQLGGD

SGGSSGGSSGSETPGTSESATPESSGGSSGG

SS

TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRL

PQGFKNSPTLFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLG

YRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLW

IPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGY

AKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPH

AVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLOHNCLDILAE

AHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELI

ALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKR

LSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP
SGGSKRTADGSEFEP

KKKRKV

KEY:

NUCLEAR LOCALIZATION SEQUENCE (NLS) TOP:(SEQ ID NO: 24), BOTTOM: (SEQ ID NO: 33)

CAS9(H840A) (SEQ ID NO: 55)

33-AMINO ACID LINKER
(SEQ ID NO: 56)

M-MLV reverse transcriptase (SEQ ID NO: 57).

PE2

As used herein, “PE2” refers to a PE complex comprising a fusion protein comprising Cas9(H840A) and a variant MMLV RT having the following structure: [NLS]-[Cas9(H840A)]-[linker]-[MMLV_RT(D200N)(T330P)(L603W)(T306K)(W313F)]+a desired pegRNA, wherein the PE fusion has the amino acid sequence of SEQ ID NO: 58, which is shown as follows:

(SEQ ID NO: 58)

MKRTADGSEFESPKKKRKV
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTD

RHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDS

FFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLI

YLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAI

LSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSK

DTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYD

EHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEK

MDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNRE

KIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERM

TNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDL

LFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLD

NEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSR

KLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLH

EHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNS

RERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL

SDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLN

AKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKY

PKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIR

KRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKR

NSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIME

RSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNEL

ALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILA

DANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK

EVLDATLIHQSITGLYETRIDLSQLGGD

SGGSSGGSSGSETPGTSESATPESSGGSSGG

SS

TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRL

PQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLG

YRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLF

IPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGY

AKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPH

AVEALVKOPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLOHNCLDILAE

AHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELI

ALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPK

RLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP
SGGSKRTADGSEFEP

KKKRKV

KEY:

NUCLEAR LOCALIZATION SEQUENCE (NLS) TOP:(SEQ ID NO: 24), BOTTOM: (SEQ ID NO: 33)

CAS9(H840A) (SEQ ID NO: 55)

33-AMINO ACID LINKER
(SEQ ID NO: 56)

M-MLV reverse transcriptase (SEQ ID NO: 59).

PE3

As used herein, “PE3” refers to PE2 plus a second-strand nicking guide RNA that complexes with the PE2 and introduces a nick in the non-edited DNA strand in order to induce preferential replacement of the edited strand.

PE3b

As used herein, “PE3b” refers to PE3 but wherein the second-strand nicking guide RNA is designed for temporal control such that the second strand nick is not introduced until after the installation of the desired edit. This is achieved by designing a gRNA with a spacer sequence that matches only the edited strand, but not the original allele. Using this strategy, referred to hereafter as PE3b, mismatches between the protospacer and the unedited allele should disfavor nicking by the sgRNA until after the editing event on the PAM strand takes place.

PE-Short

As used herein, “PE-short” refers to a PE construct that is fused to a C-terminally truncated reverse transcriptase, and has the following amino acid sequence:

(SEQ ID NO: 60)

MKRTADGSEFESPKKKRKV
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNT

DRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVD

DSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADL

RLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA

KAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQ

LSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIK

RYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPI

LEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKD

NREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFI

ERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKA

IVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDK

DFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGW

GRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQ

GDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQK

GQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQE

LDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYW

RQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRM

NTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGT

ALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITL

ANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKE

SILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKEL

LGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGEL

QKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFS

KRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRK

RYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD

SGGSSGGSSGSETPGTSESATPESS

GGSSGGSS

TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKAT

STPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREV

NKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGIS

GQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRA

LLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFL

GKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPF

ELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLT

MGQPLVILAPHAVEALVKOPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEE

GLQHNCLDNSRLIN
SGGSKRTADGSEFEPKKKRKV

KEY:

NUCLEAR LOCALIZATION SEQUENCE (NLS) TOP:(SEQ ID NO: 24), BOTTOM: (SEQ ID

NO: 33)

CAS9(H840A) (SEQ ID NO: 55)

33-AMINO ACID LINKER 1
(SEQ ID NO: 56)

M-MLV TRUNCATED REVERSE TRANSCRIPTASE (SEQ ID NO: 61)

Peptide Tag

The term “peptide tag” refers to a peptide amino acid sequence that is genetically fused to a protein sequence to impart one or more functions onto the proteins that facilitate the manipulation of the protein for various purposes, such as, visualization, purification, solubilization, and separation, etc. Peptide tags can include various types of tags categorized by purpose or function, which may include “affinity tags” (to facilitate protein purification), “solubilization tags” (to assist in proper folding of proteins), “chromatography tags” (to alter chromatographic properties of proteins), “epitope tags” (to bind to high affinity antibodies), “fluorescence tags” (to facilitate visualization of proteins in a cell or in vitro).

Polymerase

As used herein, the term “polymerase” refers to an enzyme that synthesizes a nucleotide strand and which may be used in connection with the prime editor systems described herein. The polymerase can be a “template-dependent” polymerase (i.e., a polymerase which synthesizes a nucleotide strand based on the order of nucleotide bases of a template strand). The polymerase can also be a “template-independent” polymerase (i.e., a polymerase which synthesizes a nucleotide strand without the requirement of a template strand). A polymerase may also be further categorized as a “DNA polymerase” or an “RNA polymerase.” In various embodiments, the prime editor system comprises a DNA polymerase. In various embodiments, the DNA polymerase can be a “DNA-dependent DNA polymerase” (i.e., whereby the template molecule is a strand of DNA). In such cases, the DNA template molecule can be a pegRNA, wherein the extension arm comprises a strand of DNA. In such cases, the pegRNA may be referred to as a chimeric or hybrid pegRNA which comprises an RNA portion (i.e., the guide RNA components, including the spacer and the gRNA core) and a DNA portion (i.e., the extension arm). In various other embodiments, the DNA polymerase can be an “RNA-dependent DNA polymerase” (i.e., whereby the template molecule is a strand of RNA). In such cases, the pegRNA is RNA, i.e., including an RNA extension. The term “polymerase” may also refer to an enzyme that catalyzes the polymerization of nucleotide (i.e., the polymerase activity). Generally, the enzyme will initiate synthesis at the 3′-end of a primer annealed to a polynucleotide template sequence (e.g., such as a primer sequence annealed to the primer binding site of a pegRNA), and will proceed toward the 5′ end of the template strand. A “DNA polymerase” catalyzes the polymerization of deoxynucleotides. As used herein in reference to a DNA polymerase, the term DNA polymerase includes a “functional fragment thereof”. A “functional fragment thereof” refers to any portion of a wild-type or mutant DNA polymerase that encompasses less than the entire amino acid sequence of the polymerase and which retains the ability, under at least one set of conditions, to catalyze the polymerization of a polynucleotide. Such a functional fragment may exist as a separate entity, or it may be a constituent of a larger polypeptide, such as a fusion protein.

Prime Editing

As used herein, the term “prime editing” refers to a novel approach for gene editing using napDNAbps, a polymerase (e.g., a reverse transcriptase), and specialized guide RNAs that include a DNA synthesis template for encoding desired new genetic information (or deleting genetic information) that is then incorporated into a target DNA sequence. Certain embodiments of prime editing are described in the embodiments of FIGS. 1A-1H and FIG. 72(a)-72(c), among other figures.

Prime editing represents an entirely new platform for genome editing that is a versatile and precise genome editing method that directly writes new genetic information into a specified DNA site using a nucleic acid programmable DNA binding protein (“napDNAbp”) working in association with a polymerase (i.e., in the form of a fusion protein or otherwise provided in trans with the napDNAbp), wherein the prime editing system is programmed with a prime editing (PE) guide RNA (“pegRNA”) that both specifies the target site and templates the synthesis of the desired edit in the form of a replacement DNA strand by way of an extension (either DNA or RNA) engineered onto a guide RNA (e.g., at the 5′ or 3′ end, or at an internal portion of a guide RNA). The replacement strand containing the desired edit (e.g., a single nucleobase substitution) shares the same (or is homologous to) sequence as the endogenous strand (immediately downstream of the nick site) of the target site to be edited (with the exception that it includes the desired edit). Through DNA repair and/or replication machinery, the endogenous strand downstream of the nick site is replaced by the newly synthesized replacement strand containing the desired edit. In some cases, prime editing may be thought of as a “search-and-replace” genome editing technology since the prime editors, as described herein, not only search and locate the desired target site to be edited, but at the same time, encode a replacement strand containing a desired edit which is installed in place of the corresponding target site endogenous DNA strand. The prime editors of the present disclosure relate, in part, to the discovery that the mechanism of target-primed reverse transcription (TPRT) or “prime editing” can be leveraged or adapted for conducting precision CRISPR/Cas-based genome editing with high efficiency and genetic flexibility (e.g., as depicted in various embodiments of FIGS. 1A-1F). TPRT is naturally used by mobile DNA elements, such as mammalian non-LTR retrotransposons and bacterial Group II introns^{28, 29}. The inventors have herein used Cas protein-reverse transcriptase fusions or related systems to target a specific DNA sequence with a guide RNA, generate a single strand nick at the target site, and use the nicked DNA as a primer for reverse transcription of an engineered reverse transcriptase template that is integrated with the guide RNA. However, while the concept begins with prime editors that use reverse transcriptase as the DNA polymerase component, the prime editors described herein are not limited to reverse transcriptases but may include the use of virtually any DNA polymerase. Indeed, while the application throughout may refer to prime editors with “reverse transcriptases,” it is set forth here that reverse transcriptases are only one type of DNA polymerase that may work with prime editing. Thus, where ever the specification mentions a “reverse transcriptase,” the person having ordinary skill in the art should appreciate that any suitable DNA polymerase may be used in place of the reverse transcriptase. Thus, in one aspect, the prime editors may comprise Cas9 (or an equivalent napDNAbp) which is programmed to target a DNA sequence by associating it with a specialized guide RNA (i.e., pegRNA) containing a spacer sequence that anneals to a complementary protospacer in the target DNA. The specialized guide RNA also contains new genetic information in the form of an extension that encodes a replacement strand of DNA containing a desired genetic alteration which is used to replace a corresponding endogenous DNA strand at the target site. To transfer information from the pegRNA to the target DNA, the mechanism of prime editing involves nicking the target site in one strand of the DNA to expose a 3′-hydroxyl group. The exposed 3′-hydroxyl group can then be used to prime the DNA polymerization of the edit-encoding extension on pegRNA directly into the target site. In various embodiments, the extension which provides the template for polymerization of the replacement strand containing the edit can be formed from RNA or DNA. In the case of an RNA extension, the polymerase of the prime editor can be an RNA-dependent DNA polymerase (such as, a reverse transcriptase). In the case of a DNA extension, the polymerase of the prime editor may be a DNA-dependent DNA polymerase. The newly synthesized strand (i.e., the replacement DNA strand containing the desired edit) that is formed by the herein disclosed prime editors would be homologous to the genomic target sequence (i.e., have the same sequence as) except for the inclusion of a desired nucleotide change (e.g., a single nucleotide change, a deletion, or an insertion, or a combination thereof). The newly synthesized (or replacement) strand of DNA may also be referred to as a single strand DNA flap, which would compete for hybridization with the complementary homologous endogenous DNA strand, thereby displacing the corresponding endogenous strand. In certain embodiments, the system can be combined with the use of an error-prone reverse transcriptase enzyme (e.g., provided as a fusion protein with the Cas9 domain, or provided in trans to the Cas9 domain). The error-prone reverse transcriptase enzyme can introduce alterations during synthesis of the single strand DNA flap. Thus, in certain embodiments, error-prone reverse transcriptase can be utilized to introduce nucleotide changes to the target DNA. Depending on the error-prone reverse transcriptase that is used with the system, the changes can be random or non-random. Resolution of the hybridized intermediate (comprising the single strand DNA flap synthesized by the reverse transcriptase hybridized to the endogenous DNA strand) can include removal of the resulting displaced flap of endogenous DNA (e.g., with a 5′ end DNA flap endonuclease, FEN1), ligation of the synthesized single strand DNA flap to the target DNA, and assimilation of the desired nucleotide change as a result of cellular DNA repair and/or replication processes. Because templated DNA synthesis offers single nucleotide precision for the modification of any nucleotide, including insertions and deletions, the scope of this approach is very broad and could foreseeably be used for myriad applications in basic science and therapeutics.

In various embodiments, prime editing operates by contacting a target DNA molecule (for which a change in the nucleotide sequence is desired to be introduced) with a nucleic acid programmable DNA binding protein (napDNAbp) complexed with a prime editing guide RNA (pegRNA). In reference to FIG. 1G, the prime editing guide RNA (pegRNA) comprises an extension at the 3′ or 5′ end of the guide RNA, or at an intramolecular location in the guide RNA and encodes the desired nucleotide change (e.g., single nucleotide change, insertion, or deletion). In step (a), the napDNAbp/pegRNA complex contacts the DNA molecule and the extended pegRNA guides the napDNAbp to bind to a target locus. In step (b), a nick in one of the strands of DNA of the target locus is introduced (e.g., by a nuclease or chemical agent), thereby creating an available 3′ end in one of the strands of the target locus. In certain embodiments, the nick is created in the strand of DNA that corresponds to the R-loop strand, i.e., the strand that is not hybridized to the guide RNA sequence, i.e., the “non-target strand.” The nick, however, could be introduced in either of the strands. That is, the nick could be introduced into the R-loop “target strand” (i.e., the strand hybridized to the protospacer of the extended pegRNA) or the “non-target strand” (i.e., the strand forming the single-stranded portion of the R-loop and which is complementary to the target strand). In step (c), the 3′ end of the DNA strand (formed by the nick) interacts with the extended portion of the guide RNA in order to prime reverse transcription (i.e., “target-primed RT”). In certain embodiments, the 3′ end DNA strand hybridizes to a specific RT priming sequence on the extended portion of the guide RNA, i.e., the “reverse transcriptase priming sequence” or “primer binding site” on the pegRNA. In step (d), a reverse transcriptase (or other suitable DNA polymerase) is introduced which synthesizes a single strand of DNA from the 3′ end of the primed site towards the 5′ end of the prime editing guide RNA. The DNA polymerase (e.g., reverse transcriptase) can be fused to the napDNAbp or alternatively can be provided in trans to the napDNAbp. This forms a single-strand DNA flap comprising the desired nucleotide change (e.g., the single base change, insertion, or deletion, or a combination thereof) and which is otherwise homologous to the endogenous DNA at or adjacent to the nick site. In step (e), the napDNAbp and guide RNA are released. Steps (f) and (g) relate to the resolution of the single strand DNA flap such that the desired nucleotide change becomes incorporated into the target locus. This process can be driven towards the desired product formation by removing the corresponding 5′ endogenous DNA flap that forms once the 3′ single strand DNA flap invades and hybridizes to the endogenous DNA sequence. Without being bound by theory, the cells endogenous DNA repair and replication processes resolves the mismatched DNA to incorporate the nucleotide change(s) to form the desired altered product. The process can also be driven towards product formation with “second strand nicking,” as exemplified in FIG. 1F. This process may introduce at least one or more of the following genetic changes: transversions, transitions, deletions, and insertions.

The term “prime editor (PE) system” or “prime editor (PE)” or “PE system” or “PE editing system” refers the compositions involved in the method of genome editing using prime editing described herein, including, but not limited to the napDNAbps, reverse transcriptases, fusion proteins (e.g., comprising napDNAbps and reverse transcriptases), prime editing guide RNAs, and complexes comprising fusion proteins and prime editing guide RNAs, as well as accessory elements, such as second strand nicking components (e.g., second strand sgRNAs) and 5′ endogenous DNA flap removal endonucleases (e.g., FEN1) for helping to drive the prime editing process towards the edited product formation.

Although in the embodiments described thus far the pegRNA constitutes a single molecule comprising a guide RNA (which itself comprises a spacer sequence and a gRNA core or scaffold) and a 5′ or 3′ extension arm comprising the primer binding site and a DNA synthesis template (e.g., see FIG. 3D, the pegRNA may also take the form of two individual molecules comprised of a guide RNA and a trans prime editor RNA template (tPERT), which essentially houses the extension arm (including, in particular, the primer binding site and the DNA synthesis domain) and an RNA-protein recruitment domain (e.g., MS2 aptamer or hairpin) in the same molecule which becomes co-localized or recruited to a modified prime editor complex that comprises a tPERT recruiting protein (e.g., MS2cp protein, which binds to the MS2 aptamer). See FIG. 3G and FIG. 3H as an example of a tPERT that may be used with prime editing.

Prime Editor

The term “prime editor” refers to the herein described fusion constructs comprising a napDNAbp (e.g., Cas9 nickase) and a reverse transcriptase and is capable of carrying out prime editing on a target nucleotide sequence in the presence of a pegRNA. The term “prime editor” may refer to the fusion protein or to the fusion protein complexed with a pegRNA, and/or further complexed with a second-strand nicking sgRNA. In some embodiments, the prime editor may also refer to the complex comprising a fusion protein (reverse transcriptase fused to a napDNAbp), a pegRNA, and a regular guide RNA capable of directing the second-site nicking step of the non-edited strand as described herein. In other embodiments, the reverse transcriptase component of the “primer editor” may be provided in trans.

Primer Binding Site

The term “primer binding site” or “the PBS” refers to the nucleotide sequence located on a pegRNA as component of the extension arm (typically at the 3′ end of the extension arm) and serves to bind to the primer sequence that is formed after Cas9 nicking of the target sequence by the prime editor. As detailed elsewhere, when the Cas9 nickase component of a prime editor nicks one strand of the target DNA sequence, a 3′-ended ssDNA flap is formed, which serves a primer sequence that anneals to the primer binding site on the pegRNA to prime reverse transcription. FIGS. 27 and 28 show embodiments of the primer binding site located on a 3′ and 5′ extension arm, respectively.

Promoter

The term “promoter” is art-recognized and refers to a nucleic acid molecule with a sequence recognized by the cellular transcription machinery and able to initiate transcription of a downstream gene. A promoter can be constitutively active, meaning that the promoter is always active in a given cellular context, or conditionally active, meaning that the promoter is only active in the presence of a specific condition. For example, a conditional promoter may only be active in the presence of a specific protein that connects a protein associated with a regulatory element in the promoter to the basic transcriptional machinery, or only in the absence of an inhibitory molecule. A subclass of conditionally active promoters are inducible promoters that require the presence of a small molecule “inducer” for activity. Examples of inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters. A variety of constitutive, conditional, and inducible promoters are well known to the skilled artisan, and the skilled artisan will be able to ascertain a variety of such promoters useful in carrying out the instant invention, which is not limited in this respect.

Protospacer

As used herein, the term “protospacer” refers to the sequence (˜20 bp) in DNA adjacent to the PAM (protospacer adjacent motif) sequence. The protospacer shares the same sequence as the spacer sequence of the guide RNA. The guide RNA anneals to the complement of the protospacer sequence on the target DNA (specifically, one strand thereof, i.e., the “target strand” versus the “non-target strand” of the target DNA sequence). In order for Cas9 to function it also requires a specific protospacer adjacent motif (PAM) that varies depending on the bacterial species of the Cas9 gene. The most commonly used Cas9 nuclease, derived from S. pyogenes, recognizes a PAM sequence of NGG that is found directly downstream of the target sequence in the genomic DNA, on the non-target strand. The skilled person will appreciate that the literature in the state of the art sometimes refers to the “protospacer” as the ˜20-nt target-specific guide sequence on the guide RNA itself, rather than referring to it as a “spacer.” Thus, in some cases, the term “protospacer” as used herein may be used interchangeably with the term “spacer.” The context of the description surrounding the appearance of either “protospacer” or “spacer” will help inform the reader as to whether the term is in reference to the gRNA or the DNA target.

Protospacer Adjacent Motif (PAM)

As used herein, the term “protospacer adjacent sequence” or “PAM” refers to an approximately 2-6 base pair DNA sequence that is an important targeting component of a Cas9 nuclease. Typically, the PAM sequence is on either strand, and is downstream in the 5′ to 3′ direction of Cas9 cut site. The canonical PAM sequence (i.e., the PAM sequence that is associated with the Cas9 nuclease of Streptococcus pyogenes or SpCas9) is 5′-NGG-3′ wherein “N” is any nucleobase followed by two guanine (“G”) nucleobases. Different PAM sequences can be associated with different Cas9 nucleases or equivalent proteins from different organisms. In addition, any given Cas9 nuclease, e.g., SpCas9, may be modified to alter the PAM specificity of the nuclease such that the nuclease recognizes alternative PAM sequence.

For example, with reference to the canonical SpCas9 amino acid sequence is SEQ ID NO: 18, the PAM sequence can be modified by introducing one or more mutations, including (a) D1135V, R1335Q, and T1337R “the VQR variant”, which alters the PAM specificity to NGAN or NGNG, (b) D1135E, R1335Q, and T1337R “the EQR variant”, which alters the PAM specificity to NGAG, and (c) D1135V, G1218R, R1335E, and T1337R “the VRER variant”, which alters the PAM specificity to NGCG. In addition, the D1135E variant of canonical SpCas9 still recognizes NGG, but it is more selective compared to the wild type SpCas9 protein.

It will also be appreciated that Cas9 enzymes from different bacterial species (i.e., Cas9 orthologs) can have varying PAM specificities. For example, Cas9 from Staphylococcus aureus (SaCas9) recognizes NGRRT or NGRRN. In addition, Cas9 from Neisseria meningitis (NmCas) recognizes NNNNGATT. In another example, Cas9 from Streptococcus thermophilis (StCas9) recognizes NNAGAAW. In still another example, Cas9 from Treponema denticola (TdCas) recognizes NAAAAC. These are example are not meant to be limiting. It will be further appreciated that non-SpCas9s bind a variety of PAM sequences, which makes them useful when no suitable SpCas9 PAM sequence is present at the desired target cut site. Furthermore, non-SpCas9s may have other characteristics that make them more useful than SpCas9. For example, Cas9 from Staphylococcus aureus (SaCas9) is about 1 kilobase smaller than SpCas9, so it can be packaged into adeno-associated virus (AAV). Further reference may be made to Shah et al., “Protospacer recognition motifs: mixed identities and functional diversity,” RNA Biology, 10(5): 891-899 (which is incorporated herein by reference).

Reverse Transcriptase

The term “reverse transcriptase” describes a class of polymerases characterized as RNA-dependent DNA polymerases. All known reverse transcriptases require a primer to synthesize a DNA transcript from an RNA template. Historically, reverse transcriptase has been used primarily to transcribe mRNA into cDNA which can then be cloned into a vector for further manipulation. Avian myoblastosis virus (AMV) reverse transcriptase was the first widely used RNA-dependent DNA polymerase (Verma, Biochim. Biophys. Acta 473:1 (1977)). The enzyme has 5′-3′ RNA-directed DNA polymerase activity, 5′-3′ DNA-directed DNA polymerase activity, and RNase H activity. RNase H is a processive 5′ and 3′ ribonuclease specific for the RNA strand for RNA-DNA hybrids (Perbal, A Practical Guide to Molecular Cloning, New York: Wiley & Sons (1984)). Errors in transcription cannot be corrected by reverse transcriptase because known viral reverse transcriptases lack the 3′-5′ exonuclease activity necessary for proofreading (Saunders and Saunders, Microbial Genetics Applied to Biotechnology, London: Croom Helm (1987)). A detailed study of the activity of AMV reverse transcriptase and its associated RNase H activity has been presented by Berger et al., Biochemistry 22:2365-2372 (1983). Another reverse transcriptase which is used extensively in molecular biology is reverse transcriptase originating from Moloney murine leukemia virus (M-MLV). See, e.g., Gerard, G. R., DNA 5:271-279 (1986) and Kotewicz, M. L., et al., Gene 35:249-258 (1985). M-MLV reverse transcriptase substantially lacking in RNase H activity has also been described. See, e.g., U.S. Pat. No. 5,244,797. The invention contemplates the use of any such reverse transcriptases, or variants or mutants thereof.

In addition, the invention contemplates the use of reverse transcriptases which are error-prone, i.e., which may be referred to as error-prone reverse transcriptases or reverse transcriptases which do not support high fidelity incorporation of nucleotides during polymerization. During synthesis of the single-strand DNA flap based on the RT template integrated with the guide RNA, the error-prone reverse transcriptase can introduce one or more nucleotides which are mismatched with the RT template sequence, thereby introducing changes to the nucleotide sequence through erroneous polymerization of the single-strand DNA flap. These errors introduced during synthesis of the single strand DNA flap then become integrated into the double strand molecule through hybridization to the corresponding endogenous target strand, removal of the endogenous displaced strand, ligation, and then through one more round of endogenous DNA repair and/or sequencing processes.

Reverse Transcription

As used herein, the term “reverse transcription” indicates the capability of enzyme to synthesize DNA strand (that is, complementary DNA or cDNA) using RNA as a template. In some embodiments, the reverse transcription can be “error-prone reverse transcription,” which refers to the properties of certain reverse transcriptase enzymes which are error-prone in their DNA polymerization activity.

Protein, Peptide, and Polypeptide

The terms “protein,” “peptide,” and “polypeptide” are used interchangeably herein, and refer to a polymer of amino acid residues linked together by peptide (amide) bonds. The terms refer to a protein, peptide, or polypeptide of any size, structure, or function. Typically, a protein, peptide, or polypeptide will be at least three amino acids long. A protein, peptide, or polypeptide may refer to an individual protein or a collection of proteins. One or more of the amino acids in a protein, peptide, or polypeptide may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a farnesyl group, an isofarnesyl group, a fatty acid group, a linker for conjugation, functionalization, or other modification, etc. A protein, peptide, or polypeptide may also be a single molecule or may be a multi-molecular complex. A protein, peptide, or polypeptide may be just a fragment of a naturally occurring protein or peptide. A protein, peptide, or polypeptide may be naturally occurring, recombinant, or synthetic, or any combination thereof. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.

Protein Splicing

The term “protein splicing,” as used herein, refers to a process in which a sequence, an intein (or split inteins, as the case may be), is excised from within an amino acid sequence, and the remaining fragments of the amino acid sequence, the exteins, are ligated via an amide bond to form a continuous amino acid sequence. The term “trans” protein splicing refers to the specific case where the inteins are split inteins and they are located on different proteins.

Second-Strand Nicking

The resolution of heteroduplex DNA (i.e., containing one edited and one non-edited strand) formed as a result of prime editing determines long-term editing outcomes. In words, a goal of prime editing is to resolve the heteroduplex DNA (the edited strand paired with the endogenous non-edited strand) formed as an intermediate of PE by permanently integrating the edited strand into the complement, endogenous strand. The approach of “second-strand nicking” can be used herein to help drive the resolution of heteroduplex DNA in favor of permanent integration of the edited strand into the DNA molecule. As used herein, the concept of “second-strand nicking” refers to the introduction of a second nick at a location downstream of the first nick (i.e., the initial nick site that provides the free 3′ end for use in priming of the reverse transcriptase on the extended portion of the guide RNA), preferably on the unedited strand. In certain embodiments, the first nick and the second nick are on opposite strands. In other embodiments, the first nick and the second nick are on opposite strands. In yet another embodiment, the first nick is on the non-target strand (i.e., the strand that forms the single strand portion of the R-loop), and the second nick is on the target strand. In still other embodiments, the first nick is on the edited strand, and the second nick is on the unedited strand. The second nick can be positioned at least 5 nucleotides downstream of the first nick, or at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, or 150 or more nucleotides downstream of the first nick. The second nick, in certain embodiments, can be introduced between about 5-150 nucleotides on the unedited strand away from the site of the pegRNA-induced nick, or between about 5-140, or between about 5-130, or between about 5-120, or between about 5-110, or between about 5-100, or between about 5-90, or between about 5-80, or between about 5-70, or between about 5-60, or between about 5-50, or between about 5-40, or between about 5-30, or between about 5-20, or between about 5-10. In one embodiment, the second nick is introduced between 14-116 nucleotides away from the pegRNA-induced nick. Without being bound by theory, the second nick induces the cell's endogenous DNA repair and replication processes towards replacement or editing of the unedited strand, thereby permanently installing the edited sequence on both strands and resolving the heteroduplex that is formed as a result of PE. In some embodiments, the edited strand is the non-target strand and the unedited strand is the target strand. In other embodiments, the edited strand is the target strand, and the unedited strand is the non-target strand.

Sense Strand

In genetics, a “sense” strand is the segment within double-stranded DNA that runs from 5′ to 3′, and which is complementary to the antisense strand of DNA, or template strand, which runs from 3′ to 5′. In the case of a DNA segment that encodes a protein, the sense strand is the strand of DNA that has the same sequence as the mRNA, which takes the antisense strand as its template during transcription, and eventually undergoes (typically, not always) translation into a protein. The antisense strand is thus responsible for the RNA that is later translated to protein, while the sense strand possesses a nearly identical makeup to that of the mRNA. Note that for each segment of dsDNA, there will possibly be two sets of sense and antisense, depending on which direction one reads (since sense and antisense is relative to perspective). It is ultimately the gene product, or mRNA, that dictates which strand of one segment of dsDNA is referred to as sense or antisense.

In the context of a pegRNA, the first step is the synthesis of a single-strand complementary DNA (i.e., the 3′ ssDNA flap, which becomes incorporated) oriented in the 5′ to 3′ direction which is templated off of the pegRNA extension arm. Whether the 3′ ssDNA flap should be regarded as a sense or antisense strand depends on the direction of transcription since it well accepted that both strands of DNA may serve as a template for transcription (but not at the same time). Thus, in some embodiments, the 3′ ssDNA flap (which overall runs in the 5′ to 3′ direction) will serve as the sense strand because it is the coding strand. In other embodiments, the 3′ ssDNA flap (which overall runs in the 5′ to 3′ direction) will serve as the antisense strand and thus, the template for transcription.

Spacer Sequence

As used herein, the term “spacer sequence” in connection with a guide RNA or a pegRNA refers to the portion of the guide RNA or pegRNA of about 20 nucleotides which contains a nucleotide sequence that is complementary to the protospacer sequence in the target DNA sequence. The spacer sequence anneals to the protospacer sequence to form a ssRNA/ssDNA hybrid structure at the target site and a corresponding R loop ssDNA structure of the endogenous DNA strand that is complementary to the protospacer sequence.

Subject

The term “subject,” as used herein, refers to an individual organism, for example, an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, a goat, a cattle, a cat, or a dog. In some embodiments, the subject is a vertebrate, an amphibian, a reptile, a fish, an insect, a fly, or a nematode. In some embodiments, the subject is a research animal. In some embodiments, the subject is genetically engineered, e.g., a genetically engineered non-human subject. The subject may be of either sex and at any stage of development.

Split Intein

Although inteins are most frequently found as a contiguous domain, some exist in a naturally split form. In this case, the two fragments are expressed as separate polypeptides and must associate before splicing takes place, so-called protein trans-splicing.

An exemplary split intein is the Ssp DnaE intein, which comprises two subunits, namely, DnaE-N and DnaE-C. The two different subunits are encoded by separate genes, namely dnaE-n and dnaE-c, which encode the DnaE-N and DnaE-C subunits, respectively. DnaE is a naturally occurring split intein in Synechocytis sp. PCC6803 and is capable of directing trans-splicing of two separate proteins, each comprising a fusion with either DnaE-N or DnaE-C.

Additional naturally occurring or engineered split-intein sequences are known in the or can be made from whole-intein sequences described herein or those available in the art. Examples of split-intein sequences can be found in Stevens et al., “A promiscuous split intein with expanded protein engineering applications,” PNAS, 2017, Vol. 114: 8538-8543; Iwai et al., “Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostc punctiforme, FEBS Lett, 580: 1853-1858, each of which are incorporated herein by reference. Additional split intein sequences can be found, for example, in WO 2013/045632, WO 2014/055782, WO 2016/069774, and EP2877490, the contents each of which are incorporated herein by reference.

In addition, protein splicing in trans has been described in vivo and in vitro (Shingledecker, et al., Gene 207:187 (1998), Southworth, et al., EMBO J. 17:918 (1998); Mills, et al., Proc. Natl. Acad. Sci. USA, 95:3543-3548 (1998); Lew, et al., J. Biol. Chem., 273:15887-15890 (1998); Wu, et al., Biochim. Biophys. Acta 35732:1 (1998b), Yamazaki, et al., J. Am. Chem. Soc. 120:5591 (1998), Evans, et al., J. Biol. Chem. 275:9091 (2000); Otomo, et al., Biochemistry 38:16040-16044 (1999); Otomo, et al., J. Biolmol. NMR 14:105-114 (1999); Scott, et al., Proc. Natl. Acad. Sci. USA 96:13638-13643 (1999)) and provides the opportunity to express a protein as to two inactive fragments that subsequently undergo ligation to form a functional product, e.g., as shown in FIGS. 66 and 67 with regard to the formation of a complete Prime editor from two separately-expressed halves.

Target Site

The term “target site” refers to a sequence within a nucleic acid molecule that is edited by a prime editor (PE) disclosed herein. The target site further refers to the sequence within a nucleic acid molecule to which a complex of the prime editor (PE) and gRNA binds.

tPERT

See definition for “trans prime editor RNA template (tPERT).”

Temporal Second-Strand Nicking

As used herein, the term “temporal second-strand nicking” refers to a variant of second strand nicking whereby the installation of the second nick in the unedited strand occurs only after the desired edit is installed in the edited strand. This avoids concurrent nicks on both strands that could lead to double-stranded DNA breaks. The second-strand nicking guide RNA is designed for temporal control such that the second strand nick is not introduced until after the installation of the desired edit. This is achieved by designing a gRNA with a spacer sequence that matches only the edited strand, but not the original allele. Using this strategy, mismatches between the protospacer and the unedited allele should disfavor nicking by the sgRNA until after the editing event on the PAM strand takes place.

Trans Prime Editing

As used herein, the term “trans prime editing” refers to a modified form of prime editing that utilizes a split pegRNA, i.e., wherein the pegRNA is separated into two separate molecules: an sgRNA and a trans prime editing RNA template (tPERT). The sgRNA serves to target the prime editor (or more generally, to target the napDNAbp component of the prime editor) to the desired genomic target site, while the tPERT is used by the polymerase (e.g., a reverse transcriptase) to write new DNA sequence into the target locus once the tPERT is recruited in trans to the prime editor by the interaction of binding domains located on the prime editor and on the tPERT. In one embodiment, the binding domains can include RNA-protein recruitment moieties, such as a MS2 aptamer located on the tPERT and an MS2cp protein fused to the prime editor. An advantage of trans prime editing is that by separating the DNA synthesis template from the guide RNA, one can potentially use longer length templates.

An embodiment of trans prime editing is shown in FIGS. 3G and 3H. FIG. 3G shows the composition of the trans prime editor complex on the left (“RP-PE:gRNA complex), which comprises an napDNAbp fused to each of a polymerase (e.g., a reverse transcriptase) and a rPERT recruiting protein (e.g., MS2sc), and which is complexed with a guide RNA. FIG. 3G further shows a separate tPERT molecule, which comprises the extension arm features of a pegRNA, including the DNA synthesis template and the primer binding sequence. The tPERT molecule also includes an RNA-protein recruitment domain (which, in this case, is a stem loop structure and can be, for example, MS2 aptamer). As depicted in the process described in FIG. 3H, the RP-PE:gRNA complex binds to and nicks the target DNA sequence. Then, the recruiting protein (RP) recruits a tPERT to co-localize to the prime editor complex bound to the DNA target site, thereby allowing the primer binding site to bind to the primer sequence on the nicked strand, and subsequently, allowing the polymerase (e.g., RT) to synthesize a single strand of DNA against the DNA synthesis template up through the 5′ of the tPERT.

While the tPERT is shown in FIG. 3G and FIG. 3H as comprising the PBS and DNA synthesis template on the 5′ end of the RNA-protein recruitment domain, the tPERT in other configurations may be designed with the PBS and DNA synthesis template located on the 3′ end of the RNA-protein recruitment domain. However, the tPERT with the 5′ extension has the advantage that synthesis of the single strand of DNA will naturally terminate at the 5′ end of the tPERT and thus, does not risk using any portion of the RNA-protein recruitment domain as a template during the DNA synthesis stage of prime editing.

Transitions

As used herein, “transitions” refer to the interchange of purine nucleobases (A↔G) or the interchange of pyrimidine nucleobases (C↔T). This class of interchanges involves nucleobases of similar shape. The compositions and methods disclosed herein are capable of inducing one or more transitions in a target DNA molecule. The compositions and methods disclosed herein are also capable of inducing both transitions and transversion in the same target DNA molecule. These changes involve A↔G, G↔A, C↔T, or T↔C. In the context of a double-strand DNA with Watson-Crick paired nucleobases, transversions refer to the following base pair exchanges: A:T↔G:C, G:G↔A:T, C:G↔T:A, or T:A↔C:G. The compositions and methods disclosed herein are capable of inducing one or more transitions in a target DNA molecule. The compositions and methods disclosed herein are also capable of inducing both transitions and transversion in the same target DNA molecule, as well as other nucleotide changes, including deletions and insertions.

Transversions

As used herein, “transversions” refer to the interchange of purine nucleobases for pyrimidine nucleobases, or in the reverse and thus, involve the interchange of nucleobases with dissimilar shape. These changes involve T↔A, T↔G, C↔G, C↔A, A↔T, A↔C, G<-+C, and G↔T. In the context of a double-strand DNA with Watson-Crick paired nucleobases, transversions refer to the following base pair exchanges: T:A↔A:T, T:A↔G:C, C:G↔G:C, C:G↔A:T, A:T↔T:A, A:T↔C:G, G:C↔C:G, and G:C↔T:A. The compositions and methods disclosed herein are capable of inducing one or more transversions in a target DNA molecule. The compositions and methods disclosed herein are also capable of inducing both transitions and transversion in the same target DNA molecule, as well as other nucleotide changes, including deletions and insertions.

Treatment

The terms “treatment,” “treat,” and “treating,” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. As used herein, the terms “treatment,” “treat,” and “treating” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. In some embodiments, treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed. In other embodiments, treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease. For example, treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their recurrence.

Upstream

Variant

As used herein the term “variant” should be taken to mean the exhibition of qualities that have a pattern that deviates from what occurs in nature, e.g., a variant Cas9 is a Cas9 comprising one or more changes in amino acid residues as compared to a wild type Cas9 amino acid sequence. The term “variant” encompasses homologous proteins having at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 99% percent identity with a reference sequence and having the same or substantially the same functional activity or activities as the reference sequence. The term also encompasses mutants, truncations, or domains of a reference sequence, and which display the same or substantially the same functional activity or activities as the reference sequence.

Vector

The term “vector,” as used herein, refers to a nucleic acid that can be modified to encode a gene of interest and that is able to enter into a host cell, mutate and replicate within the host cell, and then transfer a replicated form of the vector into another host cell. Exemplary suitable vectors include viral vectors, such as retroviral vectors or bacteriophages and filamentous phage, and conjugative plasmids. Additional suitable vectors will be apparent to those of skill in the art based on the instant disclosure.

Wild Type

As used herein the term “wild type” is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms.

5′ Endogenous DNA Flap

As used herein, the term “5′ endogenous DNA flap” refers to the strand of DNA situated immediately downstream of the PE-induced nick site in the target DNA. The nicking of the target DNA strand by PE exposes a 3′ hydroxyl group on the upstream side of the nick site and a 5′ hydroxyl group on the downstream side of the nick site. The endogenous strand ending in the 3′ hydroxyl group is used to prime the DNA polymerase of the prime editor (e.g., wherein the DNA polymerase is a reverse transcriptase). The endogenous strand on the downstream side of the nick site and which begins with the exposed 5′ hydroxyl group is referred to as the “5′ endogenous DNA flap” and is ultimately removed and replaced by the newly synthesized replacement strand (i.e., “3′ replacement DNA flap”) the encoded by the extension of the pegRNA.

5′ Endogenous DNA Flap Removal

As used herein, the term “5′ endogenous DNA flap removal” or “5′ flap removal” refers to the removal of the 5′ endogenous DNA flap that forms when the RT-synthesized single-strand DNA flap competitively invades and hybridizes to the endogenous DNA, displacing the endogenous strand in the process. Removing this endogenous displaced strand can drive the reaction towards the formation of the desired product comprising the desired nucleotide change. The cell's own DNA repair enzymes may catalyze the removal or excision of the 5′ endogenous flap (e.g., a flap endonuclease, such as EXO1 or FEN1). Also, host cells may be transformed to express one or more enzymes that catalyze the removal of said 5′ endogenous flaps, thereby driving the process toward product formation (e.g., a flap endonuclease). Flap endonucleases are known in the art and can be found described in Patel et al., “Flap endonucleases pass 5′-flaps through a flexible arch using a disorder-thread-order mechanism to confer specificity for free 5′-ends,” Nucleic Acids Research, 2012, 40(10): 4507-4519 and Tsutakawa et al., “Human flap endonuclease structures, DNA double-base flipping, and a unified understanding of the FEN1 superfamily,” Cell, 2011, 145(2): 198-211 (each of which are incorporated herein by reference).

3′ Replacement DNA Flap

As used herein, the term “3′ replacement DNA flap” or simply, “replacement DNA flap,” refers to the strand of DNA that is synthesized by the prime editor and which is encoded by the extension arm of the prime editor pegRNA. More in particular, the 3′ replacement DNA flap is encoded by the polymerase template of the pegRNA. The 3′ replacement DNA flap comprises the same sequence as the 5′ endogenous DNA flap except that it also contains the edited sequence (e.g., single nucleotide change). The 3′ replacement DNA flap anneals to the target DNA, displacing or replacing the 5′ endogenous DNA flap (which can be excised, for example, by a 5′ flap endonuclease, such as FEN1 or EXO1) and then is ligated to join the 3′ end of the 3′ replacement DNA flap to the exposed 5′ hydoxyl end of endogenous DNA (exposed after excision of the 5′ endogenous DNA flap, thereby reforming a phosophodiester bond and installing the 3′ replacement DNA flap to form a heteroduplex DNA containing one edited strand and one unedited strand. DNA repair processes resolve the heteroduplex by copying the information in the edited strand to the complementary strand permanently installs the edit in to the DNA. This resolution process can be driven further to completion by nicking the unedited strand, i.e., by way of “second-strand nicking,” as described herein.

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

The disclosure relates to a fusion protein comprising a nucleic acid-programmable RNA binding protein (napRNAbp) and an RNA-dependent RNA polymerase (RDRP). In some embodiments, the fusion protein when complexed to an RNA prime editing guide RNA (RpegRNA) is capable of appending a single-strand RNA sequence to a target RNA (e.g., to the 3′ end of the target RNA, or to the 3′ end of the RNA generated after cutting the RNA at a cut site). In some embodiments, the single-stand RNA sequence is appended to the 3′ terminus of the target RNA or to a 3′ terminus which is formed upon cleavage of the target RNA by the fusion protein at a cut site. In some embodiments, the single-strand RNA sequence is polymerized by the RDRP using the RpegRNA as a template.

As shown herein, the inventors have used Cas protein:RNA-dependent RNA Polymerase (RDRP) fusion proteins to target a specific RNA sequence with a specialized guide RNA, i.e., a RpegRNA.

RNA Prime Editor Embodiments

The present disclosure provides compositions and methods for the targeted modification of RNA molecules by RNA prime editing. The compositions and methods may be conducted in vitro or in vivo within cells (e.g., human cells) for the therapeutic correction of disease-causing mutations and/or installation of motifs or mutations in RNA molecules of interest as a tool for scientific research. The disclosure provides compositions and methods for conducting RNA prime editing of a target RNA molecule (e.g., an RNA transcript) that enables the incorporation of one or more nucleotide changes and/or targeted mutagenesis of a target RNA molecule. The nucleotide changes can include a single-nucleotide change, an insertion of one or more nucleotides, or a deletion of one or more nucleotides. More in particular, the disclosure provides a variety of configurations of the RNA prime editors each comprising a nucleic acid programmable RNA binding proteins (napRNAbp), such as Cas13, and an RNA-dependent RNA polymerase (RDRP), which are provided as fusion proteins or which can be separately provided in trans. The RNA prime editors are guided to a target RNA site by a guide RNA, which can be a rpegRNA that includes a template region for the synthesis of an RNA sequence to be installed on the RNA molecule attached to an available 3′ terminus. In others embodiments, the RNA template can be provided in trans. This application throughout describes a variety of amino acid and nucleotide sequences relating to various aspects of the present disclosure, including exemplary Cas13 sequences, RDRP sequences, fusion protein sequences, RpegRNAs, and other sequences.

napRNAbp (e.g., Cas13)

The RPE RNA editing system described herein comprises a nucleic acid programmable RNA binding protein (napRNAbp) domain. The napRNAbp is associated with at least one nucleic acid (e.g., an RPE guide RNA), which localizes the napRNAbp to an RNA sequence that comprises an RNA strand (i.e., a target strand) that is complementary to the guide nucleic acid, or a portion thereof (e.g. the protospacer of a guide RNA). In other words, the guide nucleic acid “programs” the napRNAbp domain to localize and bind to a complementary sequence of the target strand. Binding of the napRNAbp domain to a complementary sequence enables the RNA-dependent RNA polymerase domain of the RPE to access and enzymatically edit the target strand.

The below description of napRNAbps which can be used in connection with the disclosed nucleobase modification domains is not meant to be limiting in any way. The napRNAbp can be a CRISPR (clustered regularly interspaced short palindromic repeat)-associated nuclease. CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). Type VI CRISPR systems utilize a Cas13 protein. In some embodiments, the RPE RNA editing system described herein comprises Cas13, or any variant or equivalent that may be used in place of Cas13 in the RPE editing system. This includes any naturally occurring variant, mutant, or otherwise engineered version of Cas13 that is known or that can be made or evolved through a directed evolution or otherwise mutagenic process. In some embodiments, the napRNAbp has an inactive nuclease, e.g., are “dead” proteins.

As used herein, the term “Cas protein” refers to a full-length Cas protein obtained from nature, a recombinant Cas protein having a sequences that differs from a naturally occurring Cas protein, or any fragment of a Cas protein that nevertheless retains all or a significant amount of the requisite basic functions needed for the disclosed methods, i.e., possession of nucleic-acid programmable binding of the Cas protein to a target RNA. The Cas proteins contemplated herein embrace CRISPR Cas13 proteins, as well as Cas13 equivalents, variants (e.g., nuclease inactive Cas13 (dCas13)) homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant).

An exemplary Cas13 sequence is provided as follows; however, these specific examples are not meant to be limiting. The RNA prime editors of the present disclosure may use any suitable napRNAbp, including any suitable Cas13 or Cas13 equivalent:

SEQ

ID

NO:
SEQUENCE
DESCRIPTION

1
MKLTRRRISGNSVDQKITAAFYRDMSQGLLYYDSEDNDCTDKVIESMDFER
CAS13A

SWRGRILKNGEDDKNPFYMFVKGLVGSNDKIVCEPIDVDSDPDNLDILINK
CRISPR-

NLTGFGRNLKAPDSNDTLENLIRKIQAGIPEEEVLPELKKIKEMIQKDIVN
ASSOCIATED

RKEQLLKSIKNNRIPFSLEGSKLVPSTKKMKWLFKLIDVPNKTENEKMLEK
ENDORIBONUCLEASE

YWEIYDYDKLKANITNRLDKTDKKARSISRAVSEELREYHKNLRTNYNRFV
CAS13A

SGDRPAAGLDNGGSAKYNPDKEEFLLFLKEVEQYFKKYFPVKSKHSNKSKD
[HERBINIX

KSLVDKYKNYCSYKVVKKEVNRSIINQLVAGLIQQGKLLYYFYYNDTWQED
HEMICELLULOS-

FLNSYGLSYIQVEEAFKKSVMTSLSWGINRLISFFIDDSNTVKEDDITTKK
ILYTICA].

AKEAIESNYFNKLRTCSRMQDHFKEKLAFFYPVYVKDKKDRPDDDIENLIV
WP_103203632

LVKNAIESVSYLRNRTFHFKESSLLELLKELDDKNSGQNKIDYSVAAEFIK

RDIENLYDVFREQIRSLGIAEYYKADMISDCFKTCGLEFALYSPKNSLMPA

FKNVYKRGANLNKAYIRDKGPKETGDQGQNSYKALEEYRELIWYIEVKNND

QSYNAYKNLLQLIYYHAFLPEVRENEALITDFINRTKEWNRKETEERLNTK

NNKKHKNFDENDDITVNTYRYESIPDYQGESLDDYLKVLQRKQMARAKEVN

EKEEGNNNYIQFIRDVVVWAFGAYLENKLKNYKNELQPPLSKENIGLNDTL

KELFPEEKVKSPFNIKCRFSISTFIDNKGKSTDNTSAEAVKTDGKEDEKDK

KNIKRKDLLCFYLFLRLLDENEICKLQHQFIKYRCSLKERRFPGNRTKLEK

ETELLAELEELMELVRFTMPSIPEISAKAESGYDTMIKKYFKDFIEKKVFK

NPKISNLYYHSDSKTPVTRKYMALLMRSAPLHLYKDIFKGYYLITKKECLE

YIKLSNIIKDYQNSLNELHEQLERIKLKSEKQNGKDSLYLDKKDFYKVKEY

VENLEQVARYKHLQHKINFESLYRIFRIHVDIAARMVGYTQDWERDMHFLE

KALVYNGVLEERRFEAIFNNNDDNNDGRIVKKIQNNLNNKNRELVSMLCWN

KKLNKNEFGAIIWKRNPIAHLNHFTQTEQNSKSSLESLINSLRILLAYDRK

RQNAVTKTINDLLLNDYHIRIKWEGRVDEGQIYFNIKEKEDIENEPIIHLK

HLHKKDCYIYKNSYMEDKQKEWICNGIKEEVYDKSILKCIGNLFKFDYEDK

NKSSANPKHT

34
MAKILIAGLG KGIKKDGKYR ETNYSIEKID KENIIYKNES
CAS13 FROM

FITSALEKHF EIDKTIYIGT VGSMWDNLYS YYCNKYNLKE

LEPTOTRICHIA

DEDYTFELLE ASSNATQDSE FSEINIKKFN DIFEGKARII

TREVIASANII

LTKFGMNTNE IFENENLIME IGNMLNDGDE IYLDITHSFR
ACCESSION NO.

SNAMWMFLVI NYITNVLDKN VEVKMISYGM FEAKYKKEVI
BBM57444

ENGETKEIEI SPVVNLKAFF DLMKWIKGAN ELKNYGNSYT

ILEMIDDKDV NKKIRTFSDS LNLNYLGTIK RNLESIKRIM

DKIDSIEGPG KLIIPNIVKD FIEIFGNIEK EYEFLFKIAE

WNFNQKRYAM VAININEGLR EFVAGILEIE DRVSDFNDEN

SGIFKYFKKI RQSIEYKPNN TKGTFDKKEE KIYKIFEHTR

KIRNEIAHSK GEKDTAINDV ESLKNYVKDI DTIISDKGFI

KYLKLKYNV

35
1
MKLTRRRISG NSVDQKITAA FYRDMSQGLL YYDSEDNDCT DKVIESMDFE
CAS13A OF

RSWRGRILKN

HERBINIX

61
GEDDKNPFYM FVKGLVGSND KIVCEPIDVD SDPDNLDILI NKNLTGFGRN

HEMICELLULO-

LKAPDSNDTL

SILYTICA

121
ENLIRKIQAG IPEEEVLPEL KKIKEMIQKD IVNRKEQLLK SIKNNRIPFS
ACCESSION NO.

LEGSKLVPST
WP_103203632

181
KKMKWLFKLI DVPNKTFNEK MLEKYWEIYD YDKLKANITN RLDKTDKKAR

SISRAVSEEL

241
REYHKNLRTN YNRFVSGDRP AAGLDNGGSA KYNPDKEEFL LFLKEVEQYF

KKYFPVKSKH

301
SNKSKDKSLV DKYKNYCSYK VVKKEVNRSI INQLVAGLIQ QGKLLYYFYY

NDTWQEDFLN

361
SYGLSYIQVE EAFKKSVMTS LSWGINRLTS FFIDDSNTVK FDDITTKKAK

EAIESNYFNK

421
LRTCSRMQDH FKEKLAFFYP VYVKDKKDRP DDDIENLIVL VKNAIESVSY

LRNRTFHFKE

481
SSLLELLKEL DDKNSGQNKI DYSVAAEFIK RDIENLYDVF REQIRSLGIA

EYYKADMISD

541
CFKTCGLEFA LYSPKNSLMP AFKNVYKRGA NLNKAYIRDK GPKETGDQGQ

NSYKALEEYR

601
ELTWYIEVKN NDQSYNAYKN LLQLIYYHAF LPEVRENEAL ITDFINRTKE

WNRKETEERL

661
NTKNNKKHKN FDENDDITVN TYRYESIPDY QGESLDDYLK VLQRKQMARA

KEVNEKEEGN

721
NNYIQFIRDV VVWAFGAYLE NKLKNYKNEL QPPLSKENIG LNDTLKELFP

EEKVKSPFNI

781
KCRFSISTFI DNKGKSTDNT SAEAVKTDGK EDEKDKKNIK RKDLLCFYLF

LRLLDENEIC

841
KLQHQFIKYR CSLKERRFPG NRTKLEKETE LLAELEELME LVRFTMPSIP

EISAKAESGY

901
DTMIKKYFKD FIEKKVFKNP KTSNLYYHSD SKTPVTRKYM ALLMRSAPLH

LYKDIFKGYY

961
LITKKECLEY IKLSNIIKDY QNSLNELHEQ LERIKLKSEK QNGKDSLYLD

KKDFYKVKEY

1021
VENLEQVARY KHLQHKINFE SLYRIFRIHV DIAARMVGYT QDWERDMHFL

FKALVYNGVL

1081
EERRFEAIFN NNDDNNDGRI VKKIQNNLNN KNRELVSMLC WNKKLNKNEF

GAIIWKRNPI

1141
AHLNHFTQTE QNSKSSLESL INSLRILLAY DRKRQNAVTK TINDLLLNDY

HIRIKWEGRV

1201
DEGQIYFNIK EKEDIENEPI IHLKHLHKKD CYIYKNSYMF DKQKEWICNG

IKEEVYDKSI

1261
LKCIGNLFKF DYEDKNKSSA NPKHT

36
1
MKITKIDGVS HYKKQDKGIL KKKWKDLDER KQREKIEARY NKQIESKIYK
CAS13A OF

EFFRLKNKKR

LEPTOTRICHIA

61
IEKEEDQNIK SLYFFIKELY LNEKNEEWEL KNINLEILDD KERVIKGYKF

WADEI

KEDVYFFKEG
ACCESSION NO.

121
YKEYYLRILF NNLIEKVQNE NREKVRKNKE FLDLKEIFKK YKNRKIDLLL
WP_036059678

KSINNNKINL

181
EYKKENVNEE IYGINPTNDR EMTFYELLKE IIEKKDEQKS ILEEKLDNED

ITNFLENIEK

241
IFNEETEINI IKGKVLNELR EYIKEKEENN SDNKLKQIYN LELKKYIENN

FSYKKQKSKS

301
KNGKNDYLYL NFLKKIMFIE EVDEKKEINK EKFKNKINSN FKNLFVQHIL

DYGKLLYYKE

361
NDEYIKNTGQ LETKDLEYIK TKETLIRKMA VLVSFAANSY YNLFGRVSGD

ILGTEVVKSS

421
KTNVIKVGSH IFKEKMLNYF FDFEIFDANK IVEILESISY SIYNVRNGVG

HENKLILGKY

481
KKKDININKR IEEDLNNNEE IKGYFIKKRG EIERKVKEKF LSNNLQYYYS

KEKIENYFEV

541
YEFEILKRKI PFAPNFKRII KKGEDLENNK NNKKYEYFKN FDKNSAEEKK

EFLKTRNELL

601
KELYYNNFYK EFLSKKEEFE KIVLEVKEEK KSRGNINNKK SGVSFQSIDD

YDTKINISDY

661
IASIHKKEME RVEKYNEEKQ KDTAKYIRDE VEEIFLTGFI NYLEKDKRLH

FLKEEFSILC

721
NNNNNVVDEN ININEEKIKE FLKENDSKTL NLYLFENMID SKRISEFRNE

LVKYKQFTKK

781
RLDEEKEFLG IKIELYETLI EFVILTREKL DTKKSEEIDA WLVDKLYVKD

SNEYKEYEEI

841
LKLFVDEKIL SSKEAPYYAT DNKTPILLSN FEKTRKYGTQ SFLSEIQSNY

KYSKVEKENI

901
EDYNKKEEIE QKKKSNIEKL QDLKVELHKK WEQNKITEKE IEKYNNTTRK

INEYNYLKNK

961
EELQNVYLLH EMLSDLLARN VAFFNKWERD FKFIVIAIKQ FLRENDKEKV

NEFLNPPDNS

1021
KGKKVYFSVS KYKNIVENID GIHKNEMNLI FLNNKFMNRK IDKMNCAIWV

YFRNYIAHFL

1081
HLHTKNEKIS LISQMNLLIK LFSYDKKVQN HILKSTKILL EKYNIQINFE

ISNDKNEVEK

1141
YKIKNRLYSK KGKMLGKNNK FEILENEFLE NVKAMLEYSE

37
1
MRVSKVKVKD GGKDKMVLVH RKTTGAQLVY SGQPVSNETS NILPEKKRQS
CAS13A OF

FDLSTINKTI

PALUDIBACTER

61
IKFDTAKKQK LNVDQYKIVE KIFKYPKQEL PKQIKAEEIL PFLNHKFQEP

PROPIONICIGENES

VKYWKNGKEE
ACCESSION NO.

121
SENLTLLIVE AVQAQDKRKL QPYYDWKTWY IQTKSDLLKK SIENNRIDLT
WP_013443710

ENLSKRKKAL

181
LAWETEFTAS GSIDLTHYHK VYMTDVLCKM LQDVKPLTDD KGKINTNAYH

RGLKKALQNH

241
QPAIFGTREV PNEANRADNQ LSIYHLEVVK YLEHYFPIKT SKRRNTADDI

AHYLKAQTLK

301
TTIEKQLVNA IRANIIQQGK TNHHELKADT TSNDLIRIKT NEAFVLNLTG

TCAFAANNIR

361
NMVDNEQTND ILGKGDFIKS LLKDNTNSQL YSFFFGEGLS TNKAEKETQL

WGIRGAVQQI

421
RNNVNHYKKD ALKTVFNISN FENPTITDPK QQTNYADTIY KARFINELEK

IPEAFAQQLK

481
TGGAVSYYTI ENLKSLLTTF QFSLCRSTIP FAPGFKKVEN GGINYQNAKQ

DESFYELMLE

541
QYLRKENFAE ESYNARYFML KLIYNNLFLP GFTTDRKAFA DSVGFVQMQN

KKQAEKVNPR

601
KKEAYAFEAV RPMTAADSIA DYMAYVQSEL MQEQNKKEEK VAEETRINFE

KFVLQVFIKG

661
FDSFLRAKEF DFVQMPQPQL TATASNQQKA DKLNQLEASI TADCKLTPQY

AKADDATHIA

721
FYVFCKLLDA AHLSNLRNEL IKFRESVNEF KFHHLLEIIE ICLLSADVVP

TDYRDLYSSE

781
ADCLARLRPF IEQGADITNW SDLFVQSDKH SPVIHANIEL SVKYGTTKLL

EQIINKDTQF

841
KITEANFTAW NTAQKSIEQL IKQREDHHEQ WVKAKNADDK EKQERKREKS

NFAQKFIEKH

901
GDDYLDICDY INTYNWLDNK MHFVHLNRLH GLTIELLGRM AGFVALFDRD

FQFFDEQQIA

961
DEFKLHGFVN LHSIDKKLNE VPTKKIKEIY DIRNKIIQIN GNKINESVRA

NLIQFISSKR

1021
NYYNNAFLHV SNDEIKEKQM YDIRNHIAHF NYLTKDAADF SLIDLINELR

ELLHYDRKLK

1081
NAVSKAFIDL FDKHGMILKL KLNADHKLKV ESLEPKKIYH LGSSAKDKPE

YQYCTNQVMM

1141
AYCNMCRSLL EMKK

38
1
MWISIKTLIH HLGVLFFCDY MYNRREKKII EVKTMRITKV EVDRKKVLIS
CAS13A OF

RDKNGGKLVY

LISTERIA

61
ENEMQDNTEQ IMHHKKSSFY KSVVNKTICR PEQKQMKKLV HGLLQENSQE

SEELIGERI

KIKVSDVTKL
ACCESSION NO:

121
NISNFLNHRF KKSLYYFPEN SPDKSEEYRI EINLSQLLED SLKKQQGTFI
WP_01298577

CWESFSKDME

181
LYINWAENYI SSKTKLIKKS IRNNRIQSTE SRSGQLMDRY MKDILNKNKP

FDIQSVSEKY

241
QLEKLTSALK ATFKEAKKND KEINYKLKST LQNHERQIIE ELKENSELNQ

FNIEIRKHLE

301
TYFPIKKINR KVGDIRNLEI GEIQKIVNHR LKNKIVQRIL QEGKLASYEI

ESTVNSNSLQ

361
KIKIEEAFAL KFINACLFAS NNLRNMVYPV CKKDILMIGE FKNSFKEIKH

KKFIRQWSQF

421
FSQEITVDDI ELASWGLRGA IAPIRNEIIH LKKHSWKKFF NNPTFKVKKS

KIINGKTKDV

481
TSEFLYKETL FKDYFYSELD SVPELIINKM ESSKILDYYS SDQLNQVFTI

PNFELSLLTS

541
AVPFAPSFKR VYLKGEDYQN QDEAQPDYNL KLNIYNEKAF NSEAFQAQYS

LFKMVYYQVE

601
LPQFTINNDL FKSSVDFILT LNKERKGYAK AFQDIRKMNK DEKPSEYMSY

IQSQLMLYQK

661
KQEEKEKINH FEKFINQVFI KGENSFIEKN RLTYICHPTK NTVPENDNIE

IPFHTDMDDS

721
NIAFWLMCKL LDAKQLSELR NEMIKFSCSL QSTEEISTFT KAREVIGLAL

LNGEKGCNDW

781
KELFDDKEAW KKNMSLYVSE ELLQSLPYTQ EDGQTPVINR SIDLVKKYGT

ETILEKLESS

841
SDDYKVSAKD IAKLHEYDVT EKIAQQESLH KQWIEKPGLA RDSAWTKKYQ

NVINDISNYQ

901
WAKTKVELTQ VRHLHQLTID LLSRLAGYMS IADRDFQFSS NYILERENSE

YRVTSWILLS

961
ENKNKNKYND YELYNLKNAS IKVSSKNDPQ LKVDLKQLRL TLEYLELFDN

RLKEKRNNIS

1021
HFNYLNGQLG NSILELEDDA RDVLSYDRKL KNAVSKSLKE ILSSHGMEVT

FKPLYQTNHH

1081
LKIDKLQPKK IHHLGEKSTV SSNQVSNEYC QLVRTLLTMK

39
1
XENKISLGNN IYYNPFKPQD KSYFAGYFNA AXENTDSVFR ELGKRLKGKE
CAS13B (R1177A)

YTSENFFDAI
MUTANT OF

61
FKENISLVEY ERYVKLLSDY FPXARLLDKK EVPIKERKEN FKKNFKGIIK

BEGEYELLA

AVRDLRNFYT

ZOOHELCUM

121
HKEHGEVEIT DEIFGVLDEX LKSTVLTVKK KKVKTDKTKE ILKKSIEKQL
ACCESSION NO.

DILCQKKLEY
6AAY_A

181
LRDTARKIEE KRRNQRERGE KELVAPFKYS DKRDDLIAAI YNDAFDVYID

KKKDSLKESS

241
KAKYNTKSDP QQEEGDLKIP ISKNGVVFLL SLFLTKQEIH AFKSKIAGFK

ATVIDEATVS

301
EATVSHGKNS ICFXATHEIF SHLAYKKLKR KVRTAEINYG EAENAEQLSV

YAKETLXXQX

361
LDELSKVPDV VYQNLSEDVQ KTFIEDWNEY LKENNGDVGT XEEEQVIHPV

IRKRYEDKFN

421
YFAIRFLDEF AQFPTLRFQV HLGNYLHDSR PKENLISDRR IKEKITVFGR

LSELEHKKAL

481
FIKNTETNED REHYWEIFPN PNYDFPKENI SVNDKDFPIA GSILDREKQP

VAGKIGIKVK

541
LLNQQYVSEV DKAVKAHQLK QRKASKPSIQ NIIEEIVPIN ESNPKEAIVF

GGQPTAYLSX

601
NDIHSILYEF FDKWEKKKEK LEKKGEKELR KEIGKELEKK IVGKIQAQIQ

QIIDKDTNAK

661
ILKPYQDGNS TAIDKEKLIK DLKQEQNILQ KLKDEQTVRE KEYNDFIAYQ

DKNREINKVR

721
DRNHKQYLKD NLKRKYPEAP ARKEVLYYRE KGKVAVWLAN DIKRFXPTDF

KNEWKGEQHS

781
LLQKSLAYYE QCKEELKNLL PEKVFQHLPF KLGGYFQQKY LYQFYTCYLD

KRLEYISGLV

841
QQAENFKSEN KVFKKVENEC FKFLKKQNYT HKELDARVQS ILGYPIFLER

GFXDEKPTII

901
KGKTFKGNEA LFADWFRYYK EYQNFQTFYD TENYPLVELE KKQADRKRKT

KIYQQKKNDV

961
FTLLXAKHIF KSVFKQDSID QFSLEDLYQS REERLGNQER ARQTGERNTN

YIWNKTVDLK

1021
LCDGKITVEN VKLKNVGDFI KYEYDQRVQA FLKYEENIEW QAFLIKESKE

EENYPYVVER

1081
EIEQYEKVRR EELLKEVHLI EEYILEKVKD KEILKKGDNQ NFKYYILNGL

LKQLKNEDVE

1141
SYKVENLNTE PEDVNINQLK QEATDLEQKA FVLTYIANKF AHNQLPKKEF

WDYCQEKYGK

1201
IEKEKTYAEY FAEVFKKEKE ALIKLEHHHH HH

40
1
MAKKNKMKPR ELREAQKKAR QLKAAEINNN AAPAIAAMPA AEVIAPVAEK
CAS13D, CHAIN A

KKSSVKAAGM
ACCESSION NO.

61
KSILVSENKM YITSFGKGNS AVLEYEVDNN DYNKTQLSSK DNSNIELGDV
CIV9_A

NEVNITFSSK

121
HGFGSGVEIN TSNPTHRSGE SSPVRGDMLG LKSELEKRFF GKTFDDNIHI

QLIYNILDIE

181
KILAVYVINI VYALNNMLGI KDSESYDDEM GYLSARNTYE VFTHPDKSNL

SDKVKGNIKK

241
SLSKFNDLLK TKRLGYFGLE EPKTKDTRAS EAYKKRVYHM LAIVGQIAQC

VFHDKSGAKR

301
FDLYSFINNI DPEYRDTLDY LVEERLKSIN KDFIEGNKVN ISLLIDMMKG

YEADDIIRLY

361
YDFIVLKSQK NLGFSIKKLR EKMLEEYGER FKDKQYDSVR SKMYKLMDFL

LFCNYYRNDV

421
AAGEALVRKL RFSMIDDEKE GIYADEAAKL WGKERNDFEN IADHMNGDVI

KELGKADMDF

481
DEKILDSEKK NASDLLYFSK MIYMLTYFLD GKEINDLLTT LISKFDNIKE

FLKIMKSSAV

541
DVECELTAGY KLFNDSQRIT NELFIVKNIA SMRKPAASAK LIMFRDALTI

LGIDDNITDD

601
RISEILKLKE KGKGIHGLRN FITNNVIESS RFVYLIKYAN AQKIREVAKN

EKVVMFVLGG

661
IPDTQIERYY KSCVEFPDMN SSLEAKRSEL ARMIKNISFD DFKNVKQQAK

GRENVAKERA

721
KAVIGLYLTV MYLLVKNLVN VNARYVIAIH CLERDFGLYK EIIPELASKN

LKNDYRILSQ

781
TLCELCDDRN ESSNLFLKKN KRLRKCVEVD INNADSSMTR KYANCIAHLT

VVRELKEYIG

841
DIRTVDSYFS IYHYVMQRCI TKRGDDTKQE EKIKYEDDLL KNHGYTKDFV

KALNSPFGYN

901
IPRFKNLSIE QLFDRNEYLT EKLEHHHHHH

41
1
MGKKIHARDL REQRKTDRTE KFADQNKKRE AERAVPKKDA AVSVKSVSSV
ES CAS13D,

SSKKDNVTKS
CHAIN A, FROM

61
MAKAAGVKSV FAVGNTVYMT SFGRGNDAVL EQKIVDTSHE PLNIDDPAYQ

EUBACTERIUM

LNVVTMNGYS

SIRAEUM DSM

121
VTGHRGETVS AVTDNPLRRF NGRKKDEPEQ SVPTDMLCLK PTLEKKFFGK
15702

EFDDNIHIQL
ACCESSION NO.

181
IYNILDIEKI LAVYSTNAIY ALNNMSADEN IENSDFFMKR TTDETFDDFE
6E9F_A

KKKESTNSRE

241
KADFDAFEKF IGNYRLAYFA DAFYVNKKNP KGKAKNVLRE DKELYSVLTL

IGKLAHWCVA

301
SEEGRAEFWL YKLDELKDDF KNVLDVVYNR PVEEINNRFI ENNKVNIQIL

GSVYKNTDIA

361
ELVRSYYEFL ITKKYKNMGF SIKKLRESML EGKGYADKEY DSVRNKLYQM

TDFILYTGYI

421
NEDSDRADDL VNTLRSSLKE DDKTTVYCKE ADYLWKKYRE SIREVADALD

GDNIKKLSKS

481
NIEIQEDKLR KCFISYADSV SEFTKLIYLL TRFLSGKEIN DLVTTLINKF

DNIRSFLEIM

541
DELGLDRIFT AEYSFFEGST KYLAELVELN SFVKSCSFDI NAKRTMYRDA

LDILGIESDK

601
TEEDIEKMID NILQIDANGD KKLKKNNGLR NFIASNVIDS NRFKYLVRYG

NPKKIRETAK

661
CKPAVRFVLN EIPDAQIERY YEACCPKNTA LCSANKRREK LADMIAEIKF

ENFSDAGNYQ

721
KANVTSRISE AEIKRKNQAI IRLYLTVMYI MLKNLVNVNA RYVIAFHCVE

RDTKLYAESG

781
LEVGNIEKNK TNLTMAVMGV KLENGIIKTE FDKSFAENAA NRYLRNARWY

KLILDNLKKS

841
ERAVVNEFAN TVCALNAIRN ININIKEIKE VENYFALYHY LIQKHLENRF

ADKKVERDTG

901
DFISKLEEHK TYCKDFVKAY CTPFGYNLVR YKNLTIDGLF DKNYPGKDDS

DEQK

The present application contemplates any Cas13 homolog (e.g., Cas13a, Cas13b, Cas13c, or Cas13d), variant, or equivalent there of having an amino acid sequence that is at least 80%, or 85%, or 90%, or 95%, or 99% identical with SEQ ID NO: 1, or with any of the sequences of SEQ ID NOs: 34-41.

Other Cas13 sequences that may be used can include, but are not limited to: (a) Cas13a of Leptotrichia wadei (Ref Seq No. WP_03059678.1); (b) Cas13a of Leptotrichia buccalis (Ref Seq No. WP_015770004.1); (c) any Cas13b sequence known in the art, (d) any Cas13d sequence known in the art, and (e) any Pumby sequence known in the art, or any homology, variant, or equivalent there of having an amino acid sequence that is at least 80%, or 85%, or 90%, or 95%, or 99% identical with any of these alternate Cas13 sequences.

In some embodiments, the disclosed RNA prime editors may comprise a catalytically inactive, or “dead,” napRNAbp domain. In certain embodiments, the base editors described herein may include a dead Cas13 that has no nuclease activity due to one or more mutations. The nuclease inactivation may be due to one or mutations that result in one or more substitutions and/or deletions in the amino acid sequence of the encoded protein, or any variants thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto. As used herein, the term “dCas13” refers to a nuclease-inactive Cas13 or nuclease-dead Cas13, or a functional fragment thereof, and embraces any naturally occurring dCas13 from any organism, any naturally-occurring dCas13 equivalent or functional fragment thereof, any dCas13 homolog, ortholog, or paralog from any organism, and any mutant or variant of a dCas13, naturally-occurring or engineered. The term dCas13 is not meant to be particularly limiting and may be referred to as a “dCas13 or equivalent.”

RNA-Dependent RNA Polymerase (RDRP)

As used herein, the term “polymerase” refers to an enzyme that synthesizes a nucleotide strand and which may be used in connection with the RNA prime editing system described herein. The polymerase may be a wild type polymerase, a functional fragment, a mutant, a variant, or a truncated variant, and the like. The polymerase may include wild type polymerases from eukaryotic, prokaryotic, archael, or viral organisms, and/or the polymerase may be modified by genetic engineering, mutagenesis, directed evolution-based processes. The polymerase can be a “template-dependent” polymerase (i.e., a polymerase which synthesizes a nucleotide strand based on the order of nucleotide bases of a template strand). The polymerase can also be a “template-independent” polymerase (i.e., a polymerase which synthesizes a nucleotide strand without the requirement of a template strand). A polymerase may also be further categorized as a “DNA polymerase” or an “RNA polymerase.” In various embodiments, the RPE RNA editing system described herein comprises an RNA polymerase. In various embodiments, the RPE RNA editing system described herein comprises an RNA-dependent DNA polymerase (RDRP), or any variant or equivalent that may be used in place of the RDRP component in the RPE editing system. A list of exemplary RDRP sequences is provided as follows:

SEQ ID NO:
SEQUENCE
DESCRIPTION

2
MHTDDKSCATRLLSLLGVRSVPRERFADVDKVGVWTVLNQRLILLSWHVGM
RNA-DEPENDENT

VIDSTFEHVNSDGHPSKCLEIMIQRSREFAYSAERSGLCGYYIHIFNFRDH
RNA POLYMERASE

VSFAIKTAHLKKYTQTLLACDTTTPIHETVEAFSPPTGRAGGKVFVTPRLL
[TRICHODERMA

KSIGNPTILQCALATKLKENLHRLDWEQIMVTGMILLQYLNVGAGDVLWYL

ATROVIRIDE

ITYEKLWDSKFLDAVKILKDAHVETRIKHKLPNVSIHYARSTFARPWARSL
MYCOVIRUS]

YGLDTLPGRSEKFNMNFQSEQLMRMIDTSKRAWPAIVEDKYGNNYIKFFPD
REFSEQ

KYTEQVKQIARKTAEDLIQSQTNLETFTEFFSNRMFWGASGGAPGASVHWD
YP_009342055-

DTKEKLRVNKRGALLSLKEQKARNILEYMRSHPDPKPIQWSVKAVKYESGK

LRSILNTVLENYVIQGYIFNAVDTNSRRDSWYANTHDNPSRIANAMRRILD

LKQRPGLMWDYADFNLNHTFFLMSQEYLARVEVLLERCDSHLPMDVQNTIR

ADMRAATAYAILARYNTYLHDPETMITTQAVRSLQSGERGTSSINSDSNET

DTTIVRRVCKEMLGIDPIVPVTDHAGDDAFENVISMTYAPLVCSVYNLTGA

AGQAYKIAVSYATENGASGEFLRLSYDAASNHIAGYPIRGMMGFIHGEFFA

EALPQPFDRLASFLNQRNKLQRRGWVAPDSLFNAVCRYNTRLTYTLSDGTK

RHFYPDIETVLTPAAFGGVGVDTVDSQLLSQLSDKQIVTPLHANCPYDAIV

IPSGEGKTTLARKYPDIFVDHDSLVSSINLIALRSRAVSSGNWEPLNAYLR

GEGERYMSVNRGKILLTWSPSTAPSRSRICALLLQQPVGLRANIANRSSIM

NDMNKKYVHMFKNYSERDAYLMSITAGLTGLTYKVYQRTGDAIPKFNWPRV

DSKDLLHRSKTVIKDHATLHRHNLPMDVSITDAIAQSALSGAWPKDALYKS

IAEHARQLAEWARKSSFEYKIIAPLKLCSDTEFVAEAVSTSMYVLGLSGSL

NHSNAGGGLTFVLNDLLRPATKRLKHHYNYIPTLTRLCGCSVNASTQYLIK

CSNGDNLLEKILSLMASERSFSHSKQRATKQFDEVIEFIDRLGLLKQEPLL

AFGKVVSPSLAEAWFTGGLQLLPPAICHQSADLSTFVRDITLNVIETHFLA

RLNRTNDLQQIVVTVHHYERMAQYAINKVLNGLIPGIIMQD

3
MVAITVQGAQLIKRVVERFYPGIAFNINEGACYIYKFSDHIRRIRMKHGTK
RNA-DEPENDENT

YRRQAEEIIRNIGLRKERLYGIPVLDEVEWKCVEDGQTFQSYAFEVYVNSI
RNA POLYMERASE

LPWSELDPEEEFLRNYRVSREMTEVEKFIEFRAKNEMQIYGDIPIKVWCCF
[BLUETONGUE

INELSVELKHIPLGMQVMADFVNREDSPFHQGNRDLSNLEDFQVAYTTPLL
VIRUS].

FEMCCMESILEFNIKMRMREEDISALEFGDMKVDPVGLLREFFILCLPHPK
ADI49523

KINNVLRAPYSWFVKMWGVGADPIVVLQSTAGDDRNSKDVFYDKERTEPNR

YKALFRSSFYNESRRMNEEKILEAVKYSQKLGSHDRRLPLFEKMLKTVYTT

PFYPHKSSNMILASFLLSIQTITGYGRAWVKNVSTEFDKQLKPNPSNLVQD

VSDLTREFFKQAYVEAKERREEIVKPEDLYTSMLRLARNTSSGESTEIYVK

KRFGPRVKDKDLIKINSRIKALVIFTKGHTVFTDEELHKKYNSVELYQTKG

SRDVPIKATRTIYSINLSVLVPQLIVTLPLNEYFSRVGGITSPDYKKIGGK

VIVGDLEATGSRVMDAADCFRNSADRDIFTIAIDYSEYDTHLTRHNFRTGM

LQGIREAMAPYRDLRYEGYTLEQIIDFGYGEGRVANTLWNGKRRLFKTTED

AYIRLDESERDKGSFKVPKGVLPVSSVDVANRIAVDKGFDTLVAATDGSDL

ALIDTHLSGENSTLIANSMHNMAIGTLIQREVGREQPGILTFLSEQYVGDD

TLFYTKLHTTDTKVEDKVAASIFDTVAKCGHEASPSKTMMTPYSVEKTQTH

AKQGCYVPQDRMMIISSERRKDIEDVQGYVRSQVQTMITKVSRGFCHDLAQ

LILMLKTTFIGAWKMKRTIKEDAMYRDRKFDSNDEDGETLIQIRNPLALYV

PIGWNGYGAHPAALNIVMTEEMYVDSIMISKLDEIMAPIRRIVHDIPPCWN

ETQGDKRGLISATKMSFFSKMARPAVQAALSDPQIMNLVEELPLGEFSPGR

ISRTMMHSALLKESSARTLLSSGYELEYQKALNSWIAQVSMRLGEESGVIS

TSYAKLFDVYFEGELDGAPYMEPDQNLSPQFYIQKMMIGPRVSSRVRNSYV

DRIDVILRKDVVMRGFITANTILNVIEKLGTNHSVGDLVTVFILMNIETRV

AEELAEYMTSEKIRFDALKLLKKGIAGDEFTMSLNVATQDFIDTYLAYPYQ

LTKTEVDAISLYCTQMVMLRAALGLPKKKMKIVVTDDAKKRYKIRLQRFRT

HVPKIKVLKKLIDPNRMTVRNLENQFV

4
MEQNAFNGFEFVDYSEELENLNQNHIHKVRRESNTTYVDKFAERELIDLHP
RNA-DEPENDENT

EYHRQFIQGWSRSYYNTERHMEALLNYGTRNIPVDNVDYNLYQGCIDTVKN
RNA POLYMERASE

GLRSLPRVKAFDVLTELNLVSYKSSTAAGYNYMGAKGPFDGYNHKQAIRRA
[PYRUS

RATVGDVSDNGIEGLRRAITTAVPDVGYTRTQLTDLTEKTKIRNVWGRAFH

PYRIFOLIA].

YILIEGTSADPLIRMFSKTKSFYHIGRDPLDSVPDVLSETAGKARWLYAID
BAA34783

WKQFDATVSRFEINAAFDIIMDLIEFPNYPTYVAFELSRQLFIHKKIAAPD

GYIYWSHKGIPSGSYFTSIIGSIINRLRIEYLWRKITGHGPLACYTQGDDS

LSCDDEFTPPEKFAEIANQIGWVLNPEKTEYSTIPSEVHFLGRTMLGGLNT

REIKRCLRLLIYPEYPVDSGRISAYRAKSISEDVGRLSELLNKIERRLQGQ

YGIASDEEVPDYFKRYVL

5
MSDVFNSPQARSTISAAFGIKPTAGQDVEELLIPKVWVPPEDPLASPSRLA
RNA-DEPENDENT

KFLRENGYKVLQPRSLPENEEYETDQILPDLAWMRQIEGAVLKPTLSLPIG
RNA POLYMERASE

DQEYFPKYYPTHRPSKEKPNAYPPDIALLKQMIYLFLQVPEANEGLKDEVT
[INFECTIOUS

LLTQNIRDKAYGSGTYMGQANRLVAMKEVATGRNPNKDPLKLGYTFESIAQ
BURSAL DISEASE

LLDITLPVGPPGEDDKPWVPLTRVPSRMLVLTGDVDGDFEVEDYLPKINLK
VIRUS].

SSSGLPYVGRTKGETIGEMIAISNQFLRELSTLLKQGAGTKGSNKKKLLSM
ABS18957

LSDYWYLSCGLLFPKAERYDKSTWLTKTRNIWSAPSPTHLMISMITWPVMS

NSPNNVLNIEGCPSLYKFNPERGGLNRIVEWILAPEEPKALVYADNIYIVH

SNTWYSIDLEKGEANCTRQHMQAAMYYILTRGWSDNGDPMENQTWATFAMN

IAPALVVDSSCLIMNLQIKTYGQGSGNAATFINNHLLSTLVLDQWNLMRQP

RPDSEEFKSIEDKLGINFKIERSIDDIRGKLRQLVLLAQPGYLSGGVEPEQ

SSPTVELDLLGWSATYSKDLGIYVPVLDKERLFCSAAYPKGVENKSLKSKV

GIEQAYKVVRYEALRLVGGWNYPLLNKACKNNAGAARRHLEAKGFPLDEFL

AEWSELSEFGEAFEGFNIKLTVTSESLAELNKPVPPKPPNVNRPVNTGGLK

AVSNALKTGRYRNEAGLSGLVLLATARSRLQDAVKAKAEAEKLHKSKPDDP

DADWFERSETLSDLLEKADIASKVAHSALVETSDALEAVQSTSVYTPKYPE

VKNPQTASNPVVGLHLPAKRATGVQAALLGAGTSRPMGMEAPTRSKNAVKM

AKRRQRQKESRQ

6
GEIQWMRPSKEVGYPIINAPSKTKLEPSAFHYVFEGVKEPAVLTKNDPRLK
POLIO VIRUS

TDFEEAIFSKYVGNKITEVDEYMKEAVDHYAGQLMSLDINTEQMCLEDAMY
RDRP

GTDGLEALDLSTSAGYPYVAMGKKKRDILNKQTRDTKEMQKLLDTYGINLP

LVTYVKDELRSKTKVEQGKSRLIEASSLNDSVAMRMAFGNLYAAFHKNPGV

ITGSAVGCDPDLFWSKIPVLMEEKLFAFDYTGYDASLSPAWFEALKMVLEK

IGFGDRVDYIDYLNHSHHLYKNKTYCVKGGMPSGCSGTSIFNSMINNLIIR

TLLLKTYKGIDLDHLKMIAYGDDVIASYPHEVDASLLAQSGKDYGLIMTPA

DKSATFETVTWENVTFLKRFFRADEKYPFLIHPVMPMKEIHESIRWTKDPR

NTQDHVRSLCLLAWHNGEEEYNKFLAKIRSVPIGRALLLPEYSTLYRRWLD

SE

7
MHDYLSDLNVYVPYQRPNIVLHGSDFWLRTLDCHTEIPLAAIYFGNIQGGT

C. ELEGANS RDRP

YFNHWQVSFSRENISSRDMLHKIHAEFEFDKTDMITVQFQCFEEKKQKFED

SRKQKVRVNYQLTIRRDSIRRIIVDPRVEGCNTCVHFEVNCPPLIRKGYID

NDKSSFHKPFYERQKRFDCDWRNGNVNHGNPQDAAIADSPFFTIEFHKEIS

TKEMYRVLSRLRSRTKVLIEFANLPSIDVPMGSHYPYNRWNLKKSPTDSNA

PIFREFLKEIFPPKYEIVDDKLIDVNEERKFSITYLIECLLSRGAIVKDQL

LLNEQHWKNFLEIIIWYYRNDNQLCEAALEDLVHLIDGRKRIGSILKCLDK

ICQKREVMKLVNGLTEKESIEGYQRVRKVIFTPTRVIYIAPETIMGNRVLR

KFDKDGTRVLRVTFRDDNNKKMRSNVTGKLLDRTANKYLEHGVRIANREYG

FLGCSNSQMRDNGAYFMMRFTDKQLDRFYKCNPTASNINFKPKIDEVRFQL

GRFSEIENVPKLMARLGQCFTQSRLTGVGLGRDDYCSTYDLIGGRATNGSE

YTFSDGVGMMSYQFAQEVSQAMQFGKAVPSCFQIRERGNKGVIAIEPFLDE

IRKWALVNGVTSMKMAKCLFRPSQIKFQAKAISGDQIEMVKESSAVLVALN

KPFINILDQVSEMQSLDCHKRITSRIEELMDRQILSFAKQMNEETFCRNKL

KEFPRRIDIDNLRTMWGFTLSSEPFERSLIKASIKESITKQLCKEQIQIPS

ELGRSMLGVVDETGRLQYGQIFVQYTKNYKKKLPPRDSNNKVHGSEIVTGT

VLLTKNPCIVPGDVRIFEAVDIPELHHMCDVVVFPQHGPRPHPDEMAGSDL

DGDEYSVIWDQELLLERNEEPFDFAVEKIKVPYDREKLDVLMREFYVTYLK

LDSVGQISNSHLHNSDQYGLNSRVCMDLAKKNCQAVDFTKSGQPPDPLETK

WRADPVTFEVIPPENPERIPDFHMGNERSPMYVSPRLCGKLFREFQAIDNV

IKISEERDEQYNIELDETIFVTGFERYMESAQKQLSSYNGQLRSIMENYGI

RSEGEIMSGCIVEMRNRISDKDQDDMSFYNTNQMIETKMTSLVCKFRETFF

EEFGGFTVKCTLLPNAYDNGNCLNYRCEDPDQEVRKKAVAWYRACYECAQS

TREVRKLSFAWIAYDVIAKVKETNVLNNERMQIGGANPMYTFLEEHRKQYL

IDHDADFKNFCELDHLITGEKSKEAISILKIYLEMIPGLDSVFFMLMRWGE

SLRLFDGKPIKIYHFFLMFILFATRQLASADGNAEPFFKIIEKEEYEKQKR

DSSRGNIDPLTEKKRSDMMVKFFQFMGCRKFRKMSTLSFCPLNFSSIFMRG

EWRIFHESALKTYYNILFNLRFEELPVSSDPTITAETMDRECEPFVIELPE

NINVNDLINNMKKHTNVSTVKMRRQEKNPINDKAKPKTTVRYIVSVSGTLE

SIQMLKKLSAVTIPIKSHWEGEEVSQQMASLCYQKVMNGEF

The present application contemplates any RDRP homology, variant, or equivalent there of having an amino acid sequence that is at least 80%, or 85%, or 90%, or 95%, or 99% identical with any of SEQ ID NOs: 2-7.

RpegRNA

As used herein, the terms “RNA prime editing guide RNA” or “RpegRNA” refer to a specialized form of a guide RNA that has been modified to include one or more additional sequences for implementing the RNA prime editing methods and compositions described herein. The RPE RNA editing system described herein comprises an RpegRNA to direct the Cas13 component to the target RNA molecule of interest. In general RpegRNA have structures that are similar to PEgRNA editing systems and comprise (a) a spacer sequence, which comprises a sequence complementary to the target RNA sequence, (b) a core sequence which allows the RpegRNA to bind to the napRNAbp component, and (c) an extension arm, which comprises a (i) primer sequence that anneals to the 3′ end of the RNA (or an internal 3′ end created after cleavage of the target RNA) to create a double stranded RNA substrate for polymerization by the RDRP, and (ii) a template region that provides the coding template for the RDRP to synthesize new RNA at the natural 3′ end (or at an internal 3′ end created after RNA cleavage) (see FIGS. 1-4). A exemplary RpegRNA sequence is provided as follows:

8

GGCTTCCATCTCTTTGAGCACCTCCAGCGG
GTTGTGGAAGGTCCAGTTTTG

RPEGRNA, BOUND

AGGGGCTATTACAAGTNAATTGCAGC
BY PSPCAS13B.

UNDERLINED IS

SPACER,

ITALICIZED IS

PSPCAS13B

DIRECT REPEAT,

FOLLOWING IS

TEMPLATE, BOLD

IS BINDING

SITE.

Cas13-RDRP Fusion Proteins

The term “fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an “amino-terminal fusion protein” or a “carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas13 that directs the binding of the protein to a target site) and an RNA polymerase. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.

The RPE RNA editing system described herein comprises a fusion protein comprising an napRNAbp (e.g., Cas13) and an RNA-dependent DNA polymerase (RDRP), optionally fused by a linker. A non-limiting list of exemplary Cas13-RDRP fusion protein sequences is provided as follows:

9
MKRTADGSEFESPKKKRKVSGSETPGTSESATPES[X]SGSETPGTSESAT
DCAS13B FUSED

PESIPALVENQKKYFGTYSVMAMLNAQTVLDHIQKVADIEGEQNENNENLW
N-TERMINALLY TO

FHPVMSHLYNAKNGYDKQPEKTMFIIERLQSYFPFLKIMAENQREYSNGKY
AN RDRP AND

KQNRVEVNSNDIFEVLKRAFGVLKMYRDLTNAYKTYEEKLNDGCEFLISTE
TARGETED TO THE

QPLSGMINNYYTVALRNMNERYGYKTEDLAFIQDKRFKFVKDAYGKKKSQV
NUCLEUS

NTGFFLSLQDYNGDTQKKLHLSGVGIALLICLFLDKQYINIFLSRLPIFSS

YNAQSEERRIIIRSFGINSIKLPKDRIHSEKSNKSVAMDMLNEVKRCPDEL

FTTLSAEKQSRFRIISDDHNEVLMKRSSDRFVPLLLQYIDYGKLFDHIRFH

VNMGKLRYLLKADKTCIDGQTRVRVIEQPLNGFGRLEEAETMRKQENGTFG

NSGIRIRDFENMKRDDANPANYPYIVDTYTHYILENNKVEMFINDKEDSAP

LLPVIEDDRYVVKTIPSCRMSTLEIPAMAFHMFLFGSKKTEKLIVDVHNRY

KRLFQAMQKEEVTAENIASFGIAESDLPQKILDLISGNAHGKDVDAFIRLT

VDDMLTDTERRIKRFKDDRKSIRSADNKMGKRGFKQISTGKLADFLAKDIV

LFQPSVNDGENKITGLNYRIMQSAIAVYDSGDDYEAKQQFKLMFEKARLIG

KGTTEPHPFLYKVFARSIPANAVEFYERYLIERKFYLTGLSNEIKKGNRVD

VPFIRRDQNKWKTPAMKTLGRIYSEDLPVELPRQMEDNEIKSHLKSLPQME

GIDENNANVTYLIAEYMKRVLDDDFQTFYQWNRNYRYMDMLKGEYDRKGSL

QHCFTSVEEREGLWKERASRTERYRKQASNKIRSNRQMRNASSEEIETILD

KRLSNSRNEYQKSEKVIRRYRVQDALLFLLAKKTLTELADFDGERFKLKEI

MPDAEKGILSEIMPMSFTFEKGGKKYTITSEGMKLKNYGDFFVLASDKRIG

NLLELVGSDIVSKED

10
MKRTADGSEFESPKKKRKVNIPALVENQKKYFGTYSVMAMLNAQTVLDHIQ
DCAS13B FUSED

KVADIEGEQNENNENLWFHPVMSHLYNAKNGYDKQPEKTMFIIERLQSYFP
C-TERMINALLY TO

FLKIMAENQREYSNGKYKQNRVEVNSNDIFEVLKRAFGVLKMYRDLTNAYK
AN RDRP AND

TYEEKLNDGCEFLISTEQPLSGMINNYYTVALRNMNERYGYKTEDLAFIQD
TARGETED TO THE

KRFKFVKDAYGKKKSQVNTGFFLSLQDYNGDTQKKLHLSGVGIALLICLEL
NUCLEUS

DKQYINIFLSRLPIFSSYNAQSEERRIIIRSFGINSIKLPKDRIHSEKSNK

SVAMDMLNEVKRCPDELFTTLSAEKQSRFRIISDDHNEVLMKRSSDRFVPL

LLQYIDYGKLFDHIRFHVNMGKLRYLLKADKTCIDGQTRVRVIEQPLNGFG

RLEEAETMRKQENGTFGNSGIRIRDFENMKRDDANPANYPYIVDTYTHYIL

ENNKVEMFINDKEDSAPLLPVIEDDRYVVKTIPSCRMSTLEIPAMAFHMEL

FGSKKTEKLIVDVHNRYKRLFQAMQKEEVTAENIASFGIAESDLPQKILDL

ISGNAHGKDVDAFIRLTVDDMLTDTERRIKRFKDDRKSIRSADNKMGKRGF

KQISTGKLADFLAKDIVLFQPSVNDGENKITGLNYRIMQSAIAVYDSGDDY

EAKQQFKLMFEKARLIGKGTTEPHPFLYKVFARSIPANAVEFYERYLIERK

FYLTGLSNEIKKGNRVDVPFIRRDQNKWKTPAMKTLGRIYSEDLPVELPRQ

MFDNEIKSHLKSLPQMEGIDENNANVTYLIAEYMKRVLDDDFQTFYQWNRN

YRYMDMLKGEYDRKGSLQHCFTSVEEREGLWKERASRTERYRKQASNKIRS

NRQMRNASSEEIETILDKRLSNSRNEYQKSEKVIRRYRVQDALLELLAKKT

LTELADFDGEREKLKEIMPDAEKGILSEIMPMSFTFEKGGKKYTITSEGMK

LKNYGDFFVLASDKRIGNLLELVGSDIVSKEDGSKRTADGSEFEPKKKRKV

[X]

11
SGSETPGTSESATPES[X]SGSETPGTSESATPESNIPALVENQKKYFGTY
DCAS13B FUSED

SVMAMLNAQTVLDHIQKVADIEGEQNENNENLWFHPVMSHLYNAKNGYDKQ
N-TERMINALLY TO

PEKTMFIIERLQSYFPFLKIMAENQREYSNGKYKQNRVEVNSNDIFEVLKR
AN RDRP AND

AFGVLKMYRDLTNAYKTYEEKLNDGCEFLISTEQPLSGMINNYYTVALRNM
TARGETED TO THE

NERYGYKTEDLAFIQDKRFKFVKDAYGKKKSQVNTGFFLSLQDYNGDTQKK
CYTOPLASM

LHLSGVGIALLICLFLDKQYINIFLSRLPIFSSYNAQSEERRIIIRSFGIN

SIKLPKDRIHSEKSNKSVAMDMLNEVKRCPDELFTTLSAEKQSRFRIISDD

HNEVLMKRSSDRFVPLLLQYIDYGKLFDHIRFHVNMGKLRYLLKADKTCID

GQTRVRVIEQPLNGFGRLEEAETMRKQENGTFGNSGIRIRDFENMKRDDAN

PANYPYIVDTYTHYILENNKVEMFINDKEDSAPLLPVIEDDRYVVKTIPSC

RMSTLEIPAMAFHMFLFGSKKTEKLIVDVHNRYKRLFQAMQKEEVTAENIA

SFGIAESDLPQKILDLISGNAHGKDVDAFIRLTVDDMLTDTERRIKRFKDD

RKSIRSADNKMGKRGFKQISTGKLADFLAKDIVLFQPSVNDGENKITGLNY

RIMQSAIAVYDSGDDYEAKQQFKLMFEKARLIGKGTTEPHPFLYKVFARSI

PANAVEFYERYLIERKFYLTGLSNEIKKGNRVDVPFIRRDQNKWKTPAMKT

LGRIYSEDLPVELPRQMEDNEIKSHLKSLPQMEGIDENNANVTYLIAEYMK

RVLDDDFQTFYQWNRNYRYMDMLKGEYDRKGSLQHCFTSVEEREGLWKERA

SRTERYRKQASNKIRSNRQMRNASSEEIETILDKRLSNSRNEYQKSEKVIR

RYRVQDALLFLLAKKTLTELADFDGEREKLKEIMPDAEKGILSEIMPMSFT

FEKGGKKYTITSEGMKLKNYGDFFVLASDKRIGNLLELVGSDIVSKED

12
MNIPALVENQKKYFGTYSVMAMLNAQTVLDHIQKVADIEGEQNENNENLWE
DCAS13B FUSED

HPVMSHLYNAKNGYDKQPEKTMFIIERLQSYFPFLKIMAENQREYSNGKYK
C-TERMINALLY TO

QNRVEVNSNDIFEVLKRAFGVLKMYRDLTNAYKTYEEKLNDGCEFLISTEQ
AN RDRP AND

PLSGMINNYYTVALRNMNERYGYKTEDLAFIQDKRFKFVKDAYGKKKSQVN
TARGETED TO THE

TGFFLSLQDYNGDTQKKLHLSGVGIALLICLFLDKQYINIFLSRLPIFSSY
CYTOPLASM

NAQSEERRIIIRSFGINSIKLPKDRIHSEKSNKSVAMDMLNEVKRCPDELF

TTLSAEKQSRFRIISDDHNEVLMKRSSDRFVPLLLQYIDYGKLFDHIRFHV

NMGKLRYLLKADKTCIDGQTRVRVIEQPLNGFGRLEEAETMRKQENGTEGN

SGIRIRDFENMKRDDANPANYPYIVDTYTHYILENNKVEMFINDKEDSAPL

LPVIEDDRYVVKTIPSCRMSTLEIPAMAFHMFLFGSKKTEKLIVDVHNRYK

RLFQAMQKEEVTAENIASFGIAESDLPQKILDLISGNAHGKDVDAFIRLTV

DDMLTDTERRIKRFKDDRKSIRSADNKMGKRGFKQISTGKLADFLAKDIVL

FQPSVNDGENKITGLNYRIMQSAIAVYDSGDDYEAKQQFKLMFEKARLIGK

GTTEPHPFLYKVFARSIPANAVEFYERYLIERKFYLTGLSNEIKKGNRVDV

PFIRRDQNKWKTPAMKTLGRIYSEDLPVELPRQMEDNEIKSHLKSLPQMEG

IDENNANVTYLIAEYMKRVLDDDFQTFYQWNRNYRYMDMLKGEYDRKGSLQ

HCFTSVEEREGLWKERASRTERYRKQASNKIRSNRQMRNASSEEIETILDK

RLSNSRNEYQKSEKVIRRYRVQDALLFLLAKKTLTELADFDGEREKLKEIM

PDAEKGILSEIMPMSFTFEKGGKKYTITSEGMKLKNYGDFFVLASDKRIGN

LLELVGSDIVSKEDGSLQLPPLERLTLSGSETPGTSESATPES[X]

13
MNIPALVENQKKYFGTYSVMAMLNAQTVLDHIQKVADIEGEQNENNENLWE
WTCAS13B FUSED

HPVMSHLYNAKNGYDKQPEKTMFIIERLQSYFPFLKIMAENQREYSNGKYK
C-TERMINALLY TO

QNRVEVNSNDIFEVLKRAFGVLKMYRDLTNHYKTYEEKLNDGCEFLISTEQ
AN RDRP AND

PLSGMINNYYTVALRNMNERYGYKTEDLAFIQDKRFKFVKDAYGKKKSQVN
TARGETED TO THE

TGFFLSLQDYNGDTQKKLHLSGVGIALLICLFLDKQYINIFLSRLPIFSSY
CYTOPLASM

NAQSEERRIIIRSFGINSIKLPKDRIHSEKSNKSVAMDMLNEVKRCPDELF

TTLSAEKQSRFRIISDDHNEVLMKRSSDRFVPLLLQYIDYGKLFDHIRFHV

NMGKLRYLLKADKTCIDGQTRVRVIEQPLNGFGRLEEAETMRKQENGTEGN

SGIRIRDFENMKRDDANPANYPYIVDTYTHYILENNKVEMFINDKEDSAPL

LPVIEDDRYVVKTIPSCRMSTLEIPAMAFHMFLFGSKKTEKLIVDVHNRYK

RLFQAMQKEEVTAENIASFGIAESDLPQKILDLISGNAHGKDVDAFIRLTV

DDMLTDTERRIKRFKDDRKSIRSADNKMGKRGFKQISTGKLADFLAKDIVL

FQPSVNDGENKITGLNYRIMQSAIAVYDSGDDYEAKQQFKLMFEKARLIGK

GTTEPHPFLYKVFARSIPANAVEFYERYLIERKFYLTGLSNEIKKGNRVDV

PFIRRDQNKWKTPAMKTLGRIYSEDLPVELPRQMEDNEIKSHLKSLPQMEG

IDENNANVTYLIAEYMKRVLDDDFQTFYQWNRNYRYMDMLKGEYDRKGSLQ

HCFTSVEEREGLWKERASRTERYRKQASNKIRSNRQMRNASSEEIETILDK

RLSNSRNEYQKSEKVIRRYRVQDALLFLLAKKTLTELADFDGEREKLKEIM

PDAEKGILSEIMPMSFTFEKGGKKYTITSEGMKLKNYGDFFVLASDKRIGN

LLELVGSDIVSKEDGSLQLPPLERLTLSGSETPGTSESATPES[X]

The following sequence belong to the following family of proteins:

- Nucleic acid-programmable RNA binding protein: SEQ ID NO: 1 and 34-41;
- RNA-dependent RNA polymerase: SEQ ID NO: 2-7;
- rpegRNA sequences: SEQ ID NO: 8;
- Fusion proteins (napRNAbp:RDRP): SEQ ID NO: 9-13, wherein [X] represents an RDRP, examples of which are listed below. Only examples of truncated Cas13b are listed for the fusions. Other Cas13 proteins that are potentially usable include Cas13a, -13c, and 13d, either truncated or full-length. Examples include either an NLS or NES to direct the RNA prime editor to the nucleus or cytoplasm, respectively. Other NLSs or NESs are also envisioned.

Mutants

It should be appreciated that any of the amino acid sequences described herein may also include mutations that result in acceptable substitutions of amino acids. For example, mutation of an amino acid with a hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan) may be a mutation to a second amino acid with a different hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan). For example, a mutation of an alanine to a threonine (e.g., a A262T mutation) may also be a mutation from an alanine to an amino acid that is similar in size and chemical properties to a threonine, for example, serine. As another example, mutation of an amino acid with a positively charged side chain (e.g., arginine, histidine, or lysine) may be a mutation to a second amino acid with a different positively charged side chain (e.g., arginine, histidine, or lysine). As another example, mutation of an amino acid with a polar side chain (e.g., serine, threonine, asparagine, or glutamine) may be a mutation to a second amino acid with a different polar side chain (e.g., serine, threonine, asparagine, or glutamine). Additional similar amino acid pairs include, but are not limited to, the following: phenylalanine and tyrosine; asparagine and glutamine; methionine and cysteine; aspartic acid and glutamic acid; and arginine and lysine. The skilled artisan would recognize that such conservative amino acid substitutions will likely have minor effects on protein structure and are likely to be well tolerated without compromising function. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a threonine may be an amino acid mutation to a serine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an arginine may be an amino acid mutation to a lysine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an isoleucine, may be an amino acid mutation to an alanine, valine, methionine, or leucine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a lysine may be an amino acid mutation to an arginine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an aspartic acid may be an amino acid mutation to a glutamic acid or asparagine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a valine may be an amino acid mutation to an alanine, isoleucine, methionine, or leucine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a glycine may be an amino acid mutation to an alanine. It should be appreciated, however, that additional conserved amino acid residues would be recognized by the skilled artisan and any of the amino acid mutations to other conserved amino acid residues are also within the scope of this disclosure.

In some embodiments, the present disclosure may utilize any variant, mutant, or equivalent of the exemplary Cas13 or RDRP proteins disclosed herein. Any available methods may be utilized to obtain or construct a variant or mutant Cas13 or RDRP protein. The term “mutation,” as used herein, refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)). Mutations can include a variety of categories, such as single base polymorphisms, microduplication regions, indel, and inversions, and is not meant to be limiting in any way. Mutations can include “loss-of-function” mutations which is the normal result of a mutation that reduces or abolishes a protein activity. Most loss-of-function mutations are recessive, because in a heterozygote the second chromosome copy carries an unmutated version of the gene coding for a fully functional protein whose presence compensates for the effect of the mutation. Mutations also embrace “gain-of-function” mutations, which is one which confers an abnormal activity on a protein or cell that is otherwise not present in a normal condition. Many gain-of-function mutations are in regulatory sequences rather than in coding regions, and can therefore have a number of consequences. For example, a mutation might lead to one or more genes being expressed in the wrong tissues, these tissues gaining functions that they normally lack. Because of their nature, gain-of-function mutations are usually dominant.

Mutations can be introduced into a reference Cas13 or RDRP protein using site-directed mutagenesis. Older methods of site-directed mutagenesis known in the art rely on sub-cloning of the sequence to be mutated into a vector, such as an M13 bacteriophage vector, that allows the isolation of single-stranded DNA template. In these methods, one anneals a mutagenic primer (i.e., a primer capable of annealing to the site to be mutated but bearing one or more mismatched nucleotides at the site to be mutated) to the single-stranded template and then polymerizes the complement of the template starting from the 3′ end of the mutagenic primer. The resulting duplexes are then transformed into host bacteria and plaques are screened for the desired mutation. More recently, site-directed mutagenesis has employed PCR methodologies, which have the advantage of not requiring a single-stranded template. In addition, methods have been developed that do not require sub-cloning. Several issues must be considered when PCR-based site-directed mutagenesis is performed. First, in these methods it is desirable to reduce the number of PCR cycles to prevent expansion of undesired mutations introduced by the polymerase. Second, a selection must be employed in order to reduce the number of non-mutated parental molecules persisting in the reaction. Third, an extended-length PCR method is preferred in order to allow the use of a single PCR primer set. And fourth, because of the non-template-dependent terminal extension activity of some thermostable polymerases it is often necessary to incorporate an end-polishing step into the procedure prior to blunt-end ligation of the PCR-generated mutant product.

Mutations may also be introduced by directed evolution processes, such as phage-assisted continuous evolution (PACE) or phage-assisted noncontinuous evolution (PANCE). The term “phage-assisted continuous evolution (PACE),” as used herein, refers to continuous evolution that employs phage as viral vectors. The general concept of PACE technology has been described, for example, in International PCT Application, PCT/US2009/056194, filed Sep. 8, 2009, published as WO 2010/028347 on Mar. 11, 2010; International PCT Application, PCT/US2011/066747, filed Dec. 22, 2011, published as WO 2012/088381 on Jun. 28, 2012; U.S. Application, U.S. Pat. No. 9,023,594, issued May 5, 2015; U.S. Pat. No. 9,771,574, issued Sep. 26, 2017; U.S. Pat. No. 9,394,537, issued Jul. 19, 2016; International PCT Application, PCT/US2015/012022, filed Jan. 20, 2015, published as WO 2015/134121 on Sep. 11, 2015; U.S. Pat. No. 10,179,911, issued Jan. 15, 2019; International PCT Application, PCT/US2016/027795, filed Apr. 15, 2016, published as WO 2016/168631 on Oct. 20, 2016, and International Patent Publication WO 2019/023680, published Jan. 31, 2019, the entire contents of each of which are incorporated herein by reference. Variant Cas9s may also be obtain by phage-assisted non-continuous evolution (PANCE),” which as used herein, refers to non-continuous evolution that employs phage as viral vectors. PANCE is a simplified technique for rapid in vivo directed evolution using serial flask transfers of evolving ‘selection phage’ (SP), which contain a gene of interest to be evolved, across fresh E. coli host cells, thereby allowing genes inside the host E. coli to be held constant while genes contained in the SP continuously evolve. Serial flask transfers have long served as a widely-accessible approach for laboratory evolution of microbes, and, more recently, analogous approaches have been developed for bacteriophage evolution. The PANCE system features lower stringency than the PACE system.

Any of the references noted above are hereby incorporated by reference in their entireties, if not already stated so.

In various embodiments, the RNA prime editor fusion proteins contemplated herein may also include any variants of the above-disclosed sequences having an amino acid sequence that is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any of the above indicated RNA prime editor fusion sequences.

The RPE fusion proteins may comprise various other domains besides the Cas13 domain and the RDRP domains. For example, the RPE fusion proteins may comprise one or more linkers that join the Cas13 domain with the RDRP domain. The linkers may also join other functional domains, such as nuclear localization sequences (NLS) to the RPE fusion proteins or a domain thereof.

Linkers

As defined above, the term “linker,” as used herein, refers to a chemical group or a molecule linking two molecules or moieties, e.g., a binding domain and a cleavage domain of a nuclease. In some embodiments, a linker joins a gRNA binding domain of an RNA-programmable nuclease and the catalytic domain of a recombinase. In some embodiments, a linker joins a Cas13 and RDRP. Typically, the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker may comprise a peptide or a non-peptide moiety. In some embodiments, the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.

The linker may be as simple as a covalent bond, or it may be a polymeric linker many atoms in length. In certain embodiments, the linker is a polypeptide or based on amino acids. In other embodiments, the linker is not peptide-like. In certain embodiments, the linker is a covalent bond (e.g., a carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.). In certain embodiments, the linker is a carbon-nitrogen bond of an amide linkage. In certain embodiments, the linker is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic or heteroaliphatic linker. In certain embodiments, the linker is polymeric (e.g., polyethylene, polyethylene glycol, polyamide, polyester, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminoalkanoic acid. In certain embodiments, the linker comprises an aminoalkanoic acid (e.g., glycine, ethanoic acid, alanine, beta-alanine, 3-aminopropanoic acid, 4-aminobutanoic acid, 5-pentanoic acid, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminohexanoic acid (Ahx). In certain embodiments, the linker is based on a carbocyclic moiety (e.g., cyclopentane, cyclohexane). In other embodiments, the linker comprises a polyethylene glycol moiety (PEG). In other embodiments, the linker comprises amino acids. In certain embodiments, the linker comprises a peptide. In certain embodiments, the linker comprises an aryl or heteroaryl moiety. In certain embodiments, the linker is based on a phenyl ring. The linker may included funtionalized moieties to facilitate attachment of a nucleophile (e.g., thiol, amino) from the peptide to the linker. Any electrophile may be used as part of the linker. Exemplary electrophiles include, but are not limited to, activated esters, activated amides, Michael acceptors, alkyl halides, aryl halides, acyl halides, and isothiocyanates.

In some other embodiments, the linker comprises the amino acid sequence (GGGGS)_N(SEQ ID NO: 14), (G)_N(SEQ ID NO: 15), (EAAAK)_N(SEQ ID NO: 16), (GGS)_N(SEQ ID NO: 17), (SGGS)_N(SEQ ID NO: 18), (XP)_N(SEQ ID NO: 19), or any combination thereof, wherein n is independently an integer between 1 and 30, and wherein X is any amino acid. In some embodiments, the linker comprises the amino acid sequence (GGS)_N(SEQ ID NO: 17), wherein n is 1, 3, or 7. In some embodiments, the linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 20). In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 21). In some embodiments, the linker comprises the amino acid sequence SGGSGGSGGS (SEQ ID NO: 22). In some embodiments, the linker comprises the amino acid sequence SGGS (SEQ ID NO: 18). In other embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESAGSYPYDVPDYAGSAAPAAKKKKLDGSGSGGSSGGS (SEQ ID NO: 23, 60AA).

In certain embodiments, linkers may be used to link any of the peptides or peptide domains or moieties of the invention (e.g., a napRNAbp linked or fused to a RDRP).

NLS

In various embodiments, the RPE fusion proteins may comprise one or more nuclear localization sequences (NLS), which help promote translocation of a protein into the cell nucleus. In certain embodiments, the RPE fusion proteins comprise at least two NLSs. In embodiments with at least two NLSs, the NLSs can be the same NLSs, or they can be different NLSs. In addition, the NLSs may be expressed as part of a fusion protein with the other portions of the RPEs. The location of the NLS fusion can be at the N-terminus, the C-terminus, or within a sequence of an RPE (e.g., inserted between the napRNAbp domain (e.g., Cas13) and the RNA-dependent RNA polymerase.

The NLSs may be any known NLS in the art. The NLSs may also be any NLSs for nuclear localization discovered in the future. The NLSs also may be any naturally occurring NLS, or any non-naturally occurring NLS (e.g., an NLS with one or more desired mutations).

The term “nuclear localization sequence” or “NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., International PCT application PCT/EP2000/011690, filed Nov. 23, 2000, published as WO/2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference.

A representative nuclear localization signal is a peptide sequence that directs the protein to the nucleus of the cell in which the sequence is expressed. A nuclear localization signal is predominantly basic, can be positioned almost anywhere in a protein's amino acid sequence, generally comprises a short sequence of four amino acids (Autieri & Agrawal, (1998) J. Biol. Chem. 273: 14731-37, incorporated herein by reference) to eight amino acids, and is typically rich in lysine and arginine residues (Magin et al., (2000) Virology 274: 11-16, incorporated herein by reference). Nuclear localization signals often comprise proline residues. A variety of nuclear localization signals have been identified and have been used to effect transport of biological molecules from the cytoplasm to the nucleus of a cell. See, e.g., Tinland et al., (1992) Proc. Natl. Acad. Sci. U.S.A. 89:7442-46; Moede et al., (1999) FEBS Lett. 461:229-34, which is incorporated herein by reference. Translocation is currently thought to involve nuclear pore proteins. Such sequences are well-known in the art and can include the following examples:

DESCRIPTION
SEQUENCE
SEQ ID NO:

NLS OF SV40
PKKKRKV
SEQ ID NO: 50

LARGE T-AG

NLS
MKRTADGSEFESPKKKRKV
SEQ ID NO: 24

NLS
MDSLLMNRRKFLYQFKNVRWAKGRRETYLC
SEQ ID NO: 25

NLS OF
AVKRPAATKKAGQAKKKKLD
SEQ ID NO: 26

NUCLEOPLASMIN

NLS OF EGL-
MSRRRKANPTKLSENAKKLAKEVEN
SEQ ID NO: 27

13

NLS OF C-
PAAKRVKLD
SEQ ID NO: 28

MYC

NLS OF TUS-
KLKIKRPVK
SEQ ID NO: 29

PROTEIN

NLS OF
VSRKRPRP
SEQ ID NO: 30

POLYOMA

LARGE T-AG

NLS OF
EGAPPAKRAR
SEQ ID NO: 31

HEPATITIS D

VIRUS

ANTIGEN

NLS OF
PPQPKKKPLDGE
SEQ ID NO: 32

MURINE P53

NLS OF PE1
SGGSKRTADGSEFEPKKKRKV
SEQ ID NO: 33

AND PE2

The NLS examples above are non-limiting. The RPE fusion proteins may comprise any known NLS sequence, including any of those described in Cokol et al., “Finding nuclear localization signals,” EMBO Rep., 2000, 1(5): 411-415 and Freitas et al., “Mechanisms and Signals for the Nuclear Import of Proteins,” Current Genomics, 2009, 10(8): 550-7, each of which are incorporated herein by reference.

The present disclosure contemplates any suitable means by which to modify an RPE to include one or more NLSs. In one aspect, the RPE may be engineered to express an RPE protein that is translationally fused at its N-terminus or its C-terminus (or both) to one or more NLSs, i.e., to form an RPE-NLS fusion construct. In other embodiments, the RPE-encoding nucleotide sequence may be genetically modified to incorporate a reading frame that encodes one or more NLSs in an internal region of the encoded RPE. In addition, the NLSs may include various amino acid linkers or spacer regions encoded between the RPE and the N-terminally, C-terminally, or internally-attached NLS amino acid sequence, e.g, and in the central region of proteins. Thus, the present disclosure also provides for nucleotide constructs, vectors, and host cells for expressing fusion proteins that comprise an RPE and one or more NLSs.

The RPEs described herein may also comprise nuclear localization signals which are linked to an RPE through one or more linkers, e.g., and polymeric, amino acid, nucleic acid, polysaccharide, chemical, or nucleic acid linker element. The linkers within the contemplated scope of the disclosure are not intended to have any limitations and can be any suitable type of molecule (e.g., polymer, amino acid, polysaccharide, nucleic acid, lipid, or any synthetic chemical linker domain) and be joined to the RPE by any suitable strategy that effectuates forming a bond (e.g., covalent linkage, hydrogen bonding) between the prime editor and the one or more NLSs.

Methods of Treatment

The instant disclosure provides methods for the treatment of a subject diagnosed with a disease associated with or caused by a point mutation that can be corrected by RNA prime editing of RNA molecules (e.g., mRNA transcripts comprising said mutations). For example, in some embodiments, a method is provided that comprises administering to a subject having such a disease, e.g., a cancer associated with a point mutation as described above, an effective amount of the RNA prime editing system described herein that corrects the point mutation or introduces a deactivating mutation into a disease-associated RNA. In some embodiments, a method is provided that comprises administering to a subject having such a disease, e.g., a cancer associated with a point mutation as described above, an effective amount of the RNA prime editing system described herein that corrects the defective RNA molecule. In some embodiments, the disease is a proliferative disease. In some embodiments, the disease is a genetic disease. In some embodiments, the disease is a neoplastic disease. In some embodiments, the disease is a metabolic disease. In some embodiments, the disease is a lysosomal storage disease. Other diseases that can be treated by correcting a point mutation or introducing a deactivating mutation into a disease-associated RNA will be known to those of skill in the art, and the disclosure is not limited in this respect.

The instant disclosure provides methods for the treatment of additional diseases or disorders, e.g., diseases or disorders that are associated or caused by a point mutation that can be corrected by RNA prime editing. Some such diseases are described herein, and additional suitable diseases that can be treated with the strategies and fusion proteins provided herein will be apparent to those of skill in the art based on the instant disclosure. Exemplary suitable diseases and disorders are listed below. It will be understood that the numbering of the specific positions or residues in the respective sequences depends on the particular protein and numbering scheme used. Numbering might be different, e.g., in precursors of a mature protein and the mature protein itself, and differences in sequences from species to species may affect numbering. One of skill in the art will be able to identify the respective residue in any homologous protein and in the respective encoding nucleic acid by methods well known in the art, e.g., by sequence alignment and determination of homologous residues. Exemplary suitable diseases and disorders include, without limitation: 2-methyl-3-hydroxybutyric aciduria; 3 beta-Hydroxysteroid dehydrogenase deficiency; 3-Methylglutaconic aciduria; 3-Oxo-5 alpha-steroid delta 4-dehydrogenase deficiency; 46,XY sex reversal, type 1, 3, and 5; 5-Oxoprolinase deficiency; 6-pyruvoyl-tetrahydropterin synthase deficiency; Aarskog syndrome; Aase syndrome; Achondrogenesis type 2; Achromatopsia 2 and 7; Acquired long QT syndrome; Acrocallosal syndrome, Schinzel type; Acrocapitofemoral dysplasia; Acrodysostosis 2, with or without hormone resistance; Acroerythrokeratoderma; Acromicric dysplasia; Acth-independent macronodular adrenal hyperplasia 2; Activated PI3K-delta syndrome; Acute intermittent porphyria; deficiency of Acyl-CoA dehydrogenase family, member 9; Adams-Oliver syndrome 5 and 6; Adenine phosphoribosyltransferase deficiency; Adenylate kinase deficiency; hemolytic anemia due to Adenylosuccinate lyase deficiency; Adolescent nephronophthisis; Renal-hepatic-pancreatic dysplasia; Meckel syndrome type 7; Adrenoleukodystrophy; Adult junctional epidermolysis bullosa; Epidermolysis bullosa, junctional, localisata variant; Adult neuronal ceroid lipofuscinosis; Adult neuronal ceroid lipofuscinosis; Adult onset ataxia with oculomotor apraxia; ADULT syndrome; Afibrinogenemia and congenital Afibrinogenemia; autosomal recessive Agammaglobulinemia 2; Age-related macular degeneration 3, 6, 11, and 12; Aicardi Goutieres syndromes 1, 4, and 5; Chilbain lupus 1; Alagille syndromes 1 and 2; Alexander disease; Alkaptonuria; Allan-Herndon-Dudley syndrome; Alopecia universalis congenital; Alpers encephalopathy; Alpha-1-antitrypsin deficiency; autosomal dominant, autosomal recessive, and X-linked recessive Alport syndromes; Alzheimer disease, familial, 3, with spastic paraparesis and apraxia; Alzheimer disease, types, 1, 3, and 4; hypocalcification type and hypomaturation type, IIA1 Amelogenesis imperfecta; Aminoacylase 1 deficiency; Amish infantile epilepsy syndrome; Amyloidogenic transthyretin amyloidosis; Amyloid Cardiomyopathy, Transthyretin-related; Cardiomyopathy; Amyotrophic lateral sclerosis types 1, 6, 15 (with or without frontotemporal dementia), 22 (with or without frontotemporal dementia), and 10; Frontotemporal dementia with TDP43 inclusions, TARDBP-related; Andermann syndrome; Andersen Tawil syndrome; Congenital long QT syndrome; Anemia, nonspherocytic hemolytic, due to G6PD deficiency; Angelman syndrome; Severe neonatal-onset encephalopathy with microcephaly; susceptibility to Autism, X-linked 3; Angiopathy, hereditary, with nephropathy, aneurysms, and muscle cramps; Angiotensin i-converting enzyme, benign serum increase; Aniridia, cerebellar ataxia, and mental retardation; Anonychia; Antithrombin III deficiency; Antley-Bixler syndrome with genital anomalies and disordered steroidogenesis; Aortic aneurysm, familial thoracic 4, 6, and 9; Thoracic aortic aneurysms and aortic dissections; Multisystemic smooth muscle dysfunction syndrome; Moyamoya disease 5; Aplastic anemia; Apparent mineralocorticoid excess; Arginase deficiency; Argininosuccinate lyase deficiency; Aromatase deficiency; Arrhythmogenic right ventricular cardiomyopathy types 5, 8, and 10; Primary familial hypertrophic cardiomyopathy; Arthrogryposis multiplex congenita, distal, X-linked; Arthrogryposis renal dysfunction cholestasis syndrome; Arthrogryposis, renal dysfunction, and cholestasis 2; Asparagine synthetase deficiency; Abnormality of neuronal migration; Ataxia with vitamin E deficiency; Ataxia, sensory, autosomal dominant; Ataxia-telangiectasia syndrome; Hereditary cancer-predisposing syndrome; Atransferrinemia; Atrial fibrillation, familial, 11, 12, 13, and 16; Atrial septal defects 2, 4, and 7 (with or without atrioventricular conduction defects); Atrial standstill 2; Atrioventricular septal defect 4; Atrophia bulborum hereditaria; ATR-X syndrome; Auriculocondylar syndrome 2; Autoimmune disease, multisystem, infantile-onset; Autoimmune lymphoproliferative syndrome, type 1a; Autosomal dominant hypohidrotic ectodermal dysplasia; Autosomal dominant progressive external ophthalmoplegia with mitochondrial DNA deletions 1 and 3; Autosomal dominant torsion dystonia 4; Autosomal recessive centronuclear myopathy; Autosomal recessive congenital ichthyosis 1, 2, 3, 4A, and 4B; Autosomal recessive cutis laxa type IA and 1B; Autosomal recessive hypohidrotic ectodermal dysplasia syndrome; Ectodermal dysplasia 11b; hypohidrotic/hair/tooth type, autosomal recessive; Autosomal recessive hypophosphatemic bone disease; Axenfeld-Rieger syndrome type 3; Bainbridge-Ropers syndrome; Bannayan-Riley-Ruvalcaba syndrome; PTEN hamartoma tumor syndrome; Baraitser-Winter syndromes 1 and 2; Barakat syndrome; Bardet-Biedl syndromes 1, 11, 16, and 19; Bare lymphocyte syndrome type 2, complementation group E; Bartter syndrome antenatal type 2; Bartter syndrome types 3, 3 with hypocalciuria, and 4; Basal ganglia calcification, idiopathic, 4; Beaded hair; Benign familial hematuria; Benign familial neonatal seizures 1 and 2; Seizures, benign familial neonatal, 1, and/or myokymia; Seizures, Early infantile epileptic encephalopathy 7; Benign familial neonatal-infantile seizures; Benign hereditary chorea; Benign scapuloperoneal muscular dystrophy with cardiomyopathy; Bernard-Soulier syndrome, types A1 and A2 (autosomal dominant); Bestrophinopathy, autosomal recessive; beta Thalassemia; Bethlem myopathy and Bethlem myopathy 2; Bietti crystalline corneoretinal dystrophy; Bile acid synthesis defect, congenital, 2; Biotinidase deficiency; Birk Barel mental retardation dysmorphism syndrome; Blepharophimosis, ptosis, and epicanthus inversus; Bloom syndrome; Borjeson-Forssman-Lehmann syndrome; Boucher Neuhauser syndrome; Brachydactyly types A1 and A2; Brachydactyly with hypertension; Brain small vessel disease with hemorrhage; Branched-chain ketoacid dehydrogenase kinase deficiency; Branchiootic syndromes 2 and 3; Breast cancer, early-onset; Breast-ovarian cancer, familial 1, 2, and 4; Brittle cornea syndrome 2; Brody myopathy; Bronchiectasis with or without elevated sweat chloride 3; Brown-Vialetto-Van laere syndrome and Brown-Vialetto-Van Laere syndrome 2; Brugada syndrome; Brugada syndrome 1; Ventricular fibrillation; Paroxysmal familial ventricular fibrillation; Brugada syndrome and Brugada syndrome 4; Long QT syndrome; Sudden cardiac death; Bull eye macular dystrophy; Stargardt disease 4; Cone-rod dystrophy 12; Bullous ichthyosiform erythroderma; Burn-Mckeown syndrome; Candidiasis, familial, 2, 5, 6, and 8; Carbohydrate-deficient glycoprotein syndrome type I and II; Carbonic anhydrase VA deficiency, hyperammonemia due to; Carcinoma of colon; Cardiac arrhythmia; Long QT syndrome, LQT1 subtype; Cardioencephalomyopathy, fatal infantile, due to cytochrome c oxidase deficiency; Cardiofaciocutaneous syndrome; Cardiomyopathy; Danon disease; Hypertrophic cardiomyopathy; Left ventricular noncompaction cardiomyopathy; Carnevale syndrome; Carney complex, type 1; Carnitine acylcarnitine translocase deficiency; Carnitine palmitoyltransferase I, II, II (late onset), and II (infantile) deficiency; Cataract 1, 4, autosomal dominant, autosomal dominant, multiple types, with microcornea, coppock-like, juvenile, with microcornea and glucosuria, and nuclear diffuse nonprogressive; Catecholaminergic polymorphic ventricular tachycardia; Caudal regression syndrome; Cd8 deficiency, familial; Central core disease; Centromeric instability of chromosomes 1,9 and 16 and immunodeficiency; Cerebellar ataxia infantile with progressive external ophthalmoplegi and Cerebellar ataxia, mental retardation, and dysequilibrium syndrome 2; Cerebral amyloid angiopathy, APP-related; Cerebral autosomal dominant and recessive arteriopathy with subcortical infarcts and leukoencephalopathy; Cerebral cavernous malformations 2; Cerebrooculofacioskeletal syndrome 2; Cerebro-oculo-facio-skeletal syndrome; Cerebroretinal microangiopathy with calcifications and cysts; Ceroid lipofuscinosis neuronal 2, 6, 7, and 10; Ch\xc3\xa9diak-Higashi syndrome, Chediak-Higashi syndrome, adult type; Charcot-Marie-Tooth disease types 1B, 2B2, 2C, 2F, 2I, 2U (axonal), 1C (demyelinating), dominant intermediate C, recessive intermediate A, 2A2, 4C, 4D, 4H, IF, IVF, and X; Scapuloperoneal spinal muscular atrophy; Distal spinal muscular atrophy, congenital nonprogressive; Spinal muscular atrophy, distal, autosomal recessive, 5; CHARGE association; Childhood hypophosphatasia; Adult hypophosphatasia; Cholecystitis; Progressive familial intrahepatic cholestasis 3; Cholestasis, intrahepatic, of pregnancy 3; Cholestanol storage disease; Cholesterol monooxygenase (side-chain cleaving) deficiency; Chondrodysplasia Blomstrand type; Chondrodysplasia punctata 1, X-linked recessive and 2 X-linked dominant; CHOPS syndrome; Chronic granulomatous disease, autosomal recessive cytochrome b-positive, types 1 and 2; Chudley-McCullough syndrome; Ciliary dyskinesia, primary, 7, 11, 15, 20 and 22; Citrullinemia type I; Citrullinemia type I and II; Cleidocranial dysostosis; C-like syndrome; Cockayne syndrome type A; Coenzyme Q10 deficiency, primary 1, 4, and 7; Coffin Siris/Intellectual Disability; Coffin-Lowry syndrome; Cohen syndrome; Cold-induced sweating syndrome 1; COLE-CARPENTER SYNDROME 2; Combined cellular and humoral immune defects with granulomas; Combined d-2- and 1-2-hydroxyglutaric aciduria; Combined malonic and methylmalonic aciduria; Combined oxidative phosphorylation deficiencies 1, 3, 4, 12, 15, and 25; Combined partial and complete 17-alpha-hydroxylase/17,20-lyase deficiency; Common variable immunodeficiency 9; Complement component 4, partial deficiency of, due to dysfunctional c1 inhibitor; Complement factor B deficiency; Cone monochromatism; Cone-rod dystrophy 2 and 6; Cone-rod dystrophy amelogenesis imperfecta; Congenital adrenal hyperplasia and Congenital adrenal hypoplasia, X-linked; Congenital amegakaryocytic thrombocytopenia; Congenital aniridia; Congenital central hypoventilation; Hirschsprung disease 3; Congenital contractural arachnodactyly; Congenital contractures of the limbs and face, hypotonia, and developmental delay; Congenital disorder of glycosylation types 1B, 1D, 1G, 1H, 1J, 1K, 1N, 1P, 2C, 2J, 2K, IIm; Congenital dyserythropoietic anemia, type I and II; Congenital ectodermal dysplasia of face; Congenital erythropoietic porphyria; Congenital generalized lipodystrophy type 2; Congenital heart disease, multiple types, 2; Congenital heart disease; Interrupted aortic arch; Congenital lipomatous overgrowth, vascular malformations, and epidermal nevi; Non-small cell lung cancer; Neoplasm of ovary; Cardiac conduction defect, nonspecific; Congenital microvillous atrophy; Congenital muscular dystrophy; Congenital muscular dystrophy due to partial LAMA2 deficiency; Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, types A2, A7, A8, A11, and A14; Congenital muscular dystrophy-dystroglycanopathy with mental retardation, types B2, B3, B5, and B15; Congenital muscular dystrophy-dystroglycanopathy without mental retardation, type B5; Congenital muscular hypertrophy-cerebral syndrome; Congenital myasthenic syndrome, acetazolamide-responsive; Congenital myopathy with fiber type disproportion; Congenital ocular coloboma; Congenital stationary night blindness, type 1A, 1B, 1C, 1E, 1F, and 2A; Coproporphyria; Cornea plana 2; Corneal dystrophy, Fuchs endothelial, 4; Corneal endothelial dystrophy type 2; Corneal fragility keratoglobus, blue sclerae and joint hypermobility; Cornelia de Lange syndromes 1 and 5; Coronary artery disease, autosomal dominant 2; Coronary heart disease; Hyperalphalipoproteinemia 2; Cortical dysplasia, complex, with other brain malformations 5 and 6; Cortical malformations, occipital; Corticosteroid-binding globulin deficiency; Corticosterone methyloxidase type 2 deficiency; Costello syndrome; Cowden syndrome 1; Coxa plana; Craniodiaphyseal dysplasia, autosomal dominant; Craniosynostosis 1 and 4; Craniosynostosis and dental anomalies; Creatine deficiency, X-linked; Crouzon syndrome; Cryptophthalmos syndrome; Cryptorchidism, unilateral or bilateral; Cushing symphalangism; Cutaneous malignant melanoma 1; Cutis laxa with osteodystrophy and with severe pulmonary, gastrointestinal, and urinary abnormalities; Cyanosis, transient neonatal and atypical nephropathic; Cystic fibrosis; Cystinuria; Cytochrome c oxidase i deficiency; Cytochrome-c oxidase deficiency; D-2-hydroxyglutaric aciduria 2; Darier disease, segmental; Deafness with labyrinthine aplasia microtia and microdontia (LAMM); Deafness, autosomal dominant 3a, 4, 12, 13, 15, autosomal dominant nonsyndromic sensorineural 17, 20, and 65; Deafness, autosomal recessive 1A, 2, 3, 6, 8, 9, 12, 15, 16, 18b, 22, 28, 31, 44, 49, 63, 77, 86, and 89; Deafness, cochlear, with myopia and intellectual impairment, without vestibular involvement, autosomal dominant, X-linked 2; Deficiency of 2-methylbutyryl-CoA dehydrogenase; Deficiency of 3-hydroxyacyl-CoA dehydrogenase; Deficiency of alpha-mannosidase; Deficiency of aromatic-L-amino-acid decarboxylase; Deficiency of bisphosphoglycerate mutase; Deficiency of butyryl-CoA dehydrogenase; Deficiency of ferroxidase; Deficiency of galactokinase; Deficiency of guanidinoacetate methyltransferase; Deficiency of hyaluronoglucosaminidase; Deficiency of ribose-5-phosphate isomerase; Deficiency of steroid 11-beta-monooxygenase; Deficiency of UDPglucose-hexose-1-phosphate uridylyltransferase; Deficiency of xanthine oxidase; Dejerine-Sottas disease; Charcot-Marie-Tooth disease, types ID and IVF; Dejerine-Sottas syndrome, autosomal dominant; Dendritic cell, monocyte, B lymphocyte, and natural killer lymphocyte deficiency; Desbuquois dysplasia 2; Desbuquois syndrome; DFNA 2 Nonsyndromic Hearing Loss; Diabetes mellitus and insipidus with optic atrophy and deafness; Diabetes mellitus, type 2, and insulin-dependent, 20; Diamond-Blackfan anemia 1, 5, 8, and 10; Diarrhea 3 (secretory sodium, congenital, syndromic) and 5 (with tufting enteropathy, congenital); Dicarboxylic aminoaciduria; Diffuse palmoplantar keratoderma, Bothnian type; Digitorenocerebral syndrome; Dihydropteridine reductase deficiency; Dilated cardiomyopathy 1A, 1AA, 1C, 1G, 1BB, 1DD, 1FF, 1HH, 11, 1KK, 1N, 1S, 1Y, and 3B; Left ventricular noncompaction 3; Disordered steroidogenesis due to cytochrome p450 oxidoreductase deficiency; Distal arthrogryposis type 2B; Distal hereditary motor neuronopathy type 2B; Distal myopathy Markesbery-Griggs type; Distal spinal muscular atrophy, X-linked 3; Distichiasis-lymphedema syndrome; Dominant dystrophic epidermolysis bullosa with absence of skin; Dominant hereditary optic atrophy; Donnai Barrow syndrome; Dopamine beta hydroxylase deficiency; Dopamine receptor d2, reduced brain density of; Dowling-degos disease 4; Doyne honeycomb retinal dystrophy; Malattia leventinese; Duane syndrome type 2; Dubin-Johnson syndrome; Duchenne muscular dystrophy; Becker muscular dystrophy; Dysfibrinogenemia; Dyskeratosis congenita autosomal dominant and autosomal dominant, 3; Dyskeratosis congenita, autosomal recessive, 1, 3, 4, and 5; Dyskeratosis congenita X-linked; Dyskinesia, familial, with facial myokymia; Dysplasminogenemia; Dystonia 2 (torsion, autosomal recessive), 3 (torsion, X-linked), 5 (Dopa-responsive type), 10, 12, 16, 25, 26 (Myoclonic); Seizures, benign familial infantile, 2; Early infantile epileptic encephalopathy 2, 4, 7, 9, 10, 11, 13, and 14; Atypical Rett syndrome; Early T cell progenitor acute lymphoblastic leukemia; Ectodermal dysplasia skin fragility syndrome; Ectodermal dysplasia-syndactyly syndrome 1; Ectopia lentis, isolated autosomal recessive and dominant; Ectrodactyly, ectodermal dysplasia, and cleft lip/palate syndrome 3; Ehlers-Danlos syndrome type 7 (autosomal recessive), classic type, type 2 (progeroid), hydroxylysine-deficient, type 4, type 4 variant, and due to tenascin-X deficiency; Eichsfeld type congenital muscular dystrophy; Endocrine-cerebroosteodysplasia; Enhanced s-cone syndrome; Enlarged vestibular aqueduct syndrome; Enterokinase deficiency; Epidermodysplasia verruciformis; Epidermolysa bullosa simplex and limb girdle muscular dystrophy, simplex with mottled pigmentation, simplex with pyloric atresia, simplex, autosomal recessive, and with pyloric atresia; Epidermolytic palmoplantar keratoderma; Familial febrile seizures 8; Epilepsy, childhood absence 2, 12 (idiopathic generalized, susceptibility to) 5 (nocturnal frontal lobe), nocturnal frontal lobe type 1, partial, with variable foci, progressive myoclonic 3, and X-linked, with variable learning disabilities and behavior disorders; Epileptic encephalopathy, childhood-onset, early infantile, 1, 19, 23, 25, 30, and 32; Epiphyseal dysplasia, multiple, with myopia and conductive deafness; Episodic ataxia type 2; Episodic pain syndrome, familial, 3; Epstein syndrome; Fechtner syndrome; Erythropoietic protoporphyria; Estrogen resistance; Exudative vitreoretinopathy 6; Fabry disease and Fabry disease, cardiac variant; Factor H, VII, X, v and factor viii, combined deficiency of 2, xiii, a subunit, deficiency; Familial adenomatous polyposis 1 and 3; Familial amyloid nephropathy with urticaria and deafness; Familial cold urticarial; Familial aplasia of the vermis; Familial benign pemphigus; Familial cancer of breast; Breast cancer, susceptibility to; Osteosarcoma; Pancreatic cancer 3; Familial cardiomyopathy; Familial cold autoinflammatory syndrome 2; Familial colorectal cancer; Familial exudative vitreoretinopathy, X-linked; Familial hemiplegic migraine types 1 and 2; Familial hypercholesterolemia; Familial hypertrophic cardiomyopathy 1, 2, 3, 4, 7, 10, 23 and 24; Familial hypokalemia-hypomagnesemia; Familial hypoplastic, glomerulocystic kidney; Familial infantile myasthenia; Familial juvenile gout; Familial Mediterranean fever and Familial mediterranean fever, autosomal dominant; Familial porencephaly; Familial porphyria cutanea tarda; Familial pulmonary capillary hemangiomatosis; Familial renal glucosuria; Familial renal hypouricemia; Familial restrictive cardiomyopathy 1; Familial type 1 and 3 hyperlipoproteinemia; Fanconi anemia, complementation group E, I, N, and O; Fanconi-Bickel syndrome; Favism, susceptibility to; Febrile seizures, familial, 11; Feingold syndrome 1; Fetal hemoglobin quantitative trait locus 1; FG syndrome and FG syndrome 4; Fibrosis of extraocular muscles, congenital, 1, 2, 3a (with or without extraocular involvement), 3b; Fish-eye disease; Fleck corneal dystrophy; Floating-Harbor syndrome; Focal epilepsy with speech disorder with or without mental retardation; Focal segmental glomerulosclerosis 5; Forebrain defects; Frank Ter Haar syndrome; Borrone Di Rocco Crovato syndrome; Frasier syndrome; Wilms tumor 1; Freeman-Sheldon syndrome; Frontometaphyseal dysplasia land 3; Frontotemporal dementia; Frontotemporal dementia and/or amyotrophic lateral sclerosis 3 and 4; Frontotemporal Dementia Chromosome 3-Linked and Frontotemporal dementia ubiquitin-positive; Fructose-biphosphatase deficiency; Fuhrmann syndrome; Gamma-aminobutyric acid transaminase deficiency; Gamstorp-Wohlfart syndrome; Gaucher disease type 1 and Subacute neuronopathic; Gaze palsy, familial horizontal, with progressive scoliosis; Generalized dominant dystrophic epidermolysis bullosa; Generalized epilepsy with febrile seizures plus 3, type 1, type 2; Epileptic encephalopathy Lennox-Gastaut type; Giant axonal neuropathy; Glanzmann thrombasthenia; Glaucoma 1, open angle, e, F, and G; Glaucoma 3, primary congenital, d; Glaucoma, congenital and Glaucoma, congenital, Coloboma; Glaucoma, primary open angle, juvenile-onset; Glioma susceptibility 1; Glucose transporter type 1 deficiency syndrome; Glucose-6-phosphate transport defect; GLUT1 deficiency syndrome 2; Epilepsy, idiopathic generalized, susceptibility to, 12; Glutamate formiminotransferase deficiency; Glutaric acidemia IIA and IIB; Glutaric aciduria, type 1; Gluthathione synthetase deficiency; Glycogen storage disease 0 (muscle), II (adult form), IXa2, IXc, type 1A; type II, type IV, IV (combined hepatic and myopathic), type V, and type VI; Goldmann-Favre syndrome; Gordon syndrome; Gorlin syndrome; Holoprosencephaly sequence; Holoprosencephaly 7; Granulomatous disease, chronic, X-linked, variant; Granulosa cell tumor of the ovary; Gray platelet syndrome; Griscelli syndrome type 3; Groenouw corneal dystrophy type I; Growth and mental retardation, mandibulofacial dysostosis, microcephaly, and cleft palate; Growth hormone deficiency with pituitary anomalies; Growth hormone insensitivity with immunodeficiency; GTP cyclohydrolase I deficiency; Hajdu-Cheney syndrome; Hand foot uterus syndrome; Hearing impairment; Hemangioma, capillary infantile; Hematologic neoplasm; Hemochromatosis type 1, 2B, and 3; Microvascular complications of diabetes 7; Transferrin serum level quantitative trait locus 2; Hemoglobin H disease, nondeletional; Hemolytic anemia, nonspherocytic, due to glucose phosphate isomerase deficiency; Hemophagocytic lymphohistiocytosis, familial, 2; Hemophagocytic lymphohistiocytosis, familial, 3; Heparin cofactor II deficiency; Hereditary acrodermatitis enteropathica; Hereditary breast and ovarian cancer syndrome; Ataxia-telangiectasia-like disorder; Hereditary diffuse gastric cancer; Hereditary diffuse leukoencephalopathy with spheroids; Hereditary factors II, IX, VIII deficiency disease; Hereditary hemorrhagic telangiectasia type 2; Hereditary insensitivity to pain with anhidrosis; Hereditary lymphedema type I; Hereditary motor and sensory neuropathy with optic atrophy; Hereditary myopathy with early respiratory failure; Hereditary neuralgic amyotrophy; Hereditary Nonpolyposis Colorectal Neoplasms; Lynch syndrome I and II; Hereditary pancreatitis; Pancreatitis, chronic, susceptibility to; Hereditary sensory and autonomic neuropathy type IIB and IIA; Hereditary sideroblastic anemia; Hermansky-Pudlak syndrome 1, 3, 4, and 6; Heterotaxy, visceral, 2, 4, and 6, autosomal; Heterotaxy, visceral, X-linked; Heterotopia; Histiocytic medullary reticulosis; Histiocytosis-lymphadenopathy plus syndrome; Holocarboxylase synthetase deficiency; Holoprosencephaly 2, 3, 7, and 9; Holt-Oram syndrome; Homocysteinemia due to MTHFR deficiency, CBS deficiency, and Homocystinuria, pyridoxine-responsive; Homocystinuria-Megaloblastic anemia due to defect in cobalamin metabolism, cblE complementation type; Howel-Evans syndrome; Hurler syndrome; Hutchinson-Gilford syndrome; Hydrocephalus; Hyperammonemia, type III; Hypercholesterolaemia and Hypercholesterolemia, autosomal recessive; Hyperekplexia 2 and Hyperekplexia hereditary; Hyperferritinemia cataract syndrome; Hyperglycinuria; Hyperimmunoglobulin D with periodic fever; Mevalonic aciduria; Hyperimmunoglobulin E syndrome; Hyperinsulinemic hypoglycemia familial 3, 4, and 5; Hyperinsulinism-hyperammonemia syndrome; Hyperlysinemia; Hypermanganesemia with dystonia, polycythemia and cirrhosis; Hyperornithinemia-hyperammonemia-homocitrullinuria syndrome; Hyperparathyroidism 1 and 2; Hyperparathyroidism, neonatal severe; Hyperphenylalaninemia, bh4-deficient, a, due to partial pts deficiency, BH4-deficient, D, and non-pku; Hyperphosphatasia with mental retardation syndrome 2, 3, and 4; Hypertrichotic osteochondrodysplasia; Hypobetalipoproteinemia, familial, associated with apob32; Hypocalcemia, autosomal dominant 1; Hypocalciuric hypercalcemia, familial, types 1 and 3; Hypochondrogenesis; Hypochromic microcytic anemia with iron overload; Hypoglycemia with deficiency of glycogen synthetase in the liver; Hypogonadotropic hypogonadism 11 with or without anosmia; Hypohidrotic ectodermal dysplasia with immune deficiency; Hypohidrotic X-linked ectodermal dysplasia; Hypokalemic periodic paralysis 1 and 2; Hypomagnesemia 1, intestinal; Hypomagnesemia, seizures, and mental retardation; Hypomyelinating leukodystrophy 7; Hypoplastic left heart syndrome; Atrioventricular septal defect and common atrioventricular junction; Hypospadias 1 and 2, X-linked; Hypothyroidism, congenital, nongoitrous, 1; Hypotrichosis 8 and 12; Hypotrichosis-lymphedema-telangiectasia syndrome; I blood group system; Ichthyosis bullosa of Siemens; Ichthyosis exfoliativa; Ichthyosis prematurity syndrome; Idiopathic basal ganglia calcification 5; Idiopathic fibrosing alveolitis, chronic form; Dyskeratosis congenita, autosomal dominant, 2 and 5; Idiopathic hypercalcemia of infancy; Immune dysfunction with T-cell inactivation due to calcium entry defect 2; Immunodeficiency 15, 16, 19, 30, 31C, 38, 40, 8, due to defect in cd3-zeta, with hyper IgM type 1 and 2, and X-Linked, with magnesium defect, Epstein-Barr virus infection, and neoplasia; Immunodeficiency-centromeric instability-facial anomalies syndrome 2; Inclusion body myopathy 2 and 3; Nonaka myopathy; Infantile convulsions and paroxysmal choreoathetosis, familial; Infantile cortical hyperostosis; Infantile GM1 gangliosidosis; Infantile hypophosphatasia; Infantile nephronophthisis; Infantile nystagmus, X-linked; Infantile Parkinsonism-dystonia; Infertility associated with multi-tailed spermatozoa and excessive DNA; Insulin resistance; Insulin-resistant diabetes mellitus and acanthosis nigricans; Insulin-dependent diabetes mellitus secretory diarrhea syndrome; Interstitial nephritis, karyomegalic; Intrauterine growth retardation, metaphyseal dysplasia, adrenal hypoplasia congenita, and genital anomalies; Iodotyrosyl coupling defect; IRAK4 deficiency; Iridogoniodysgenesis dominant type and type 1; Iron accumulation in brain; Ischiopatellar dysplasia; Islet cell hyperplasia; Isolated 17,20-lyase deficiency; Isolated lutropin deficiency; Isovaleryl-CoA dehydrogenase deficiency; Jankovic Rivera syndrome; Jervell and Lange-Nielsen syndrome 2; Joubert syndrome 1, 6, 7, 9/15 (digenic), 14, 16, and 17, and Orofaciodigital syndrome xiv; Junctional epidermolysis bullosa gravis of Herlitz; Juvenile GM>1< gangliosidosis; Juvenile polyposis syndrome; Juvenile polyposis/hereditary hemorrhagic telangiectasia syndrome; Juvenile retinoschisis; Kabuki make-up syndrome; Kallmann syndrome 1, 2, and 6; Delayed puberty; Kanzaki disease; Karak syndrome; Kartagener syndrome; Kenny-Caffey syndrome type 2; Keppen-Lubinsky syndrome; Keratoconus 1; Keratosis follicularis; Keratosis palmoplantaris striata 1; Kindler syndrome; L-2-hydroxyglutaric aciduria; Larsen syndrome, dominant type; Lattice corneal dystrophy Type III; Leber amaurosis; Zellweger syndrome; Peroxisome biogenesis disorders; Zellweger syndrome spectrum; Leber congenital amaurosis 11, 12, 13, 16, 4, 7, and 9; Leber optic atrophy; Aminoglycoside-induced deafness; Deafness, nonsyndromic sensorineural, mitochondrial; Left ventricular noncompaction 5; Left-right axis malformations; Leigh disease; Mitochondrial short-chain Enoyl-CoA Hydratase 1 deficiency; Leigh syndrome due to mitochondrial complex I deficiency; Leiner disease; Leri Weill dyschondrosteosis; Lethal congenital contracture syndrome 6; Leukocyte adhesion deficiency type I and III; Leukodystrophy, Hypomyelinating, 11 and 6; Leukoencephalopathy with ataxia, with Brainstem and Spinal Cord Involvement and Lactate Elevation, with vanishing white matter, and progressive, with ovarian failure; Leukonychia totalis; Lewy body dementia; Lichtenstein-Knorr Syndrome; Li-Fraumeni syndrome 1; Lig4 syndrome; Limb-girdle muscular dystrophy, type 1B, 2A, 2B, 2D, C1, C5, C9, C14; Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, type A14 and B14; Lipase deficiency combined; Lipid proteinosis; Lipodystrophy, familial partial, type 2 and 3; Lissencephaly 1, 2 (X-linked), 3, 6 (with microcephaly), X-linked; Subcortical laminar heterotopia, X-linked; Liver failure acute infantile; Loeys-Dietz syndrome 1, 2, 3; Long QT syndrome 1, 2, 2/9, 2/5, (digenic), 3, 5 and 5, acquired, susceptibility to; Lung cancer; Lymphedema, hereditary, id; Lymphedema, primary, with myelodysplasia; Lymphoproliferative syndrome 1, 1 (X-linked), and 2; Lysosomal acid lipase deficiency; Macrocephaly, macrosomia, facial dysmorphism syndrome; Macular dystrophy, vitelliform, adult-onset; Malignant hyperthermia susceptibility type 1; Malignant lymphoma, non-Hodgkin; Malignant melanoma; Malignant tumor of prostate; Mandibuloacral dysostosis; Mandibuloacral dysplasia with type A or B lipodystrophy, atypical; Mandibulofacial dysostosis, Treacher Collins type, autosomal recessive; Mannose-binding protein deficiency; Maple syrup urine disease type 1A and type 3; Marden Walker like syndrome; Marfan syndrome; Marinesco-Sj\xc3\xb6gren syndrome; Martsolf syndrome; Maturity-onset diabetes of the young, type 1, type 2, type 11, type 3, and type 9; May-Hegglin anomaly; MYH9 related disorders; Sebastian syndrome; McCune-Albright syndrome; Somatotroph adenoma; Sex cord-stromal tumor; Cushing syndrome; McKusick Kaufman syndrome; McLeod neuroacanthocytosis syndrome; Meckel-Gruber syndrome; Medium-chain acyl-coenzyme A dehydrogenase deficiency; Medulloblastoma; Megalencephalic leukoencephalopathy with subcortical cysts land 2a; Megalencephaly cutis marmorata telangiectatica congenital; PIK3CA Related Overgrowth Spectrum; Megalencephaly-polymicrogyria-polydactyly-hydrocephalus syndrome 2; Megaloblastic anemia, thiamine-responsive, with diabetes mellitus and sensorineural deafness; Meier-Gorlin syndromes land 4; Melnick-Needles syndrome; Meningioma; Mental retardation, X-linked, 3, 21, 30, and 72; Mental retardation and microcephaly with pontine and cerebellar hypoplasia; Mental retardation X-linked syndromic 5; Mental retardation, anterior maxillary protrusion, and strabismus; Mental retardation, autosomal dominant 12, 13, 15, 24, 3, 30, 4, 5, 6, and 9; Mental retardation, autosomal recessive 15, 44, 46, and 5; Mental retardation, stereotypic movements, epilepsy, and/or cerebral malformations; Mental retardation, syndromic, Claes-Jensen type, X-linked; Mental retardation, X-linked, nonspecific, syndromic, Hedera type, and syndromic, wu type; Merosin deficient congenital muscular dystrophy; Metachromatic leukodystrophy juvenile, late infantile, and adult types; Metachromatic leukodystrophy; Metatrophic dysplasia; Methemoglobinemia types I and 2; Methionine adenosyltransferase deficiency, autosomal dominant; Methylmalonic acidemia with homocystinuria; Methylmalonic aciduria cblB type; Methylmalonic aciduria due to methylmalonyl-CoA mutase deficiency; METHYLMALONIC ACIDURIA, mut(0) TYPE; Microcephalic osteodysplastic primordial dwarfism type 2; Microcephaly with or without chorioretinopathy, lymphedema, or mental retardation; Microcephaly, hiatal hernia and nephrotic syndrome; Microcephaly; Hypoplasia of the corpus callosum; Spastic paraplegia 50, autosomal recessive; Global developmental delay; CNS hypomyelination; Brain atrophy; Microcephaly, normal intelligence and immunodeficiency; Microcephaly-capillary malformation syndrome; Microcytic anemia; Microphthalmia syndromic 5, 7, and 9; Microphthalmia, isolated 3, 5, 6, 8, and with coloboma 6; Microspherophakia; Migraine, familial basilar; Miller syndrome; Minicore myopathy with external ophthalmoplegia; Myopathy, congenital with cores; Mitchell-Riley syndrome; mitochondrial 3-hydroxy-3-methylglutaryl-CoA synthase deficiency; Mitochondrial complex I, II, III, III (nuclear type 2, 4, or 8) deficiency; Mitochondrial DNA depletion syndrome 11, 12 (cardiomyopathic type), 2, 4B (MNGIE type), 8B (MNGIE type); Mitochondrial DNA-depletion syndrome 3 and 7, hepatocerebral types, and 13 (encephalomyopathic type); Mitochondrial phosphate carrier and pyruvate carrier deficiency; Mitochondrial trifunctional protein deficiency; Long-chain 3-hydroxyacyl-CoA dehydrogenase deficiency; Miyoshi muscular dystrophy 1; Myopathy, distal, with anterior tibial onset; Mohr-Tranebjaerg syndrome; Molybdenum cofactor deficiency, complementation group A; Mowat-Wilson syndrome; Mucolipidosis III Gamma; Mucopolysaccharidosis type VI, type VI (severe), and type VII; Mucopolysaccharidosis, MPS-I-H/S, MPS-II, MPS-III-A, MPS-III-B, MPS-III-C, MPS-IV-A, MPS-IV-B; Retinitis Pigmentosa 73; Gangliosidosis GM1 type1 (with cardiac involvement) 3; Multicentric osteolysis nephropathy; Multicentric osteolysis, nodulosis and arthropathy; Multiple congenital anomalies; Atrial septal defect 2; Multiple congenital anomalies-hypotonia-seizures syndrome 3; Multiple Cutaneous and Mucosal Venous Malformations; Multiple endocrine neoplasia, types land 4; Multiple epiphyseal dysplasia 5 or Dominant; Multiple gastrointestinal atresias; Multiple pterygium syndrome Escobar type; Multiple sulfatase deficiency; Multiple synostoses syndrome 3; Muscle AMP guanine oxidase deficiency; Muscle eye brain disease; Muscular dystrophy, congenital, megaconial type; Myasthenia, familial infantile, 1; Myasthenic Syndrome, Congenital, 11, associated with acetylcholine receptor deficiency; Myasthenic Syndrome, Congenital, 17, 2A (slow-channel), 4B (fast-channel), and without tubular aggregates; Myeloperoxidase deficiency; MYH-associated polyposis; Endometrial carcinoma; Myocardial infarction 1; Myoclonic dystonia; Myoclonic-Atonic Epilepsy; Myoclonus with epilepsy with ragged red fibers; Myofibrillar myopathy 1 and ZASP-related; Myoglobinuria, acute recurrent, autosomal recessive; Myoneural gastrointestinal encephalopathy syndrome; Cerebellar ataxia infantile with progressive external ophthalmoplegia; Mitochondrial DNA depletion syndrome 4B, MNGIE type; Myopathy, centronuclear, 1, congenital, with excess of muscle spindles, distal, 1, lactic acidosis, and sideroblastic anemia 1, mitochondrial progressive with congenital cataract, hearing loss, and developmental delay, and tubular aggregate, 2; Myopia 6; Myosclerosis, autosomal recessive; Myotonia congenital; Congenital myotonia, autosomal dominant and recessive forms; Nail-patella syndrome; Nance-Horan syndrome; Nanophthalmos 2; Navajo neurohepatopathy; Nemaline myopathy 3 and 9; Neonatal hypotonia; Intellectual disability; Seizures; Delayed speech and language development; Mental retardation, autosomal dominant 31; Neonatal intrahepatic cholestasis caused by citrin deficiency; Nephrogenic diabetes insipidus, Nephrogenic diabetes insipidus, X-linked; Nephrolithiasis/osteoporosis, hypophosphatemic, 2; Nephronophthisis 13, 15 and 4; Infertility; Cerebello-oculo-renal syndrome (nephronophthisis, oculomotor apraxia and cerebellar abnormalities); Nephrotic syndrome, type 3, type 5, with or without ocular abnormalities, type 7, and type 9; Nestor-Guillermo progeria syndrome; Neu-Laxova syndrome 1; Neurodegeneration with brain iron accumulation 4 and 6; Neuroferritinopathy; Neurofibromatosis, type land type 2; Neurofibrosarcoma; Neurohypophyseal diabetes insipidus; Neuropathy, Hereditary Sensory, Type IC; Neutral 1 amino acid transport defect; Neutral lipid storage disease with myopathy; Neutrophil immunodeficiency syndrome; Nicolaides-Baraitser syndrome; Niemann-Pick disease type C1, C2, type A, and type C1, adult form; Non-ketotic hyperglycinemia; Noonan syndrome 1 and 4, LEOPARD syndrome 1; Noonan syndrome-like disorder with or without juvenile myelomonocytic leukemia; Normokalemic periodic paralysis, potassium-sensitive; Norum disease; Epilepsy, Hearing Loss, And Mental Retardation Syndrome; Mental Retardation, X-Linked 102 and syndromic 13; Obesity; Ocular albinism, type I; Oculocutaneous albinism type 1B, type 3, and type 4; Oculodentodigital dysplasia; Odontohypophosphatasia; Odontotrichomelic syndrome; Oguchi disease; Oligodontia-colorectal cancer syndrome; Opitz G/BBB syndrome; Optic atrophy 9; Oral-facial-digital syndrome; Ornithine aminotransferase deficiency; Orofacial cleft 11 and 7, Cleft lip/palate-ectodermal dysplasia syndrome; Orstavik Lindemann Solberg syndrome; Osteoarthritis with mild chondrodysplasia; Osteochondritis dissecans; Osteogenesis imperfecta type 12, type 5, type 7, type 8, type I, type III, with normal sclerae, dominant form, recessive perinatal lethal; Osteopathia striata with cranial sclerosis; Osteopetrosis autosomal dominant type 1 and 2, recessive 4, recessive 1, recessive 6; Osteoporosis with pseudoglioma; Oto-palato-digital syndrome, types I and II; Ovarian dysgenesis 1; Ovarioleukodystrophy; Pachyonychia congenita 4 and type 2; Paget disease of bone, familial; Pallister-Hall syndrome; Palmoplantar keratoderma, nonepidermolytic, focal or diffuse; Pancreatic agenesis and congenital heart disease; Papillon-Lef\xc3\xa8vre syndrome; Paragangliomas 3; Paramyotonia congenita of von Eulenburg; Parathyroid carcinoma; Parkinson disease 14, 15, 19 (juvenile-onset), 2, 20 (early-onset), 6, (autosomal recessive early-onset, and 9; Partial albinism; Partial hypoxanthine-guanine phosphoribosyltransferase deficiency; Patterned dystrophy of retinal pigment epithelium; PC-K6a; Pelizaeus-Merzbacher disease; Pendred syndrome; Peripheral demyelinating neuropathy, central dysmyelination; Hirschsprung disease; Permanent neonatal diabetes mellitus; Diabetes mellitus, permanent neonatal, with neurologic features; Neonatal insulin-dependent diabetes mellitus; Maturity-onset diabetes of the young, type 2; Peroxisome biogenesis disorder 14B, 2A, 4A, 5B, 6A, 7A, and 7B; Perrault syndrome 4; Perry syndrome; Persistent hyperinsulinemic hypoglycemia of infancy; familial hyperinsulinism; Phenotypes; Phenylketonuria; Pheochromocytoma; Hereditary Paraganglioma-Pheochromocytoma Syndromes; Paragangliomas 1; Carcinoid tumor of intestine; Cowden syndrome 3; Phosphoglycerate dehydrogenase deficiency; Phosphoglycerate kinase 1 deficiency; Photosensitive trichothiodystrophy; Phytanic acid storage disease; Pick disease; Pierson syndrome; Pigmentary retinal dystrophy; Pigmented nodular adrenocortical disease, primary, 1; Pilomatrixoma; Pitt-Hopkins syndrome; Pituitary dependent hypercortisolism; Pituitary hormone deficiency, combined 1, 2, 3, and 4; Plasminogen activator inhibitor type 1 deficiency; Plasminogen deficiency, type I; Platelet-type bleeding disorder 15 and 8; Poikiloderma, hereditary fibrosing, with tendon contractures, myopathy, and pulmonary fibrosis; Polycystic kidney disease 2, adult type, and infantile type; Polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy; Polyglucosan body myopathy 1 with or without immunodeficiency; Polymicrogyria, asymmetric, bilateral frontoparietal; Polyneuropathy, hearing loss, ataxia, retinitis pigmentosa, and cataract; Pontocerebellar hypoplasia type 4; Popliteal pterygium syndrome; Porencephaly 2; Porokeratosis 8, disseminated superficial actinic type; Porphobilinogen synthase deficiency; Porphyria cutanea tarda; Posterior column ataxia with retinitis pigmentosa; Posterior polar cataract type 2; Prader-Willi-like syndrome; Premature ovarian failure 4, 5, 7, and 9; Primary autosomal recessive microcephaly 10, 2, 3, and 5; Primary ciliary dyskinesia 24; Primary dilated cardiomyopathy; Left ventricular noncompaction 6; 4, Left ventricular noncompaction 10; Paroxysmal atrial fibrillation; Primary hyperoxaluria, type I, type, and type III; Primary hypertrophic osteoarthropathy, autosomal recessive 2; Primary hypomagnesemia; Primary open angle glaucoma juvenile onset 1; Primary pulmonary hypertension; Primrose syndrome; Progressive familial heart block type 1B; Progressive familial intrahepatic cholestasis 2 and 3; Progressive intrahepatic cholestasis; Progressive myoclonus epilepsy with ataxia; Progressive pseudorheumatoid dysplasia; Progressive sclerosing poliodystrophy; Prolidase deficiency; Proline dehydrogenase deficiency; Schizophrenia 4; Properdin deficiency, X-linked; Propionic academia; Proprotein convertase ⅓ deficiency; Prostate cancer, hereditary, 2; Protan defect; Proteinuria; Finnish congenital nephrotic syndrome; Proteus syndrome; Breast adenocarcinoma; Pseudoachondroplastic spondyloepiphyseal dysplasia syndrome; Pseudohypoaldosteronism type 1 autosomal dominant and recessive and type 2; Pseudohypoparathyroidism type 1A, Pseudopseudohypoparathyroidism; Pseudoneonatal adrenoleukodystrophy; Pseudoprimary hyperaldosteronism; Pseudoxanthoma elasticum; Generalized arterial calcification of infancy 2; Pseudoxanthoma elasticum-like disorder with multiple coagulation factor deficiency; Psoriasis susceptibility 2; PTEN hamartoma tumor syndrome; Pulmonary arterial hypertension related to hereditary hemorrhagic telangiectasia; Pulmonary Fibrosis And/Or Bone Marrow Failure, Telomere-Related, 1 and 3; Pulmonary hypertension, primary, 1, with hereditary hemorrhagic telangiectasia; Purine-nucleoside phosphorylase deficiency; Pyruvate carboxylase deficiency; Pyruvate dehydrogenase E1-alpha deficiency; Pyruvate kinase deficiency of red cells; Raine syndrome; Rasopathy; Recessive dystrophic epidermolysis bullosa; Nail disorder, nonsyndromic congenital, 8; Reifenstein syndrome; Renal adysplasia; Renal carnitine transport defect; Renal coloboma syndrome; Renal dysplasia; Renal dysplasia, retinal pigmentary dystrophy, cerebellar ataxia and skeletal dysplasia; Renal tubular acidosis, distal, autosomal recessive, with late-onset sensorineural hearing loss, or with hemolytic anemia; Renal tubular acidosis, proximal, with ocular abnormalities and mental retardation; Retinal cone dystrophy 3B; Retinitis pigmentosa; Retinitis pigmentosa 10, 11, 12, 14, 15, 17, and 19; Retinitis pigmentosa 2, 20, 25, 35, 36, 38, 39, 4, 40, 43, 45, 48, 66, 7, 70, 72; Retinoblastoma; Rett disorder; Rhabdoid tumor predisposition syndrome 2; Rhegmatogenous retinal detachment, autosomal dominant; Rhizomelic chondrodysplasia punctata type 2 and type 3; Roberts-SC phocomelia syndrome; Robinow Sorauf syndrome; Robinow syndrome, autosomal recessive, autosomal recessive, with brachy-syn-polydactyly; Rothmund-Thomson syndrome; Rapadilino syndrome; RRM2B-related mitochondrial disease; Rubinstein-Taybi syndrome; Salla disease; Sandhoff disease, adult and infantil types; Sarcoidosis, early-onset; Blau syndrome; Schindler disease, type 1; Schizencephaly; Schizophrenia 15; Schneckenbecken dysplasia; Schwannomatosis 2; Schwartz Jampel syndrome type 1; Sclerocornea, autosomal recessive; Sclerosteosis; Secondary hypothyroidism; Segawa syndrome, autosomal recessive; Senior-Loken syndrome 4 and 5; Sensory ataxic neuropathy, dysarthria, and ophthalmoparesis; Sepiapterin reductase deficiency; SeSAME syndrome; Severe combined immunodeficiency due to ADA deficiency, with microcephaly, growth retardation, and sensitivity to ionizing radiation, atypical, autosomal recessive, T cell-negative, B cell-positive, NK cell-negative of NK-positive; Severe congenital neutropenia; Severe congenital neutropenia 3, autosomal recessive or dominant; Severe congenital neutropenia and 6, autosomal recessive; Severe myoclonic epilepsy in infancy; Generalized epilepsy with febrile seizures plus, types 1 and 2; Severe X-linked myotubular myopathy; Short QT syndrome 3; Short stature with nonspecific skeletal abnormalities; Short stature, auditory canal atresia, mandibular hypoplasia, skeletal abnormalities; Short stature, onychodysplasia, facial dysmorphism, and hypotrichosis; Primordial dwarfism; Short-rib thoracic dysplasia 11 or 3 with or without polydactyly; Sialidosis type I and II; Silver spastic paraplegia syndrome; Slowed nerve conduction velocity, autosomal dominant; Smith-Lemli-Opitz syndrome; Snyder Robinson syndrome; Somatotroph adenoma; Prolactinoma; familial, Pituitary adenoma predisposition; Sotos syndrome 1 or 2; Spastic ataxia 5, autosomal recessive, Charlevoix-Saguenay type, 1, 10, or 11, autosomal recessive; Amyotrophic lateral sclerosis type 5; Spastic paraplegia 15, 2, 3, 35, 39, 4, autosomal dominant, 55, autosomal recessive, and 5A; Bile acid synthesis defect, congenital, 3; Spermatogenic failure 11, 3, and 8; Spherocytosis types 4 and 5; Spheroid body myopathy; Spinal muscular atrophy, lower extremity predominant 2, autosomal dominant; Spinal muscular atrophy, type II; Spinocerebellar ataxia 14, 21, 35, 40, and 6; Spinocerebellar ataxia autosomal recessive 1 and 16; Splenic hypoplasia; Spondylocarpotarsal synostosis syndrome; Spondylocheirodysplasia, Ehlers-Danlos syndrome-like, with immune dysregulation, Aggrecan type, with congenital joint dislocations, short limb-hand type, Sedaghatian type, with cone-rod dystrophy, and Kozlowski type; Parastremmatic dwarfism; Stargardt disease 1; Cone-rod dystrophy 3; Stickler syndrome type 1; Kniest dysplasia; Stickler syndrome, types 1 (nonsyndromic ocular) and 4; Sting-associated vasculopathy, infantile-onset; Stormorken syndrome; Sturge-Weber syndrome, Capillary malformations, congenital, 1; Succinyl-CoA acetoacetate transferase deficiency; Sucrase-isomaltase deficiency; Sudden infant death syndrome; Sulfite oxidase deficiency, isolated; Supravalvar aortic stenosis; Surfactant metabolism dysfunction, pulmonary, 2 and 3; Symphalangism, proximal, 1b; Syndactyly Cenani Lenz type; Syndactyly type 3; Syndromic X-linked mental retardation 16; Talipes equinovarus; Tangier disease; TARP syndrome; Tay-Sachs disease, B1 variant, Gm2-gangliosidosis (adult), Gm2-gangliosidosis (adult-onset); Temtamy syndrome; Tenorio Syndrome; Terminal osseous dysplasia; Testosterone 17-beta-dehydrogenase deficiency; Tetraamelia, autosomal recessive; Tetralogy of Fallot; Hypoplastic left heart syndrome 2; Truncus arteriosus; Malformation of the heart and great vessels; Ventricular septal defect 1; Thiel-Behnke corneal dystrophy; Thoracic aortic aneurysms and aortic dissections; Marfanoid habitus; Three M syndrome 2; Thrombocytopenia, platelet dysfunction, hemolysis, and imbalanced globin synthesis; Thrombocytopenia, X-linked; Thrombophilia, hereditary, due to protein C deficiency, autosomal dominant and recessive; Thyroid agenesis; Thyroid cancer, follicular; Thyroid hormone metabolism, abnormal; Thyroid hormone resistance, generalized, autosomal dominant; Thyrotoxic periodic paralysis and Thyrotoxic periodic paralysis 2; Thyrotropin-releasing hormone resistance, generalized; Timothy syndrome; TNF receptor-associated periodic fever syndrome (TRAPS); Tooth agenesis, selective, 3 and 4; Torsades de pointes; Townes-Brocks-branchiootorenal-like syndrome; Transient bullous dermolysis of the newborn; Treacher collins syndrome 1; Trichomegaly with mental retardation, dwarfism and pigmentary degeneration of retina; Trichorhinophalangeal dysplasia type I; Trichorhinophalangeal syndrome type 3; Trimethylaminuria; Tuberous sclerosis syndrome; Lymphangiomyomatosis; Tuberous sclerosis 1 and 2; Tyrosinase-negative oculocutaneous albinism; Tyrosinase-positive oculocutaneous albinism; Tyrosinemia type I; UDPglucose-4-epimerase deficiency; Ullrich congenital muscular dystrophy; Ulna and fibula absence of with severe limb deficiency; Upshaw-Schulman syndrome; Urocanate hydratase deficiency; Usher syndrome, types 1, 1B, 1D, 1G, 2A, 2C, and 2D; Retinitis pigmentosa 39; UV-sensitive syndrome; Van der Woude syndrome; Van Maldergem syndrome 2; Hennekam lymphangiectasia-lymphedema syndrome 2; Variegate porphyria; Ventriculomegaly with cystic kidney disease; Verheij syndrome; Very long chain acyl-CoA dehydrogenase deficiency; Vesicoureteral reflux 8; Visceral heterotaxy 5, autosomal; Visceral myopathy; Vitamin D-dependent rickets, types land 2; Vitelliform dystrophy; von Willebrand disease type 2M and type 3; Waardenburg syndrome type 1, 4C, and 2E (with neurologic involvement); Klein-Waardenberg syndrome; Walker-Warburg congenital muscular dystrophy; Warburg micro syndrome 2 and 4; Warts, hypogammaglobulinemia, infections, and myelokathexis; Weaver syndrome; Weill-Marchesani syndrome 1 and 3; Weill-Marchesani-like syndrome; Weissenbacher-Zweymuller syndrome; Werdnig-Hoffmann disease; Charcot-Marie-Tooth disease; Werner syndrome; WFS1-Related Disorders; Wiedemann-Steiner syndrome; Wilson disease; Wolfram-like syndrome, autosomal dominant; Worth disease; Van Buchem disease type 2; Xeroderma pigmentosum, complementation group b, group D, group E, and group G; X-linked agammaglobulinemia; X-linked hereditary motor and sensory neuropathy; X-linked ichthyosis with steryl-sulfatase deficiency; X-linked periventricular heterotopia; Oto-palato-digital syndrome, type I; X-linked severe combined immunodeficiency; Zimmermann-Laband syndrome and Zimmermann-Laband syndrome 2; and Zonular pulverulent cataract 3.

In a particular aspect, the instant disclosure provides TPRT-based methods for the treatment of a subject diagnosed with an expansion repeat disorder (also known as a repeat expansion disorder or a trinucleotide repeat disorder). Expansion repeat disorders occur when microsatellite repeats expand beyond a threshold length. Currently, at least 30 genetic diseases are believed to be caused by repeat expansions. Scientific understanding of this diverse group of disorders came to lights in the early 1990's with the discovery that trinucleotide repeats underlie several major inherited conditions, including Fragile X, Spinal and Bulbar Muscular Atrophy, Myotonic Dystrophy, and Huntington's disease (Nelson et al, “The unstable repeats—three evolving faces of neurological disease,” Neuron, Mar. 6, 2013, Vol. 77; 825-843, which is incorporated herein by reference), as well as Haw River Syndrome, Jacobsen Syndrome, Dentatorubral-pallidoluysian atrophy (DRPLA), Machado-Joseph disease, Synpolydactyly (SPD II), Hand-foot genital syndrome (HFGS), Cleidocranial dysplasia (CCD), Holoprosencephaly disorder (HPE), Congenital central hypventilation syndrome (CCHS), ARX-nonsyndromic X-linked mental retardation (XLMR), and Oculopharyngeal muscular dystrophy (OPMD) (see. Microsatellite repeat instability was found to be a hallmark of these conditions, as was anticipation—the phenomenon in which repeat expansion can occur with each successive generation, which leads to a more severe phenotype and earlier age of onset in the offspring. Repeat expansions are believed to cause diseases via several different mechanisms. Namely, expansions may interfere with cellular functioning at the level of the gene, the mRNA transcript, and/or the encoded protein. In some conditions, mutations act via a loss-of-function mechanism by silencing repeat-containing genes. In others, disease results from gain-of-function mechanisms, whereby either the mRNA transcript or protein takes on new, aberrant functions.

Pharmaceutical Compositions

Other aspects of the present disclosure relate to pharmaceutical compositions comprising any of the various components of the prime editing system described herein (e.g., including, but not limited to, the napRNAbps, RDRPs, fusion proteins (e.g., comprising napRNAbp:RDRP fusions), rpegRNAs, and complexes comprising fusion proteins and rpegRNAs, as well as accessory elements.

The term “pharmaceutical composition”, as used herein, refers to a composition formulated for pharmaceutical use. In some embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable carrier. In some embodiments, the pharmaceutical composition comprises additional agents (e.g. for specific delivery, increasing half-life, or other therapeutic compounds).

As used here, the term “pharmaceutically-acceptable carrier” means a pharmaceutically-acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., organ, tissue or portion of the body). A pharmaceutically acceptable carrier is “acceptable” in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiologic pH, etc.). Some examples of materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids (23) serum component, such as serum albumin, HDL and LDL; (22) C2-C12 alcohols, such as ethanol; and (23) other non-toxic compatible substances employed in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservative and antioxidants can also be present in the formulation. The terms such as “excipient”, “carrier”, “pharmaceutically acceptable carrier” or the like are used interchangeably herein.

In some embodiments, the pharmaceutical composition is formulated for delivery to a subject, e.g., for gene editing. Suitable routes of administrating the pharmaceutical composition described herein include, without limitation: topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, gingival, intradental, intracochlear, transtympanic, intraorgan, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseus, periocular, intratumoral, intracerebral, and intracerebroventricular administration.

In some embodiments, the pharmaceutical composition described herein is administered locally to a diseased site (e.g., tumor site). In some embodiments, the pharmaceutical composition described herein is administered to a subject by injection, by means of a catheter, by means of a suppository, or by means of an implant, the implant being of a porous, non-porous, or gelatinous material, including a membrane, such as a sialastic membrane, or a fiber.

In other embodiments, the pharmaceutical composition described herein is delivered in a controlled release system. In one embodiment, a pump may be used (see, e.g., Langer, 1990, Science 249:1527-1533; Sefton, 1989, CRC Crit. Ref. Biomed. Eng. 14:201; Buchwald et al., 1980, Surgery 88:507; Saudek et al., 1989, N. Engl. J. Med. 321:574). In another embodiment, polymeric materials can be used. (See, e.g., Medical Applications of Controlled Release (Langer and Wise eds., CRC Press, Boca Raton, Fla., 1974); Controlled Drug Bioavailability, Drug Product Design and Performance (Smolen and Ball eds., Wiley, New York, 1984); Ranger and Peppas, 1983, Macromol. Sci. Rev. Macromol. Chem. 23:61. See also Levy et al., 1985, Science 228:190; During et al., 1989, Ann. Neurol. 25:351; Howard et al., 1989, J. Neurosurg. 71:105). Other controlled release systems are discussed, for example, in Langer, supra.

In some embodiments, the pharmaceutical composition is formulated in accordance with routine procedures as a composition adapted for intravenous or subcutaneous administration to a subject, e.g., a human. In some embodiments, pharmaceutical composition for administration by injection are solutions in sterile isotonic aqueous buffer. Where necessary, the pharmaceutical can also include a solubilizing agent and a local anesthetic such as lignocaine to ease pain at the site of the injection. Generally, the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent. Where the pharmaceutical is to be administered by infusion, it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline. Where the pharmaceutical composition is administered by injection, an ampoule of sterile water for injection or saline can be provided so that the ingredients can be mixed prior to administration.

A pharmaceutical composition for systemic administration may be a liquid, e.g., sterile saline, lactated Ringer's or Hank's solution. In addition, the pharmaceutical composition can be in solid forms and re-dissolved or suspended immediately prior to use. Lyophilized forms are also contemplated.

The pharmaceutical composition can be contained within a lipid particle or vesicle, such as a liposome or microcrystal, which is also suitable for parenteral administration. The particles can be of any suitable structure, such as unilamellar or plurilamellar, so long as compositions are contained therein. Compounds can be entrapped in “stabilized plasmid-lipid particles” (SPLP) containing the fusogenic lipid dioleoylphosphatidylethanolamine (DOPE), low levels (5-10 mol %) of cationic lipid, and stabilized by a polyethyleneglycol (PEG) coating (Zhang Y. P. et al., Gene Ther. 1999, 6:1438-47). Positively charged lipids such as N-[1-(2,3-dioleoyloxi)propyl]-N,N,N-trimethyl-amoniummethylsulfate, or “DOTAP,” are particularly preferred for such particles and vesicles. The preparation of such lipid particles is well known. See, e.g., U.S. Pat. Nos. 4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; and 4,921,757; each of which is incorporated herein by reference.

The pharmaceutical composition described herein may be administered or packaged as a unit dose, for example. The term “unit dose” when used in reference to a pharmaceutical composition of the present disclosure refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent; i.e., carrier, or vehicle.

Further, the pharmaceutical composition can be provided as a pharmaceutical kit comprising (a) a container containing a compound of the invention in lyophilized form and (b) a second container containing a pharmaceutically acceptable diluent (e.g., sterile water) for injection. The pharmaceutically acceptable diluent can be used for reconstitution or dilution of the lyophilized compound of the invention. Optionally associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration.

In another aspect, an article of manufacture containing materials useful for the treatment of the diseases described above is included. In some embodiments, the article of manufacture comprises a container and a label. Suitable containers include, for example, bottles, vials, syringes, and test tubes. The containers may be formed from a variety of materials such as glass or plastic. In some embodiments, the container holds a composition that is effective for treating a disease described herein and may have a sterile access port. For example, the container may be an intravenous solution bag or a vial having a stopper pierce-able by a hypodermic injection needle. The active agent in the composition is a compound of the invention. In some embodiments, the label on or associated with the container indicates that the composition is used for treating the disease of choice. The article of manufacture may further comprise a second container comprising a pharmaceutically-acceptable buffer, such as phosphate-buffered saline, Ringer's solution, or dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.

Viral Delivery Methods

In some aspects, the invention provides methods comprising delivering one or more polynucleotides, such as or one or more vectors as described herein encoding one or more components of the RNA prime editor (RPE) system described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell. In some aspects, the invention further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells. In some embodiments, a RNA prime editor as described herein in combination with (and optionally complexed with) a guide sequence is delivered to a cell. The nucleic acid constructs may be designed in accordance with the particular embodiment of RNA prime editing that is implements. For example, FIGS. 1-4 depict various exemplary embodiments of RNA prime editors. In some embodiments, the prime editor comprises a fusion protein of a Cas13 (e.g., or other napRNAbp) and an RDRP complexed with a rpegRNA, e.g., as shown in FIGS. 1 and 2. In the embodiment of FIG. 3, the RNA prime editing approach involves delivering a second napRNAbp (e.g., a second Cas13) and traditional guide RNA that binds nearby and installs an internal cut site in the target RNA molecule from which RNA extension may proceed. In the embodiment of FIG. 4, the RNA prime editor does not require a rpegRNA comprising the RNA template sequence. Rather, the RNA template sequence is provided in trans, e.g., by a ribozyme that is co-localized to the target RNA by an MS2 targeting system. Any suitable number and/or arrangements of expression vectors may be prepared that are capable of expressing the protein and guide RNA components of the various embodiments of RNA prime editors envisioned here. Separate nucleic acid constructs may also be provided for separate expression of a napRNAbp (e.g., a Cas13 domain) and an RDRP. In addition, the nucleic acid constructs may also include a nucleotide sequence encoding one or more guide RNAs for conducting RNA prime editing, include an rpegRNA which comprises an extended regions having a template sequence. The template sequence may also be provided in trans in other embodiments. Each of these components may be configured to be expressed from one or more nucleic acid vectors in any suitable manner utilizing one or more promoters.

Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in mammalian cells or target tissues. Such methods can be used to administer nucleic acids encoding components of a RNA prime editor to cells in culture, or in a host organism. Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell. For a review of gene therapy procedures, see Anderson, Science 256:808-813 (1992); Nabel & Felgner, TIBTECH 11:211-217 (1993); Mitani & Caskey, TIBTECH 11:162-166 (1993); Dillon, TIBTECH 11:167-175 (1993); Miller, Nature 357:455-460 (1992); Van Brunt, Biotechnology 6(10):1149-1154 (1988); Vigne, Restorative Neurology and Neuroscience 8:35-36 (1995); Kremer & Perricaudet, British Medical Bulletin 51(1):31-44 (1995); Haddada et al., in Current Topics in Microbiology and Immunology Doerfler and Bihm (eds) (1995); and Yu et al., Gene Therapy 1:13-26 (1994).

Methods of non-viral delivery of nucleic acids include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™ and Lipofectin™). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Feigner, WO 91/17424; WO 91/16024. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration).

The preparation of lipid:nucleic acid complexes, including targeted liposomes such as immunolipid complexes, is well known to one of skill in the art (see, e.g., Crystal, Science 270:404-410 (1995); Blaese et al., Cancer Gene Ther. 2:291-297 (1995); Behr et al., Bioconjugate Chem. 5:382-389 (1994); Remy et al., Bioconjugate Chem. 5:647-654 (1994); Gao et al., Gene Therapy 2:710-722 (1995); Ahmad et al., Cancer Res. 52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787).

The use of RNA or DNA viral based systems for the delivery of nucleic acids take advantage of highly evolved processes for targeting a virus to specific cells in the body and trafficking the viral payload to the nucleus. Viral vectors can be administered directly to patients (in vivo) or they can be used to treat cells in vitro, and the modified cells may optionally be administered to patients (ex vivo). Conventional viral based systems could include retroviral, lentivirus, adenoviral, adeno-associated and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.

The tropism of a viruses can be altered by incorporating foreign envelope proteins, expanding the potential target population of target cells. Lentiviral vectors are retroviral vectors that are able to transduce or infect non-dividing cells and typically produce high viral titers. Selection of a retroviral gene transfer system would therefore depend on the target tissue. Retroviral vectors are comprised of cis-acting long terminal repeats with packaging capacity for up to 6-10 kb of foreign sequence. The minimum cis-acting LTRs are sufficient for replication and packaging of the vectors, which are then used to integrate the therapeutic gene into the target cell to provide permanent transgene expression. Widely used retroviral vectors include those based upon murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), Simian Immuno deficiency virus (SIV), human immuno deficiency virus (HIV), and combinations thereof (see, e.g., Buchscher et al., J. Virol. 66:2731-2739 (1992); Johann et al., J. Virol. 66:1635-1640 (1992); Sommnerfelt et al., Virol. 176:58-59 (1990); Wilson et al., J. Virol. 63:2374-2378 (1989); Miller et al., J. Virol. 65:2220-2224 (1991); PCT/US94/05700). In applications where transient expression is preferred, adenoviral based systems may be used. Adenoviral based vectors are capable of very high transduction efficiency in many cell types and do not require cell division. With such vectors, high titer and levels of expression have been obtained. This vector can be produced in large quantities in a relatively simple system. Adeno-associated virus (“AAV”) vectors may also be used to transduce cells with target nucleic acids, e.g., in the in vitro production of nucleic acids and peptides, and for in vivo and ex vivo gene therapy procedures (see, e.g., West et al., Virology 160:38-47 (1987); U.S. Pat. No. 4,797,368; WO 93/24641; Kotin, Human Gene Therapy 5:793-801 (1994); Muzyczka, J. Clin. Invest. 94:1351 (1994). Construction of recombinant AAV vectors are described in a number of publications, including U.S. Pat. No. 5,173,414; Tratschin et al., Mol. Cell. Biol. 5:3251-3260 (1985); Tratschin, et al., Mol. Cell. Biol. 4:2072-2081 (1984); Hermonat & Muzyczka, PNAS 81:6466-6470 (1984); and Samulski et al., J. Virol. 63:03822-3828 (1989).

Packaging cells are typically used to form virus particles that are capable of infecting a host cell. Such cells include 293 cells, which package adenovirus, and W2 cells or PA317 cells, which package retrovirus. Viral vectors used in gene therapy are usually generated by producing a cell line that packages a nucleic acid vector into a viral particle. The vectors typically contain the minimal viral sequences required for packaging and subsequent integration into a host, other viral sequences being replaced by an expression cassette for the polynucleotide(s) to be expressed. The missing viral functions are typically supplied in trans by the packaging cell line. For example, AAV vectors used in gene therapy typically only possess ITR sequences from the AAV genome which are required for packaging and integration into the host genome. Viral DNA is packaged in a cell line, which contains a helper plasmid encoding the other AAV genes, namely rep and cap, but lacking ITR sequences. The cell line may also be infected with adenovirus as a helper. The helper virus promotes replication of the AAV vector and expression of AAV genes from the helper plasmid. The helper plasmid is not packaged in significant amounts due to a lack of ITR sequences. Contamination with adenovirus can be reduced by, e.g., heat treatment to which adenovirus is more sensitive than AAV. Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. Reference is made to US 2003/0087817, published May 8, 2003, International Patent Application No. WO 2016/205764, published Dec. 22, 2016, International Patent Application No. WO 2018/071868, published Apr. 19, 2018, U.S. Patent Publication No. 2018/0127780, published May 10, 2018, and International Patent Application No. PCT/US2020/033873, the disclosures of each of which are incorporated herein by reference.

In various embodiments, the disclosed expression constructs may be engineered for delivery in one or more rAAV vectors. An rAAV as related to any of the methods and compositions provided herein may be of any serotype including any derivative or pseudotype (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 2/1, 2/5, 2/8, 2/9, 3/1, 3/5, 3/8, or 3/9). An rAAV may comprise a genetic load (i.e., a recombinant nucleic acid vector that expresses a gene of interest, such as a whole or split fusion protein that is carried by the rAAV into a cell) that is to be delivered to a cell. An rAAV may be chimeric.

As used herein, the serotype of an rAAV refers to the serotype of the capsid proteins of the recombinant virus. Non-limiting examples of derivatives and pseudotypes include rAAV2/1, rAAV2/5, rAAV2/8, rAAV2/9, AAV2-AAV3 hybrid, AAVrh.10, AAVhu.14, AAV3a/3b, AAVrh32.33, AAV-HSC15, AAV-HSC17, AAVhu.37, AAVrh.8, CHt-P6, AAV2.5, AAV6.2, AAV2i8, AAV-HSC15/17, AAVM41, AAV9.45, AAV6(Y445F/Y731F), AAV2.5T, AAV-HAE1/2, AAV clone 32/83, AAVShH10, AAV2 (Y->F), AAV8 (Y733F), AAV2.15, AAV2.4, AAVM41, and AAVr3.45. A non-limiting example of derivatives and pseudotypes that have chimeric VP1 proteins is rAAV2/5-1VP1u, which has the genome of AAV2, capsid backbone of AAV5 and VP1u of AAV1. Other non-limiting example of derivatives and pseudotypes that have chimeric VP1 proteins are rAAV2/5-8VP1u, rAAV2/9-1VP1u, and rAAV2/9-8VP1u.

AAV derivatives/pseudotypes, and methods of producing such derivatives/pseudotypes are known in the art (see, e.g., Mol Ther. 2012 April; 20(4):699-708. doi: 10.1038/mt.2011.287. Epub 2012 Jan. 24. The AAV vector toolkit: poised at the clinical crossroads. Asokan A1, Schaffer D V, Samulski R J). Methods for producing and using pseudotyped rAAV vectors are known in the art (see, e.g., Duan et al., J. Virol., 75:7662-7671, 2001; Halbert et al., J. Virol., 74:1524-1532, 2000; Zolotukhin et al., Methods, 28:158-167, 2002; and Auricchio et al., Hum. Molec. Genet., 10:3075-3081, 2001).

Methods of making or packaging rAAV particles are known in the art and reagents are commercially available (see, e.g., Zolotukhin et al. Production and purification of serotype 1, 2, and 5 recombinant adeno-associated viral vectors. Methods 28 (2002) 158-167; and U.S. Patent Publication Numbers US20070015238 and US20120322861, which are incorporated herein by reference; and plasmids and kits available from ATCC and Cell Biolabs, Inc.). For example, a plasmid comprising a gene of interest may be combined with one or more helper plasmids, e.g., that contain a rep gene (e.g., encoding Rep78, Rep68, Rep52 and Rep40) and a cap gene (encoding VP1, VP2, and VP3, including a modified VP2 region as described herein), and transfected into a recombinant cells such that the rAAV particle can be packaged and subsequently purified.

Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US 2003/0087817, incorporated herein by reference.

It should be appreciated that any fusion protein, e.g., any of the fusion proteins provided herein, may be introduced into the cell in any suitable way, either stably or transiently. In some embodiments, a fusion protein may be transfected into the cell. In some embodiments, the cell may be transduced or transfected with a nucleic acid construct that encodes a fusion protein. For example, a cell may be transduced (e.g., with a virus encoding a fusion protein), or transfected (e.g., with a plasmid encoding a fusion protein) with a nucleic acid that encodes a fusion protein, or the translated fusion protein. Such transduction may be a stable or transient transduction. In some embodiments, cells expressing a fusion protein or containing a fusion protein may be transduced or transfected with one or more gRNA molecules, for example when the fusion protein comprises a Cas9 (e.g., nCas9) domain. In some embodiments, a plasmid expressing a fusion protein may be introduced into cells through electroporation, transient (e.g., lipofection) and stable genome integration (e.g., piggybac) and viral transduction or other methods known to those of skill in the art.

In some aspects, the invention provides methods comprising delivering one or more polynucleotides, such as or one or more vectors as described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell. In some aspects, the invention further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells. In some embodiments, a base editor as described herein in combination with (and optionally complexed with) a guide sequence is delivered to a cell.

Exemplary delivery strategies are described herein elsewhere, which include vector-based strategies, RPE ribonucleoprotein complex delivery, and delivery of RPE by mRNA methods.

In some embodiments, the method of delivery provided comprises nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA.

Exemplary methods of delivery of nucleic acids include lipofection, nucleofection, electoporation, stable genome integration (e.g., piggybac), microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™, Lipofectin™ and SF Cell Line 4D-Nucleofector X Kit™ (Lonza)). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Feigner, WO 91/17424; WO 91/16024. Delivery may be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration). Delivery may be achieved through the use of RNP complexes.

The preparation of lipid:nucleic acid complexes, including targeted liposomes such as immunolipid complexes, is well known to one of skill in the art (see, e.g., Crystal, Science 270:404-410 (1995); Blaese et al., Cancer Gene Ther. 2:291-297 (1995); Behr et al., Bioconjugate Chem. 5:382-389 (1994); Remy et al., Bioconjugate Chem. 5:647-654 (1994); Gao et al., Gene Therapy 2:710-722 (1995); Ahmad et al., Cancer Res. 52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787).

In other embodiments, the method of delivery and vector provided herein is an RNP complex. RNP delivery of fusion proteins markedly increases the DNA specificity of base editing. RNP delivery of fusion proteins leads to decoupling of on- and off-target DNA editing. RNP delivery ablates off-target editing at non-repetitive sites while maintaining on-target editing comparable to plasmid delivery, and greatly reduces off-target DNA editing even at the highly repetitive VEGFA site 2. See Rees, H. A. et al., Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery, Nat. Commun. 8, 15790 (2017), U.S. Pat. No. 9,526,784, issued Dec. 27, 2016, and U.S. Pat. No. 9,737,604, issued Aug. 22, 2017, each of which is incorporated by reference herein.

Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US 2003/0087817, incorporated herein by reference.

Other aspects of the present disclosure provide methods of delivering the prime editor constructs into a cell to form a complete and functional prime editor within a cell. For example, in some embodiments, a cell is contacted with a composition described herein (e.g., compositions comprising nucleotide sequences encoding the split Cas9 or the split prime editor or AAV particles containing nucleic acid vectors comprising such nucleotide sequences). In some embodiments, the contacting results in the delivery of such nucleotide sequences into a cell, wherein the N-terminal portion of the Cas9 protein or the prime editor and the C-terminal portion of the Cas9 protein or the prime editor are expressed in the cell and are joined to form a complete Cas9 protein or a complete prime editor.

It should be appreciated that any rAAV particle, nucleic acid molecule or composition provided herein may be introduced into the cell in any suitable way, either stably or transiently. In some embodiments, the disclosed proteins may be transfected into the cell. In some embodiments, the cell may be transduced or transfected with a nucleic acid molecule. For example, a cell may be transduced (e.g., with a virus encoding a split protein), or transfected (e.g., with a plasmid encoding a split protein) with a nucleic acid molecule that encodes a split protein, or an rAAV particle containing a viral genome encoding one or more nucleic acid molecules. Such transduction may be a stable or transient transduction. In some embodiments, cells expressing a split protein or containing a split protein may be transduced or transfected with one or more guide RNA sequences, for example in delivery of a split Cas9 (e.g., nCas9) protein. In some embodiments, a plasmid expressing a split protein may be introduced into cells through electroporation, transient (e.g., lipofection) and stable genome integration (e.g., piggybac) and viral transduction or other methods known to those of skill in the art.

In certain embodiments, the compositions provided herein comprise a lipid and/or polymer. In certain embodiments, the lipid and/or polymer is cationic. The preparation of such lipid particles is well known. See, e.g. U.S. Pat. Nos. 4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; 4,921,757; and 9,737,604, each of which is incorporated herein by reference.

The guide RNAs and/or rpegRNAs used in the present disclosure may be 15-1000 nucleotides in length and comprise a sequence of at least 10, at least 15, or at least 20 contiguous nucleotides that is complementary to a target nucleotide sequence. The guide RNA may comprise a sequence of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 contiguous nucleotides that is complementary to a target nucleotide sequence. The guide RNA may be 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides in length.

In some embodiments, the target nucleotide sequence is a DNA sequence in a genome, e.g. a eukaryotic genome. In certain embodiments, the target nucleotide sequence is in a mammalian (e.g. a human) genome.

The compositions of this disclosure may be administered or packaged as a unit dose, for example. The term “unit dose” when used in reference to a pharmaceutical composition of the present disclosure refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent, i.e., a carrier or vehicle.

Treatment of a disease or disorder includes delaying the development or progression of the disease, or reducing disease severity. Treating the disease does not necessarily require curative results.

As used therein, “delaying” the development of a disease means to defer, hinder, slow, retard, stabilize, and/or postpone progression of the disease. This delay can be of varying lengths of time, depending on the history of the disease and/or individuals being treated. A method that “delays” or alleviates the development of a disease, or delays the onset of the disease, is a method that reduces probability of developing one or more symptoms of the disease in a given time frame and/or reduces extent of the symptoms in a given time frame, when compared to not using the method. Such comparisons are typically based on clinical studies, using a number of subjects sufficient to give a statistically significant result.

“Development” or “progression” of a disease means initial manifestations and/or ensuing progression of the disease. Development of the disease can be detectable and assessed using standard clinical techniques as well known in the art. However, development also refers to progression that may be undetectable. For purpose of this disclosure, development or progression refers to the biological course of the symptoms. “Development” includes occurrence, recurrence, and onset.

As used herein “onset” or “occurrence” of a disease includes initial onset and/or recurrence. Conventional methods, known to those of ordinary skill in the art of medicine, can be used to administer the isolated polypeptide or pharmaceutical composition to the subject, depending upon the type of disease to be treated or the site of the disease.

Kits, Vectors, Cells

Some aspects of this disclosure provide kits comprising a nucleic acid construct comprising a nucleotide sequence encoding the various components of the RNA prime editing system described herein (e.g., including, but not limited to, the napRNAbps, RDRPs, fusion proteins (e.g., comprising napRNAbps and RDRPs), RpegRNAs, and complexes comprising fusion proteins and the RpegRNAs, as well as accessory elements. In some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of the prime editing system components.

Some aspects of this disclosure provide kits comprising one or more nucleic acid constructs encoding the various components of the prime editing system described herein, e.g., the comprising a nucleotide sequence encoding the components of the prime editing system capable of modifying a target DNA sequence. In some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of the RNA prime editing system components.

Some aspects of this disclosure provides kits comprising a nucleic acid construct, comprising (a) a nucleotide sequence encoding a napRNAbp (e.g., a Cas13 domain) and an RDRP (expressed as separate protein products or as a fusion protein) and (b) a heterologous promoter that drives expression of the sequence of (a). Separate nucleic acid constructs may also be provide for separate expression of a napRNAbp (e.g., a Cas13 domain) and an RDRP. In addition, the nucleic acid constructs may also include a nucleotide sequence encoding one or more guide RNAs for conducting RNA prime editing, include an rpegRNA which comprises an extended regions having a template sequence. The template sequence may also be provided in trans in other embodiments. Each of these components may be configured to be expressed from one or more nucleic acid vectors in any suitable manner utilizing one or more promoters.

Some aspects of this disclosure provide cells comprising any of the constructs disclosed herein. In some embodiments, a host cell is transiently or non-transiently transfected with one or more vectors described herein. In some embodiments, a cell is transfected as it naturally occurs in a subject. In some embodiments, a cell that is transfected is taken from a subject. In some embodiments, the cell is derived from cells taken from a subject, such as a cell line. A wide variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, HeLa-S3, Huh1, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panc1, PC-3, TF1, CTLL-2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calu1, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7, NRK, NRK-52E, MRC5, MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelial, BALB/3T3 mouse embryo fibroblast, 3T3 Swiss, 3T3-L1, 132-d5 human fetal fibroblasts; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A2780ADR, A2780cis, A 172, A20, A253, A431, A-549, ALC, B16, B35, BCP-1 cells, BEAS-2B, bEnd.3, BHK-21, BR 293. BxPC3. C3H-10T1/2, C6/36, Cal-27, CHO, CHO-7, CHO-IR, CHO-K1, CHO-K2, CHO-T, CHO Dhfr−/−, COR-L23, COR-L23/CPR, COR-L23/5010, COR-L23/R23, COS-7, COV-434, CML T1, CMT, CT26, D17, DH82, DU145, DuCaP, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, HEK-293, HeLa, Hepalclc7, HL-60, HMEC, HT-29, Jurkat, JY cells, K562 cells, Ku812, KCL22, KG1, KYO1, LNCap, Ma-Mel 1-48, MC-38, MCF-7, MCF-10A, MDA-MB-231, MDA-MB-468, MDA-MB-435, MDCK II, MDCK 11, MOR/0.2R, MONO-MAC 6, MTD-1A, MyEnd, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NALM-1, NW-145, OPCN/OPCT cell lines, Peer, PNT-1A/PNT 2, RenCa, RIN-5F, RMA/RMAS, Saos-2 cells, Sf-9, SkBr3, T2, T-47D, T84, THP1 cell line, U373, U87, U937, VCaP, Vero cells, WM39, WT-49, X63, YAC-1, YAR, and transgenic varieties thereof. Cell lines are available from a variety of sources known to those with skill in the art (see, e.g., the American Type Culture Collection (ATCC) (Manassas, Va.)). In some embodiments, a cell transfected with one or more vectors described herein is used to establish a new cell line comprising one or more vector-derived sequences. In some embodiments, a cell transiently transfected with the components of a CRISPR system as described herein (such as by transient transfection of one or more vectors, or transfection with RNA), and modified through the activity of a CRISPR complex, is used to establish a new cell line comprising cells containing the modification but lacking any other exogenous sequence. In some embodiments, cells transiently or non-transiently transfected with one or more vectors described herein, or cell lines derived from such cells are used in assessing one or more test compounds.

In addition, the present disclosure involves targeting an RNA molecule with a cell. Such cells may be manipulated using RNA prime editing under in vitro conditions, i.e., where the cells are provided in culture. In other embodiments, the RNA prime editing may be conducted under ex vivo conditions, i.e., whereby cells are removed from a subject and manipulated outside of the body. In still other embodiments, the RNA prime editing may be conducted in vivo, whereby the components of the RNA prime editor are provided to a subject (e.g., by delivery of expression vectors, or by delivery of particles comprising RNA prime editor) in an effective amount and delivered to one or more cells in which RNA editing is desired. Thus, in such methods the target locus of interest may be comprised in a nucleic acid molecule within a cell, in particular a eukaryotic cell, such as a mammalian cell or a plant cell. The mammalian cell many be a non-human primate, bovine, porcine, rodent or mouse cell. The cell may be a non-mammalian eukaryotic cell such as poultry, fish or shrimp. The plant cell may be of a crop plant such as cassava, com, sorghum, wheat, or rice. The plant cell may also be of an algae, tree or vegetable. The modification introduced to the cell by the present invention may be such that the cell and progeny of the cell are altered for improved production of biologic products such as an antibody, starch, alcohol or other desired cellular output. The modification introduced to the cell by the present invention may be such that the cell and progeny of the cell include an alteration that changes the biologic product produced.

The mammalian cell many be a non-human mammal, e.g., primate, bovine, ovine, porcine, canine, rodent, Leporidae such as monkey, cow, sheep, pig, dog, rabbit, rat or mouse cell. The cell may be a non-mammalian eukaryotic cell such as poultry bird (e.g., chicken), vertebrate fish (e.g., salmon) or shellfish (e.g., oyster, claim, lobster, shrimp) cell. The cell may also be a plant cell. The plant cell may be of a monocot or dicot or of a crop or grain plant such as cassava, corn, sorghum, soybean, wheat, oat or rice. The plant cell may also be of an algae, tree or production plant, fruit or vegetable (e.g., trees such as citrus trees, e.g., orange, grapefruit or lemon trees; peach or nectarine trees; apple or pear trees; nut trees such as almond or walnut or pistachio trees; nightshade plants; plants of the genus Brassica; plants of the genus Lactuca; plants of the genus Spinaeia; plants of the genus Capsicum; cotton, tobacco, asparagus, carrot, cabbage, broccoli, cauliflower, tomato, eggplant, pepper, lettuce, spinach, strawberry, blueberry, raspberry, blackberry, grape, coffee, cocoa, etc).

Vectors

Some aspects of the present disclosure relate to using recombinant virus vectors (e.g., adeno-associated virus vectors, adenovirus vectors, or herpes simplex virus vectors) for the delivery of the prime editors or components thereof described herein, e.g., the split Cas9 protein or a split nucleobase prime editors, into a cell. In the case of a split-PE approach, the N-terminal portion of a PE fusion protein and the C-terminal portion of a PE fusion are delivered by separate recombinant virus vectors (e.g., adeno-associated virus vectors, adenovirus vectors, or herpes simplex virus vectors) into the same cell, since the full-length Cas9 protein or prime editors exceeds the packaging limit of various virus vectors, e.g., rAAV (˜4.9 kb).

Thus, in one embodiment, the disclosure contemplates vectors capable of delivering split prime editor fusion proteins, or split components thereof. In some embodiments, a composition for delivering the split Cas9 protein or split prime editor into a cell (e.g., a mammalian cell, a human cell) is provided. In some embodiments, the composition of the present disclosure comprises: (i) a first recombinant adeno-associated virus (rAAV) particle comprising a first nucleotide sequence encoding a N-terminal portion of a Cas9 protein or prime editor fused at its C-terminus to an intein-N; and (ii) a second recombinant adeno-associated virus (rAAV) particle comprising a second nucleotide sequence encoding an intein-C fused to the N-terminus of a C-terminal portion of the Cas9 protein or prime editor. The rAAV particles of the present disclosure comprise a rAAV vector (i.e., a recombinant genome of the rAAV) encapsidated in the viral capsid proteins.

In some embodiments, the rAAV vector comprises: (1) a heterologous nucleic acid region comprising the first or second nucleotide sequence encoding the N-terminal portion or C-terminal portion of a split Cas9 protein or a split prime editor in any form as described herein, (2) one or more nucleotide sequences comprising a sequence that facilitates expression of the heterologous nucleic acid region (e.g., a promoter), and (3) one or more nucleic acid regions comprising a sequence that facilitate integration of the heterologous nucleic acid region (optionally with the one or more nucleic acid regions comprising a sequence that facilitates expression) into the genome of a cell. In some embodiments, viral sequences that facilitate integration comprise Inverted Terminal Repeat (ITR) sequences. In some embodiments, the first or second nucleotide sequence encoding the N-terminal portion or C-terminal portion of a split Cas9 protein or a split prime editor is flanked on each side by an ITR sequence. In some embodiments, the nucleic acid vector further comprises a region encoding an AAV Rep protein as described herein, either contained within the region flanked by ITRs or outside the region. The ITR sequences can be derived from any AAV serotype (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) or can be derived from more than one serotype. In some embodiments, the ITR sequences are derived from AAV2 or AAV6.

Thus, in some embodiments, the rAAV particles disclosed herein comprise at least one rAAV2 particle, rAAV6 particle, rAAV8 particle, rPHP.B particle, rPHP.eB particle, or rAAV9 particle, or a variant thereof. In particular embodiments, the disclosed rAAV particles are rPHP.B particles, rPHP.eB particles, rAAV9 particles.

ITR sequences and plasmids containing ITR sequences are known in the art and commercially available (see, e.g., products and services available from Vector Biolabs, Philadelphia, PA; Cellbiolabs, San Diego, CA; Agilent Technologies, Santa Clara, Ca; and Addgene, Cambridge, MA; and Gene delivery to skeletal muscle results in sustained expression and systemic delivery of a therapeutic protein. Kessler P D, Podsakoff G M, Chen X, McQuiston S A, Colosi P C, Matelis L A, Kurtzman G J, Byrne B J. Proc Natl Acad Sci USA. 1996 Nov. 26; 93(24):14082-7; and Curtis A. Machida. Methods in Molecular Medicine™. Viral Vectors for Gene Therapy Methods and Protocols. 10.1385/1-59259-304-6:201 © Humana Press Inc. 2003. Chapter 10. Targeted Integration by Adeno-Associated Virus. Matthew D. Weitzman, Samuel M. Young Jr., Toni Cathomen and Richard Jude Samulski; U.S. Pat. Nos. 5,139,941 and 5,962,313, all of which are incorporated herein by reference).

In some embodiments, the rAAV vector of the present disclosure comprises one or more regulatory elements to control the expression of the heterologous nucleic acid region (e.g., promoters, transcriptional terminators, and/or other regulatory elements). In some embodiments, the first and/or second nucleotide sequence is operably linked to one or more (e.g., 1, 2, 3, 4, 5, or more) transcriptional terminators. Non-limiting examples of transcriptional terminators that may be used in accordance with the present disclosure include transcription terminators of the bovine growth hormone gene (bGH), human growth hormone gene (hGH), SV40, CW3, ϕ, or combinations thereof. The efficiencies of several transcriptional terminators have been tested to determine their respective effects in the expression level of the split Cas9 protein or the split prime editor. In some embodiments, the transcriptional terminator used in the present disclosure is a bGH transcriptional terminator. In some embodiments, the rAAV vector further comprises a Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE). In certain embodiments, the WPRE is a truncated WPRE sequence, such as “W3.” In some embodiments, the WPRE is inserted 5′ of the transcriptional terminator. Such sequences, when transcribed, create a tertiary structure which enhances expression, in particular, from viral vectors.

In some embodiments, the vectors used herein may encode the PE fusion proteins, or any of the components thereof (e.g., napDNAbp, linkers, or polymerases). In addition, the vectors used herein may encode the PEgRNAs, and/or the accessory gRNA for second strand nicking. The vectors may be capable of driving expression of one or more coding sequences in a cell. In some embodiments, the cell may be a prokaryotic cell, such as, e.g., a bacterial cell. In some embodiments, the cell may be a eukaryotic cell, such as, e.g., a yeast, plant, insect, or mammalian cell. In some embodiments, the eukaryotic cell may be a mammalian cell. In some embodiments, the eukaryotic cell may be a rodent cell. In some embodiments, the eukaryotic cell may be a human cell. Suitable promoters to drive expression in different types of cells are known in the art. In some embodiments, the promoter may be wild-type. In other embodiments, the promoter may be modified for more efficient or efficacious expression. In yet other embodiments, the promoter may be truncated yet retain its function. For example, the promoter may have a normal size or a reduced size that is suitable for proper packaging of the vector into a virus.

In some embodiments, the promoters that may be used in the prime editor vectors may be constitutive, inducible, or tissue-specific. In some embodiments, the promoters may be a constitutive promoters. Non-limiting exemplary constitutive promoters include cytomegalovirus immediate early promoter (CMV), simian virus (SV40) promoter, adenovirus major late (MLP) promoter, Rous sarcoma virus (RSV) promoter, mouse mammary tumor virus (MMTV) promoter, phosphoglycerate kinase (PGK) promoter, elongation factor-alpha (EFla) promoter, ubiquitin promoters, actin promoters, tubulin promoters, immunoglobulin promoters, a functional fragment thereof, or a combination of any of the foregoing. In some embodiments, the promoter may be a CMV promoter. In some embodiments, the promoter may be a truncated CMV promoter. In other embodiments, the promoter may be an EFla promoter. In some embodiments, the promoter may be an inducible promoter. Non-limiting exemplary inducible promoters include those inducible by heat shock, light, chemicals, peptides, metals, steroids, antibiotics, or alcohol. In some embodiments, the inducible promoter may be one that has a low basal (non-induced) expression level, such as, e.g., the Tet-On® promoter (Clontech). In some embodiments, the promoter may be a tissue-specific promoter. In some embodiments, the tissue-specific promoter is exclusively or predominantly expressed in liver tissue. Non-limiting exemplary tissue-specific promoters include B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, Flt-1 promoter, GFAP promoter, GPIIb promoter, ICAM-2 promoter, INF-β promoter, Mb promoter, Nphsl promoter, OG-2 promoter, SP-B promoter, SYN1 promoter, and WASP promoter.

In some embodiments, the prime editor vectors (e.g., including any vectors encoding the prime editor fusion protein and/or the PEgRNAs, and/or the accessory second strand nicking gRNAs) may comprise inducible promoters to start expression only after it is delivered to a target cell. Non-limiting exemplary inducible promoters include those inducible by heat shock, light, chemicals, peptides, metals, steroids, antibiotics, or alcohol. In some embodiments, the inducible promoter may be one that has a low basal (non-induced) expression level, such as, e.g., the Tet-On® promoter (Clontech).

In additional embodiments, the prime editor vectors (e.g., including any vectors encoding the prime editor fusion protein and/or the PEgRNAs, and/or the accessory second strand nicking gRNAs) may comprise tissue-specific promoters to start expression only after it is delivered into a specific tissue. Non-limiting exemplary tissue-specific promoters include B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, Flt-1 promoter, GFAP promoter, GPIIb promoter, ICAM-2 promoter, INF-β promoter, Mb promoter, Nphsl promoter, OG-2 promoter, SP-B promoter, SYN1 promoter, and WASP promoter.

In some embodiments, the nucleotide sequence encoding the PEgRNA (or any guide RNAs used in connection with prime editing) may be operably linked to at least one transcriptional or translational control sequence. In some embodiments, the nucleotide sequence encoding the guide RNA may be operably linked to at least one promoter. In some embodiments, the promoter may be recognized by RNA polymerase III (Pol III). Non-limiting examples of Pol III promoters include U6, HI and tRNA promoters. In some embodiments, the nucleotide sequence encoding the guide RNA may be operably linked to a mouse or human U6 promoter. In other embodiments, the nucleotide sequence encoding the guide RNA may be operably linked to a mouse or human HI promoter. In some embodiments, the nucleotide sequence encoding the guide RNA may be operably linked to a mouse or human tRNA promoter. In embodiments with more than one guide RNA, the promoters used to drive expression may be the same or different. In some embodiments, the nucleotide encoding the crRNA of the guide RNA and the nucleotide encoding the tracr RNA of the guide RNA may be provided on the same vector. In some embodiments, the nucleotide encoding the crRNA and the nucleotide encoding the tracr RNA may be driven by the same promoter. In some embodiments, the crRNA and tracr RNA may be transcribed into a single transcript. For example, the crRNA and tracr RNA may be processed from the single transcript to form a double-molecule guide RNA. Alternatively, the crRNA and tracr RNA may be transcribed into a single-molecule guide RNA.

In some embodiments, the nucleotide sequence encoding the guide RNA may be located on the same vector comprising the nucleotide sequence encoding the PE fusion protein. In some embodiments, expression of the guide RNA and of the PE fusion protein may be driven by their corresponding promoters. In some embodiments, expression of the guide RNA may be driven by the same promoter that drives expression of the PE fusion protein. In some embodiments, the guide RNA and the PE fusion protein transcript may be contained within a single transcript. For example, the guide RNA may be within an untranslated region (UTR) of the Cas9 protein transcript. In some embodiments, the guide RNA may be within the 5′ UTR of the PE fusion protein transcript. In other embodiments, the guide RNA may be within the 3′ UTR of the PE fusion protein transcript. In some embodiments, the intracellular half-life of the PE fusion protein transcript may be reduced by containing the guide RNA within its 3′ UTR and thereby shortening the length of its 3′ UTR. In additional embodiments, the guide RNA may be within an intron of the PE fusion protein transcript. In some embodiments, suitable splice sites may be added at the intron within which the guide RNA is located such that the guide RNA is properly spliced out of the transcript. In some embodiments, expression of the Cas9 protein and the guide RNA in close proximity on the same vector may facilitate more efficient formation of the CRISPR complex.

The prime editor vector system may comprise one vector, or two vectors, or three vectors, or four vectors, or five vector, or more. In some embodiments, the vector system may comprise one single vector, which encodes both the PE fusion protein and PEgRNA. In other embodiments, the vector system may comprise two vectors, wherein one vector encodes the PE fusion protein and the other encodes the PEgRNA. In additional embodiments, the vector system may comprise three vectors, wherein the third vector encodes the second strand nicking gRNA used in the herein methods.

In some embodiments, the composition comprising the rAAV particle (in any form contemplated herein) further comprises a pharmaceutically acceptable carrier. In some embodiments, the composition is formulated in appropriate pharmaceutical vehicles for administration to human or animal subjects.

Some examples of materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids (23) serum component, such as serum albumin, HDL and LDL; (22) C2-C12 alcohols, such as ethanol; and (23) other non-toxic compatible substances employed in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservative and antioxidants can also be present in the formulation. The terms such as “excipient”, “carrier”, “pharmaceutically acceptable carrier” or the like are used interchangeably herein.

Without further elaboration, it is believed that one skilled in the art can, based on the above description, utilize the present disclosure to its fullest extent. The following specific embodiments are, therefore, to be construed as merely illustrative, and not limitative of the remainder of the disclosure in any way whatsoever. All publications cited herein are incorporated by reference for the purposes or subject matter referenced herein.

Example 1. Prime Editing to Modify the Sequence of an RNA Target Molecule

This example relates to the use of a programmable RNA binding protein to direct programmable RNA modifying enzymes to install mutations in a target RNA molecule as a means to correct disease-causing mutations or otherwise to install sequence changes in a target RNA molecule. A variety of strategies for the targeting of these complexes are contemplated here, such as Cas13 proteins (as is true for REPAIR and RESCUE^{4, 5}), or Pumby proteins,⁷or homologs, orthologs, or variants of these proteins. It was surprisingly discovered that RNA could be directly edited using a fusion protein comprising a nucleic acid-programmable RNA binding protein (napRNAbp) and an RNA-dependent RNA polymerase (RDRP) when complexed with a specialized guide RNA called an RNA prime editing guide RNA. This approach is referred to as “RNA prime editing” in reference to the recently described method of prime editing which edits DNA sequences.

Prime editing (PE) was recently developed to edit target DNA sequences (see Azalone et al., “Search-and-replace genome editing without double-strand breaks of donor DNA,” Nature, 2019, Vol. 576, pp. 149-157, incorporated herein by reference; also see International PCT Publications which are directed to prime editing: WO2020/191239, WO2020/191153, WO2020/191171, WO2020/191248, WO2020/191234, WO2020/191233, WO2020/191245, WO2020/191242, WO2020/191243, WO2020/191246, WO2020/191249, and WO2020/191241, each of which are incorporated herein by reference). Prime editing involves contacting a target DNA with a prime editor and a prime editing guide RNA (pegRNA). The prime editor is a fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) fused to areverse transcriptase (RT). Prime editing comprises contacting a DNA molecule comprising a target nucleotide sequence with a prime editor and a pegRNA, nicking of one of the strands by the prime editor, followed by the synthesis of a new strand of DNA from the exposed 3′ end of the cut target DNA by the RT-dependent synthesis from the exposed 3′ end of the cut target DNA of a replacement strand of DNA containing the desired edit (e.g., insertion, deletion, or substitution) which results in the synthesis of a replacement strand of DNA nucleotide editing at the target nucleotide sequence.

The present specification describes a novel nucleic acid-editing system-namely, RNA prime editing—that is capable of directly editing the sequence of a target RNA molecule. RNA prime editing of a target RNA molecule comprises contacting a target RNA molecule with a RNA prime editor and an RNA prime editing guide RNA (rpegRNA). The RNA prime editor comprises a nucleic acid programmable RNA binding protein (e.g., Cas13) fused with an RNA-dependent RNA polymerase (RDRP). In other embodiments, the RNA prime editor may be provided as a complex with separately expressed napRNAbp, pegRNA, and RDRP components. When complexed with the rpegRNA, the RNA prime editor (and specifically, the napRNAbp component) is guided to and binds the target RNA molecule due to a region (i.e., the spacer) in the rpegRNA that is complementary to a region of the target RNA molecule having a free 3′ terminus (e.g., the natural 3′ terminus of the RNA molecule, or a 3′ terminus formed as a result of nuclease action on the target RNA by the RNA prime editor. The RNA prime editor, and specifically, the RNA-dependent RNA polymerase (e.g., provided separately or fused to the napRNAbp), then synthesizes a strand of RNA from the 3′ terminus which is templated by the rpegRNA (specifically, the extension arm of the rpegRNA that encodes the desired edited sequence), thereby installing a modified sequence in the target RNA molecule at the natural 3′ terminus or at a nuclease-generated 3′ terminus within the target RNA molecule. These aspects are depicted in FIG. 1.

In contrast to Cas9, Cas13 enzymes cleave their cognate RNA target outside of the protospacer binding site,⁸and can do so at a variable position relative to the protospacer. As such, it is possible that the Cas13:rpegRNA complex remains bound to the RNA target following cleavage for sufficient time to enable the fused or separately-provided RDRP to bind to the newly cleaved RNA. As such, targeting a wild-type Cas13:RDRP fusion or a separately provided Cas13 and RDRP components to a specific site using a rpegRNA could effectively enable programmable replacement of the 3′-portion of the RNA with an edited one, encoded by the rpegRNA.

RNA prime editing requires a 3′ terminus, which is required by the RDRP to begin RNA synthesis. A 3′ terminus naturally exists in any RNA molecule and thus RNA prime editing may operate to extend the naturally present 3′ terminus of an RNA molecule. Alternatively, a 3′ terminus may be formed at an internal site in a target RNA molecule by nuclease-induced cleavage of a phosphodiester bond between any two adjacent ribonucleotides in the target RNA molecule, as depicted in FIG. 2.

In another embodiment, as depicted in FIG. 3, the internal 3′ terminus may be formed by a second napRNAbp (e.g., Cas13) complexed with a second guide RNA that targets the napRNAbp to a nearby RNA locus or binding site to install a cut site thereby forming a 3′ terminus. The RNA prime editor may be programmed to bind to a site upstream of the 3′ terminus, wherein the extension arm of the rpegRNA may then bind upstream of the cut site to provide a template sequence (that includes the desired edit) for the synthesis of new RNA beginning at the 3′ terminus.

Various design considerations for RNA prime editing are contemplated as follows. First, whether the RPE is directed to the nucleus or cytoplasm will likely vary based on what RNA transcript is targeted. Typically, targeting of RNA prime editors to the nucleus results in improved editing efficacy in other editing strategies. Second, location of where the RPE is targeted on the RNA transcript relative to the location of the installed edit should be considered. Cas13 is reported to cleave its RNA substrate non-specifically near the targeted site, and can only be targeted to accessible regions of the RNA substrate. Designing an RPE such that Cas13-cleavaged leads to both RDRP-mediated nucleotide addition and subsequent mutation installation is contemplated. Third, in various embodiments where the new RNA sequence is installed at an internal 3′ terminus, the rpegRNA can be longer than pegRNAs used in prime editing of DNA, because the rpregRNA can encode the remainder of the RNA sequence that is lost due to generation of the internal 3′ terminus. Thus, expression platforms capable of expressing rpegRNAs are contemplated. Fourth, if multiple napRNAbp (e.g., Cas13) versions are targeted to the same RNA, the spacing of their binding sites will be contemplated.

Alternative RNA prime editors that do require a rpegRNA are also contemplated wherein the template portion of the rpegRNA is separately delivered by another protein (e.g., a ribozyme complexed with a template sequence. Such an embodiment is depicted in FIG. 4, which depicts an RNA prime editor that comprises a Cas13 complexed with a traditional guide RNA that targets the Cas13/guide RNA complex to bind to a target site on an RNA molecule. A ribozyme complexed with a template strand could become co-localized with the Cas13 protein through a recruitment system, such as an MS2-tagging system. In the case of the MS2-tagging system, the Cas13 could be complexed with an RNA-protein recruitment domain or protein (such as the MS2 hairpin structure), which would recruit a ribozyme fused to a MS2 bacteriophage coat protein (MCP). In this way, the MS2 hairpin on the Cas13 “recruits” in trans the ribozyme to the target site occupied by the RNA prime editing complex. In the case of trans-splicing ribozymes, this approach could be used to cleave a target RNA to remove its 3′ “exon” (which forms an available 3′ terminus) with subsequent installation of a replacement exon by the action of a RDRP (which can be provide in trans or in cis as a fusion protein with either the Cas13 domain or the recruited ribozyme component).³In embodiments where the RDRP is provided separately in trans, the napDNAbp or ribozyme components could be modified to include another recruitment system, such as an MS2-tagging system, to enhance the co-localization of the RDRP to the target site in the RNA. The MS2-tagging system is further described in Schechner D M, et al. Nat. Methods., 2015, which is incorporated herein by reference.

REFERENCES

The following references are incorporated herein by reference in their entireties.

1. Fire A, et al. Nature 1998.
2. Setten, R L, et al. Nat. Rev. Drug Discovery 2019.
3. Lee, C H, et al. Prog. Mol Biol. Trans. Sci., 2018.
4. Cox, D B T, et al. Science 2017.
5. Abudayyeh, 00, et al. Science 2019.
6. Kim, D, et al. Annu. Rev. Biochem., 2019.
7. Adamala, K P, et al. Proc. Natl. Acad. Sci. USA, 2016.
8. Abudayyeh, 00, et al. Science 2016.
9. Schechner, D M, et al. Nat. Methods., 2015.

EQUIVALENTS AND SCOPE

In the claims articles such as “a,” “an,” and “the” may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Claims or descriptions that include “or” between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process.

Furthermore, the invention encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims is introduced into another claim. For example, any claim that is dependent on another claim can be modified to include one or more limitations found in any other claim that is dependent on the same base claim. Where elements are presented as lists, e.g., in Markush group format, each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should it be understood that, in general, where the invention, or aspects of the invention, is/are referred to as comprising particular elements and/or features, certain embodiments of the invention or aspects of the invention consist, or consist essentially of, such elements and/or features. For purposes of simplicity, those embodiments have not been specifically set forth in haec verba herein. It is also noted that the terms “comprising” and “containing” are intended to be open and permits the inclusion of additional elements or steps. Where ranges are given, endpoints are included. Furthermore, unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or sub-range within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise.

This application refers to various issued patents, published patent applications, journal articles, and other publications, all of which are incorporated herein by reference. If there is a conflict between any of the incorporated references and the instant specification, the specification shall control. In addition, any particular embodiment of the present invention that falls within the prior art may be explicitly excluded from any one or more of the claims. Because such embodiments are deemed to be known to one of ordinary skill in the art, they may be excluded even if the exclusion is not set forth explicitly herein. Any particular embodiment of the invention can be excluded from any claim, for any reason, whether or not related to the existence of prior art.

Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation many equivalents to the specific embodiments described herein. The scope of the present embodiments described herein is not intended to be limited to the above Description, but rather is as set forth in the appended claims. Those of ordinary skill in the art will appreciate that various changes and modifications to this description may be made without departing from the spirit or scope of the present invention, as defined in the following claims.

METHODS AND COMPOSITIONS FOR PRIME EDITING RNA

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

RELATED APPLICATIONS

GOVERNMENT SUPPORT

PCT Information

Provisional Applications (1)