METHODS AND COMPOSITIONS FOR EDITING NUCLEOTIDE SEQUENCES

SEQUENCE LISTING INCORPORATION BY REFERENCE

Pursuant to 37 CFR § 1.52(e), this Specification includes a Sequence Listing submitted concurrently herewith on a compact disc (2 copies). As required by 37 CFR § 1.52(e)(5), Applicant expressly incorporates by reference all of the information and material located on the compact disc in the file designated “B119570083US04-SUBSEQ-TNG.txt,” which was created on Jul. 19, 2023, and is 379,907,333 bytes in size. By this statement, the Sequence Listing constitutes a part of the instant Specification. The compact disc contains no other files.

BACKGROUND OF THE INVENTION

Pathogenic single nucleotide mutations contribute to approximately 67% of human diseases for which there is a genetic component⁷. Unfortunately, treatment options for patients with these genetic disorders remain extremely limited, despite decades of gene therapy exploration⁸. Perhaps one of the most straightforward solutions to this therapeutic challenge is direct correction of single nucleotide mutations in the patients' genomes, which would address the root cause of disease and would likely provide lasting benefit. Although such a strategy was previously unthinkable, recent improvements in genome editing capabilities brought about by the advent of the CRISRP/Cas system⁹have now brought this therapeutic approach within reach. By straightforward design of a guide RNA (gRNA) sequence that contains ˜20 nucleotides complementary to the target DNA sequence, nearly any conceivable genomic site can be specifically accessed by CRISPR associated (Cas) nucleases^1,2. To date, several monomeric bacterial Cas nuclease systems have been identified and adapted for genome editing applications¹⁰. This natural diversity of Cas nucleases, along with a growing collection of engineered variants^11-14, offers fertile ground for developing new genome editing technologies.

While gene disruption with CRISPR is now a mature technique, precision editing of single base pairs in the human genome remains a major challenge³. Homology directed repair (HDR) has long been used in human cells and other organisms to insert, correct, or exchange DNA sequences at sites of double strand breaks (DSBs) using donor DNA repair templates that encode the desired edits¹⁵. However, traditional HDR has very low efficiency in most human cell types, particularly in non-dividing cells, and competing non-homologous end joining (NHEJ) leads predominantly to insertion-deletion (indel) by-products¹⁶. Other issues relate to the generation of DSBs, which can give rise to large chromosomal rearrangements and deletions at target loci¹⁷, or activate the p53 axis leading to growth arrest and apoptosis^18,19.

Several approaches have been explored to address the drawbacks of HDR. For example, repair of single-stranded DNA breaks (nicks) with oligonucleotide donors has been shown to reduce indel formation, but yields of desired repair products remain low²⁰. Other strategies attempt to bias repair toward HDR over NHEJ using small molecule and biologic reagents^21-23. However, the effectiveness of these methods is cell-type dependent, and perturbation of the normal cell state could lead to undesirable and unforeseeable effects.

Recently, Liu et al. developed base editing as a technology that edits target nucleotides without creating DSBs or relying on HDR^4-6,24-27. Direct modification of DNA bases by Cas-fused deaminases allows for C•G to T•A, or A•T to G•C, base pair conversions in a short target window (˜5-7 bases) with high efficiency. As a result, base editors have been rapidly adopted by the scientific community. However, several factors may limit their generality for precision genome editing.

Therefore, the development of programmable editors that are capable of introducing any desired single or multiple nucleotide change, which could install nucleotide insertions or deletions (e.g., at least 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, or more base pair insertions or deletions), and/or which could alter or modify the nucleotide sequence at a target site with high specificity and efficiency would substantially expand the scope and therapeutic potential of genome editing technologies based on CRISPR.

SUMMARY OF THE INVENTION

The present invention disclosed new compositions (e.g., new PEgRNA and PE complexes comprising same) and methods for using prime editing (PE) to repair therapeutic targets, e.g., those targets identified in the ClinVar database, using PEgRNA designed using a specialized algorithm that is described herein. Thus, in one aspect, the present application discloses an algorithm for predicting on a large-scale the sequences for PEgRNA that may be used to repair therapeutic targets (e.g., those included in the ClinVar database). In addition, the present application discloses predicted sequences for therapeutic PEgRNAs designed and which can be designed using the disclosed algorithm and which may be used with prime editing to repair therapeutic targets.

The herein disclosed algorithm and the predicted PEgRNA sequences relate in general to prime editing. Thus, this disclosure also provides a description for the various components and aspects of prime editing, including suitable napDNAbp (e.g., Cas9 nickase) and a polymerase (e.g., a reverse transcriptase), as well as other suitable components (e.g., linkers, NLS) and PE fusion proteins, that may be used with the therapeutic PEgRNA disclosed herein.

As disclosed herein, prime editing is a versatile and precise genome editing method that directly writes new genetic information into a specified DNA site using a nucleic acid programmable DNA binding protein (“napDNAbp”) working in association with a polymerase (i.e., in the form of a fusion protein or otherwise provided in trans with the napDNAbp), wherein the prime editing system is programmed with a prime editing (PE) guide RNA (“PEgRNA”) that both specifies the target site and templates the synthesis of the desired edit in the form of a replacement DNA strand by way of an extension (either DNA or RNA) engineered onto a guide RNA (e.g., at the 5′ or 3′ end, or at an internal portion of a guide RNA). The replacement strand containing the desired edit (e.g., a single nucleobase substitution) shares the same sequence as the endogenous strand of the target site to be edited (with the exception that it includes the desired edit). Through DNA repair and/or replication machinery, the endogenous strand of the target site is replaced by the newly synthesized replacement strand containing the desired edit. In some cases, prime editing may be thought of as a “search-and-replace” genome editing technology since the prime editors, as described herein, not only search and locate the desired target site to be edited, but at the same time, encode a replacement strand containing a desired edit which is installed in place of the corresponding target site endogenous DNA strand. The prime editors of the present disclosure relate, in part, to the discovery that the mechanism of target-primed reverse transcription (TPRT) or “prime editing” can be leveraged or adapted for conducting precision CRISPR/Cas-based genome editing with high efficiency and genetic flexibility (e.g., as depicted in various embodiments of FIGS. 1A-1F). TPRT is naturally used by mobile DNA elements, such as mammalian non-LTR retrotransposons and bacterial Group II introns^28,29. The inventors have herein used Cas protein-reverse transcriptase fusions or related systems to target a specific DNA sequence with a guide RNA, generate a single strand nick at the target site, and use the nicked DNA as a primer for reverse transcription of an engineered reverse transcriptase template that is integrated with the guide RNA. However, while the concept begins with prime editors that use reverse trancriptases as the DNA polymerase component, the prime editors described herein are not limited to reverse transcriptases but may include the use of virtually and DNA polymerase. Indeed, while the application throughout may refer to prime editors with “reverse transcriptases,” it is set forth here that reverse transcriptases are only one type of DNA polymerase that may work with prime editing. Thus, where ever the specification mentions “reverse transcriptases,” the person having ordinary skill in the art should appreciate that any suitable DNA polymerase may be used in place of the reverse transcriptase. Thus, in one aspect, the prime editors may comprise Cas9 (or an equivalent napDNAbp) which is programmed to target a DNA sequence by associating it with a specialized guide RNA (i.e., PEgRNA) containing a spacer sequence that anneals to a complementary protospacer in the target DNA. The specialized guide RNA also contains new genetic information in the form of an extension that encodes a replacement strand of DNA containing a desired genetic alteration which is used to replace a corresponding endogenous DNA strand at the target site. To transfer information from the PEgRNA to the target DNA, the mechanism of prime editing involves nicking the target site in one strand of the DNA to expose a 3′-hydroxyl group. The exposed 3′-hydroxyl group can then be used to prime the DNA polymerization of the edit-encoding extension on PEgRNA directly into the target site. In various embodiments, the extension—which provides the template for polymerization of the replacement strand containing the edit—can be formed from RNA or DNA. In the case of an RNA extension, the polymerase of the prime editor can be an RNA-dependent DNA polymerase (such as, a reverse transcriptase). In the case of a DNA extension, the polymerase of the prime editor may be a DNA-dependent DNA polymerase.

The newly synthesized strand (i.e., the replacement DNA strand containing the desired edit) that is formed by the herein disclosed prime editors would be homologous to the genomic target sequence (i.e., have the same sequence as) except for the inclusion of a desired nucleotide change (e.g., a single nucleotide change, a deletion, or an insertion, or a combination thereof). The newly synthesized (or replacement) strand of DNA may also be referred to as a single strand DNA flap, which would compete for hybridization with the complementary homologous endogenous DNA strand, thereby displacing the corresponding endogenous strand. In certain embodiments, the system can be combined with the use of an error-prone reverse transcriptase enzyme (e.g., provided as a fusion protein with the Cas9 domain, or provided in trans to the Cas9 domain). The error-prone reverse transcriptase enzyme can introduce alterations during synthesis of the single strand DNA flap. Thus, in certain embodiments, error-prone reverse transcriptase can be utilized to introduce nucleotide changes to the target DNA. Depending on the error-prone reverse transcriptase that is used with the system, the changes can be random or non-random.

Resolution of the hybridized intermediate (comprising the single strand DNA flap synthesized by the reverse transcriptase hybridized to the endogenous DNA strand) can include removal of the resulting displaced flap of endogenous DNA (e.g., with a 5′ end DNA flap endonuclease, FEN1), ligation of the synthesized single strand DNA flap to the target DNA, and assimilation of the desired nucleotide change as a result of cellular DNA repair and/or replication processes. Because templated DNA synthesis offers single nucleotide precision for the modification of any nucleotide, including insertions and deletions, the scope of this approach is very broad and could foreseeably be used for myriad applications in basic science and therapeutics.

Algorithm and Methods of Designing Therapeutic PEgRNA

In one aspect, the present disclosure relates to a novel algorithm for designing therapeutic PEgRNA, in particular, on a large-scale as opposed to a one-off PEgRNA design exercise.

Accordingly, some aspects relate to a computerized method for determining a sequence of a prime editor guide RNA (PEgRNA). The method includes using at least one computer hardware processor to access data indicative of an input allele, an output allele, and a fusion protein comprising a nucleic acid programmable DNA binding protein and a polymerase (e.g., a reverse transcriptase). The method includes determining the PEgRNA sequence based on the input allele, the output allele, and the fusion protein, wherein the PEgRNA sequence is designed to be associated with the fusion protein to change the input allele to the output allele, including determining for the PEgRNA sequence one or more of the following features: a spacer complementary to a target nucleotide sequence in the input allele (i.e., the spacer, as defined in FIG. 27); a gRNA backbone for interacting with the fusion protein (i.e., the gRNA core as defined in FIG. 27); and an extension (i.e., the extension arm as shown in FIG. 27) comprising one or more of: a DNA synthesis template (as shown in FIG. 27) comprising a desired nucleotide change to change the input allele to the output allele; primer binding site (i.e., the primer binding site as shown in FIG. 27). The PEgRNA may also comprise a 3′ termination signal that terminates transcription from a promoter. In addition, the PEgRNA may include a first modifier at the 5′ end of the extension arm and a second modifier at the 3′ end of the extension arm. Such sequences (shown as “e1” and “e2” in FIG. 27) may include stem-loop sequences, which may increase the stability of the PEgRNA.

In some examples, the method includes determining the spacer and the extension, and determining the spacer is at the 5′ end of the PEgRNA, and the extension is at a 3′ end of the PEgRNA structure.

In some examples, the method includes determining the spacer and the extension, and determining the spacer is at the 5′ end of the PEgRNA, and the extension is 3′ to the spacer.

In some examples, accessing data indicative of the input allele and the output allele comprises accessing a database comprising a set of input alleles and associated output alleles. Accessing the database can include accessing a ClinVar database of the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov/clinvar/) comprising a plurality of entries, each entry comprising an input allele from the set of input alleles and an output allele from the set of output alleles (e.g., wild-type or alleles with the desired activity). Determining the PEgRNA sequence can include determining one or more PEgRNA sequences for each input allele and associated output allele in the set.

In some examples, accessing data indicative of the fusion protein includes determining the fusion protein from a plurality of fusion proteins.

In some examples, the fusion protein comprises a Cas9 protein. The fusion protein can include a Cas9-NG protein, Cas9-NGG, saCas9-KKH, or a SpCas9 protein.

In some examples, changing the input allele to the output allele includes a single nucleotide change, an insertion of one or more nucleotides, a deletion of one or more nucleotides, or a combination thereof.

In some embodiments, the method includes determining the spacer, wherein the spacer includes a nucleotide sequence of between 1 and 40 nucleotides. In some embodiments, the method includes determining the spacer, wherein the spacer includes a nucleotide sequence of between 5 and 35 nucleotides. In some embodiments, the method includes determining the spacer, wherein the spacer includes a nucleotide sequence of between 10 and 30 nucleotides. In some embodiments, the method includes determining the spacer, wherein the spacer includes a nucleotide sequence of between 15 and 25 nucleotides. In some examples, the method includes determining the spacer, wherein the spacer includes a nucleotide sequence of approximately 20 nucleotides. The method can include determining the spacer based on a position of the change in a corresponding protospacer nucleotide sequence. The change can be installed in an editing window that is between about protospacer position −15 to protospacer position +39. The change can be installed in an editing window that is between about protospacer position −10 to protospacer position +34. The change can be installed in an editing window that is between about protospacer position −5 to protospacer position +29. The change can be installed in an editing window that is between about protospacer position −1 to protospacer position +27.

In some examples, the method can include: determining a set of initial candidate protospacers based on the input allele and the fusion protein, wherein each initial candidate protospacer comprises a PAM of the fusion protein in the input allele; determining one or more initial candidate protospacers from the set of initial candidate protospacers each comprise an incompatible nick position; removing the determined one or more initial candidate protospacers from the set to generate a set of remaining candidate protospacers; and wherein determining the PEgRNA structure comprises determining a plurality of PEgRNA structures, wherein each of the PEgRNA structure comprises a different spacer determined based on a corresponding protospacer from the set of remaining candidate protospacers.

In some examples, the method includes determining the extension and the DNA synthesis template (e.g., RT template sequence), wherein the DNA synthesis template (e.g., RT template sequence) comprises approximately 1 nucleotides to 40 nucleotides. In some examples, the method includes determining the extension and the DNA synthesis template (e.g., RT template sequence), wherein the DNA synthesis template (e.g., RT template sequence) comprises approximately 3 nucleotides to 38 nucleotides. In some examples, the method includes determining the extension and the DNA synthesis template (e.g., RT template sequence), wherein the DNA synthesis template (e.g., RT template sequence) comprises approximately 5 nucleotides to 36 nucleotides. In some examples, the method includes determining the extension and the DNA synthesis template (e.g., RT template sequence), wherein the DNA synthesis template (e.g., RT template sequence) comprises approximately 7 nucleotides to 34 nucleotides.

In some examples, determining the PEgRNA includes determining the spacer based on the input allele and/or the fusion protein, and determining the DNA synthesis template (e.g., RT template sequence) based on the spacer.

In some examples, the DNA synthesis template (e.g., RT template sequence) encodes a single-strand DNA flap that is complementary to an endogenous DNA sequence adjacent to a nick site, wherein the single-strand DNA flap comprises the desired nucleotide change. The single-strand DNA flap can hybridize to the endogenous DNA sequence adjacent to the nick site, thereby installing the desired nucleotide change. The single-stranded DNA flap can displace the endogenous DNA sequence adjacent to the nick site. Cellular repair of the single-strand DNA flap can result in installation of the desired nucleotide change, thereby forming a desired product.

In some examples, the fusion protein when complexed with the PEgRNA is capable of binding to a target DNA sequence. The target DNA sequence can include a target strand at which the change occurs and a complementary non-target strand.

In some examples, the input allele comprises a pathogenic DNA mutation, and the output allele comprises a corrected DNA sequence.

Some embodiments relate to a system including at least one processor and at least one computer-readable storage medium having encoded thereon instructions which, when executed, cause the at least one processor to perform the computerized methods for determining the PEgRNA structure.

Some embodiments relate to at least one computer-readable storage medium having encoded thereon instructions which, when executed, cause at least one processor to perform the computerized methods for determining the sequence of the PEgRNA.

Some embodiments relate to a method of base editing using the PEgRNA determined according to the computerized methods for determining the PEgRNA.

Therapeutic PEgRNA

In another aspect, the present disclosure provide therapeutic PEgRNA that have been designed using the herein disclosed algorithm, as represented by FIG. 27 and FIG. 28.

For example, the PEgRNA that may be used in the herein disclosure are exemplified in FIG. 27. This figure provides the structure of an embodiment of a PEgRNA contemplated herein and which may be designed in accordance with the methodology defined in Example 2. The PEgRNA comprises three main component elements ordered in the 5′ to 3′ direction, namely: a spacer, a gRNA core, and an extension arm at the 3′ end. The extension arm may further be divided into the following structural elements in the 5′ to 3′ direction, namely: a primer binding site (A), an edit template (B), and a homology arm (C). In addition, the PEgRNA may comprise an optional 3′ end modifier region (e1) and an optional 5′ end modifier region (e2). Still further, the PEgRNA may comprise a transcriptional termination signal at the 3′ end of the PEgRNA (not depicted). These structural elements are further defined herein. The depiction of the structure of the PEgRNA is not meant to be limiting and embraces variations in the arrangement of the elements. For example, the optional sequence modifiers (e1) and (e2) could be positioned within or between any of the other regions shown, and not limited to being located at the 3′ and 5′ ends. The PEgRNA shown in FIG. 27 can be designed by the herein disclosed algorithm.

In another example, FIG. 28 provides the structure of another embodiment of a PEgRNA contemplated herein and which may be designed in accordance with the methodology defined in Example 2. The PEgRNA comprises three main component elements ordered in the 5′ to 3′ direction, namely: a spacer, a gRNA core, and an extension arm at the 3′ end. The extension arm may further be divided into the following structural elements in the 5′ to 3′ direction, namely: a primer binding site (A), an edit template (B), and a homology arm (C). In addition, the PEgRNA may comprise an optional 3′ end modifier region (e1) and an optional 5′ end modifier region (e2). Still further, the PEgRNA may comprise a transcriptional termination signal on the 3′ end of the PEgRNA (not depicted). These structural elements are further defined herein. The depiction of the structure of the PEgRNA is not meant to be limiting and embraces variations in the arrangement of the elements. For example, the optional sequence modifiers (e1) and (e2) could be positioned within or between any of the other regions shown, and not limited to being located at the 3′ and 5′ ends. The PEgRNA shown in FIG. 27 can be designed by the herein disclosed algorithm.

In various embodiments, the disclosure provides therapeutic PEgRNA of SEQ ID NOs: 1-135514 and 813085-880462 designed using the herein disclosed algorithm against ClinVar database entries.

In various other embodiments, exemplary PEgRNA designed against the ClinVar database using the herein disclosed algorithm are included in the Sequence Listing, which forms a part of this specification. The Sequence Listing includes complete PEgRNA sequences of SEQ ID NOs: 1-135514 and 813085-880462. Each of these complete PEgRNA are each comprised of a spacer (SEQ ID NOs: 135515-271028 and 880463-947840) and an extension arm (SEQ ID NOs: 271029-406542 and 947841-1015218). In addition, each PEgRNA comprises a gRNA core, for example, as defined by SEQ ID NOs: 1361579-1361580. The extension arms of SEQ ID NOs: 271029-406542 and 947841-1015218 are further each comprised of a primer binding site (SEQ ID NOs.: 406543-542056 and 1015219-1082596), an edit template (SEQ ID NOs.: 542057-677570 and 1082597-1149974), and a homology arm (SEQ ID NOs.: 677571-813084 and 1149975-1217352). The PEgRNA optionally may comprise a 5′ end modifier region and/or a 3′ end modifier region. The PEgRNA may also comprise a reverse transcription termination signal (e.g., SEQ ID NOs: 1361560-1361566) at the 3′ of the PEgRNA. The application embraces the design and use of all of these sequences.

In various embodiments, the prime editor guide RNA comprises (a) a guide RNA and (b) an RNA extension at the 5′ or the 3′ end of the guide RNA, or at an intramolecular location in the guide RNA, examples of which are depicted in FIGS. 3A-C. The RNA extension can comprise (i) a DNA synthesis template comprising a desired nucleotide change, (ii) a reverse transcription primer binding site, and (iii) optionally, a linker sequence. In various embodiments, the DNA synthesis template encodes a single-strand DNA flap that is complementary to an endogenous DNA sequence adjacent to the nick site, wherein the single-stranded DNA flap comprises the desired nucleotide change.

In various embodiments, the RNA extension arm is at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, or at least 25 nucleotides in length.

In certain embodiments, the prime editor guide RNA comprises the nucleotide sequence of SEQ ID NOs: 1361548-1361581, or a nucleotide sequence having at least 85%, or at least 90%, or at least 95%, or at least 98%, or at least 99% sequence identity with any one of SEQ ID NOs: 1361548-1361581.

In some embodiments, the prime editor guide RNA (PEgRNA) comprises a variant of a nucleotide sequence of SEQ ID NOs: 1361548-1361581, comprising at least one mutation as compared to the nucleotide sequence of SEQ ID NOs: 1361548-1361581. In some embodiments, the variant comprises more than 1 (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or more) mutation as compared to the nucleotide sequence of SEQ ID NOs: 1361548-1361581.

In another aspect, the present disclosure provides an prime editor guide RNA comprising a guide RNA and at least one RNA extension (i.e., extension arm, per FIG. 27). The RNA extension is positioned at the 3′ end of the guide RNA. In other embodiments, the RNA extension is positioned at the 5′ of the guide RNA. In still other embodiments, the RNA extension is positioned at an intramolecular position within the guide RNA, preferably, the intramolecular positioning of the extended portion does not disrupt the functioning of the protospacer.

In various embodiments, the prime editor guide RNA (PEgRNA) is capable of binding to a napDNAbp and directing the napDNAbp to a target DNA sequence. The target DNA sequence can comprise a target strand and a complementary non-target strand, wherein the guide RNA hybridizes to the target strand to form an RNA-DNA hybrid and an R-loop.

In various embodiments of the prime editor guide RNA, the at least one RNA extension comprises a DNA synthesis template. In various other embodiment, the RNA extension further comprises a reverse transcription primer binding site. In still other embodiments, the RNA extension comprises a linker or spacer that joins the RNA extension to the guide RNA.

In various embodiments, the RNA extension can be at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 150 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.

In other embodiments, the DNA synthesis template (i.e., the edit template, per FIG. 27) is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.

In still other embodiments, wherein the reverse transcription primer binding site sequence (i.e., the primer binding site, per FIG. 27) is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.

In other embodiments, the optional linker or spacer is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.

The designed PEgRNA disclosed herein may be complexed with a prime editor fusion protein.

In one aspect, the specification provides a primer editor fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) and a reverse transcriptase. In various embodiments, the fusion protein is capable of carrying out genome editing by target-primed reverse transcription in the presence of a prime editor guide RNA (PEgRNA).

In some embodiments, the napDNAbp is selected from the group consisting of: Cas9, CasX, CasY, Cpf1, C2c1, C2c2, C2C3, and Argonaute and optionally has nickase activity.

In other embodiments, the fusion protein when complexed with an prime editor guide RNA as described herein is capable of binding to a target DNA sequence (e.g., genomic DNA).

In still other embodiments, the target DNA sequence comprises a target strand and a complementary non-target strand.

In other embodiments, the binding of the fusion protein complexed to the prime editor guide RNA forms an R-loop. The R-loop can comprise (i) an RNA-DNA hybrid comprising the prime editor guide RNA and the target strand, and (ii) the complementary non-target strand.

In still other embodiments, the complementary non-target strand is nicked to form a reverse transcriptase priming sequence having a free 3′ end.

In still other embodiments, the single-strand DNA flap hybridizes to the endogenous DNA sequence adjacent to the nick site, thereby installing the desired nucleotide change. In still other embodiments, the single-stranded DNA flap displaces the endogenous DNA sequence adjacent to the nick site and which has a free 5′ end. In some embodiments, the displaced endogenous DNA having the 5′ end is excised by the cell.

In various embodiments, the cellular repair of the single-strand DNA flap results in installation of the desired nucleotide change, thereby forming a desired product.

In various other embodiments, the desired nucleotide change is installed in an editing window that is between about −4 to +10 of the PAM sequence.

In still other embodiments, the desired nucleotide change is installed in an editing window that is between about −5 to +5 nucleotides of the nick site, or between about −10 to +10 of the nick site, or between about −20 to +20 of the nick site, or between about −30 to +30 of the nick site, or between about −40 to +40 of the nick site, or between about −50 to +50 of the nick site, or between about −60 to +60 of the nick site, or between about −70 to +70 of the nick site, or between about −80 to +80 of the nick site, or between about −90 to +90 of the nick site, or between about −100 to +100 of the nick site, or between about −200 to +200 of the nick site.

In various embodiments, the napDNAbp comprises an amino acid sequence of SEQ ID NO: 1361421. In various other embodiments, the napDNAbp comprises an amino acid sequence that is at least 80%, 85%, 90%, 95%, 98%, or 99% identical to the amino acid sequence of any one of SEQ ID NOs: 1361421-1361484, and 1361593-1361596.

In other embodiments, the reverse transcriptase of the discloses fusion proteins and/or compositions may comprise any one of the amino acid sequences of SEQ ID NO: 1361485-1361514, and 1361597-1361598. In still other embodiments, the reverse transcriptase may comprise an amino acid sequence that is at least 80%, 85%, 90%, 95%, 98%, or 99% identical to the amino acid sequence of any one of SEQ ID NOs: 1361485-1361514, and 1361597-1361598. These sequences may be naturally occurring reverse transcriptase sequences, e.g., from a retrovirus or a retrotransposon, or the sequences may be non-naturally occurring or engineered.

In various other embodiments, the fusion proteins herein disclosed may comprise various structural configurations. For example, the fusion proteins may comprise the structure NH₂-[napDNAbp]-[reverse transcriptase]-COOH; or NH₂-[reverse transcriptase]-[napDNAbp]-COOH, wherein each instance of “]-[” indicates the presence of an optional linker sequence.

In various embodiments, the linker sequence comprises an amino acid sequence of SEQ ID NOs: 1361520-1361530, 1361585, and 1361603, or an amino acid sequence that this at least 80%, 85%, or 90%, or 95%, or 99% identical to any one of the linker amino acid sequence of SEQ ID NOs: 1361520-1361530, 1361585, and 1361603.

In various embodiments, the desired nucleotide change that is incorporated into the target DNA can be a single nucleotide change (e.g., a transition or transversion), an insertion of one or more nucleotides, a deletion of one or more nucleotides, or a combination thereof.

In certain cases, the insertion is at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 200, at least 300, at least 400, or at least 500 nucleotides in length.

In certain other cases, the deletion is at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 200, at least 300, at least 400, or at least 500 nucleotides in length.

In various embodiments of the prime editor guide RNAs, the DNA synthesis template (i.e., the edit template, per FIG. 27) may encode a single-strand DNA flap that is complementary to an endogenous DNA sequence adjacent to a nick site, wherein the single-strand DNA flap comprises a desired nucleotide change. The single-stranded DNA flap may displace an endogenous single-strand DNA at the nick site. The displaced endogenous single-strand DNA at the nick site can have a 5′ end and form an endogenous flap, which can be excised by the cell. In various embodiments, excision of the 5′ end endogenous flap can help drive product formation since removing the 5′ end endogenous flap encourages hybridization of the single-strand 3′ DNA flap to the corresponding complementary DNA strand, and the incorporation or assimilation of the desired nucleotide change carried by the single-strand 3′ DNA flap into the target DNA.

In various embodiments of the prime editor guide RNAs, the cellular repair of the single-strand DNA flap results in installation of the desired nucleotide change, thereby forming a desired product.

In yet another aspect of the invention, the specification provides for complexes comprising a fusion protein described herein and any prime editor guide RNA (PEgRNA) described above.

In still other aspects of the invention, the specification provides a complex comprising a napDNAbp (e.g., Cas9) and an prime editor guide RNA. The napDNAbp can be a Cas9 nickase (e.g., spCas9), or can be an amino acid sequence of SEQ ID NO: 1361421, or an amino acid sequence that is at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to the amino acid sequence of any one of SEQ ID NOs: 1361421-1361484, and 1361593-1361596.

In various embodiments involving a complex, the prime editor guide RNA is capable of directing the napDNAbp to a target DNA sequence. In various embodiments, a reverse transcriptase may be provided in trans, i.e., provided from a different source than the complex itself. For example, a reverse transcriptase could be provided to the same cell having the complex by introducing a separate vector separately encoding the reverse transcriptase.

In yet another aspect, the specification provides pharmaceutical compositions (e.g., fusion proteins described herein, PEgRNA of SEQ ID NOs: 1-135,514). In some embodiments, the pharmaceutical compositions comprise one or more of a napDNAbp, a fusion protein, a reverse transcriptase, and an prime editor guide RNA. In some embodiments, the fusion protein described herein and a pharmaceutically acceptable excipient. In other embodiments, the pharmaceutical compositions comprise any extend guide RNA described herein and a pharmaceutically acceptable excipient. In still other embodiments, the pharmaceutical compositions comprise any extend guide RNA described herein in combination with any fusion protein described herein and a pharmaceutically acceptable excipient. In yet other embodiments, the pharmaceutical compositions comprise any polynucleotide sequence encoding one or more of a napDNAbp, a fusion protein, a reverse transcriptase, and an prime editor guide RNA. In still other embodiments, the various components disclosed herein may be separated into one or more pharmaceutical compositions. For example, a first pharmaceutical composition may comprise a fusion protein or a napDNAbp, a second pharmaceutical compositions may comprise a reverse transcriptase, and a third pharmaceutical composition may comprise an prime editor guide RNA.

In still a further aspect, the present disclosure provides kits. In one embodiment, the kit comprises one or more polynucleotides encoding one or more components, including a fusion protein, a napDNAbp, a reverse transcriptase, and an prime editor guide RNA (e.g., any of SEQ ID NOs: 1-135514 or 813085-880462). The kits may also comprise vectors, cells, and isolated preparations of polypeptides, including any fusion protein, napDNAbp, or reverse transcriptase disclosed herein.

In yet another aspect, the present disclosure provides for methods of using the disclosed PEgRNA compositions of matter.

In one embodiment, the methods relate to a method for installing a desired nucleotide change in a double-stranded DNA using the PEgRNA disclosed herein. The method first comprises contacting the double-stranded DNA sequence with a complex comprising a fusion protein and a prime editor guide RNA as described herein, wherein the fusion protein comprises a napDNAbp and a reverse transcriptase, and wherein the prime editor guide RNA comprises a DNA synthesis template comprising the desired nucleotide change. The napDNAbp nicks the double-stranded DNA sequence on the non-target strand, thereby generating a free single-strand DNA having a 3′ end. Subsequent to nicking, the 3′ end of the free single-strand DNA hybridizes to the DNA synthesis template, thereby priming the reverse transcriptase domain. Reverse transcriptase then facilitates DNA polymerization from the 3′ end, thereby generating a single-strand DNA flap comprising the desired nucleotide change. The single-strand DNA flap then, replaces the endogenous DNA strand adjacent the cut site, thereby installing the desired nucleotide change in the double-stranded DNA sequence.

In other embodiments, the disclosure provides for a method for introducing one or more changes in the nucleotide sequence of a DNA molecule at a target locus, comprising contacting the DNA molecule with a nucleic acid programmable DNA binding protein (napDNAbp) and a guide RNA which targets the napDNAbp to the target locus, wherein the guide RNA comprises a reverse transcriptase (RT) template sequence comprising at least one desired nucleotide change. The napDNAbp exposes a 3′ end in a DNA strand at the target locus which hybridizes to the DNA synthesis template (e.g., RT template sequence) to prime reverse transcription. Next, a single strand DNA flap comprising the at least one desired nucleotide change based on the DNA synthesis template (e.g., RT template sequence) is synthesized or polymerized by reverse transcriptase. Lastly, the at least one desired nucleotide change is incorporated into the corresponding endogenous DNA, thereby introducing one or more changes in the nucleotide sequence of the DNA molecule at the target locus.

In still other embodiments, the disclosure provides a method for introducing one or more changes in the nucleotide sequence of a DNA molecule at a target locus by target-primed reverse transcription, the method comprising: contacting the DNA molecule at the target locus with a (i) fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) and a reverse transcriptase and (ii) a guide RNA comprising an RT template comprising a desired nucleotide change (e.g., any of SEQ ID NOs: 1-135514 or 813085-880462); which contact facilitates target-primed reverse transcription of the RT template to generate a single strand DNA comprising the desired nucleotide change and incorporates the desired nucleotide change into the DNA molecule at the target locus through a DNA repair and/or replication process.

In some embodiments, the step of replacing the endogenous DNA strand comprises: (i) hybridizing the single-strand DNA flap to the endogenous DNA strand adjacent the cut site to create a sequence mismatch; (ii) excising the endogenous DNA strand; and (iii) repairing the mismatch to form the desired product comprising the desired nucleotide change in both strands of DNA.

The methods disclosed herein may involve fusion proteins having a napDNAbp that is a nuclease dead Cas9 (dCas9), a Cas9 nickase (nCas9), or a nuclease active Cas9. In other embodiments, a napDNAbp and reverse transcriptase are not encoded as a single fusion protein, but rather can be provided in separate constructs. Thus, in some embodiments, the reverse transcriptase can be provided in trans relative to the napDNAbp (rather than by way of a fusion protein).

In various embodiments involving methods, the napDNAbp may comprise an amino acid sequence of SEQ ID NO: 1361421 (Cas9). The napDNAbp may also comprise an amino acid sequence that is at least 80%, 85%, 90%, 95%, 98%, or 99% identical to the amino acid sequence of any one of SEQ ID NOs: 1361421.

In various embodiments involving methods, the reverse transcriptase may comprise any one of the amino acid sequences of SEQ ID NO: 1361485-1361514, and 1361597-1361598. The reverse transcriptase may also comprise an amino acid sequence that is at least 80%, 85%, 90%, 95%, 98%, or 99% identical to the amino acid sequence of any one of SEQ ID NOs: 1361485-1361514, and 1361597-1361598.

The methods may involve the use an extended RNA having a nucleotide sequence of SEQ ID NOs: 271029-406542 and 947841-1015218, or a nucleotide sequence having at least a 80%, or at least 85%, or at least 90%, or at least 95%, or at least 99% sequence identity thereto.

The methods may comprise the use of prime editor guide RNAs that comprise an RNA extension at the 3′ end, wherein the RNA extension comprises the DNA synthesis template, for example the PEgRNA show in FIG. 3B (with the following components as described from 5′ to 3′: spacer; gRNA core; reverse transcription template; primer binding site) has an extension arm comprising, from 5′ to 3′, a reverse transcription template and a primer binding site.

The methods may comprise the use of prime editor guide RNAs that comprise an RNA extension at the 5′ end, wherein the RNA extension comprises the DNA synthesis template, for example the PEgRNA show in FIG. 3A (with the following components as described from 5′ to 3′: reverse transcription template; primer binding site; linker; spacer; gRNA core) has an extension arm comprising, from 5′ to 3′, a reverse transcription template, primer binding site, and a 5-20 nucleotide long linker.

The methods may comprise the use of prime editor guide RNAs that comprise an RNA extension at an intramolecular location in the guide RNA, wherein the RNA extension comprises the DNA synthesis template.

The methods may comprise the use of prime editor guide RNAs having one or more RNA extensions that are at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 200, at least 300, at least 400, or at least 500 nucleotides in length.

It should be appreciated that the foregoing concepts, and additional concepts discussed below, may be arranged in any suitable combination, as the present disclosure is not limited in this respect. Further, other advantages and novel features of the present disclosure will become apparent from the following detailed description of various non-limiting embodiments when considered in conjunction with the accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present disclosure, which can be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.

FIG. 1A.1 provides a schematic of an exemplary process for introducing a single nucleotide change, insertion, and/or deletion into a DNA molecule (e.g., a genome) using a fusion protein comprising a reverse transcriptase fused to a napDNAbp (e.g., Cas9) protein in complex with a prime editor guide RNA. In this embodiment, the guide RNA is extended at the 3′ end to include a DNA synthesis template. The schematic shows how a reverse transcriptase (RT) fused to a Cas9 nickase, in a complex with a guide RNA (gRNA), binds the DNA target site and nicks the PAM-containing DNA strand adjacent to the target nucleotide. The RT template uses the nicked DNA as a primer for DNA synthesis from the gRNA, which is used as a template for the synthesis of a new DNA strand that encodes the desired edit. The editing process shown may be referred to as target-primed reverse transcription editing (prime editing). FIG. 1A.2 provides the same representation as in FIG. 1A.1, except that the prime editor complex is represented more generally as [napDNAbp]-[P]:PEgRNAPEgRNA or [P]-[napDNAbp]:PEgRNAPEgRNA, wherein “P” refers to any polymerase (e.g., a reverse transcriptase), “napDNAbp” refers to a nucleic acid programmable DNA binding protein (e.g., SpCas9), and “PEgRNAPEgRNA” refers to a prime editing guide RNA, and “]-[” refers to an optional linker. As described elsewhere, e.g., FIGS. 3A-3G, the PEgRNAPEgRNA comprises an 5′ extension arm comprising a primer binding site and a DNA synthesis template. Although not shown, it is contemplated that the extension arm of the PEgRNAPEgRNA (i.e., which comprises a primer binding site and a DNA synthesis template) can be DNA or RNA. The particular polymerase contemplated in this configuration will depend upon the nature of the DNA synthesis template. For instance, if the DNA synthesis template is RNA, then the polymerase case be an RNA-dependent DNA polymerase (e.g., reverse transcriptase). If the DNA synthesis template is DNA, then the polymerase can be a DNA-dependent DNA polymerase. In various embodiments, the PEgRNA can be engineered or synthesized to incorporate a DNA-based DNA synthesis template.

FIG. 1B.1 provides a schematic of an exemplary process for introducing a single nucleotide change, insertion, and/or deletion into a DNA molecule (e.g., a genome) using a fusion protein comprising a reverse transcriptase fused to a napDNAbp (e.g., Cas9) in complex with an prime editor guide RNA. In this embodiment, the guide RNA is extended at the 5′ end to include a DNA synthesis template. The schematic shows how a reverse transcriptase (RT) fused to a Cas9 nickase, in a complex with a guide RNA (gRNA), binds the DNA target site and nicks the PAM-containing DNA strand adjacent to the target nucleotide. The canonical PAM sequence is 5′-NGG-3′, but different PAM sequences can be associated with different Cas9 proteins or equivalent proteins from different organisms. In addition, any given Cas9 nuclease, e.g., SpCas9, may be modified to alter the PAM specificity of the protein to recognize alternative PAM sequence. The RT enzyme uses the nicked DNA as a primer for DNA synthesis from the gRNA, which is used as a template for the synthesis of a new DNA strand that encodes the desired edit. The editing process shown may be referred to as target-primed reverse transcription editing (TPRT editor or prime editor). FIG. 1B.2 provides the same representation as in FIG. 1B.1, except that the prime editor complex is represented more generally as [napDNAbp]-[P]:PEgRNAPEgRNA or [P]-[napDNAbp]:PEgRNAPEgRNA, wherein “P” refers to any polymerase (e.g., a reverse transcriptase), “napDNAbp” refers to a nucleic acid programmable DNA binding protein (e.g., SpCas9), and “PEgRNAPEgRNA” refers to a prime editing guide RNA, and “]-[” refers to an optional linker. As described elsewhere, e.g., FIGS. 3A-3G, the PEgRNAPEgRNA comprises an 3′ extension arm comprising a primer binding site and a DNA synthesis template. Although not shown, it is contemplated that the extension arm of the PEgRNAPEgRNA (i.e., which comprises a primer binding site and a DNA synthesis template) can be DNA or RNA. The particular polymerase contemplated in this configuration will depend upon the nature of the DNA synthesis template. For instance, if the DNA synthesis template is RNA, then the polymerase case be an RNA-dependent DNA polymerase (e.g., reverse transcriptase). If the DNA synthesis template is DNA, then the polymerase can be a DNA-dependent DNA polymerase.

FIG. 1C is a schematic depicting an exemplary process of how the synthesized single strand of DNA (which comprises the desired nucleotide change) becomes resolved such that the desired nucleotide change, insertion, and/or deletion is incorporated into the DNA. As shown, following synthesis of the edited strand (or “mutagenic strand”), equilibration with the endogenous strand, flap cleavage of the endogenous strand, and ligation leads to incorporation of the DNA edit after resolution of the mismatched DNA duplex through the action of endogenous DNA repair and/or replication processes.

FIG. 1D is a schematic showing that “opposite strand nicking” can be incorporated into the resolution method of FIG. 1C to help drive the formation of the desired product versus the reversion product. In opposite strand nicking, a second napDNAbp/gRNA complex (e.g., Cas9/gRNA complex) is used to introduce a second nick on the opposite strand from the initial nicked strand. This induces the endogenous cellular DNA repair and/or replication processes to preferentially replace the unedited strand (i.e., the strand containing the second nick site).

FIG. 1E provides another schematic of an exemplary process for introducing at least one nucleotide change (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more), insertion, and/or deletion into a DNA molecule (e.g., a genome) of a target locus using a nucleic acid programmable DNA binding protein (napDNAbp) complexed with an prime editor guide RNA (e.g., prime editing). The prime editor guide RNA comprises an extension at the 3′ or 5′ end of the guide RNA, or at an intramolecular location in the guide RNA. In step (a), the napDNAbp/gRNA complex contacts the DNA molecule, and the gRNA guides the napDNAbp to bind to the target locus. In step (b), a nick in one of the strands of DNA (the R-loop strand, or the PAM-containing strand, or the non-target DNA strand, or the protospacer strand) of the target locus is introduced (e.g., by a nuclease or chemical agent), thereby creating an available 3′ end in one of the strands of the target locus. In certain embodiments, the nick is created in the strand of DNA that corresponds to the R-loop strand, i.e., the strand that is not hybridized to the guide RNA sequence. In step (c), the 3′ end DNA strand interacts with the extended portion of the guide RNA in order to prime reverse transcription. In some embodiments, the 3′ end DNA strand hybridizes to a specific primer binding site on the extended portion of the guide RNA. In step (d), a reverse transcriptase is introduced which synthesizes a single strand of DNA from the 3′ end of the primed site towards the 3′ end of the guide RNA. This forms a single-strand DNA flap comprising the desired nucleotide change (e.g., single or multiple base change(s), insertion(s), deletion(s), or a combination thereof). In step (e), the napDNAbp and guide RNA are released. Steps (f) and (g) relate to the resolution of the single strand DNA flap such that the desired nucleotide change becomes incorporated into the target locus. This process can be driven towards the desired product formation by removing the corresponding 5′ endogenous DNA flap that forms once the 3′ single strand DNA flap invades and hybridizes to the complementary sequence on the other strand. The process can also be driven towards product formation with second strand nicking, as exemplified in FIG. 1D. This process may introduce at least one or more of the following genetic changes: transversions, transitions, deletions, and insertions.

FIG. 1F is a schematic depicting the types of genetic changes that are possible with the target-primed reverse transcription editing (prime editing) processes described herein. The types of nucleotide changes achievable by prime editing include deletions (including short and long deletions), single and/or multiple nucleotide changes, and insertions (including short and long insertions).

FIG. 1G is a schematic depicting an example of temporal second strand nicking exemplified by a prime editor complex. Temporal second strand nicking is a variant of second strand nicking in order to facilitate the formation of the desired edited product. The term “temporal” refers to the fact that the second-strand nick to the unedited strand occurs only after the desired edit is installed in the edited strand. This avoids concurrent nicks on both strands that could lead to double-stranded DNA breaks.

FIG. 1H depicts a variation of prime editing contemplated herein that replaces the napDNAbp (e.g., SpCas9 nickase) with any programmable nuclease domain, such as zinc finger nucleases (ZFN) or transcription activator-like effector nucleases (TALEN). As such, it is contemplated that suitable nucleases do not necessarily need to be “programmed” by a nucleic acid targeting molecule (such as a guide RNA), but rather, may be programmed by defining the specificity of a DNA-binding domain, such as and in particular, a nuclease. Just as in prime editing with napDNAbp moieties, it is preferable that such alternative programmable nucleases be modified such that only one strand of a target DNA is cut. In other words, the programmable nucleases should function as nickases, preferably. Once a programmable nuclease is selected (e.g., a ZFN or a TALEN), then additional functionalities may be engineered into the system to allow it to operate in accordance with a prime editing-like mechanism. For example, the programmable nucleases may be modified by coupling (e.g., via a chemical linker) an RNA or DNA extension arm thereto, wherein the extension arm comprises a primer binding site (PBS) and a DNA synthesis template. The programmable nuclease may also be coupled (e.g., via a chemical or amino acid linker) to a polymerase, the nature of which will depend upon whether the extension arm is DNA or RNA. In the case of an RNA extension arm, the polymerase can be an RNA-dependent DNA polymerase (e.g., reverse transcriptase). In the case of a DNA extension arm, the polymerase can be a DNA-dependent DNA polymerase (e.g., a prokaryotic polymerase, including Pol I, Pol II, or Pol III, or a eukaryotic polymerase, including Pol a, Pol b, Pol g, Pol d, Pol e, or Pol z). The system may also include other functionalities added as fusions to the programmable nucleases, or added in trans to facilitate the reaction as a whole (e.g., (a) a helicase to unwind the DNA at the cut site to make the cut strand with the 3′ end available as a primer, (b) a FEN1 to help remove the endogenous strand on the cut strand to drive the reaction towards replacement of the endogenous strand with the synthesized strand, or (c) a nCas9:gRNA complex to create a second site nick on the opposite strand, which may help drive the integration of the synthesize repair through favored cellular repair of the non-edited strand). In an analogous manner to prime editing with a napDNAbp, such a complex with an otherwise programmable nuclease could be used to synthesize and then install a newly synthesized replacement strand of DNA carrying an edit of interest permanently into a target site of DNA.

FIG. 1I depicts, in one embodiment, the anatomical features of a target DNA that may be edited by prime editing. The target DNA comprises a “non-target strand” and a “target strand.” The target-strand is the strand that becomes annealed to the spacer of a PEgRNA of a prime editor complex that recognizes the PAM site (in this case, NGG, which is recognized by the canonical SpCas9-based prime editors). The target strand may also be referred to as the “non-PAM strand” or the “non-edit strand.” By contrast, the non-target strand (i.e., the strand containing the protospacer and the PAM sequence of NGG) may be referred to as the “PAM-strand” or the “edit strand.” In various embodiments, the nick site of the PE complex will be in the protospacer on the PAM-strand (e.g., with the SpCas9-based PE). The location of the nick will be characteristic of the particular Cas9 that forms the PE. For example, with an SpCas9-based PE, the nick site in the phosphodiester bond between bases three (“−3” position relative to the position 1 of the PAM sequence) and four (“−4” position relative to position 1 of the PAM sequence). The nick site in the protospacer forms a free 3′ hydroxyl group, which as seen in the following figures, complexes with the primer binding site of the extension arm of the PEgRNA and provides the substrate to begin polymerization of a single strand of DNA code for by the DNA synthesis template of the extension arm of the PEgRNA. This polymerization reaction is catalyzed by the polymerase (e.g., reverse transcriptase) of the PE fusion protein in the 5′ to 3′ direction. Polymerization terminates before reaching the gRNA core (e.g., by inclusion of a polymerization termination signal, or secondary structure, which functions to terminate the polymerization activity of PE), producing a single strand DNA flap that is extended from the original 3′ hydroxyl group of the nicked PAM strand. The DNA synthesis template codes for a single strand DNA that is homologous to the endogenous 5′-ended single strand of DNA that immediately follows the nick site on the PAM strand and incorporates the desired nucleotide change (e.g., single base substitution, insertion, deletion, inversion). The position of the desired edit can be in any position following downstream of the nick site on the PAM strand, which can include position +1, +2, +3, +4 (the start of the PAM site), +5 (position 2 of the PAM site), +6 (position 3 of the PAM site), +7, +8, +9, +10, +11, +12, +13, +14, +15, +16, +17, +18, +19, +20, +21, +22, +23, +24, +25, +26, +27, +28, +29, +30, +31, +32, +33, +34, +35, +36, +37, +38, +39, +40, +41, +42, +43, +44, +45, +46, +47, +48, +49, +50, +51, +52, +53, +54, +55, +56, +57, +58, +59, +60, +61, +62, +63, +64, +65, +66, +67, +68, +69, +70, +71, +72, +73, +74, +75, +76, +77, +78, +79, +80, +81, +82, +83, +84, +85, +86, +87, +88, +89, +90, +91, +92, +93, +94, +95, +96, +97, +98, +99, +100, +101, +102, +103, +104, +105, +106, +107, +108, +109, +110, +111, +112, +113, +114, +115, +116, +117, +118, +119, +120, +121, +122, +123, +124, +125, +126, +127, +128, +129, +130, +131, +132, +133, +134, +135, +136, +137, +138, +139, +140, +141, +142, +143, +144, +145, +146, +147, +148, +149, or +150, or more (relative to the downstream position of the nick site). Once the 3′ end single stranded DNA (containing the edit of interest) replaces the endogenous 5′ end single stranded DNA, the DNA repair and replication processes will result in permanent installation of the edit site on the PAM strand, and then correction of the mismatch on the non-PAM strand that will exist at the edit site. In this way, the edit will extend to both strands of DNA on the target DNA site. It will be appreciated that reference to “edited strand” and “non-edited” strand only intends to delineate the strands of DNA involved in the PE mechanism. The “edited strand” is the strand that first becomes edited by replacement of the 5′ ended single strand DNA immediately downstream of the nick site with the synthesized 3′ ended single stranded DNA containing the desired edit. The “non-edited” strand is the strand pair with the edited strand, but which itself also becomes edited through repair and/or replication to be complementary to the edited strand, and in particular, the edit of interest.

FIG. 1J depicts the mechanism of prime editing showing the anatomical features of the target DNA, prime editor complex, and the interaction between the PEgRNA and the target DNA. First, a prime editor comprising a fusion protein having a polymerase (e.g., reverse transcriptase) and a napDNAbp (e.g., SpCas9 nickase, e.g., a SpCas9 having a deactivating mutation in an HNH nuclease domain (e.g., H840A) or a deactivating mutation in a RuvC nuclease domain (D10A)) is complexed with a PEgRNA and DNA having a target DNA to be edited. The PEgRNA comprises a spacer, gRNA core (aka gRNA scaffold or gRNA backbone) (which binds to the napDNAbp), and an extension arm. The extension arm can be at the 3′ end, the 5′ end, or somewhere within the PEgRNA molecule. As shown, the extension arm is at the 3′ end of the PEgRNA. The extension arm comprises in the 3′ to 5′ direction a primer binding site and a DNA synthesis template (comprising both an edit of interest and regions of homology (i.e., homology arms) that are homologous with the 5′ ended single stranded DNA immediately following the nick site on the PAM strand. As shown, once the nick is introduced thereby producing a free 3′ hydroxyl group immediately upstream of the nick site, the region immediately upstream of the nick site on the PAM strand anneals to a complementary sequence at the 3′ end of the extension arm referred to as the “primer binding site,” creating a short double-stranded region with an available 3′ hydroxyl end, which forms a substrate for the polymerase of the prime editor complex. The polymerase (e.g., reverse transcriptase) then polymerase as strand of DNA from the 3′ hydroxyl end to the end of the extension arm. The sequence of the single stranded DNA is coded for by the DNA synthesis template, which is the portion of the extension arm (i.e., excluding the primer binding site) that is “read” by the polymerase to synthesize new DNA. This polymerization effectively extends the sequence of the original 3′ hydroxyl end of the initial nick site. The DNA synthesis template encodes a single strand of DNA that comprises not only the desired edit, but also regions that are homologous to the endogenous single strand of DNA immediately downstream of the nick site on the PAM strand. Next, the encoded 3′ ended single strand of DNA (i.e., the 3′ single strand DNA flap) displaces the corresponding homologous endogenous 5′-ended single strand of DNA immediately downstream of the nick site on the PAM strand, forming a DNA intermediate having a 5′-ended single strand DNA flap, which is removed by the cell (e.g., by a flap endonuclease). The 3′-ended single strand DNA flap, which anneals to the complement of the endogenous 5′-ended single strand DNA flap, is ligated to the endogenous strand after the 5′ DNA flap is removed. The desired edit in the 3′ ended single strand DNA flap, now annealed and ligate, forms a mismatch with the complement strand, which undergoes DNA repair and/or a round of replication, thereby permanently installing the desired edit on both strands.

FIG. 2 shows three Cas complexes that will be tested and their PAM, gRNA, and DNA cleavage features. The figure shows designs for complexes involving SpCas9, SaCas9, and LbCas12a.

FIGS. 3A-3C show designs for engineered 5′ extended gRNA (FIG. 3A), 3′ extended gRNA (FIG. 3B), and an intramolecular extension (FIG. 3C), each of which may be used for prime editing. The embodiments depict exemplary arrangements of the DNA synthesis template, the primer binding site, and an optional linker sequence in the extended portions of the 3′, 5′, and intramolecular extended gRNAs, as well as the arrangement of the protospacer and core regions. The disclosed TPRT process is not limited to these configurations of prime editor guide RNAs.

FIGS. 4A-4E demonstrate in vitro TPRT assays. FIG. 4A is a schematic of fluorescently labeled DNA substrate gRNA templated extension by an RT enzyme, polyacrylamide gel electrophoresis (PAGE) assay of the reverse transcriptase products. FIG. 4B shows TPRT with pre-nicked substrates, dCas9, and 5′-extended gRNAs of differing edit template length. FIG. 4C shows the RT reaction with pre-nicked DNA substrates in the absence of Cas9. FIG. 4D shows TPRT on full dsDNA substrates with Cas9 (H840A) and 5′-extended gRNAs. FIG. 4E shows a 3′-extended gRNA template with pre-nicked and full dsDNA substrates. All reactions are with M-MLV RT.

FIG. 5 shows in vitro validations using 5′-extended gRNAs with varying length edit templates. Fluorescently labeled (Cy5) DNA targets were used as substrates and were pre-nicked in this set of experiments. The Cas9 used in these experiments is catalytically dead Cas9 (dCas9), and the RT used is Superscript III, a commercially available RT derived from Moloney-Murine Leukemia Virus (M-MLV). dCas9:gRNA complexes were formed from purified components. Then, the fluorescently labeled DNA substrate was added along with dNTPs and the RT enzyme. After 1 hour of incubation at 37° C., the reaction products were analyzed by denaturing urea-polyacrylamide gel electrophoresis (PAGE). The gel image shows extension of the original DNA strand to lengths that are consistent with the length of the reverse transcription template.

FIG. 6 shows in vitro validations using 5′-extended gRNAs with varying length edit templates, which closely parallels those shown in FIG. 5. However, the DNA substrates are not pre-nicked in this set of experiments. The Cas9 used in these experiments is a Cas9 nickase (SpyCas9 H840A mutant), and the RT used is Superscript III, a commercially available RT derived from the Moloney-Murine Leukemia Virus (M-MLV). The reaction products were analyzed by denaturing urea-polyacrylamide gel electrophoresis (PAGE). As shown in the gel, the nickase efficiently cleaves the DNA strand when the gRNA is used (gRNA 0, lane 3).

FIG. 7 demonstrates that 3′ extensions support DNA synthesis and do not significantly affect Cas9 nickase activity. Pre-nicked substrates (black arrow) are near-quantitatively converted to RT products when either dCas9 or Cas9 nickase is used (lanes 4 and 5). Greater than 50% conversion to the RT product (red arrow) is observed with full substrates (lane 3). Cas9 nickase (SpyCas9 H840A mutant), catalytically dead Cas9 (dCas9), and Superscript III, a commercially available RT derived from the Moloney-Murine Leukemia Virus (M-MLV) were used.

FIG. 8 demonstrates dual color experiments that were used to determine if the RT reaction preferentially occurs with the gRNA in cis (bound in the same complex). Two separate experiments were conducted for 5′-extended and 3′-extended gRNAs. Products were analyzed by PAGE. Product ratio calculated as (Cy3cis/Cy3trans)/(Cy5trans/Cy5cis).

FIGS. 9A-9D demonstrates a flap model substrate. FIG. 9A shows a dual-FP reporter for flap-directed mutagenesis. FIG. 9B shows stop codon repair in HEK cells. FIG. 9C shows sequenced yeast clones after flap repair. FIG. 9D shows testing of different flap features in human cells.

FIG. 10 demonstrates prime editing on plasmid substrates. A dual-fluorescent reporter plasmid was constructed for yeast (S. cerevisiae) expression. Expression of this construct in yeast produces only GFP. The in vitro TRT reaction introduces a point mutation, and transforms the parent plasmid or an in vitro Cas9(H840A) nicked plasmid into yeast. The colonies are visualized by fluorescence imaging. Yeast dual-FP plasmid transformants are shown. Transforming the parent plasmid or an in vitro Cas9 (H840A) nicked plasmid results in only green GFP expressing colonies. The TRT reaction with 5′-extended or 3′-extended gRNAs produces a mix of green and yellow colonies. The latter express both GFP and mCherry. More yellow colonies are observed with the 3′-extended gRNA. A positive control that contains no stop codon is shown as well.

FIG. 11 shows prime editing on plasmid substrates similar to the experiment in FIG. 10, but instead of installing a point mutation in the stop codon, prime editing installs a single nucleotide insertion (left) or deletion (right) that repairs a frameshift mutation and allows for synthesis of downstream mCherry. Both experiments used 3′ extended gRNAs.

FIG. 12 shows editing products of prime editing on plasmid substrates, characterized by Sanger sequencing. Individually colonies from the TRT transformations were selected and analyzed by Sanger sequencing. Precise edits were observed by sequencing select colonies. Green colonies contained plasmids with the original DNA sequence, while yellow colonies contained the precise mutation designed by the prime editing gRNA. No other point mutations or indels were observed.

FIG. 13 shows the potential scope for the new prime editing technology is shown and compared to deaminase-mediated base editor technologies.

FIG. 14 shows a schematic of editing in human cells.

FIG. 15 demonstrates the extension of the primer binding site in gRNA.

FIG. 16 shows truncated gRNAs for adjacent targeting.

FIGS. 17A-17C are graphs displaying the % T to A conversion at the target nucleotide after transfection of components in human embryonic kidney (HEK) cells. FIG. 17A shows data, which presents results using an N-terminal fusion of wild type MLV reverse transcriptase to Cas9 (H840A) nickase (32-amino acid linker). FIG. 17B is similar to FIG. 17A, but for C-terminal fusion of the RT enzyme. FIG. 17C is similar to FIG. 17A but the linker between the MLV RT and Cas9 is 60 amino acids long instead of 32 amino acids.

FIG. 18 shows high purity T to A editing at HEK3 site by high-throughput amplicon sequencing. The output of sequencing analysis displays the most abundant genotypes of edited cells.

FIG. 19 shows editing efficiency at the target nucleotide (left bar of each pair of bars) alongside indel rates (right bar of each pair of bars). WT refers to the wild type MLV RT enzyme. The mutant enzymes (M1 through M4) contain the mutations listed to the right. Editing rates were quantified by high throughput sequencing of genomic DNA amplicons.

FIG. 20 shows editing efficiency of the target nucleotide when a single strand nick is introduced in the complementary DNA strand in proximity to the target nucleotide. Nicking at various distances from the target nucleotide was tested (orange triangles). Editing efficiency at the target base pair (blue bars) is shown alongside the indel formation rate (orange bars). The “none” example does not contain a complementary strand nicking guide RNA. Editing rates were quantified by high throughput sequencing of genomic DNA amplicons.

FIG. 21 demonstrates processed high throughput sequencing data showing the desired T to A transversion mutation and general absence of other major genome editing byproducts.

FIG. 22 provides a schematic of an exemplary process for conducting targeted mutagenesis with an error-prone reverse transcriptase on a target locus using a nucleic acid programmable DNA binding protein (napDNAbp) complexed with a prime editor guide RNA. This process may be referred to as an embodiment of prime editing for targeted mutagenesis. The prime editor guide RNA comprises an extension at the 3′ or 5′ end of the guide RNA, or at an intramolecular location in the guide RNA. In step (a), the napDNAbp/gRNA complex contacts the DNA molecule and the gRNA guides the napDNAbp to bind to the target locus to be mutagenized. In step (b), a nick in one of the strands of DNA of the target locus is introduced (e.g., by a nuclease or chemical agent), thereby creating an available 3′ end in one of the strands of the target locus. In certain embodiments, the nick is created in the strand of DNA that corresponds to the R-loop strand, i.e., the strand that is not hybridized to the guide RNA sequence. In step (c), the 3′ end DNA strand interacts with the extended portion of the guide RNA in order to prime reverse transcription. In some embodiments, the 3′ ended DNA strand hybridizes to a specific primer binding site on the extended portion of the guide RNA. In step (d), an error-prone reverse transcriptase is introduced which synthesizes a mutagenized single strand of DNA from the 3′ end of the primed site towards the 3′ end of the guide RNA. Exemplary mutations are indicated with an asterisk “*”. This forms a single-strand DNA flap comprising the desired mutagenized region. In step (e), the napDNAbp and guide RNA are released. Steps (f) and (g) relate to the resolution of the single strand DNA flap (comprising the mutagenized region) such that the desired mutagenized region becomes incorporated into the target locus. This process can be driven towards the desired product formation by removing the corresponding 5′ endogenous DNA flap that forms once the 3′ single strand DNA flap invades and hybridizes to the complementary sequence on the other strand. The process can also be driven towards product formation with second strand nicking, as exemplified in FIG. 1D. Following endogenous DNA repair and/or replication processes, the mutagenized region becomes incorporated into both strands of DNA of the DNA locus.

FIG. 23 is a schematic of gRNA design for contracting trinucleotide repeat sequences and trinucleotide repeat contraction with TPRT genome editing. Trinucleotide repeat expansion is associated with a number of human diseases, including Huntington's Disease, Fragile X syndrome, and Friedreich's ataxia. The most common trinucleotide repeat contains CAG triplets, though GAA triplets (Friedreich's ataxia) and CGG triplets (Fragile X syndrome) also occur. Inheriting a predisposition to expansion, or acquiring an already expanded parental allele, increases the likelihood of acquiring the disease. Pathogenic expansions of trinucleotide repeats could hypothetically be corrected using prime editing. A region upstream of the repeat region can be nicked by an RNA-guided nuclease, then used to prime synthesis of a new DNA strand that contains a healthy number of repeats (which depends on the particular gene and disease). After the repeat sequence, a short stretch of homology is added that matches the identity of the sequence adjacent to the other end of the repeat (red strand). Invasion of the newly synthesized strand, and subsequent replacement of the endogenous DNA with the newly synthesized flap, leads to a contracted repeat allele.

FIG. 24 is a schematic showing precise 10-nucleotide deletion with prime editing. A guide RNA targeting the HEK3 locus was designed with a reverse transcription template that encodes a 10-nucleotide deletion after the nick site. Editing efficiency in transfected HEK cells was assessed using amplicon sequencing.

FIG. 25 is a schematic showing gRNA design for peptide tagging genes at endogenous genomic loci and peptide tagging with TPRT genome editing. The FlAsH and ReAsH tagging systems comprise two parts: (1) a fluorophore-biarsenical probe, and (2) a genetically encoded peptide containing a tetracysteine motif, exemplified by the sequence FLNCCPGCCMEP (SEQ ID NO: 1361586). When expressed within cells, proteins containing the tetracysteine motif can be fluorescently labeled with fluorophore-arsenic probes (see ref: J. Am. Chem. Soc., 2002, 124 (21), pp 6063-6076. DOI: 10.1021/ja017687n). The “sortagging” system employs bacterial sortase enzymes that covalently conjugate labeled peptide probes to proteins containing suitable peptide substrates (see ref: Nat. Chem. Biol. 2007 November; 3(11):707-8. DOI: 10.1038/nchembio.2007.31). The FLAG-tag (DYKDDDDK (SEQ ID NO: 1361587)), V5-tag (GKPIPNPLLGLDST (SEQ ID NO: 1361588)), GCN4-tag (EELLSKNYHLENEVARLKK (SEQ ID NO: 1361589)), HA-tag (YPYDVPDYA (SEQ ID NO: 1361590)), and Myc-tag (EQKLISEEDL (SEQ ID NO: 1361591)) are commonly employed as epitope tags for immunoassays. The pi-clamp encodes a peptide sequence (FCPF (SEQ ID NO: 1361592)) that can by labeled with a pentafluoro-aromatic substrates (ref: Nat. Chem. 2016 February; 8(2):120-8. doi: 10.1038/nchem.2413).

FIG. 26 shows precise installation of a His6-tag and a FLAG-tag into genomic DNA. A guide RNA targeting the HEK3 locus was designed with a reverse transcription template that encodes either an 18-nt His-tag insertion or a 24-nt FLAG-tag insertion. Editing efficiency in transfected HEK cells was assessed using amplicon sequencing. Note that the full 24-nt sequence of the FLAG-tag is outside of the viewing frame (sequencing confirmed full and precise insertion).

FIG. 27 provides the structure of an embodiment of a PEgRNA contemplated herein and which may be designed in accordance with the methodology defined in Example 2. The PEgRNA comprises three main component elements ordered in the 5′ to 3′ direction, namely: a spacer, a gRNA core, and an extension arm at the 3′ end. The extension arm may further be divided into the following structural elements in the 5′ to 3′ direction, namely: a primer binding site (A), an edit template (B), and a homology arm (C). In addition, the PEgRNA may comprise an optional 3′ end modifier region (e1) and an optional 5′ end modifier region (e2). Still further, the PEgRNA may comprise a transcriptional termination signal at the 3′ end of the PEgRNA (not depicted). These structural elements are further defined herein. The depiction of the structure of the PEgRNA is not meant to be limiting and embraces variations in the arrangement of the elements. For example, the optional sequence modifiers (e1) and (e2) could be positioned within or between any of the other regions shown, and not limited to being located at the 3′ and 5′ ends. The PEgRNAPEgRNA could comprise, in certain embodiments, secondary RNA structure, such as, but not limited to, hairpins, stem/loops, toe loops, RNA-binding protein recruitment domains (e.g., the MS2 aptamer which recruits and binds to the MS2cp protein). For instance, such secondary structures could be position within the spacer, the gRNA core, or the extension arm, and in particular, within the e1 and/or e2 modifier regions. In addition to secondary RNA structures, the PEgRNAPEgRNAs could comprise (e.g., within the e1 and/or e2 modifier regions) a chemical linker or a poly(N) linker or tail, where “N” can be any nucleobase. In some embodiments (e.g., as shown in FIG. 72(c)), the chemical linker may function to prevent reverse transcription of the sgRNA scaffold or core. In addition, in certain embodiments (e.g., see FIG. 72(c)), the extension arm (3) could be comprised of RNA or DNA, and/or could include one or more nucleobase analogs (e.g., which might add functionality, such as temperature resilience). Still further, the orientation of the extension arm (3) can be in the natural 5′-to-3′ direction, or synthesized in the opposite orientation in the 3′-to-5′ direction (relative to the orientation of the PEgRNAPEgRNA molecule overall). It is also noted that one of ordinary skill in the art will be able to select an appropriate DNA polymerase, depending on the nature of the nucleic acid materials of the extension arm (i.e., DNA or RNA), for use in prime editing that may be implemented either as a fusion with the napDNAbp or as provided in trans as a separate moiety to synthesize the desired template-encoded 3′ single-strand DNA flap that includes the desired edit. For example, if the extension arm is RNA, then the DNA polymerase could be a reverse transcriptase or any other suitable RNA-dependent DNA polymerase. However, if the extension arm is DNA, then the DNA polymerase could be a DNA-dependent DNA polymerase. In various embodiments, provision of the DNA polymerase could be in trans, e.g., through the use of an RNA-protein recruitment domain (e.g., an MS2 hairpin installed on the PEgRNAPEgRNA (e.g., in the e1 or e2 region, or elsewhere and an MS2cp protein fused to the DNA polymerase, thereby co-localizing the DNA polymerase to the PEgRNAPEgRNA). It is also noted that the primer binding site does not generally form a part of the template that is used by the DNA polymerase (e.g., reverse transcriptase) to encode the resulting 3′ single-strand DNA flap that includes the desired edit. Thus, the designation of the “DNA synthesis template” refers to the region or portion of the extension arm (3) that is used as a template by the DNA polymerase to encode the desired 3′ single-strand DNA flap containing the edit. In some embodiments, the DNA synthesis template includes the “edit template” and the “homology arm”. In other embodiments, the DNA synthesis template may also include the e2 region or a portion thereof. For instance, if the e2 region comprises a secondary structure that causes termination of DNA polymerase activity, then it is possible that DNA polymerase function will be terminated before any portion of the e2 region is actual encoded into DNA. It is also possible that some or even all of the e2 region will be encoded into DNA. How much of e2 is actually used as a template will depend on its constitution and whether that constitution interrupts DNA polymerase function.

FIG. 28 provides the structure of another embodiment of a PEgRNA contemplated herein and which may be designed in accordance with the methodology defined in Example 2. The PEgRNA comprises three main component elements ordered in the 5′ to 3′ direction, namely: a spacer, a gRNA core, and an extension arm at the 3′ end. The extension arm may further be divided into the following structural elements in the 5′ to 3′ direction, namely: a primer binding site (A), an edit template (B), and a homology arm (C). In addition, the PEgRNA may comprise an optional 3′ end modifier region (e1) and an optional 5′ end modifier region (e2). Still further, the PEgRNA may comprise a transcriptional termination signal on the 3′ end of the PEgRNA (not depicted). These structural elements are further defined herein. The depiction of the structure of the PEgRNA is not meant to be limiting and embraces variations in the arrangement of the elements. For example, the optional sequence modifiers (e1) and (e2) could be positioned within or between any of the other regions shown, and not limited to being located at the 3′ and 5′ ends. The PEgRNAPEgRNA could comprise, in certain embodiments, secondary RNA structures, such as, but not limited to, hairpins, stem/loops, toeloops, RNA-binding protein recruitment domains (e.g., the MS2 aptamer which recruits and binds to the MS2cp protein). These secondary structures could be positioned anywhere in the PEgRNAPEgRNA molecule. For instance, such secondary structures could be position within the spacer, the gRNA core, or the extension arm, and in particular, within the e1 and/or e2 modifier regions. In addition to secondary RNA structures, the PEgRNAPEgRNAs could comprise (e.g., within the e1 and/or e2 modifier regions) a chemical linker or a poly(N) linker or tail, where “N” can be any nucleobase. In some embodiments (e.g., as shown in FIG. 27), the chemical linker may function to prevent reverse transcription of the sgRNA scaffold or core. In addition, in certain embodiments (e.g., see FIG. 28), the extension arm (3) could be comprised of RNA or DNA, and/or could include one or more nucleobase analogs (e.g., which might add functionality, such as temperature resilience). Still further, the orientation of the extension arm (3) can be in the natural 5′-to-3′ direction, or synthesized in the opposite orientation in the 3′-to-5′ direction (relative to the orientation of the PEgRNAPEgRNA molecule overall). It is also noted that one of ordinary skill in the art will be able to select an appropriate DNA polymerase, depending on the nature of the nucleic acid materials of the extension arm (i.e., DNA or RNA), for use in prime editing that may be implemented either as a fusion with the napDNAbp or as provided in trans as a separate moiety to synthesize the desired template-encoded 3′ single-strand DNA flap that includes the desired edit. For example, if the extension arm is RNA, then the DNA polymerase could be a reverse transcriptase or any other suitable RNA-dependent DNA polymerase. However, if the extension arm is DNA, then the DNA polymerase could be a DNA-dependent DNA polymerase. In various embodiments, provision of the DNA polymerase could be in trans, e.g., through the use of an RNA-protein recruitment domain (e.g., an MS2 hairpin installed on the PEgRNAPEgRNA (e.g., in the e1 or e2 region, or elsewhere and an MS2cp protein fused to the DNA polymerase, thereby co-localizing the DNA polymerase to the PEgRNAPEgRNA). It is also noted that the primer binding site does not generally form a part of the template that is used by the DNA polymerase (e.g., reverse transcriptase) to encode the resulting 3′ single-strand DNA flap that includes the desired edit. Thus, the designation of the “DNA synthesis template” refers to the region or portion of the extension arm (3) that is used as a template by the DNA polymerase to encode the desired 3′ single-strand DNA flap containing the edit. In some embodiments, the DNA synthesis template includes the “edit template” and the “homology arm”. In other embodiments, the DNA synthesis template may also include the e2 region or a portion thereof. For instance, if the e2 region comprises a secondary structure that causes termination of DNA polymerase activity, then it is possible that DNA polymerase function will be terminated before any portion of the e2 region is actual encoded into DNA. It is also possible that some or even all of the e2 region will be encoded into DNA. How much of e2 is actually used as a template will depend on its constitution and whether that constitution interrupts DNA polymerase function.

FIG. 29 is a schematic depicting the interaction of a typical PEgRNA with a target site of a double stranded DNA and the concomitant production of a 3′ single stranded DNA flap containing the genetic change of interest. The double strand DNA is shown with the top strand (i.e., the target strand) in the 3′ to 5′ orientation and the lower strand (i.e., the PAM strand or non-target strand) in the 5′ to 3′ direction. The top strand comprises the complement of the “protospacer” and the complement of the PAM sequence and is referred to as the “target strand.”” because it is the strand that is target by and anneals to the spacer of the PEgRNA. The complementary lower strand is referred to as the “non-target strand.”” or the “PAM strand” or the “protospacer strand” since it contains the PAM sequence (e.g., NGG) and the protospacer. Although not shown, the PEgRNA depicted would be complexed with a Cas9 or equivalent. domain of a prime editor fusion protein. As shown in the schematic, the spacer of the PEgRNA anneals to the complementary region of the protospacer on the target strand, which is referred to as the protospacer, which is located just downstream of the PAM sequence is approximately 20 nucleotides in length. This interaction forms as DNA/RNA hybrid between the spacer RNA and the complement of the protospacer DNA, and induces the formation of an R loop in the region opposite the protospacer. As taught elsewhere herein, the Cas9 protein (not shown) then induces a nick in the non-target strand, as shown. This then leads to the formation of the 3′ ssDNA flap region immediately upstream of the nick site which, in accordance with *z*, interacts with the 3′ end of the PEgRNA at the primer binding site. The 3′ end of the ssDNA flap (i.e., the reverse transcriptase primer sequence) anneals to the primer binding site (A) on the PEgRNA, thereby priming reverse transcriptase. Next, reverse transcriptase (e.g., provided in trans or provided cis as a fusion protein, attached to the Cas9 construct) then polymerizes a single strand of DNA which is coded for by the DNA synthesis template (including the edit template (B) and homology arm (C).)). The polymerization continues towards the 5′ end of the extension arm. The polymerized strand of ssDNA forms a ssDNA 3′ end flap which, as describe elsewhere (e.g., as shown in FIG. 1E), invades the endogenous DNA, displacing the corresponding endogenous strand (which is removed as a 5′ DNA flap of endogenous DNA), and installing the desired nucleotide edit (single nucleotide base pair change, deletions, insertions (including whole genes) through naturally occurring DNA repair/replication rounds.

FIG. 30 assists in understanding the disclosure of the PEgRNA of the Sequence Listing. The figures shows two exemplary PEgRNA sequences (SEQ ID NO: 135529 (top) and SEQ ID NO: 135880 (bottom)) and how the various disclosed sequence subsets map thereon. For SEQ ID NO: 135529, the corresponding sequences are spacer (SEQ ID NO: 271043), extension arm (SEQ ID NO: 406557), primer binding site (SEQ ID NO: 542071), edit template (SEQ ID NO: 677585), and the homology arm (SEQ ID NO: 813099). For SEQ ID NO: 135880, corresponding sequences are spacer (SEQ ID NO: 880463), extension arm (SEQ ID NO: 947841), primer binding site (SEQ ID NO: 1015219), edit template (SEQ ID NO:1082597), and the homology arm (SEQ ID NO: 1149975).

FIG. 31 is a flow chart showing an exemplary high level computerized method 3100 for determining an extended gRNA structure, according to some embodiments of the disclosure. At step 3102, a computing device (e.g., the computing device 3400 described in conjunction with FIG. 34) accesses data indicative of an input allele, an output allele, and a fusion protein that includes a nucleic acid programmable DNA binding protein and a reverse transcriptase. While step 3102 describes accessing all three of the input allele, output allele, and fusion protein in one step, this is for illustrative purposes, and it should be appreciated that such data can be accessed using one or more steps without departing from the spirit of the techniques described herein. Accessing data can include receiving data, storing data, accessing a database, and/or the like.

FIG. 32 is a flow chart showing an exemplary computerized method 3200 for determining the components of an extended gRNA structure, including the components of the extension, according to some embodiments. It should be appreciated that FIG. 32 is intended to be illustrative, and therefore, techniques used to determine the extended gRNA can include more, or fewer, steps than those shown in FIG. 32.

FIG. 33 is a flow chart showing an exemplary computerized method 3300 for determining sets of extended gRNA structures for each mutation entry in a database, according to some embodiments. At step 3302, the computing device accesses a database (e.g., a ClinVar database, which is accessible at www.ncbi.nlm.nih.gov/clinvar/) that includes a set of mutation entries that each include an input allele representing the mutation and an output allele representing the corrected wild-type sequence.

FIG. 34 is an illustrative implementation of a computer system 3400 that may be used to perform any of the aspects of the techniques and embodiments disclosed herein. The computer system 3400 may include one or more processors 3410 and one or more non-transitory computer-readable storage media (e.g., memory 3420 and one or more non-volatile storage media 3430) and a display 3440. The processor 3410 may control writing data to and reading data from the memory 3420 and the non-volatile storage device 3430 in any suitable manner, as the aspects of the invention described herein are not limited in this respect.

FIG. 35A is a schematic of PE-based insertion of sequences encoding RNA motifs in connection with Example 3.

FIG. 35B is a list (not exhaustive) of some example motifs that could potentially be inserted, and their functions, in connection with Example 3.

FIG. 36 provides a bar graph comparing the efficiency (i.e., “% of total sequencing reads with the specified edit or indels”) of PE2, PE2-trunc, PE3, and PE3-trunc over different target sites in various cell lines. The data shows that the prime editors comprising the truncated RT variants were about as efficient as the prime editors comprising the non-truncated RT proteins.

FIG. 37A shows the nucleotide sequence of a SpCas9 PEgRNA molecule (top) which terminates at the 3′ end in a “UGU” and does not contain a toe loop element. The lower portion of the figure depicts the same SpCas9 PEgRNA molecule but is further modified to contain a toe loop element having the sequence 5′-“GAAANNNNN”-3′ inserted immediately before the “UUU” 3′ end. The “N” can be any nucleobase.

FIG. 37B shows the results of Example 4, which demonstrates that the efficiency of prime editing in HEK cells or EMX cells is increased using PEgRNA containing toe loop elements, whereas the percent of indel formation is largely unchanged.

FIG. 38 depicts one embodiment of a prime editor being provided as two PE half proteins which regenerate as whole prime editor through the self-splicing action of the split-intein halves located at the end or beginning of each of the prime editor half proteins.

FIG. 39 depicts the mechanism of intein removal from a polypeptide sequence and the reformation of a peptide bond between the N-terminal and the C-terminal extein sequences. (a) depicts the general mechanism of two half proteins each containing half of an intein sequence, which when in contact within a cell result in a fully-functional intein which then undergoes self-spicing and excision. The process of excision results in the formation of a peptide bond between the N-terminal protein half (or the “N extein”) and the C-terminal protein half (or the “C extein”) to form a whole, single polypeptide comprising the N extein and the C extein portions. In various embodiments, the N extein may correspond to the N-terminal half of a split prime editor fusion protein and the C extein may correspond to the C-terminal half of a split prime editor. (b) shows a chemical mechanics of intein excision and the reformation of a peptide bond that joins the N extein half (the red-colored half) and the C extein half (the blue-colored half). Excision of the split inteins (i.e., the N intein and the C intein in the split intein configuration) may also be referred to as “trans splicing” as it involves the splicing action of two separate components provided in trans.

DEFINITIONS
Antisense Strand

In genetics, the “antisense” strand of a segment within double-stranded DNA is the template strand, and which is considered to run in the 3′ to 5′ orientation. By contrast, the “sense” strand is the segment within double-stranded DNA that runs from 5′ to 3′, and which is complementary to the antisense strand of DNA, or template strand, which runs from 3′ to 5′. In the case of a DNA segment that encodes a protein, the sense strand is the strand of DNA that has the same sequence as the mRNA, which takes the antisense strand as its template during transcription, and eventually undergoes (typically, not always) translation into a protein. The antisense strand is thus responsible for the RNA that is later translated to protein, while the sense strand possesses a nearly identical makeup to that of the mRNA. Note that for each segment of dsDNA, there will possibly be two sets of sense and antisense, depending on which direction one reads (since sense and antisense is relative to perspective). It is ultimately the gene product, or mRNA, that dictates which strand of one segment of dsDNA is referred to as sense or antisense.

Cas9

The term “Cas9” or “Cas9 nuclease” refers to an RNA-guided nuclease comprising a Cas9 domain, or a fragment thereof (e.g., a protein comprising an active or inactive DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9). A “Cas9 domain” as used herein, is a protein fragment comprising an active or inactive cleavage domain of Cas9 and/or the gRNA binding domain of Cas9. A “Cas9 protein” is a full length Cas9 protein. A Cas9 nuclease is also referred to sometimes as a casn1 nuclease or a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat)-associated nuclease. CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements, and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 domain. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of which are hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J. J., McShan W. M., Ajdic D. J., Savic D. J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A. N., Kenton S., Lai H. S., Lin S. P., Qian Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S. W., Roe B. A., McLaughlin R. E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C. M., Gonzales K., Chao Y., Pirzada Z. A., Eckert M. R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease comprises one or more mutations that partially impair or inactivate the DNA cleavage domain.

A nuclease-inactivated Cas9 domain may interchangeably be referred to as a “dCas9” protein (for nuclease-“dead” Cas9). Methods for generating a Cas9 domain (or a fragment thereof) having an inactive DNA cleavage domain are known (see, e.g., Jinek et al., Science. 337:816-821(2012); Qi et al., “Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression” (2013) Cell. 28; 152(5):1173-83, the entire contents of each of which are incorporated herein by reference). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, the HNH nuclease subdomain and the RuvC1 subdomain. The HNH subdomain cleaves the strand complementary to the gRNA, whereas the RuvC1 subdomain cleaves the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas9. For example, the mutations D10A and H840A completely inactivate the nuclease activity of S. pyogenes Cas9 (Jinek et al., Science. 337:816-821(2012); Qi et al., Cell. 28; 152(5):1173-83 (2013)). In some embodiments, proteins comprising fragments of Cas9 are provided. For example, in some embodiments, a protein comprises one of two Cas9 domains: (1) the gRNA binding domain of Cas9; or (2) the DNA cleavage domain of Cas9. In some embodiments, proteins comprising Cas9 or fragments thereof are referred to as “Cas9 variants.” A Cas9 variant shares homology to Cas9, or a fragment thereof. For example, a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, at least about 99.8% identical, or at least about 99.9% identical to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 1361421). In some embodiments, the Cas9 variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes compared to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 1361421). In some embodiments, the Cas9 variant comprises a fragment of SEQ ID NO: X Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 1361421). In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 1361421).

cDNA

The term “cDNA” refers to a strand of DNA copied from an RNA template. cDNA is complementary to the RNA template.

Circular Permutant

As used herein, the term “circular permutant” refers to a protein or polypeptide (e.g., a Cas9) comprising a circular permutation, which is change in the protein's structural configuration involving a change in order of amino acids appearing in the protein's amino acid sequence. In other words, circular permutants are proteins that have altered N- and C-termini as compared to a wild-type counterpart, e.g., the wild-type C-terminal half of a protein becomes the new N-terminal half. Circular permutation (or CP) is essentially the topological rearrangement of a protein's primary sequence, connecting its N- and C-terminus, often with a peptide linker, while concurrently splitting its sequence at a different position to create new, adjacent N- and C-termini. The result is a protein structure with different connectivity, but which often can have the same overall similar three-dimensional (3D) shape, and possibly include improved or altered characteristics, including, reduced proteolytic susceptibility, improved catalytic activity, altered substrate or ligand binding, and/or improved thermostability. Circular permutant proteins can occur in nature (e.g., concanavalin A and lectin). In addition, circular permutation can occur as a result of posttranslational modifications or may be engineered using recombinant techniques.

Circularly Permuted Cas9

The term “circularly permuted Cas9” refers to any Cas9 protein, or variant thereof, that occurs as a circular permutant, whereby its N- and C-termini have been reconfigured though rearrangement of the protein's primary sequence. Such circularly permuted Cas9 proteins (“CP-Cas9”), or variants thereof, retain the ability to bind DNA when complexed with a guide RNA (gRNA). See, Oakes et al., “Protein Engineering of Cas9 for enhanced function,” Methods Enzymol, 2014, 546: 491-511 and Oakes et al., “CRISPR-Cas9 Circular Permutants as Programmable Scaffolds for Genome Modification,” Cell, Jan. 10, 2019, 176: 254-267, each of are incorporated herein by reference. The instant disclosure contemplates any previously known CP-Cas9 or use a new CP-Cas9 so long as the resulting circularly permuted protein retains the ability to bind DNA when complexed with a guide RNA (gRNA). Exemplary CP-Cas9 proteins are SEQ ID NOs: 1361475-1361484.

DNA Synthesis Template

As used herein, the term “DNA synthesis template” refers to the region or portion of the extension arm of a PEgRNA that is utilized as a template strand by a polymerase of a prime editor to encode a 3′ single-strand DNA flap that contains the desired edit and which then, through the mechanism of prime editing, replaces the corresponding endogenous strand of DNA at the target site. In various embodiments, the DNA synthesis template is shown in FIG. 3A (in the context of a PEgRNA comprising a 5′ extension arm), FIG. 3B (in the context of a PEgRNA comprising a 3′ extension arm), FIG. 3C (in the context of an internal extension arm), FIG. 3D (in the context of a 3′ extension arm), and FIG. 3E (in the context of a 5′ extension arm). The extension arm, including the DNA synthesis template, may be comprised of DNA or RNA. In the case of RNA, the polymerase of the prime editor can be an RNA-dependent DNA polymerase (e.g., a reverse transcriptase). In the case of DNA, the polymerase of the prime editor can be a DNA-dependent DNA polymerase. In various embodiments (e.g., as depicted in FIGS. 3D-3E), the DNA synthesis template (4) may comprise the “edit template” and the “homology arm”, and all or a portion of the optional 5′ end modifier region, e2. That is, depending on the nature of the e2 region (e.g., whether it includes a hairpin, toe loop, or stem/loop secondary structure), the polymerase may encode none, some, or all of the e2 region, as well. Said another way, in the case of a 3′ extension arm, the DNA synthesis template (3) can include the portion of the extension arm (3) that spans from the 5′ end of the primer binding site (PBS) to 3′ end of the gRNA core that may operate as a template for the synthesis of a single-strand of DNA by a polymerase (e.g., a reverse transcriptase). In the case of a 5′ extension arm, the DNA synthesis template (3) can include the portion of the extension arm (3) that spans from the 5′ end of the PEgRNA molecule to the 3′ end of the edit template. Preferably, the DNA synthesis template excludes the primer binding site (PBS) of PEgRNAs either having a 3′ extension arm or a 5′ extension arm. Certain embodiments described here (e.g., FIG. 71A) refer to an “an RT template,” which is inclusive of the edit template and the homology arm, i.e., the sequence of the PEgRNA extension arm which is actually used as a template during DNA synthesis. The term “RT template” is equivalent to the term “DNA synthesis template.”

Downstream

As used herein, the terms “upstream” and “downstream” are terms of relativity that define the linear position of at least two elements located in a nucleic acid molecule (whether single or double-stranded) that is orientated in a 5′-to-3′ direction. In particular, a first element is upstream of a second element in a nucleic acid molecule where the first element is positioned somewhere that is 5′ to the second element. For example, a SNP is upstream of a Cas9-induced nick site if the SNP is on the 5′ side of the nick site. Conversely, a first element is downstream of a second element in a nucleic acid molecule where the first element is positioned somewhere that is 3′ to the second element. For example, a SNP is downstream of a Cas9-induced nick site if the SNP is on the 3′ side of the nick site. The nucleic acid molecule can be a DNA (double or single stranded). RNA (double or single stranded), or a hybrid of DNA and RNA. The analysis is the same for single strand nucleic acid molecule and a double strand molecule since the terms upstream and downstream are in reference to only a single strand of a nucleic acid molecule, except that one needs to select which strand of the double stranded molecule is being considered. Often, the strand of a double stranded DNA which can be used to determine the positional relativity of at least two elements is the “sense” or “coding” strand. In genetics, a “sense” strand is the segment within double-stranded DNA that runs from 5′ to 3′, and which is complementary to the antisense strand of DNA, or template strand, which runs from 3′ to 5′. Thus, as an example, a SNP nucleobase is “downstream” of a promoter sequence in a genomic DNA (which is double-stranded) if the SNP nucleobase is on the 3′ side of the promoter on the sense or coding strand.

CRISPR

CRISPR is a family of DNA sequences (i.e., CRISPR clusters) in bacteria and archaea that represent snippets of prior infections by a virus that has invaded the prokaryote. The snippets of DNA are used by the prokaryotic cell to detect and destroy DNA from subsequent attacks by similar viruses and effectively compose, along with an array of CRISPR-associated proteins (including Cas9 and homologs thereof) and CRISPR-associated RNA, a prokaryotic immune defense system. In nature, CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In certain types of CRISPR systems (e.g., type II CRISPR systems), correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the RNA. Specifically, the target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species—the guide RNA. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of which is hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. CRISPR biology, as well as Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J. J., McShan W. M., Ajdic D. J., Savic D. J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A. N., Kenton S., Lai H. S., Lin S. P., Qian Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S. W., Roe B. A., McLaughlin R. E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C. M., Gonzales K., Chao Y., Pirzada Z. A., Eckert M. R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference.

The term “Cas9” or “Cas9 nuclease” refers to an RNA-guided nuclease comprising a Cas9 domain, or a fragment thereof (e.g., a protein comprising an active or inactive DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9). A “Cas9 domain” as used herein, is a protein fragment comprising an active or inactive cleavage domain of Cas9 and/or the gRNA binding domain of Cas9. A “Cas9 protein” is a full length Cas9 protein. A Cas9 nuclease is also referred to sometimes as a casn1 nuclease or a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat)-associated nuclease. CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements, and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 domain. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of which are hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J. J., McShan W. M., Ajdic D. J., Savic D. J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A. N., Kenton S., Lai H. S., Lin S. P., Qian Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S. W., Roe B. A., McLaughlin R. E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C. M., Gonzales K., Chao Y., Pirzada Z. A., Eckert M. R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease comprises one or more mutations that partially impair or inactivate the DNA cleavage domain.

A nuclease-inactivated Cas9 domain may interchangeably be referred to as a “dCas9” protein (for nuclease-“dead” Cas9). Methods for generating a Cas9 domain (or a fragment thereof) having an inactive DNA cleavage domain are known (see, e.g., Jinek et al., Science. 337:816-821(2012); Qi et al., “Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression” (2013) Cell. 28; 152(5):1173-83, the entire contents of each of which are incorporated herein by reference). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, the HNH nuclease subdomain and the RuvC1 subdomain. The HNH subdomain cleaves the strand complementary to the gRNA, whereas the RuvC1 subdomain cleaves the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas9. For example, the mutations D10A and H840A completely inactivate the nuclease activity of S. pyogenes Cas9 (Jinek et al., Science. 337:816-821(2012); Qi et al., Cell. 28; 152(5):1173-83 (2013)). In some embodiments, proteins comprising fragments of Cas9 are provided. For example, in some embodiments, a protein comprises one of two Cas9 domains: (1) the gRNA binding domain of Cas9; or (2) the DNA cleavage domain of Cas9. In some embodiments, proteins comprising Cas9 or fragments thereof are referred to as “Cas9 variants.” A Cas9 variant shares homology to Cas9, or a fragment thereof. For example, a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, at least about 99.8% identical, or at least about 99.9% identical to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 1361421). In some embodiments, the Cas9 variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes compared to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 1361421). In some embodiments, the Cas9 variant comprises a fragment of SEQ ID NO: 1361421 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 1361421). In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 1361421).

Edit Template

The term “edit template” refers to a portion of the extension arm that encodes the desired edit in the single strand 3′ DNA flap that is synthesized by the polymerase, e.g., a DNA-dependent DNA polymerase, RNA-dependent DNA polymerase (e.g., a reverse transcriptase). Certain embodiments described here (e.g, FIG. 71A) refer to “an RT template,” which refers to both the edit template and the homology arm together, i.e., the sequence of the PEgRNA extension arm which is actually used as a template during DNA synthesis. The term “RT edit template” is also equivalent to the term “DNA synthesis template,” but wherein the RT edit template reflects the use of a prime editor having a polymerase that is a reverse transcriptase, and wherein the DNA synthesis template reflects more broadly the use of a prime editor having any polymerase.

Error-Prone

As used herein, the term “error-prone” reverse transcriptase (or more broadly, any polymerase) refers to a reverse transcriptase (or more broadly, any polymerase) that occurs naturally or which has been derived from another reverse transcriptase (e.g., a wild type M-MLV reverse transcriptase) which has an error rate that is less than the error rate of wild type M-MLV reverse transcriptase. The error rate of wild type M-MLV reverse transcriptase is reported to be in the range of one error in 15,000 (higher) to 27,000 (lower). An error rate of 1 in 15,000 corresponds with an error rate of 6.7×10⁻⁵. An error rate of 1 in 27,000 corresponds with an error rate of 3.7×10⁻⁵. See Boutabout et al. (2001) “DNA synthesis fidelity by the reverse transcriptase of the yeast retrotransposon Ty1,” Nucleic Acids Res 29(11):2217-2222, which is incorporated herein by reference. Thus, for purposes of this application, the term “error prone” refers to those RT that have an error rate that is greater than one error in 15,000 nucleobase incorporation (6.7×10⁻⁵or higher), e.g., 1 error in 14,000 nucleobases (7.14×10⁻⁵or higher), 1 error in 13,000 nucleobases or fewer (7.7×10⁻⁵or higher), 1 error in 12,000 nucleobases or fewer (7.7×10⁻⁵or higher), 1 error in 11,000 nucleobases or fewer (9.1×10⁻⁵or higher), 1 error in 10,000 nucleobases or fewer (1×10⁻⁴or 0.0001 or higher), 1 error in 9,000 nucleobases or fewer (0.00011 or higher), 1 error in 8,000 nucleobases or fewer (0.00013 or higher) 1 error in 7,000 nucleobases or fewer (0.00014 or higher), 1 error in 6,000 nucleobases or fewer (0.00016 or higher), 1 error in 5,000 nucleobases or fewer (0.0002 or higher), 1 error in 4,000 nucleobases or fewer (0.00025 or higher), 1 error in 3,000 nucleobases or fewer (0.00033 or higher), 1 error in 2,000 nucleobase or fewer (0.00050 or higher), or 1 error in 1,000 nucleobases or fewer (0.001 or higher), or 1 error in 500 nucleobases or fewer (0.002 or higher), or 1 error in 250 nucleobases or fewer (0.004 or higher).

Extension Arm

The term “extension arm” refers to a nucleotide sequence component of a PEgRNA which provides several functions, including a primer binding site and an edit template for reverse transcriptase. In some embodiments, e.g., FIG. 3D, the extension arm is located at the 3′ end of the guide RNA. In other embodiments, e.g., FIG. 3E, the extension arm is located at the 5′ end of the guide RNA. In some embodiments, the extension arm also includes a homology arm. In various embodiments, the extension arm comprises the following components in a 5′ to 3′ direction: the homology arm, the edit template, and the primer binding site. Since polymerization activity of the reverse transcriptase is in the 5′ to 3′ direction, the preferred arrangement of the homology arm, edit template, and primer binding site is in the 5′ to 3′ direction such that the reverse transcriptase, once primed by an annealed primer sequence, polymerases a single strand of DNA using the edit template as a complementary template strand. Further details, such as the length of the extension arm, are described elsewhere herein.

The extension arm may also be described as comprising generally two regions: a primer binding site (PBS) and a DNA synthesis template, as shown in FIG. 3G (top), for instance. The primer binding site binds to the primer sequence that is formed from the endogenous DNA strand of the target site when it becomes nicked by the prime editor complex, thereby exposing a 3′ end on the endogenous nicked strand. As explained herein, the binding of the primer sequence to the primer binding site on the extension arm of the PEgRNA creates a duplex region with an exposed 3′ end (i.e., the 3′ of the primer sequence), which then provides a substrate for a polymerase to begin polymerizing a single strand of DNA from the exposed 3′ end along the length of the DNA synthesis template. The sequence of the single strand DNA product is the complement of the DNA synthesis template. Polymerization continues towards the 5′ of the DNA synthesis template (or extension arm) until polymerization terminates. Thus, the DNA synthesis template represents the portion of the extension arm that is encoded into a single strand DNA product (i.e., the 3′ single strand DNA flap containing the desired genetic edit information) by the polymerase of the prime editor complex and which ultimately replaces the corresponding endogenous DNA strand of the target site that sits immediate downstream of the PE-induced nick site. Without being bound by theory, polymerization of the DNA synthesis template continues towards the 5′ end of the extension arm until a termination event. Polymerization may terminate in a variety of ways, including, but not limited to (a) reaching a 5′ terminus of the PEgRNA (e.g., in the case of the 5′ extension arm wherein the DNA polymerase simply runs out of template), (b) reaching an impassable RNA secondary structure (e.g., hairpin or stem/loop), or (c) reaching a replication termination signal, e.g., a specific nucleotide sequence that blocks or inhibits the polymerase, or a nucleic acid topological signal, such as, supercoiled DNA or RNA. Effective amount

The term “effective amount,” as used herein, refers to an amount of a biologically active agent that is sufficient to elicit a desired biological response. For example, in some embodiments, an effective amount of a prime editor may refer to the amount of the editor that is sufficient to edit a target site nucleotide sequence, e.g., a genome. In some embodiments, an effective amount of a prime editor provided herein, e.g., of a fusion protein comprising a nickase Cas9 domain and a reverse transcriptase may refer to the amount of the fusion protein that is sufficient to induce editing of a target site specifically bound and edited by the fusion protein. As will be appreciated by the skilled artisan, the effective amount of an agent, e.g., a fusion protein, a nuclease, a hybrid protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide, may vary depending on various factors as, for example, on the desired biological response, e.g., on the specific allele, genome, or target site to be edited, on the cell or tissue being targeted, and on the agent being used.

Functional Equivalent

The term “functional equivalent” refers to a second biomolecule that is equivalent in function, but not necessarily equivalent in structure to a first biomolecule. For example, a “Cas9 equivalent” refers to a protein that has the same or substantially the same functions as Cas9, but not necessarily the same amino acid sequence. In the context of the disclosure, the specification refers throughout to “a protein X, or a functional equivalent thereof.” In this context, a “functional equivalent” of protein X embraces any homolog, paralog, fragment, naturally occurring, engineered, mutated, or synthetic version of protein X which bears an equivalent function.

Fusion Protein

The term “fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an “amino-terminal fusion protein” or a “carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic-acid editing protein. Another example includes a Cas9 or equivalent thereof to a reverse transcriptase. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.

Gene Product

The term “gene product,” as used herein, refers to any product encoded by a nucleic acid sequence. Accordingly, a gene product may, for example, be a primary transcript, a mature transcript, a processed transcript, or a protein or peptide encoded by a transcript. Examples for gene products, accordingly, include mRNAs, rRNAs, tRNAs, hairpin RNAs, microRNAs (miRNAs), shRNAs, siRNAs, and peptides and proteins, for example, reporter proteins or therapeutic proteins.

Gene of Interest (GOI)

The term “gene of interest” or “GOI” refers to a gene that encodes a biomolecule of interest (e.g., a protein or an RNA molecule). A protein of interest can include any intracellular protein, membrane protein, or extracellular protein, e.g., a nuclear protein, transcription factor, nuclear membrane transporter, intracellular organelle associated protein, a membrane receptor, a catalytic protein, and enzyme, a therapeutic protein, a membrane protein, a membrane transport protein, a signal transduction protein, or an immunological protein (e.g., an IgG or other antibody protein), etc. The gene of interest may also encode an RNA molecule, including, but not limited to, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), small nuclear RNA (snRNA), antisense RNA, guide RNA, microRNA (miRNA), small interfering RNA (siRNA), and cell-free RNA (cfRNA).

Guide RNA (“gRNA”)

As used herein, the term “guide RNA” is a particular type of guide nucleic acid which is mostly commonly associated with a Cas protein of a CRISPR-Cas9 and which associates with Cas9, directing the Cas9 protein to a specific sequence in a DNA molecule that includes complementarity to protospace sequence of the guide RNA. As described elsewhere, the PEgRNA are a subcategory of guide RNA which further comprise an extension arm on the 3′ or 5′ end of the guide that enables the molecule to be used with the prime editors disclosed herein. The term “guide RNA” also embraces the equivalent guide nucleic acid molecules that associate with Cas9 equivalents, homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and which otherwise program the Cas9 equivalent to localize to a specific target nucleotide sequence. The Cas9 equivalents may include other napDNAbp from any type of CRISPR system (e.g., type II, V, VI), including Cpf1 (a type-V CRISPR-Cas systems), C2c1 (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system) and C2c3 (a type V CRISPR-Cas system). Further Cas-equivalents are described in Makarova et al., “C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299), the contents of which are incorporated herein by reference. Exemplary sequences are and structures of guide RNAs are provided herein. In addition, methods for designing appropriate guide RNA sequences are provided herein. As used herein, the “guide RNA” may also be referred to as a “traditional guide RNA” to contrast it with the modified forms of guide RNA termed “prime editor guide RNAs” (or “PEgRNA”) which have been invented for the prime editing methods and composition disclosed herein.

Guide RNAs or PEgRNA may comprise various structural elements that include, but are not limited to:

Spacer sequence—the sequence in the guide RNA or PEgRNA (having about 10 to about 40 (e.g., about 10, about 15, about 20, about 25, about 30) nucleotides in length) which binds to the protospacer (as defined herein below) in the target DNA.

gRNA core (or gRNA scaffold or backbone sequence)—refers to the sequence within the gRNA that is responsible for napDNAbp (e.g., Cas9) binding, it does not include the spacer/targeting sequence that is used to guide the napDNAbp (e.g., Cas9) to target DNA.

Extension arm—refers to the extended portion of the guide RNA at either the 5′ or the 3′ end comprising the homology arm, edit template, and primer binding site. This component is further defined elsewhere.

Homology arm—refers to a portion(s) of the extension arm that encodes a portion of the resulting reverse transcriptase-encoded single strand DNA flap that is to be integrated into the target DNA site by replacing the endogenous strand. The portion of the single strand DNA flap encoded by the homology arm is complementary to the non-edited strand of the target DNA sequence, which facilitates the displacement of the endogenous strand and annealing of the single strand DNA flap in its place, thereby installing the edit. This component is further defined elsewhere.

Edit template—refers to a portion of the extension arm that encodes the desired edit in the single strand DNA flap that is synthesized by reverse transcriptase. This component is further defined elsewhere.

Primer binding site—refers to a portion of the extension arm that anneals to the primer sequence, which is formed from a strand of the target DNA after Cas9-mediated nickase action thereon. This component is further defined elsewhere.

Transcription terminator—the guide RNA or PEgRNA may comprise a transcriptional termination sequence at the 3′ of the molecule. Typically transcription terminator sequences (e.g., SEQ ID NOs: 1361560-1361565) are about 70 to about 125 nucleotides in length, but short and longer transcription terminator sequences are contemplated and any known in the art may be used.

Flap Endonuclease (e.g., FEN1)

As used herein, the term “flap endonuclease” refers to an enzyme that catalyzes the removal of 5′ single strand DNA flaps. These are naturally occurring enzymes that process the removal of 5′ flaps formed during cellular processes, including DNA replication. The prime editing methods herein described may utilize endogenously supplied flap endonucleases or those provided in trans to remove the 5′ flap of endogenous DNA formed at the target site during prime editing. Flap endonucleases are known in the art and can be found described in Patel et al., “Flap endonucleases pass 5′-flaps through a flexible arch using a disorder-thread-order mechanism to confer specificity for free 5′-ends,” Nucleic Acids Research, 2012, 40(10): 4507-4519 and Tsutakawa et al., “Human flap endonuclease structures, DNA double-base flipping, and a unified understanding of the FEN1 superfamily,” Cell, 2011, 145(2): 198-211, and Balakrishnan et al., “Flap Endonuclease 1,” Annu Rev Biochem, 2013, Vol 82: 119-138 (each of which are incorporated herein by reference). An exemplary flap endonuclease is FEN1, which can be represented by the following amino acid sequence:

Description
Sequence
SEQ ID NO

FEN1
MGIQGLAKLIADVAPSAIRENDIKSYFGRKVAIDASMSIYQFLIAVRQ
SEQ ID NO:

Wild type
GGDVLQNEEGETTSHLMGMFYRTIRMMENGIKPVYVEDGKPPQLKSGE
1361542

LAKRSERRAEAEKQLQQAQAAGAEQEVEKFTKRLVKVTKQHNDECKHL

LSLMGIPYLDAPSEAEASCAALVKAGKVYAAATEDMDCLTFGSPVLMR

HLTASEAKKLPIQEFHLSRILQELGLNQEQFVDLCILLGSDYCESIRG

IGPKRAVDLIQKHKSIEEIVRRLDPNKYPVPENWLHKEAHQLFLEPEV

LDPESVELKWSEPNEEELIKFMCGEKQFSEERIRSGVKRLSKSRQGST

QGRLDDFFKVTGSLSSAKRKEPEPKGSTKKKAKTGAAGKFKRGK

Fusion Protein

The term “fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an “amino-terminal fusion protein” or a “carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain (e.g., Cas9 nickase, napDNAbp) or a catalytic domain of a nucleic-acid editing protein (e.g., RT domain). Another example includes a napDNAbp (e.g., Cas9) or equivalent thereof fused to a reverse transcriptase. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4^thed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.

Homology Arm

The term “homology arm” refers to a portion of the extension arm that includes a sequence of the resulting reverse transcriptase-encoded single strand DNA flap that is to be integrated into the target DNA site by replacing the endogenous strand. The portion of the single strand DNA flap encoded by the homology arm is complementary to the non-edited strand of the target DNA sequence, which facilitates the displacement of the endogenous strand and annealing of the single strand DNA flap in its place, thereby installing the edit. This component is further defined elsewhere.

Host Cell

The term “host cell,” as used herein, refers to a cell that can host, replicate, and express a vector described herein, e.g., a vector comprising a nucleic acid molecule encoding a fusion protein comprising a napDNAbp or napDNAbp equivalent (e.g., Cas9 or equivalent) and a reverse transcriptase.

Isolated

“Isolated” means altered or removed from the natural state. For example, a nucleic 20 acid or a peptide naturally present in a living animal is not “isolated,” but the same nucleic acid or peptide partially or completely separated from the coexisting materials of its natural state is “isolated.” An isolated nucleic acid or protein can exist in substantially purified form, or can exist in a non-native environment such as, for example, a host cell.

In some embodiments, a gene of interest is encoded by an isolated nucleic acid. As used herein, the term “isolated,” refers to the characteristic of a material as provided herein being removed from its original or native environment (e.g., the natural environment if it is naturally occurring). Therefore, a naturally-occurring polynucleotide or protein or polypeptide present in a living animal is not isolated, but the same polynucleotide or polypeptide, separated by human intervention from some or all of the coexisting materials in the natural system, is isolated. An artificial or engineered material, for example, a non-naturally occurring nucleic acid construct, such as the expression constructs and vectors described herein, are, accordingly, also referred to as isolated. A material does not have to be purified in order to be isolated. Accordingly, a material may be part of a vector and/or part of a composition, and still be isolated in that such vector or composition is not part of the environment in which the material is found in nature.

napDNAbp

As used herein, the term “nucleic acid programmable DNA binding protein” or “napDNAbp,” of which Cas9 is an example, refers to proteins which use RNA:DNA hybridization to target and bind to specific sequences in a DNA molecule. Each napDNAbp is associated with at least one guide nucleic acid (e.g., guide RNA), which localizes the napDNAbp to a DNA sequence that comprises a DNA strand (i.e., a target strand) that is complementary to the guide nucleic acid, or a portion thereof (e.g., the protospacer of a guide RNA). In other words, the guide nucleic-acid “programs” the napDNAbp (e.g., Cas9 or equivalent) to localize and bind to a complementary sequence.

Without being bound by theory, the binding mechanism of a napDNAbp-guide RNA complex, in general, includes the step of forming an R-loop whereby the napDNAbp induces the unwinding of a double-strand DNA target, thereby separating the strands in the region bound by the napDNAbp. The guide RNA protospacer then hybridizes to the “target strand.” This displaces a “non-target strand” that is complementary to the target strand, which forms the single strand region of the R-loop. In some embodiments, the napDNAbp includes one or more nuclease activities, which then cut the DNA leaving various types of lesions. For example, the napDNAbp may comprises a nuclease activity that cuts the non-target strand at a first location, and/or cuts the target strand at a second location. Depending on the nuclease activity, the target DNA can be cut to form a “double-stranded break” whereby both strands are cut. In other embodiments, the target DNA can be cut at only a single site, i.e., the DNA is “nicked” on one strand. Exemplary napDNAbp with different nuclease activities include “Cas9 nickase” (“nCas9”) and a deactivated Cas9 having no nuclease activities (“dead Cas9” or “dCas9”). Exemplary sequences for these and other napDNAbp are provided herein.

Linker

The term “linker,” as used herein, refers to a molecule linking two other molecules or moieties. Linkers are well known in the art and can comprise any suitable combination of nucleic acids or amino acids to facilitate the proper function of the structures they join. The linker can be a series of amino acids. The linker can be an amino acid sequence in the case of a linker joining two fusion proteins. For example, a napDNAbp (e.g., Cas9) can be fused to a reverse transcriptase by an amino acid linker sequence. The linker can also be a nucleotide sequence in the case of joining two nucleotide sequences together. For example, in the instant case, the traditional guide RNA is linked via a spacer or linker nucleotide sequence to the RNA extension of an prime editor guide RNA which may comprise a DNA synthesis template (e.g., RT template sequence) and an Primer binding site. In other embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. In some embodiments, the linker is 5-100 nucleotides in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, 150-200, 200-300, 300-500, 500-1000, 1000-2000, or 2000-5000 nucleotides. Longer or shorter linkers are also contemplated.

Nickase

The term “nickase” refers to a napDNAbp (e.g., Cas9) with one of the two nuclease domains inactivated. This enzyme is capable of cleaving only one strand of a target DNA.

Nuclear Localization Sequence (NLS)

The term “nuclear localization sequence” or “NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., international PCT application, PCT/EP2000/011690, filed Nov. 23, 2000, published as WO2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference for its disclosure of exemplary nuclear localization sequences. In some embodiments, a NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 1361531) or MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 1361533).

Nucleic Acid Molecule

The term “nucleic acid,” as used herein, refers to a polymer (i.e., multiple, more than one, (e.g., 2, 3, 4, etc.) of nucleotides. The polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C5 bromouridine, C5 fluorouridine, C5 iodouridine, C5 propynyl uridine, C5 propynyl cytidine, C5 methylcytidine, 7 deazaadenosine, 7 deazaguanosine, 8 oxoadenosine, 8 oxoguanosine, 0(6) methylguanine, 4-acetylcytidine, 5-(carboxyhydroxymethyl)uridine, dihydrouridine, methylpseudouridine, 1-methyl adenosine, 1-methyl guanosine, N6-methyl adenosine, and 2-thiocytidine), chemically modified bases, biologically modified bases (e.g., methylated bases), intercalated bases, modified sugars (e.g., 2′-fluororibose, ribose, 2′-deoxyribose, 2′-O-methylcytidine, arabinose, and hexose), or modified phosphate groups (e.g., phosphorothioates and 5′-N phosphoramidite linkages).

Nucleobase

As used herein, the term “nucleobase,” also known as “nitrogenous base” or often simply “base,” are nitrogen-containing biological compounds that form nucleosides, which in turn are components of nucleotides, with all of these monomers constituting the basic building blocks of nucleic acids. The ability of nucleobases to form base pairs and to stack one upon another leads directly to long-chain helical structures such as ribonucleic acid (RNA) and deoxyribonucleic acid (DNA).

Five nucleobases, which are adenine (A), cytosine (C), guanine (G), thymine (T), and uracil (U), can be referred to as primary or canonical. They function as the fundamental units of the genetic code, with the bases A, G, C, and T being found in DNA while A, G, C, and U are found in RNA. Thymine and uracil are identical except that T includes a methyl group that U lacks. DNA and RNA may also contain modified nucleobases. For example, for adenosine and guanosine nucleobases, alternate nucleobases can include hypoxanthine, xanthine, or 7-methylguanine, which correspond with the alternate nucleosides of inosine, xanthosine, and 7-methylguanosine, respectively. In addition, for example, cytosine, thymine, or uridine nucleobases, alternate nucleobases can include 5,6dihydrouracil, 5-methylcytosine, or 5-hydroxymethylcytosine, which correspond with the alternate nucleosides of dihydrouridine, 5-methylcytidine, and 5-hydroxymethylcytidine, respectively. Nucleobases may also include nucleobase analogues, for which a vast number are known in the art. Typically the analogue nucleobases confer, among other things, different base pairing and base stacking properties. Examples include universal bases, which can pair with all four canonical bases, and phosphate-sugar backbone analogues such as PNA, which affect the properties of the chain (PNA can even form a triple helix). Nucleic acid analogues are also called “xeno nucleic acid” and represent one of the main pillars of xenobiology, the design of new-to-nature forms of life based on alternative biochemistries. Artificial nucleic acids include peptide nucleic acid (PNA), morpholino and locked nucleic acid (LNA), as well as glycol nucleic acid (GNA) and threose nucleic acid (TNA). Each of these is distinguished from naturally occurring DNA or RNA by changes to the backbone of the molecule. Example analogues are (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C5 bromouridine, C5 fluorouridine, C5 iodouridine, C5 propynyl uridine, C5 propynyl cytidine, C5 methylcytidine, 7 deazaadenosine, 7 deazaguanosine, 8 oxoadenosine, 8 oxoguanosine, 0(6) methylguanine, 4-acetylcytidine, 5-(carboxyhydroxymethyl)uridine, dihydrouridine, methylpseudouridine, 1-methyl adenosine, 1-methyl guanosine, N6-methyl adenosine, and 2-thiocytidine), chemically modified bases, biologically modified bases (e.g., methylated bases), intercalated bases, modified sugars (e.g., 2′-fluororibose, ribose, 2′-deoxyribose, 2′-O-methylcytidine, arabinose, and hexose), or modified phosphate groups (e.g., phosphorothioates and 5′-N phosphoramidite linkages).

PEgRNA

As used herein, the terms “prime editor guide RNA” or “PEgRNA” or “extended guide RNA” refers to a specialized form of a guide RNA that has been modified to include one or more additional sequences for use in the prime editing methods, compositions, and systems described herein. As described herein, the prime editor guide RNA comprise one or more “extended regions” of nucleic acid sequence. The extended regions may comprise, but are not limited to, single-stranded RNA. Further, the extended regions may occur at the 3′ end of a traditional guide RNA. In other arrangements, the extended regions may occur at the 5′ end of a traditional guide RNA. In still other arrangements, the extended region may occur at an intramolecular region, rather than one of the ends, of the traditional guide RNA, for example, in the gRNA core region which associates and/or binds to the napDNAbp. The extended region comprises a “reverse transcriptase template sequence” which is single-stranded RNA molecule which encodes a single-stranded complementary DNA (cDNA) which, in turn, has been designed to be (a) homologous to the endogenous target DNA to be edited, and (b) which comprises at least one desired nucleotide change (e.g., transition, transversion, deletion, insertion, or combination thereof) to be introduced or integrated into the endogenous target DNA. The extended region may also comprise other functional sequence elements, such as, but not limited to, a “primer binding site” and/or a “spacer or linker” sequence. As used herein the “primer binding site” comprises a sequence that hybridizes to a single-strand DNA sequence having a 3′ end generated from the nicked DNA of the R-loop and which comprises a primer for reverse transcriptase.

In some embodiments, the PEgRNA are represented by FIG. 3A, which shows a PEgRNA having a 5′ extension arm, a spacer, and a gRNA core. The 5′ extension further comprises in the 5′ to 3′ direction a reverse transcriptase template, a primer binding site, and a linker.

In some embodiments, the PEgRNA are represented by FIG. 3B, which shows a PEgRNA having a 3′ extension arm, a spacer, and a gRNA core. The 3′ extension further comprises in the 5′ to 3′ direction a reverse transcriptase template and a primer binding site.

In still other embodiments, the PEgRNA are represented by FIG. 27, which shows a PEgRNA having in the 5′ to 3′ direction a spacer (1), a gRNA core (2), and an extension arm (3). The extension arm (3) is at the 3′ end of the PEgRNA. The extension arm (3) further comprises in the 5′ to 3′ direction a “primer binding site” (A), a “edit template” (B), and a “homology arm” (C). The extension arm (3) may also comprise an optional modifier region at the 3′ and 5′ ends, which may be the same sequences or different sequences. In addition, the 3′ end of the PEgRNA may comprise a transcriptional terminator sequence. These sequence elements of the PEgRNA are further described and defined herein. In addition, the specification discloses exemplary PEgRNA, which have been designed in accordance with the methods disclosed herein, in the accompanying Sequence Listing.

In still other embodiments, the PEgRNA are represented by FIG. 28, which shows a PEgRNA having in the 5′ to 3′ direction an extension arm (3), a spacer (1), and a gRNA core (2). The extension arm (3) is at the 5′ end of the PEgRNA. The extension arm (3) further comprises in the 3′ to 5′ direction a “primer binding site” (A), an “edit template” (B), and a “homology arm” (C). The extension arm (3) may also comprise an optional modifier region at the 3′ and 5′ ends, which may be the same sequences or different sequences. The PEgRNA may also comprise a transcriptional terminator sequence at the 3′ end. These sequence elements of the PEgRNA are further described and defined herein.

Peptide Tag

The term “peptide tag” refers to a peptide amino acid sequence that is genetically fused to a protein sequence to impart one or more functions onto the proteins that facilitate the manipulation of the protein for various purposes, such as, visualization, identification, localization, purification, solubilization, separation, etc. Peptide tags can include various types of tags categorized by purpose or function, which may include “affinity tags” (to facilitate protein purification), “solubilization tags” (to assist in proper folding of proteins), “chromatography tags” (to alter chromatographic properties of proteins), “epitope tags” (to bind to high affinity antibodies), “fluorescence tags” (to facilitate visualization of proteins in a cell or in vitro).

PE1

As used herein, “PE1” refers to a PE complex comprising a fusion protein comprising Cas9(H840A) and a wild type MMLV RT having the following structure: [NLS]-[Cas9(H840A)]-[linker]-[MMLV_RT(wt)]+a desired PEgRNA, wherein the PE fusion has the amino acid sequence of SEQ ID NO: 1361515, which is shown as follows;

(SEQ ID NO: 1361515)

MKRTADGSEFESPKKKRKV
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALL

FDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNI

VDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTY

NQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDA

KLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDL

TLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQ

RTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG

EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENASLGTYHDLLKIIKDKDELDNEEN

EDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDF

LKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMG

RHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDM

YVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLI

TQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKL

VSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA

TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQT

GGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERS

SFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHY

EKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHL

FTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD

SGGSSGGSSGSETP

GTSESATPESSGGSSGGSS

TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIP

LKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKR

VEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQG

FKNSPTLFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQI

CQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTG

TLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDP

VAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQ

FGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVT

TETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEI

KNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP
SGGS

KRTADGSEFEPKKKRKV

key:

Nuclear localization sequence (NLS) Top: (SEQ ID NO: 1361532), Bottom:

(SEQ ID NO: 1361541)

Cas9 (H840A) (SEQ ID NO: 1361454)

33-amino acid linker
(SEQ ID NO: 1361528)

M-MLV reverse transcriptase (SEQ ID NO: 1361485).

PE2

As used herein, “PE2” refers to a PE complex comprising a fusion protein comprising Cas9(H840A) and a variant MMLV RT having the following structure: [NLS]-[Cas9(H840A)]-[linker]-[MMLV_RT(D200N)(T330P)(L603W)(T306K)(W313F)]+a desired PEgRNA, wherein the PE fusion has the amino acid sequence of SEQ ID NO: 1361516, which is shown as follows:

(SEQ ID NO: 1361516)

MKRTADGSEFESPKKKRKV
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALL

FDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNI

VDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTY

NQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNEDLAEDA

KLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDL

TLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQ

RTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG

EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDELDNEEN

EDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDF

LKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMG

RHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDM

YVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLI

TQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKL

VSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA

TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQT

GGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERS

SFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHY

EKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHL

FTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD

SGGSSGGSSGSETP

GTSESATPESSGGSSGGSS

TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIP

LKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKR

VEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQG

FKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQI

CQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPG

TLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDP

VAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQ

FGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVT

TETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEI

KNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP
SGGS

KRTADGSEFEPKKKRKV

key:

Nuclear localization sequence (NLS) Top: (SEQ ID NO: 1361532), Bottom:

(SEQ ID NO: 1361541)

Cas9 (H840A) (SEQ ID NO: 1361454)

33-amino acid linker
(SEQ ID NO: 1361528)

M-MLV reverse transcriptase (SEQ ID NO: 1361514).

PE3

As used herein, “PE3” refers to PE2 plus a second-strand nicking guide RNA that complexes with the PE2 and introduces a nick in the non-edited DNA strand in order to induce preferential replacement of the edited strand.

PE3b

As used herein, “PE3b” refers to PE3 but wherein the second-strand nicking guide RNA is designed for temporal control such that the second strand nick is not introduced until after the installation of the desired edit. This is achieved by designing a gRNA with a spacer sequence that matches only the edited strand, but not the original allele. Using this strategy, referred to hereafter as PE3b, mismatches between the protospacer and the unedited allele should disfavor nicking by the sgRNA until after the editing event on the PAM strand takes place.

PE-Short

As used herein, “PE-short” refers to a PE construct that is fused to a C-terminally truncated reverse transcriptase, and has the following amino acid sequence:

(SEQ ID NO: 1361602)

MKRTADGSEFESPKKKRKV
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALL

FDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNI

VDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKERGHFLIEGDLNPDNSDVDKLFIQLVQTY

NQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDA

KLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDL

TLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQ

RTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSG

EQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENASLGTYHDLLKIIKDKDELDNEEN

EDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDE

LKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMG

RHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDM

YVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLI

TQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKL

VSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA

TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQT

GGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERS

SFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHY

EKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHL

FTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD

SGGSSGGSSGSETP

GTSESATPESSGGSSGGSS

TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIP

LKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKR

VEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQG

FKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQI

CQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPG

TLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDP

VAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQ

FGPVVALNPATLLPLPEEGLQHNCLDNSRLIN
SGGSKRTADGSEFEPKKKRKV

key:

Nuclear localization sequence (NLS) Top: (SEQ ID NO: 1361532), Bottom:

(SEQ ID NO: 1361541)

Cas9 (H840A) (SEQ ID NO: 1361454)

33-amino acid linker 1
(SEQ ID NO: 1361528)

M-MLV TRUNCATED reverse transcriptase

(SEQ ID NO: 1361597)

Percent Identity

The “percent identity,” “sequence identity,” “% identity,” or “% sequence identity” (as they may be interchangeably used herein) of sequences (e.g., nucleic acid or amino acid) refers to a quantitative measurement of the similarity between two sequences (e.g., nucleic acid or amino acid). The percent identity of genomic DNA sequence, intron and exon sequence, and amino acid sequence between humans and other species varies by species type, with chimpanzee having the highest percent identity with humans of all species in each category. Percent identity can be determined using the algorithms of Karlin and Altschul, Proc. Natl. Acad. Sci. USA 87:2264-68, 1990, modified as in Karlin and Altschul, Proc. Natl. Acad. Sci. USA 90:5873-77, 1993. Such algorithms is incorporated into the NBLAST and XBLAST programs (version 2.0) of Altschul et al., J. Mol. Biol. 215:403-10, 1990. BLAST protein searches can be performed with the XBLAST program, score=50, word length=3, to obtain amino acid sequences homologous to the protein molecules of interest. Where gaps exist between two sequences, Gapped BLAST can be utilized as described in Altschul et al., Nucleic Acids Res. 25(17):3389-3402, 1997. When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs (e.g., XBLAST and NBLAST) can be used. When a percent identity is stated or recited, or a range thereof (e.g., at least, more than, between, etc.), unless otherwise specified, the endpoints shall be inclusive and the range (e.g., at least 70% identity) shall include all ranges within the cited range (e.g., at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 95.5%,at least 96%, at least 96.5%,at least 97%, at least 97.5%,at least 98%, at least 98.5%,at least 99%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9% identity) and all increments thereof (e.g., tenths of a percent (i.e., 0.1%), hundredths of a percent (i.e., 0.01%), etc.).

Prime Editor

The term “prime editor” refers to the herein described fusion constructs comprising a napDNAbp (e.g., Cas9 nickase) and a reverse transcriptase and is capable of carrying out prime editing on a target nucleotide sequence in the presence of a PEgRNA (or “extended guide RNA”). The term “prime editor” may refer to the fusion protein or to the fusion protein complexed with a PEgRNA. In some embodiments, the prime editor may also refer to the complex comprising a fusion protein (reverse transcriptase fused to a napDNAbp), a PEgRNA, and a regular guide RNA capable of directing the second-site nicking step of the non-edited strand as described herein. In certain embodiments, the reverse transcriptase component of the “primer editor” is provided in trans.

Primer Binding Site

The term “primer binding site” or “the PBS” refers to the nucleotide sequence located on a PEgRNA as a component of the extension arm (typically at the 3′ end of the extension arm) and serves to bind to the primer sequence that is formed after napDNAbp (e.g., Cas9) nicking of the target sequence by the prime editor. As detailed elsewhere, when the Cas9 nickase component of a prime editor nicks one strand of the target DNA sequence, a 3′-ended ssDNA flap is formed, which serves a primer sequence that anneals to the primer binding site on the PEgRNA to prime reverse transcription. FIGS. 27 and 28 show embodiments of the primer binding site located on a 3′ and 5′ extension arm, respectively.

Protein, Peptide, and Polypeptide

The terms “protein,” “peptide,” and “polypeptide” are used interchangeably herein and refer to a polymer of amino acid residues linked together by peptide (amide) bonds. The terms refer to a protein, peptide, or polypeptide of any size, structure, or function. Typically, a protein, peptide, or polypeptide will be at least three amino acids long. A protein, peptide, or polypeptide may refer to an individual protein or a collection of proteins. One or more of the amino acids in a protein, peptide, or polypeptide may be modified, for example, by the addition of a chemical entity, such as a carbohydrate group, a hydroxyl group, a phosphate group, a farnesyl group, an isofarnesyl group, a fatty acid group, a linker for conjugation, functionalization, or other modification, etc. A protein, peptide, or polypeptide may also be a single molecule or may be a multi-molecular complex. A protein, peptide, or polypeptide may be just a fragment of a naturally occurring protein or peptide. A protein, peptide, or polypeptide may be naturally occurring, recombinant, or synthetic, or any combination thereof. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.

Operably Linked

The term “operably linked,” as may be used herein, refers to functional linkage between a regulatory sequence and a heterologous nucleic acid sequence (e.g., transgene) resulting in expression of the heterologous nucleic acid sequence (e.g., transgene). For example, a first nucleic acid sequence is operably linked with a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. Generally, operably linked nucleic acid sequences are contiguous and, where necessary to join two protein coding regions, in the same reading frame.

Promoter

The term “promoter” is art-recognized and refers to a nucleic acid molecule with a sequence recognized by the cellular transcription machinery and able to initiate transcription of a downstream gene. A promoter can be constitutively active, meaning that the promoter is always active in a given cellular context, or conditionally active, meaning that the promoter is only active in the presence of a specific condition. For example, a conditional promoter may only be active in the presence of a specific protein that connects a protein associated with a regulatory element in the promoter to the basic transcriptional machinery, or only in the absence of an inhibitory molecule. A subclass of conditionally active promoters are inducible promoters that require the presence of a small molecule “inducer” for activity. Examples of inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters. A variety of constitutive, conditional, and inducible promoters are well known to the skilled artisan, and the skilled artisan will be able to ascertain a variety of such promoters useful in carrying out the instant invention, which is not limited in this respect.

Protospacer Adjacent Motif (PAM)

As used herein, the term “protospacer adjacent sequence” or “PAM” refers to an approximately 2-6 base pair DNA sequence that is an important targeting component of a Cas9 nuclease. Typically, the PAM sequence is on either strand and is downstream in the 5′ to 3′ direction of Cas9 cut site. The canonical PAM sequence (i.e., the PAM sequence that is associated with the Cas9 nuclease of Streptococcus pyogenes or SpCas9) is 5′-NGG-3′, wherein “N” is any nucleobase followed by two guanine (“G”) nucleobases. Different PAM sequences can be associated with different Cas9 nucleases or equivalent proteins from different organisms, for example, 5′-NG-3′, wherein “N” is any nucleobase followed by one guanine (“G”) nucleobases, or 5′-KKH-3′, wherein two lysine (“K”) are followed by one histidine (“H”). In addition, any given Cas9 nuclease, e.g., SpCas9, may be modified to alter the PAM specificity of the nuclease such that the nuclease recognizes alternative PAM sequence.

For example, with reference to the canonical SpCas9 amino acid sequence SEQ ID NO: 1361421 (SpCas9 M1 QQ99ZW2 wild type), the PAM sequence can be modified by introducing one or more mutations, including (a) D1135V, R1335Q, and T1337R “the VQR variant”, which alters the PAM specificity to NGAN or NGNG, (b) D1135E, R1335Q, and T1337R “the EQR variant”, which alters the PAM specificity to NGAG, and (c) D1135V, G1218R, R1335E, and T1337R “the VRER variant”, which alters the PAM specificity to NGCG. In addition, the D1135E variant of canonical SpCas9 still recognizes NGG, but it is more selective compared to the wild type SpCas9 protein.

It will also be appreciated that Cas9 enzymes from different bacterial species (i.e., Cas9 orthologs) can have varying PAM specificities. For example, Cas9 from Staphylococcus aureus (SaCas9) recognizes NGRRT or NGRRN. In addition, Cas9 from Neisseria meningitis (NmCas) recognizes NNNNGATT. In another example, Cas9 from Streptococcus thermophilis (StCas9) recognizes NNAGAAW. In still another example, Cas9 from Treponema denticola (TdCas) recognizes NAAAAC. These examples are not meant to be limiting. It will be further appreciated that non-SpCas9s bind a variety of PAM sequences, which makes them useful when no suitable SpCas9 PAM sequence is present at the desired target cut site. Furthermore, non-SpCas9s may have other characteristics that make them more useful than SpCas9. For example, Cas9 from Staphylococcus aureus (SaCas9) is about 1 kilobase smaller than SpCas9, so it can be packaged into adeno-associated virus (AAV). Further reference may be made to Shah et al., “Protospacer recognition motifs: mixed identities and functional diversity,” RNA Biology, 10(5): 891-899 (which is incorporated herein by reference).

Protospacer

As used herein, the term “protospacer” refers to the sequence (˜20 bp) in DNA adjacent to the PAM (protospacer adjacent motif) sequence which has the same sequence as the spacer sequence of the guide RNA. The guide RNA anneals to the complement of the protospacer sequence on the target DNA (specifically, one strand thereof, i.e, the “target strand” versus the “non-target strand” of the target sequence). In order for Cas9 to function it also requires a specific protospacer adjacent motif (PAM) that varies depending on the bacterial species of the Cas9 gene. The most commonly used Cas9 nuclease, derived from S. pyogenes, recognizes a PAM sequence of NGG that is found directly downstream of the target sequence in the genomic DNA, on the non-target strand. The skilled person will appreciate that the literature in the state of the art sometimes refers to the “protospacer” as the ˜20-nt target-specific guide sequence on the guide RNA itself, rather than referring to it as a “spacer.” Thus, in some cases, the term “protospacer” as used herein may be used interchangeably with the term “spacer.” The context of the description surrounding the appearance of either “protospacer” or “spacer” will help inform the reader as to whether the term is refine to the gRNA or the DNA target. Both usages of these terms are acceptable since the state of the art uses both terms in each of these ways.

Reverse Transcriptase

The term “reverse transcriptase” describes a class of polymerases characterized as RNA-dependent DNA polymerases. All known reverse transcriptases require a primer to synthesize a DNA transcript from an RNA template. Historically, reverse transcriptase has been used primarily to transcribe mRNA into cDNA which can then be cloned into a vector for further manipulation. Avian myoblastosis virus (AMV) reverse transcriptase was the first widely used RNA-dependent DNA polymerase (Verma, Biochim. Biophys. Acta 473:1 (1977)). The enzyme has 5′-3′ RNA-directed DNA polymerase activity, 5′-3′ DNA-directed DNA polymerase activity, and RNase H activity. RNase H is a processive 5′ and 3′ ribonuclease specific for the RNA strand for RNA-DNA hybrids (Perbal, A Practical Guide to Molecular Cloning, New York: Wiley & Sons (1984)). Errors in transcription cannot be corrected by reverse transcriptase because known viral reverse transcriptases lack the 3′-5′ exonuclease activity necessary for proof-reading (Saunders and Saunders, Microbial Genetics Applied to Biotechnology, London: Croom Helm (1987)). A detailed study of the activity of AMV reverse transcriptase and its associated RNase H activity has been presented by Berger et al., Biochemistry 22:2365-2372 (1983). Another reverse transcriptase which is used extensively in molecular biology is reverse transcriptase originating from Moloney murine leukemia virus (M-MLV). See, e.g., Gerard, G. R., DNA 5:271-279 (1986) and Kotewicz, M. L., et al., Gene 35:249-258 (1985). M-MLV reverse transcriptase substantially lacking in RNase H activity has also been described. See, e.g., U.S. Pat. No. 5,244,797. The invention contemplates the use of any such reverse transcriptases, or variants or mutants thereof.

In addition, the invention contemplates the use of reverse transcriptases which are error-prone, i.e., which may be referred to as error-prone reverse transcriptases or reverse transcriptases which do not support high fidelity incorporation of nucleotides during polymerization. During synthesis of the single-strand DNA flap based on the RT template integrated with the guide RNA, the error-prone reverse transcriptase can introduce one or more nucleotides which are mismatched with the DNA synthesis template (e.g., RT template sequence), thereby introducing changes to the nucleotide sequence through erroneous polymerization of the single-strand DNA flap. These errors introduced during synthesis of the single strand DNA flap then become integrated into the double strand molecule through hybridization to the corresponding endogenous target strand, removal of the endogenous displaced strand, ligation, and then through one more rounds of endogenous DNA repair and/or replication.

Reverse Transcription

As used herein, the term “reverse transcription” indicates the capability of enzyme to synthesize DNA strand (that is, complementary DNA or cDNA) using RNA as a template. In some embodiments, the reverse transcription can be “error-prone reverse transcription,” which refers to the properties of certain reverse transcriptase enzymes which are error-prone in their DNA polymerization activity.

Sense Strand

In genetics, a “sense” strand is the segment within double-stranded DNA that runs from 5′ to 3′, and which is complementary to the antisense strand of DNA, or template strand, which runs from 3′ to 5′. In the case of a DNA segment that encodes a protein, the sense strand is the strand of DNA that has the same sequence as the mRNA, which takes the antisense strand as its template during transcription, and eventually undergoes (typically, not always) translation into a protein. The antisense strand is thus responsible for the RNA that is later translated to protein, while the sense strand possesses a nearly identical makeup to that of the mRNA. Note that for each segment of dsDNA, there will possibly be two sets of sense and antisense, depending on which direction one reads (since sense and antisense is relative to perspective). It is ultimately the gene product, or mRNA, that dictates which strand of one segment of dsDNA is referred to as sense or antisense.

In the context of a PEgRNA, the first step is the synthesis of a single-strand complementary DNA (i.e., the 3′ ssDNA flap, which becomes incorporated) oriented in the 5′ to 3′ direction which is templated off of the PEgRNA extension arm. Whether the 3′ ssDNA flap should be regarded as a sense or antisense strand depends on the direction of transcription since it well accepted that both strands of DNA may serve as a template for transcription (but not at the same time). Thus, in some embodiments, the 3′ ssDNA flap (which overall runs in the 5′ to 3′ direction) will serve as the sense strand because it is the coding strand. In other embodiments, the 3′ ssDNA flap (which overall runs in the 5′ to 3′ direction) will serve as the antisense strand and thus, the template for transcription.

Second Strand Nicking

As used herein, the concept refers to the introduction of a second nick at a location downstream of the first nick (i.e., the initial nick site that provides the free 3′ end for use in priming of the reverse transcriptase on the extended portion of the guide RNA). In some embodiments, the first nick and the second nick are on opposite strands. In other embodiments, the first nick and the second nick are on opposite strands. In yet another embodiment, the first nick is on the non-target strand (i.e., the strand that forms the single strand portion of the R-loop), and the second nick is on the target strand. The second nick is positioned at least 5 nucleotides downstream of the first nick, or at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 or more nucleotides downstream of the first nick. Without being bound by theory, the second nick induces the cell's endogenous DNA repair and replication processes towards replacement of the unedited strand. In some embodiments, the edited strand is the non-target strand and the unedited strand is the target strand. In other embodiments, the edited strand is the target strand, and the unedited strand is the non-target strand.

Spacer Sequence

As used herein, the term “spacer sequence” in connection with a guide RNA or a PEgRNA refers to the portion of the guide RNA or PEgRNA of about 10 to about 40 (e.g., about 10, about 15, about 20, about 25, about 30) nucleotides which contains a nucleotide sequence that is complementary to the protospacer sequence in the target DNA sequence. The spacer sequence anneals to the protospacer sequence to form a ssRNA/ssDNA hybrid structure at the target site and a corresponding R loop ssDNA structure of the endogenous DNA strand that is complementary to the protospacer sequence.

Subject

The term “subject,” as used herein, refers to an individual organism, for example, an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, a goat, a cattle, a cat, or a dog. In some embodiments, the subject is a vertebrate, an amphibian, a reptile, a fish, an insect, a fly, or a nematode. In some embodiments, the subject is a research animal. In some embodiments, the subject is genetically engineered, e.g., a genetically engineered non-human subject. The subject may be of either sex and at any stage of development.

Target Site

The term “target site” refers to a sequence within a nucleic acid molecule that is edited by a prime editor disclosed herein. The target site further refers to the sequence within a nucleic acid molecule to which a complex of the base editor and gRNA binds.

Temporal Second-Strand Nicking

As used herein, the term “temporal second-strand nicking” refers to a variant of second strand nicking whereby the installation of the second nick in the unedited strand occurs only after the desired edit is installed in the edited strand. This avoids concurrent nicks on both strands that could lead to double-stranded DNA breaks. The second-strand nicking guide RNA is designed for temporal control such that the second strand nick is not introduced until after the installation of the desired edit. This is achieved by designing a gRNA with a spacer sequence that matches only the edited strand, but not the original allele. Using this strategy, mismatches between the protospacer and the unedited allele should disfavor nicking by the sgRNA until after the editing event on the PAM strand takes place.

tPERT

See definition for “trans prime editor RNA template (tPERT).”

Temporal Second-Strand Nicking

Trans Prime Editing

As used herein, the term “trans prime editing” refers to a modified form of prime editing that utilizes a split PEgRNA, i.e., wherein the PEgRNA is separated into two separate molecules: an sgRNA and a trans prime editing RNA template (tPERT). The sgRNA serves to target the prime editor (or more generally, to target the napDNAbp component of the prime editor) to the desired genomic target site, while the tPERT is used by the polymerase (e.g., a reverse transcriptase) to write new DNA sequence into the target locus once the tPERT is recruited in trans to the prime editor by the interaction of binding domains located on the prime editor and on the tPERT. In one embodiment, the binding domains can include RNA-protein recruitment moieties, such as a MS2 aptamer located on the tPERT and an MS2cp protein fused to the prime editor. An advantage of trans prime editing is that by separating the DNA synthesis template from the guide RNA, one can potentially use longer length templates.

An embodiment of trans prime editing is shown in FIGS. 3G and 3H. FIG. 3G shows the composition of the trans prime editor complex on the left (“RP-PE:gRNA complex), which comprises an napDNAbp fused to each of a polymerase (e.g., a reverse transcriptase) and a rPERT recruiting protein (e.g., MS2sc), and which is complexed with a guide RNA. FIG. 3G further shows a separate tPERT molecule, which comprises the extension arm features of a PEgRNA, including the DNA synthesis template and the primer binding sequence. The tPERT molecule also includes an RNA-protein recruitment domain (which, in this case, is a stem loop structure and can be, for example, MS2 aptamer). As depicted in the process described in FIG. 3H, the RP-PE:gRNA complex binds to and nicks the target DNA sequence. Then, the recruiting protein (RP) recruits a tPERT to co-localize to the prime editor complex bound to the DNA target site, thereby allowing the primer binding site to bind to the primer sequence on the nicked strand, and subsequently, allowing the polymerase (e.g., RT) to synthesize a single strand of DNA against the DNA synthesis template up through the 5′ of the tPERT.

While the tPERT is shown in FIG. 3G and FIG. 3H as comprising the PBS and DNA synthesis template on the 5′ end of the RNA-protein recruitment domain, the tPERT in other configurations may be designed with the PBS and DNA synthesis template located on the 3′ end of the RNA-protein recruitment domain. However, the tPERT with the 5′ extension has the advantage that synthesis of the single strand of DNA will naturally terminate at the 5′ end of the tPERT and thus, does not risk using any portion of the RNA-protein recruitment domain as a template during the DNA synthesis stage of prime editing.

Trans Prime Editor RNA Template (tPERT)

As used herein, a “trans prime editor RNA template (tPERT)” refers to a component used in trans prime editing, a modified version of prime editing which operates by separating the PEgRNA into two distinct molecules: a guide RNA and a tPERT molecule. The tPERT molecule is programmed to co-localize with the prime editor complex at a target DNA site, bringing the primer binding site and the DNA synthesis template to the prime editor in trans. For example, see FIG. 3G for an embodiment of a trans prime editor (tPE) which shows a two-component system comprising (1) an RP-PE:gRNA complex and (2) a tPERT that includes the primer binding site and the DNA synthesis template joined to an RNA-protein recruitment domain, wherein the RP (recruiting protein) component of the RP-PE:gRNA complex recruits the tPERT to a target site to be edited, thereby associating the PBS and DNA synthesis template with the prime editor in trans. Said another way, the tPERT is engineered to contain (all or part of) the extension arm of a PEgRNA, which includes the primer binding site and the DNA synthesis template.

Transitions

As used herein, “transitions” refer to the interchange of purine nucleobases (A↔G) or the interchange of pyrimidine nucleobases (C↔T). This class of interchanges involves nucleobases of similar shape. The compositions and methods disclosed herein are capable of inducing one or more transitions in a target DNA molecule. The compositions and methods disclosed herein are also capable of inducing both transitions and transversion in the same target DNA molecule. These changes involve A↔G, G↔A, C↔T, or T↔C. In the context of a double-strand DNA with Watson-Crick paired nucleobases, transversions refer to the following base pair exchanges: A:T↔G:C, G:G↔A:T, C:G↔T:A, or T:AE↔C:G. The compositions and methods disclosed herein are capable of inducing one or more transitions in a target DNA molecule. The compositions and methods disclosed herein are also capable of inducing both transitions and transversion in the same target DNA molecule, as well as other nucleotide changes, including deletions and insertions.

Transversions

As used herein, “transversions” refer to the interchange of purine nucleobases for pyrimidine nucleobases, or in the reverse and thus, involve the interchange of nucleobases with dissimilar shape. These changes involve T↔A, T↔G, C↔G, C↔A, A↔T, A↔C, G↔C, and G↔T. In the context of a double-strand DNA with Watson-Crick paired nucleobases, transversions refer to the following base pair exchanges: T:A↔A:T, T:A↔G:C, C:G↔G:C, C:G↔A:T, A:T↔T:A, A:T↔C:G, G:C↔C:G, and G:C↔T:A. The compositions and methods disclosed herein are capable of inducing one or more transversions in a target DNA molecule. The compositions and methods disclosed herein are also capable of inducing both transitions and transversion in the same target DNA molecule, as well as other nucleotide changes, including deletions and insertions.

Treatment

The terms “treatment,” “treat,” and “treating,” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. As used herein, the terms “treatment,” “treat,” and “treating” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. In some embodiments, treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed. In other embodiments, treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease. For example, treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their recurrence.

Trinucleotide Repeat Disorder

As used herein, a “trinucleotide repeat disorder” (or alternatively, “expansion repeat disorder” or “repeat expansion disorder”) refers to a set of genetic disorders which are cause by “trinucleotide repeat expansion,” which is a kind of mutation where a certain trinucleotide repeats in certain genes or introns. Trinucleotide repeats were once thought to be commonplace iterations in the genome, but the 1990s clarified these disorders. These apparently ‘benign’ stretches of DNA can sometimes expand and cause disease. Several defining features are shared amongst disorders caused by trinucleotide repeat expansions. First, the mutant repeats show both somatic and germline instability and, more frequently, they expand rather than contract in successive transmissions. Secondly, an earlier age of onset and increasing severity of phenotype in subsequent generations (anticipation) generally are correlated with larger repeat length. Finally, the parental origin of the disease allele can often influence anticipation, with paternal transmissions carrying a greater risk of expansion for many of these disorders.

Triplet expansion is thought to be caused by slippage during DNA replication. Due to the repetitive nature of the DNA sequence in these regions ‘loop out’ structures may form during DNA replication while maintaining complementary base pairing between the parent strand and daughter strand being synthesized. If the loop out structure is formed from sequence on the daughter strand this will result in an increase in the number of repeats. However, if the loop out structure is formed on the parent strand a decrease in the number of repeats occurs. It appears that expansion of these repeats is more common than reduction. Generally the larger the expansion the more likely they are to cause disease or increase the severity of disease. This property results in the characteristic of anticipation seen in trinucleotide repeat disorders. Anticipation describes the tendency of age of onset to decrease and severity of symptoms to increase through successive generations of an affected family due to the expansion of these repeats.

Nucleotide repeat disorders may include those in which the triplet repeat occurs in a non-coding region (i.e., a non-coding trinucleotide repeat disorder) or in a coding region

The prime editor (PE) system described herein may use to treat nucleotide repeat disorders, which may include fragile X syndrome (FRAXA), fragile XE MR (FRAXE), Freidreich ataxia (FRDA), myotonic dystrophy (DM), spinocerebellar ataxia type 8 (SCA8), and spinocerebellar ataxia type 12 (SCA12), among others.

Prime Editing or “Prime Editing (PE)”

As used herein, the term “prime editing” or “prime editing (PE)” refers to a novel approach for gene editing using napDNAbps and specialized guide RNAs as described in the present application and which is exemplified in the embodiments of FIG. 1A-1J. TPRT refers to “target-primed reverse transcription” because the target DNA molecule is used, in one embodiment, to prime the synthesis of a strand of DNA by reverse transcriptase (or another polymerase). In various embodiments, prime editing operates by contacting a target DNA molecule (for which a change in the nucleotide sequence is desired to be introduced) with a nucleic acid programmable DNA binding protein (napDNAbp) complexed with an prime editor guide RNA. In reference to FIG. 1E, the prime editor guide RNA comprises an extension at the 3′ or 5′ end of the guide RNA, or at an intramolecular location in the guide RNA and encodes the desired nucleotide change (e.g., single nucleotide change, insertion, or deletion). In step (a), the napDNAbp/extended gRNA complex contacts the DNA molecule and the extended gRNA guides the napDNAbp to bind to a target locus. In step (b), a nick in one of the strands of DNA of the target locus is introduced (e.g., by a nuclease or chemical agent), thereby creating an available 3′ end in one of the strands of the target locus. In some embodiments, the nick is created in the strand of DNA that corresponds to the R-loop strand, i.e., the strand that is not hybridized to the guide RNA sequence, i.e., the “non-target strand.” The nick, however, could be introduced in either of the strands. That is, the nick could be introduced into the “target strand” (i.e., the strand that hybridized to the spacer of the extended gRNA) or the “non-target strand” (i.e, the strand forming the single-stranded portion of the R-loop and which is complementary to the target strand). In step (c), the 3′ end of the DNA strand (formed by the nick) interacts with the extended portion of the guide RNA in order to prime reverse transcription (i.e, “target-primed RT”). In some embodiments, the 3′ end DNA strand hybridizes to a specific primer binding site on the extended portion of the guide RNA, i.e, the “reverse transcriptase priming sequence.” In step (d), a reverse transcriptase is introduced which synthesizes a single strand of DNA from the 3′ end of the primed site towards the 3′ end of the prime editor guide RNA. This forms a single-strand DNA flap comprising the desired nucleotide change (e.g., the single base change, insertion, or deletion, or a combination thereof) and which is otherwise homologous to the endogenous DNA at or adjacent to the nick site. In step (e), the napDNAbp and guide RNA are released. Steps (f) and (g) relate to the resolution of the single strand DNA flap such that the desired nucleotide change becomes incorporated into the target locus. This process can be driven towards the desired product formation by removing the corresponding 5′ endogenous DNA flap that forms once the 3′ single strand DNA flap invades and hybridizes to the endogenous DNA sequence. Without being bound by theory, the cells endogenous DNA repair and replication processes resolves the mismatched DNA to incorporate the nucleotide change(s) to form the desired altered product. The process can also be driven towards product formation with “second strand nicking,” as exemplified in FIG. 1D. This process may introduce at least one or more of the following genetic changes: transversions, transitions, deletions, and insertions.

The term “prime editor (PE) system” or “prime editor” or “PE system” or “PE editing system” refers the compositions involved in the method of genome editing using target-primed reverse transcription (TPRT) describe herein, including, but not limited to the napDNAbps, reverse transcriptases, fusion proteins (e.g., comprising napDNAbps and reverse transcriptases), prime editor guide RNAs, and complexes comprising fusion proteins and prime editor guide RNAs, as well as accessory elements, such as second strand nicking components and 5′ endogenous DNA flap removal endonucleases for helping to drive the prime editing process towards the edited product formation.

Upstream

Variant

As used herein the term “variant” should be taken to mean the exhibition of qualities that have a pattern that deviates from what occurs in nature, e.g., a variant Cas9 is a Cas9 comprising one or more changes in amino acid residues as compared to a wild type Cas9 amino acid sequence. The term “variant” encompasses homologous proteins having at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 99% percent identity with a reference sequence and having the same or substantially the same functional activity or activities as the reference sequence. The term also encompasses mutants, truncations, or domains of a reference sequence, and which display the same or substantially the same functional activity or activities as the reference sequence.

Vector

The term “vector,” as used herein, refers to a nucleic acid that can be modified to encode a gene of interest and that is able to enter into a host cell, mutate and replicate within the host cell, and then transfer a replicated form of the vector into another host cell. Exemplary suitable vectors include viral vectors, such as retroviral vectors or bacteriophages and filamentous phage, and conjugative plasmids. Additional suitable vectors will be apparent to those of skill in the art based on the instant disclosure.

Wild Type

As used herein the term “wild type” is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms.

5′ Endogenous DNA Flap Removal

As used herein, the term “5′ endogenous DNA flap removal” or “5′ flap removal” refers to the removal of the 5′ endogenous DNA flap that forms when the RT-synthesized single-strand DNA flap competitively invades and hybridizes to the endogenous DNA, displacing the endogenous strand in the process. Removing this endogenous displaced strand can drive the reaction towards the formation of the desired product comprising the desired nucleotide change. The cell's own DNA repair enzymes may catalyze the removal or excision of the 5′ endogenous flap (e.g., a flap endonuclease, such as EXO1 or FEN1). Also, host cells may be transformed to express one or more enzymes that catalyze the removal of said 5′ endogenous flaps, thereby driving the process toward product formation (e.g., a flap endonuclease). Flap endonucleases are known in the art and can be found described in Patel et al., “Flap endonucleases pass 5′-flaps through a flexible arch using a disorder-thread-order mechanism to confer specificity for free 5′-ends,” Nucleic Acids Research, 2012, 40(10): 4507-4519 and Tsutakawa et al., “Human flap endonuclease structures, DNA double-base flipping, and a unified understanding of the FEN1 superfamily,” Cell, 2011, 145(2): 198-211 (each of which are incorporated herein by reference).

5′ Endogenous DNA Flap

As used herein, the term “5′ endogenous DNA flap” refers to the strand of DNA situated immediately downstream of the PE-induced nick site in the target DNA. The nicking of the target DNA strand by PE exposes a 3′ hydroxyl group on the upstream side of the nick site and a 5′ hydroxyl group on the downstream side of the nick site. The endogenous strand ending in the 3′ hydroxyl group is used to prime the DNA polymerase of the prime editor (e.g., wherein the DNA polymerase is a reverse transcriptase). The endogenous strand on the downstream side of the nick site and which begins with the exposed 5′ hydroxyl group is referred to as the “5′ endogenous DNA flap” and is ultimately removed and replaced by the newly synthesized replacement strand (i.e., “3′ replacement DNA flap”) the encoded by the extension of the PEgRNA.

3′ Replacement DNA Flap

As used herein, the term “3′ replacement DNA flap” or simply, “replacement DNA flap,” refers to the strand of DNA that is synthesized by the prime editor and which is encoded by the extension arm of the prime editor PEgRNA. More in particular, the 3′ replacement DNA flap is encoded by the polymerase template of the PEgRNA. The 3′ replacement DNA flap comprises the same sequence as the 5′ endogenous DNA flap except that it also contains the edited sequence (e.g., single nucleotide change). The 3′ replacement DNA flap anneals to the target DNA, displacing or replacing the 5′ endogenous DNA flap (which can be excised, for example, by a 5′ flap endonuclease, such as FEN1 or EXO1) and then is ligated to join the 3′ end of the 3′ replacement DNA flap to the exposed 5′ hydoxyl end of endogenous DNA (exposed after excision of the 5′ endogenous DNA flap, thereby reforming a phosophodiester bond and installing the 3′ replacement DNA flap to form a heteroduplex DNA containing one edited strand and one unedited strand. DNA repair processes resolve the heteroduplex by copying the information in the edited strand to the complementary strand permanently installs the edit in to the DNA. This resolution process can be driven further to completion by nicking the unedited strand, i.e., by way of “second-strand nicking,” as described herein.

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

The present invention disclosed new compositions (e.g., new PEgRNA and PE complexes comprising same) and methods for using prime editing (PE) to repair therapeutic targets, e.g., those targets identified in the ClinVar database, using PEgRNA designed using a specialized algorithm that is described herein. Thus, present application discloses an algorithm for predicting on a large-scale the sequences for PEgRNA that may be used to repair therapeutic targets (e.g., those included in the ClinVar database). In addition, the present application discloses predicted sequences for therapeutic PEgRNA designed using the disclosed algorithm and which may be used with prime editing to repair therapeutic targets.

The herein disclosed algorithm and the predicted PEgRNA sequences relate in general to prime editing. Thus, this disclosure also provides a description for the various components and aspects of prime editing, including suitable napDNAbp (e.g., Cas9 nickase) and reverse transcriptases, as well as other suitable components (e.g., linkers, NLS) and PE fusion proteins, that may be used with the therapeutic PEgRNA disclosed herein.

Adoption of the clustered regularly interspaced short palindromic repeat (CRISPR) system for genome editing has revolutionized the life sciences^1-3. Although gene disruption using CRISPR is now routine, the precise installation of single nucleotide edits remains a major challenge, despite being necessary for studying or correcting a large number of disease-causative mutations. Homology directed repair (HDR) is capable of achieving such edits, but suffers from low efficiency (often <5%), a requirement for donor DNA repair templates, and deleterious effects of double-stranded DNA break (DSB) formation. Recently, Prof. David Liu et al.'s laboratory developed base editing, which achieves efficient single nucleotide editing without DSBs. Base editors (BEs) combine the CRISPR system with base-modifying deaminase enzymes to convert target C•G or A•T base pairs to A•T or G•C, respectively^4-6. Although already widely used by researchers worldwide, current BEs enable only four of the twelve possible base pair conversions and are unable to correct small insertions or deletions. Moreover, the targeting scope of base editing is limited by the editing of non-target C or A bases adjacent to the target base (“bystander editing”) and by the requirement that a PAM sequence exist 15±2 bp from the target base. Overcoming these limitations would therefore greatly broaden the basic research and therapeutic applications of genome editing.

The present disclosure proposes a new precision editing approach that offers many of the benefits of base editing—namely, avoidance of double strand breaks and donor DNA repair templates—while overcoming its major limitations. The proposed approach described herein achieves the direct installation of edited DNA strands at target genomic sites using target-primed reverse transcription (TPRT). In the design discussed herein, CRISPR guide RNA (gRNA) will be engineered to carry a reverse transcriptase (RT) template sequence encoding a single-stranded DNA comprising a desired nucleotide change. The CRISPR nuclease (Cas9)-nicked target site DNA will serve as the primer for reverse transcription of the template sequence on the modified gRNA, allowing for direct incorporation of any desired nucleotide edit.

Accordingly, the present invention relates in part to the discovery that the mechanism of target-primed reverse transcription (TPRT) can be leveraged or adapted for conducting precision CRISPR/Cas-based genome editing with high efficiency and genetic flexibility (e.g., as depicted in various embodiments of FIGS. 1A-1G). The inventors have proposed herein to use napDNAbp-polymerase fusions (e.g., Cas9 nickase fused to a reverse transcriptase) to target a specific DNA sequence with a modified guide RNA (“an extended guide RNA” or PEgRNA), generate a single strand nick at the target site, and use the nicked DNA as a primer for synthesis of DNA by a polymerase (e.g., reverse transcriptase) based on a DNA synthesis template that is a component of the PEgRNA. The newly synthesized strand would be homologous to the genomic target sequence except for the inclusion of a desired nucleotide change (e.g., a single nucleotide change, a deletion, or an insertion, or a combination thereof). The newly synthesize strand of DNA may be referred to as a single strand DNA flap, which would compete for hybridization with the complementary homologous endogenous DNA strand, thereby displacing the corresponding endogenous strand. Resolution of this hybridized intermediate can include removal of the resulting displaced flap of endogenous DNA (e.g., with a 5′ end DNA flap endonuclease, FEN1), ligation of the synthesized single strand DNA flap to the target DNA, and assimilation of the desired nucleotide change as a result of cellular DNA repair and/or replication processes. Because templated DNA synthesis offers single nucleotide precision, the scope of this approach is very broad and could foreseeably be used for myriad applications in basic science and therapeutics.

I. Therapeutic PEgRNA s

The prime editor (PE) system described herein contemplates the use of any suitable prime editor guide RNA or PEgRNA. The inventors have discovered that the mechanism of target-primed reverse transcription (TPRT) can be leveraged or adapted for conducting precision and versatile CRISPR/Cas-based genome editing through the use of a specially configured guide RNA comprising a DNA synthesis template that codes for the desired nucleotide change by a polymerase (e.g., reverse transcriptase). The application refers to this specially configured guide RNA as a “prime editor guide RNA” (or PEgRNA) since the DNA synthesis template can be provided as an extension of a standard or traditional guide RNA molecule. The application contemplates any suitable configuration or arrangement for the prime editor guide RNA.

In various embodiments, the disclosure provides therapeutic PEgRNA of SEQ ID NOs: 1-135514 and 813085-880462 designed using the herein disclosed algorithm against ClinVar database entries.

FIG. 3A shows one embodiment of a prime editor guide RNA (referred to as either a “PEgRNA” or an “extended gRNA”) usable in the prime editor (PE) system disclosed herein whereby a traditional guide RNA (the green portion) includes a spacer and a gRNA core region, which binds with the napDNAbp. In this embodiment, the guide RNA includes an extended RNA segment at the 5′ end, i.e., a 5′ extension. In this embodiment, the 5′ extension includes a DNA synthesis template, a primer binding site, and an optional 5-20 nucleotide linker sequence. As shown in FIG. 1A, the Primer binding site hydrides to the free 3′ end that is formed after a nick is formed in the non-target strand of the R-loop, thereby priming the polymerase (e.g., reverse transcriptase) for DNA polymerization in the 5′ to 3′ direction.

FIG. 3B shows another embodiment of a prime editor guide RNA usable in the prime editor (PE) system disclosed herein whereby a traditional guide RNA (the green portion) includes a ˜20 nt spacer and a gRNA core, which binds with the napDNAbp. In this embodiment, the guide RNA includes an extended RNA segment at the 3′ end, i.e., a 3′ extension. In this embodiment, the 3′ extension includes a DNA synthesis template, and a primer binding site. As shown in FIG. 1B, the primer binding site hydrides to the free 3′ end that is formed after a nick is formed in the non-target strand of the R-loop, thereby priming the polymerase for DNA polymerization in the 5′ to 3′ direction.

FIG. 3C shows another embodiment of an extend guide RNA usable in the prime editor (PE) system disclosed herein whereby a traditional guide RNA (the green portion) includes a ˜20 nt spacer and a gRNA core, which binds with the napDNAbp. In this embodiment, the guide RNA includes an extended RNA segment at an intermolecular position within the gRNA core, i.e., an intramolecular extension. In this embodiment, the intramolecular extension includes a DNA synthesis template, and a primer binding site. The primer binding site hybridizes to the free 3′ end that is formed after a nick is formed in the non-target strand of the R-loop, thereby priming the polymerase for DNA polymerization in the 5′-3′ direction.

In one embodiment, the position of the intramolecular RNA extension is in the spacer of the guide RNA. In another embodiment, the position of the intramolecular RNA extension is in the gRNA core. In still another embodiment, the position of the intramolecular RNA extension is anywhere within the guide RNA molecule except within the spacer, or at a position which disrupts the spacer.

In one embodiment, the intramolecular RNA extension is inserted downstream from the 3′ end of the spacer. In another embodiment, the intramolecular RNA extension is inserted at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides downstream of the 3′ end of the spacer.

In other embodiments, the intramolecular RNA extension is inserted into the gRNA, which refers to the portion of the guide RNA corresponding or comprising the tracrRNA, which binds and/or interacts with the Cas9 protein or equivalent thereof (i.e, a different napDNAbp). Preferably the insertion of the intramolecular RNA extension does not disrupt or minimally disrupts the interaction between the tracrRNA portion and the napDNAbp.

The length of the RNA extension can be any useful length. In various embodiments, the RNA extension is at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.

The DNA synthesis template (e.g., RT template sequence) can also be any suitable length. For example, the DNA synthesis template (e.g., RT template sequence) can be at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.

In still other embodiments, wherein the reverse transcription primer binding site sequence is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, or at least 100 nucleotides in length.

The DNA synthesis template (e.g., RT template sequence), In some embodiments, encodes a single-stranded DNA molecule which is homologous to the non-target strand (and thus, complementary to the corresponding site of the target strand) but includes one or more nucleotide changes. The nucleotide change may include one or more single-base nucleotide changes, one or more deletions, one or more insertions, and combinations thereof.

As depicted in FIG. 1E, the synthesized single-stranded DNA product of the DNA synthesis template (e.g., RT template sequence) is homologous to the non-target strand and contains one or more nucleotide changes. The single-stranded DNA product of the DNA synthesis template (e.g., RT template sequence) hybridizes in equilibrium with the complementary target strand sequence, thereby displacing the homologous endogenous target strand sequence. The displaced endogenous strand may be referred to in some embodiments as a 5′ endogenous DNA flap species (e.g., see FIG. 1C). This 5′ endogenous DNA flap species can be removed by a 5′ flap endonuclease (e.g., FEN1) and the single-stranded DNA product, now hybridized to the endogenous target strand, may be ligated, thereby creating a mismatch between the endogenous sequence and the newly synthesized strand. The mismatch may be resolved by the cell's innate DNA repair and/or replication processes.

In various embodiments, the nucleotide sequence of the DNA synthesis template (e.g., RT template sequence) corresponds to the nucleotide sequence of the non-target strand which becomes displaced as the 5′ flap species and which overlaps with the site to be edited.

In various embodiments of the prime editor guide RNAs, the DNA synthesis template may encode a single-strand DNA flap that is complementary to an endogenous DNA sequence adjacent to a nick site, wherein the single-strand DNA flap comprises a desired nucleotide change. The single-stranded DNA flap may displace an endogenous single-strand DNA at the nick site. The displaced endogenous single-strand DNA at the nick site can have a 5′ end and form an endogenous flap, which can be excised by the cell. In various embodiments, excision of the 5′ end endogenous flap can help drive product formation since removing the 5′ end endogenous flap encourages hybridization of the single-strand 3′ DNA flap to the corresponding complementary DNA strand, and the incorporation or assimilation of the desired nucleotide change carried by the single-strand 3′ DNA flap into the target DNA.

In various embodiments of the prime editor guide RNAs, the cellular repair of the single-strand DNA flap results in installation of the desired nucleotide change, thereby forming a desired product.

In still other embodiments, the desired nucleotide change is installed in an editing window that is between about −5 to +5 of the nick site, or between about −10 to +10 of the nick site, or between about −20 to +20 of the nick site, or between about −30 to +30 of the nick site, or between about −40 to +40 of the nick site, or between about −50 to +50 of the nick site, or between about −60 to +60 of the nick site, or between about −70 to +70 of the nick site, or between about −80 to +80 of the nick site, or between about −90 to +90 of the nick site, or between about −100 to +100 of the nick site, or between about −200 to +200 of the nick site.

In various aspects, the prime editor guide RNAs are modified versions of a guide RNA. Guide RNAs maybe naturally occurring, expressed from an encoding nucleic acid, or synthesized chemically. Methods are well known in the art for obtaining or otherwise synthesizing guide RNAs and for determining the appropriate sequence of the guide RNA, including the spacer which interacts and hybridizes with the target strand of a genomic target site of interest.

In various embodiments, the particular design aspects of a guide RNA sequence will depend upon the nucleotide sequence of a genomic target site of interest (i.e., the desired site to be edited) and the type of napDNAbp (e.g., Cas9 protein) present in prime editor (PE) system described herein, among other factors, such as PAM sequence locations, percent G/C content in the target sequence, the degree of microhomology regions, secondary structures, etc.

In general, a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a napDNAbp (e.g., a Cas9, Cas9 homolog, or Cas9 variant) to the target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). In some embodiments, a guide sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length.

In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length. The ability of a guide sequence to direct sequence-specific binding of a base editor to a target sequence may be assessed by any suitable assay. For example, the components of a base editor, including the guide sequence to be tested, may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of a base editor disclosed herein, followed by an assessment of preferential cleavage within the target sequence, such as by Surveyor assay as described herein. Similarly, cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a base editor, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible, and will occur to those skilled in the art.

A guide sequence may be selected to target any target sequence. In some embodiments, the target sequence is a sequence within a genome of a cell. Exemplary target sequences include those that are unique in the target genome. For example, for the S. pyogenes Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGG (SEQ ID NO: 1361548) where NNNNNNNNNNNNXGG (SEQ ID NO: 1361549) (N is A, G, T, or C; and X can be anything) has a single occurrence in the genome. A unique target sequence in a genome may include an S. pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXGG (SEQ ID NO: 1361550) where NNNNNNNNNNNXGG (SEQ ID NO: 1361551) (N is A, G, T, or C; and X can be anything) has a single occurrence in the genome. For the S. thermophilus CRISPR1Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXXAGAAW (SEQ ID NO: 1361552) where NNNNNNNNNNNNXXAGAAW (SEQ ID NO: 1361553) (N is A, G, T, or C; X can be anything; and W is A or T) has a single occurrence in the genome. A unique target sequence in a genome may include an S. thermophilus CRISPR 1 Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXXAGAAW (SEQ ID NO: 1361554) where NNNNNNNNNNNXXAGAAW (SEQ ID NO: 1361555) (N is A, G, T, or C; X can be anything; and W is A or T) has a single occurrence in the genome. For the S. pyogenes Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGGXG (SEQ ID NO: 1361556) where NNNNNNNNNNNNXGGXG (SEQ ID NO: 1361557) (N is A, G, T, or C; and X can be anything) has a single occurrence in the genome. A unique target sequence in a genome may include an S. pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXGGXG (SEQ ID NO: 1361558) where NNNNNNNNNNNXGGXG (SEQ ID NO: 1361559) (N is A, G, T, or C; and X can be anything) has a single occurrence in the genome. In each of these sequences “M” may be A, G, T, or C, and need not be considered in identifying a sequence as unique.

In some embodiments, a guide sequence is selected to reduce the degree of secondary structure within the guide sequence. Secondary structure may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker and Stiegler (Nucleic Acids Res. 9 (1981), 133-148). Another example folding algorithm is the online webserver RNA fold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see e.g. A. R. Gruber et al., 2008, Cell 106(1): 23-24; and P A Carr and G M Church, 2009, Nature Biotechnology 27(12): 1151-62). Further algorithms may be found in U.S. application Ser. No. 61/836,080; Broad Reference BI-2013/004A); incorporated herein by reference.

In general, a tracr mate sequence includes any sequence that has sufficient complementarity with a tracr sequence to promote one or more of: (1) excision of a guide sequence flanked by tracr mate sequences in a cell containing the corresponding tracr sequence; and (2) formation of a complex at a target sequence, wherein the complex comprises the tracr mate sequence hybridized to the tracr sequence. In general, degree of complementarity is with reference to the optimal alignment of the tracr mate sequence and tracr sequence, along the length of the shorter of the two sequences. Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the tracr sequence or tracr mate sequence. In some embodiments, the degree of complementarity between the tracr sequence and tracr mate sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, the tracr sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length. In some embodiments, the tracr sequence and tracr mate sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a hairpin. Preferred loop forming sequences for use in hairpin structures are four nucleotides in length, and most preferably have the sequence GAAA. However, longer or shorter loop sequences may be used, as may alternative sequences. The sequences preferably include a nucleotide triplet (for example, AAA), and an additional nucleotide (for example C or G). Examples of loop forming sequences include CAAA and AAAG. In an embodiment of the invention, the transcript or transcribed polynucleotide sequence has at least two or more hairpins. In preferred embodiments, the transcript has two, three, four or five hairpins. In a further embodiment of the invention, the transcript has at most five hairpins. In some embodiments, the single transcript further includes a transcription termination sequence; preferably this is a polyT sequence, for example six T nucleotides. Further non-limiting examples of single polynucleotides comprising a guide sequence, a tracr mate sequence, and a tracr sequence are as follows (listed 5′ to 3′), where “N” represents a base of a guide sequence, the first block of lower case letters represent the tracr mate sequence, and the second block of lower case letters represent the tracr sequence, and the final poly-T sequence represents the transcription terminator: (1) NNNNNNNNgtttttgtactctcaagatttaGAAAtaaatcttgcagaagctacaaagataaggc ttcatgccgaaatcaacaccctgtcattttatggcagggtgttttcgttatttaaTTTTTT (SEQ ID NO: 1361560); (2) NNNNNNNNNNNNNNNNNNgtttttgtactctcaGAAAtgcagaagctacaaagataaggcttca tgccgaaatcaacaccctgtcattttatggcagggtgttttcgttatttaaTTTTTT (SEQ ID NO: 1361561); (3) NNNNNNNNNNNNNNNNNNNNgtttttgtactctcaGAAAtgcagaagctacaaagataaggctt catgccgaaatcaacaccctgtcattttatggcagggtgtTTTTT (SEQ ID NO: 1361562); (4) NNNNNNNNNNNNNNNNNNNNgttttagagctaGAAAtagcaagttaaaataaggctagtccgtt atcaacttgaaaaagtggcaccgagtcggtgcTTTTTT (SEQ ID NO: 1361563); (5) NNNNNNNNNNNNNNNNNNNNgttttagagctaGAAATAGcaagttaaaataaggctagtccgtt atcaacttgaaaaagtgTTTTTTT (SEQ ID NO: 1361564 and (6) NNNNNNNNNNNNNNNNNNNNgttttagagctagAAATAGcaagttaaaataaggctagtccgtt atcaTTTTTTTT (SEQ ID NO: 1361565). In some embodiments, sequences (1) to (3) are used in combination with Cas9 from S. thermophilus CRISPR1. In some embodiments, sequences (4) to (6) are used in combination with Cas9 from S. pyogenes. In some embodiments, the tracr sequence is a separate transcript from a transcript comprising the tracr mate sequence.

It will be apparent to those of skill in the art that in order to target any of the fusion proteins comprising a Cas9 domain and a single-stranded DNA binding protein, as disclosed herein, to a target site, e.g., a site comprising a point mutation to be edited, it is typically necessary to co-express the fusion protein together with a guide RNA, e.g., an sgRNA. As explained in more detail elsewhere herein, a guide RNA typically comprises a tracrRNA framework allowing for Cas9 binding, and a guide sequence, which confers sequence specificity to the Cas9:nucleic acid editing enzyme/domain fusion protein.

In some embodiments, the guide RNA comprises a structure 5′-[guide sequence]-guuuuagagcuagaaauagcaaguuaaaauaaaggcuaguccguuaucaacuugaaaaaguggc accgagucggugcuuuuu-3′ (SEQ ID NO: 1361566), wherein the guide sequence comprises a sequence that is complementary to the target sequence. The guide sequence is typically 20 nucleotides long. The sequences of suitable guide RNAs for targeting Cas9:nucleic acid editing enzyme/domain fusion proteins to specific genomic target sites will be apparent to those of skill in the art based on the instant disclosure. Such suitable guide RNA sequences typically comprise guide sequences that are complementary to a nucleic sequence within 50 nucleotides upstream or downstream of the target nucleotide to be edited. Some exemplary guide RNA sequences suitable for targeting any of the provided fusion proteins to specific target sequences are provided herein. Additional guide sequences are well known in the art and can be used with the base editors described herein.

In other embodiments, PEgRNA may include those depicted by the structure shown in FIG. 27, which comprises a guide RNA and a 3′ extension arm.

In still other embodiments, PEgRNA may include those depicted by the structure shown in FIG. 28, which comprises a guide RNA and a 5′ extension arm.

The PEgRNA may also include additional design improvements that may modify the properties and/or characteristics of PEgRNA thereby improving the efficacy of prime editing. In various embodiments, these improvements may belong to one or more of a number of different categories, including but not limited to: (1) designs to enable efficient expression of functional PEgRNA from non-polymerase III (pol III) promoters, which would enable the expression of longer PEgRNA without burdensome sequence requirements; (2) improvements to the core, Cas9-binding PEgRNA scaffold, which could improve efficacy; (3) modifications to the PEgRNA to improve RT processivity, enabling the insertion of longer sequences at targeted genomic loci; and (4) addition of RNA motifs to the 5′ or 3′ termini of the PEgRNA that improve PEgRNA stability, enhance RT processivity, prevent misfolding of the PEgRNA, or recruit additional factors important for genome editing.

In one embodiment, PEgRNA could be designed with polIII promoters to improve the expression of longer-length PEgRNA with larger extension arms. sgRNAs are typically expressed from the U6 snRNA promoter. This promoter recruits pol III to express the associated RNA and is useful for expression of short RNAs that are retained within the nucleus. However, pol III is not highly processive and is unable to express RNAs longer than a few hundred nucleotides in length at the levels required for efficient genome editing. Additionally, pol III can stall or terminate at stretches of U's, potentially limiting the sequence diversity that could be inserted using a PEgRNA. Other promoters that recruit polymerase II (such as pCMV) or polymerase I (such as the U1 snRNA promoter) have been examined for their ability to express longer sgRNAs. However, these promoters are typically partially transcribed, which would result in extra sequence 5′ of the spacer in the expressed PEgRNA, which has been shown to result in markedly reduced Cas9:sgRNA activity in a site-dependent manner. Additionally, while pol III-transcribed PEgRNA can simply terminate in a run of 6-7 U's, PEgRNA transcribed from pol II or pol I would require a different termination signal. Often such signals also result in polyadenylation, which would result in undesired transport of the PEgRNA from the nucleus. Similarly, RNAs expressed from pol II promoters such as pCMV are typically 5′-capped, also resulting in their nuclear export.

In addition, a series of methods have been designed for the cleavage of the portion of the pol II promoter that would be transcribed as part of the PEgRNA, adding either a self-cleaving ribozyme such as the hammerhead¹⁸⁸, pistol¹⁸⁹, hatchet¹⁸⁹, hairpin¹⁹⁰, VS¹⁹¹, twister¹⁹², or twister sister¹⁹²ribozymes, or other self-cleaving elements to process the transcribed guide, or a hairpin that is recognized by Csy4¹⁹³and also leads to processing of the guide. Also, it is hypothesized that incorporation of multiple ENE motifs could lead to improved PEgRNA expression and stability, as previously demonstrated for the KSHV PAN RNA and element¹⁸⁵. It is also anticipated that circularizing the PEgRNA in the form of a circular intronic RNA (ciRNA) could also lead to enhanced RNA expression and stability, as well as nuclear localization¹⁹⁴.

In various embodiments, the PEgRNA may include various above elements, as exemplified by the following sequence.

Non-limiting example 1 - PEgRNA expression platform consisting

of pCMV, Csy4 hairpin, the PEgRNA, and MALATI ENE

(SEQ ID NO: 1361567)

TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTT

ACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAA

TAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTA

TTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATT

GACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTC

CTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTA

CATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTC

AATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCC

CATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGT

GAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCT

AGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTC

CTCTGCCATCAAAGCGTGCTCAGTCTGTTTTAGGGTCATGAAGGTTTTTCTTTTCCTGAGAAAA

CAACACGTATTGTTTTCTCAGGTTTTGCTTTTTGGCCTTTTTCTAGCTTAAAAAAAAAAAAAGC

AAAAGATGCTGGTGGTTGGCACTCCTGGTTTCCAGGACGGGGTTCAAATCCCTGCGGCGTCTTT

GCTTTGACT

Non-limiting example 2 - PEgRNA expression platform consisting

of pCMV, Csy4 hairing, the PEgRNA, and PAN ENE

(SEQ ID NO: 1361568)

TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTT

ACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAA

TAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTA

TTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATT

GACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTC

CTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTA

CATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTC

AATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCC

CATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGT

GAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCT

AGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTC

CTCTGCCATCAAAGCGTGCTCAGTCTGTTTTGTTTTGGCTGGGTTTTTCCTTGTTCGCACCGGA

CACCTCCAGTGACCAGACGGCAAGGTTTTTATCCCAGTGTATATTGGAAAAACATGTTATACTT

TTGACAATTTAACGTGCCTAGAGCTCAAATTAAACTAATACCATAACGTAATGCAACTTACAAC

ATAAATAAAGGTCAATGTTTAATCCATAAAAAAAAAAAAAAAAAAA

Non-limiting example 3 - PEgRNA expression platform consisting

of pCMV, Csy4 hairing, the PERNA, and 3xPAN ENE

(SEQ ID NO: 1361569)

TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTT

ACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAA

TAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTA

TTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATT

GACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTC

CTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTA

CATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTC

AATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCC

CATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGT

GAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCT

AGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTC

CTCTGCCATCAAAGCGTGCTCAGTCTGTTTTGTTTTGGCTGGGTTTTTCCTTGTTCGCACCGGA

CACCTCCAGTGACCAGACGGCAAGGTTTTTATCCCAGTGTATATTGGAAAAACATGTTATACTT

TTGACAATTTAACGTGCCTAGAGCTCAAATTAAACTAATACCATAACGTAATGCAACTTACAAC

ATAAATAAAGGTCAATGTTTAATCCATAAAAAAAAAAAAAAAAAAAACACACTGTTTTGGCTGG

GTTTTTCCTTGTTCGCACCGGACACCTCCAGTGACCAGACGGCAAGGTTTTTATCCCAGTGTAT

ATTGGAAAAACATGTTATACTTTTGACAATTTAACGTGCCTAGAGCTCAAATTAAACTAATACC

ATAACGTAATGCAACTTACAACATAAATAAAGGTCAATGTTTAATCCATAAAAAAAAAAAAAAA

AAAATCTCTCTGTTTTGGCTGGGTTTTTCCTTGTTCGCACCGGACACCTCCAGTGACCAGACGG

CAAGGTTTTTATCCCAGTGTATATTGGAAAAACATGTTATACTTTTGACAATTTAACGTGCCTA

GAGCTCAAATTAAACTAATACCATAACGTAATGCAACTTACAACATAAATAAAGGTCAATGTTT

AATCCATAAAAAAAAAAAAAAAAAAA

Non-limiting example 4 - PERNA expression platform consisting

of pCMV, Csy4 hairing, the PEgRNA, and 3' box

(SEQ ID NO: 1361570)

TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTT

ACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAA

TAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTA

TTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATT

GACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTC

CTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTA

CATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTC

AATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCC

CATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGT

GAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCT

AGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTC

CTCTGCCATCAAAGCGTGCTCAGTCTGTTTGTTTCAAAAGTAGACTGTACGCTAAGGGTCATAT

CTTTTTTTGTTTGGTTTGTGTCTTGGTTGGCGTCTTAAA

Non-limiting example 5 - PEgRNA expression platform consisting

of pU1, Csy4 hairpin, the PEgRNA, and 3' box

(SEQ ID NO: 1361571)

CTAAGGACCAGCTTCTTTGGGAGAGAACAGACGCAGGGGGGGGAGGGAAAAAGGGAGAGGCAGA

CGTCACTTCCCCTTGGCGGCTCTGGCAGCAGATTGGTCGGTTGAGTGGCAGAAAGGCAGACGGG

GACTGGGCAAGGCACTGTCGGTGACATCACGGACAGGGCGACTTCTATGTAGATGAGGCAGCGC

AGAGGCTGCTGCTTCGCCACTTGCTGCTTCACCACGAAGGAGTTCCCGTGCCCTGGGAGCGGGT

TCAGGACCGCTGATCGGAAGTGAGAATCCCAGCTGTGTGTCAGGGCTGGAAAGGGCTCGGGAGT

GCGCGGGGCAAGTGACCGTGTGTGTAAAGAGTGAGGCGTATGAGGCTGTGTCGGGGCAGAGGCC

CAAGATCTCAGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGA

AATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTC

TGCCATCAAAGCGTGCTCAGTCTGTTTCAGCAAGTTCAGAGAAATCTGAACTTGCTGGATTTTT

GGAGCAGGGAGATGGAATAGGAGCTTGCTCCGTCCACTCCACGCATCGACCTGGTATTGCAGTA

CCTCCAGGAACGGTGCACCCACTTTCTGGAGTTTCAAAAGTAGACTGTACGCTAAGGGTCATAT

CTTTTTTTGTTTGGTTTGTGTCTTGGTTGGCGTCTTAAA

In various other embodiments, the PEgRNA may be improved by introducing improvements to the scaffold or core sequences. This can be done by introducing known

The core, Cas9-binding PEgRNA scaffold can likely be improved to enhance PE activity. Several such approaches have already been demonstrated. For instance, the first pairing element of the scaffold (P1) contains a GTTTT-AAAAC pairing element. Such runs of Ts have been shown to result in pol III pausing and premature termination of the RNA transcript. Rational mutation of one of the T-A pairs to a G-C pair in this portion of P1 has been shown to enhance sgRNA activity, suggesting this approach would also be feasible for PEgRNA¹⁹⁵. Additionally, increasing the length of P1 has also been shown to enhance sgRNA folding and lead to improved activity¹⁹⁵, suggesting it as another avenue for the improvement of PEgRNA activity. Example improvements to the core can include:

PEgRNA containing a 6 nt extension to P1

(SEQ ID NO: 1361572)

GGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGCTCATGAAAATGAGCTA

GCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGA

GTCGGTCCTCTGCCATCAAAGCGTGCTCAGTCTGTTTTTTT

PEgRNA containing a T-A to G-C mutation within P1

(SEQ ID NO: 1361573)

GGCCCAGACTGAGCACGTGAGTTTGAGAGCTAGAAATAGCAAGTTTAAAT

AAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTG

CCATCAAAGCGTGCTCAGTCTGTTTTTTT

In various other embodiments, the PEgRNA may be improved by introducing modifications to the edit template region. As the size of the insertion templated by the PEgRNA increases, it is more likely to be degraded by endonucleases, undergo spontaneous hydrolysis, or fold into secondary structures unable to be reverse-transcribed by the RT or that disrupt folding of the PEgRNA scaffold and subsequent Cas9-RT binding. Accordingly, it is likely that modification to the template of the PEgRNA might be necessary to affect large insertions, such as the insertion of whole genes. Some strategies to do so include the incorporation of modified nucleotides within a synthetic or semi-synthetic PEgRNA that render the RNA more resistant to degradation or hydrolysis or less likely to adopt inhibitory secondary structures¹⁹⁶. Such modifications could include 8-aza-7-deazaguanosine, which would reduce RNA secondary structure in G-rich sequences; locked-nucleic acids (LNA) that reduce degradation and enhance certain kinds of RNA secondary structure; 2′-O-methyl, 2′-fluoro, or 2′-O-methoxyethoxy modifications that enhance RNA stability. Such modifications could also be included elsewhere in the PEgRNA to enhance stability and activity. Alternatively or additionally, the template of the PEgRNA could be designed such that it both encodes for a desired protein product and is also more likely to adopt simple secondary structures that are able to be unfolded by the RT. Such simple structures would act as a thermodynamic sink, making it less likely that more complicated structures that would prevent reverse transcription would occur. Finally, one could also split the template into two, separate PEgRNA. In such a design, a PE would be used to initiate transcription and also recruit a separate template RNA to the targeted site via an RNA-binding protein fused to Cas9 or an RNA recognition element on the PEgRNA itself such as the MS2 aptamer. The RT could either directly bind to this separate template RNA, or initiate reverse transcription on the original PEgRNA before swapping to the second template. Such an approach could enable long insertions by both preventing misfolding of the PEgRNA upon addition of the long template and also by not requiring dissociation of Cas9 from the genome for long insertions to occur, which could possibly be inhibiting PE-based long insertions.

(iv) Installation of Additional RNA Motifs at the 5′ or 3′ Termini

In still other embodiments, the PEgRNA may be improved by introducing additional RNA motifs at the 5′ and 3′ termini of the PEgRNA. Several such motifs—such as the PAN ENE from KSHV and the ENE from MALAT1 were discussed above as possible means to terminate expression of longer PEgRNA from non-pol III promoters. These elements form RNA triple helices that engulf the polyA tail, resulting in their being retained within the nucleus^184,187. However, by forming complex structures at the 3′ terminus of the PEgRNA that occlude the terminal nucleotide, these structures would also likely help prevent exonuclease-mediated degradation of PEgRNA.

Other structural elements inserted at the 3′ terminus could also enhance RNA stability, albeit without enabling termination from non-pol III promoters. Such motifs could include hairpins or RNA quadruplexes that would occlude the 3′ terminus¹⁹⁷, or self-cleaving ribozymes such as HDV that would result in the formation of a 2′-3′-cyclic phosphate at the 3′ terminus and also potentially render the PEgRNA less likely to be degraded by exonucleases¹⁹⁸. Inducing the PEgRNA to cyclize via incomplete splicing—to form a ciRNA—could also increase PEgRNA stability and result in the PEgRNA being retained within the nucleus¹⁹⁴.

Addition of dimerization motifs—such as kissing loops or a GNRA tetraloop/tetraloop receptor pair²⁰⁰—at the 5′ and 3′ termini of the PEgRNA could also result in effective circularization of the PEgRNA, improving stability. Additionally, it is envisioned that addition of these motifs could enable the physical separation of the PEgRNA spacer and primer, prevention occlusion of the spacer which would hinder PE activity. Short 5′ extensions to the PEgRNA that form a small toehold hairpin in the spacer region could also compete favorably against the annealing region of the PEgRNA binding the spacer. Finally, kissing loops could also be used to recruit other template RNAs to the genomic site and enable swapping of RT activity from one RNA to the other. Example improvements include, but are not limited to:

PEgRNA -HDV fusion

(SEQ ID NO: 1361574)

GGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAAT

AAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTG

CCATCAAAGCGTGCTCAGTCTGGGCCGGCATGGTCCCAGCCTCCTCGCTG

GCGCCGGCTGGGCAACATGCTTCGGCATGGCGAATGGGACTTTTTTT

PEgRNA -MMLV kissing loop

(SEQ ID NO: 1361575)

GGTGGGAGACGTCCCACCGGCCCAGACTGAGCACGTGAGTTTTAGAGCTA

GAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGG

GACCGAGTCGGTCCTCTGCCATCAAAGCTTCGACCGTGCTCAGTCTGGTG

GGAGACGTCCCACCTTTTTTT

PEgRNA -VS ribozyme kissing loop

(SEQ ID NO: 1361576)

GAGCAGCATGGCGTCGCTGCTCACGGCCCAGACTGAGCACGTGAGTTTTA

GAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAA

AAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCTTCGACCGTGCTCAGT

CTCCATCAGTTGACACCCTGAGGTTTTTTT

PEgRNA -GNRA tetraloop/tetraloop receptor

(SEQ ID NO: 1361577)

GCAGACCTAAGTGGUGACATATGGTCTGGGCCCAGACTGAGCACGTGAGT

TTTAGAGCTAUACGTAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTT

UACGAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCTTCGACCGTGCT

CAGTCTGCATGCGATTAGAAATAATCGCATGTTTTTTT

PEgRNA template switching secondary RNA-HDV fusion

(SEQ ID NO: 1361578)

TCTGCCATCAAAGCTGCGACCGTGCTCAGTCTGGTGGGAGACGTCCCACC

GGCCGGCATGGTCCCAGCCTCCTCGCTGGCGCCGGCTGGGCAACATGCTT

CGGCATGGCGAATGGGACTTTTTTT

PEgRNA scaffold could be further improved via directed evolution, in an analogous fashion to how SpCas9 and base editors have been improved. Directed evolution could enhance PEgRNA recognition by Cas9 or evolved Cas9 variants. Additionally, it is likely that different PEgRNA scaffold sequences would be optimal at different genomic loci, either enhancing PE activity at the site in question, reducing off-target activities, or both. Finally, evolution of PEgRNA scaffolds to which other RNA motifs have been added would almost certainly improve the activity of the fused PEgRNA relative to the unevolved, fusion RNA. For instance, evolution of allosteric ribozymes composed of c-di-GMP-I aptamers and hammerhead ribozymes led to dramatically improved activity²⁰², suggesting that evolution would improve the activity of hammerhead-PEgRNA fusions as well. In addition, while Cas9 currently does not generally tolerate 5′ extension of the sgRNA, directed evolution will likely generate enabling mutations that mitigate this intolerance, allowing additional RNA motifs to be utilized.

The present disclosure contemplates any such ways to further improve the efficacy of the prime editing systems disclosed here.

II. Algorithm and Method to Design Therapeutic PEgRNA

As described herein, the inventors discovered and appreciated that prime editing using PEgRNA can be used to install a wide variety of nucleotide changes, including insertions (of any length, including whole genes or protein coding regions), deletions (of any length), and the correct pathogenic mutations. However, techniques do not yet exist to determine and/or predict PEgRNA structures, including specifying the various components of the PEgRNA, such as the spacer, gRNA core, and extension arm (and components of the extension as described herein). The inventors have developed computerized techniques for determining PEgRNA, including determining extended gRNA structures. Each extended gRNA structure can be determined based on an input allele (e.g., representing a pathogenic mutation), an output allele (e.g., representing a corrected wild-type sequence), and a fusion protein (e.g., a CRISPR system for prime editing, including a PAM motif and the relative position of the prime editors nick). The difference between the input allele and the output allele represents the desired edit (e.g., a single nucleotide change, insertion, deletion, and/or the like). The determined structures can be created and used to perform base editing to change the input allele to the output allele, as described further herein.

FIG. 31 is a flow chart showing an exemplary high level computerized method 3100 for determining an extended gRNA structure, according to some embodiments. At step 3102, a computing device (e.g., the computing device 3400 described in conjunction with FIG. 34) accesses data indicative of an input allele, an output allele, and a fusion protein that includes a nucleic acid programmable DNA binding protein and a reverse transcriptase. While step 3102 describes accessing all three of the input allele, output allele, and fusion protein in one step, this is for illustrative purposes and it should be appreciated that such data can be accessed using one or more steps without departing from the spirit of the techniques described herein. Accessing data can include receiving data, storing data, accessing a database, and/or the like.

At step 3104, the computing device determines the extended gRNA structure based on the input allele, the output allele, and the fusion protein accessed in step 3102. The extended gRNA structure is designed to be associated with the fusion protein to change the input allele to the output allele. The fusion protein, when it is complexed with the extended gRNA, is capable of binding to a target DNA sequence that includes a target strand at which the change occurs and a complementary non-target strand. As described herein, the input allele can represent a pathogenic DNA mutation, and the output allele can represent a corrected DNA sequence.

Changing the input allele to the output allele can include a single nucleotide change, an insertion of one or more nucleotides, a deletion of one or more nucleotides, and/or any other change designed to achieve the output allele. In particular, exemplary classes of edits that can be induced by a single PEgRNA include single nucleotide substitutions, insertions from 1 nt up to approximately 40 nt, deletions from 1 nt up to approximately 30 nt, and a combination thereof. For example, prime editing can support changes of these types from spacer position −3 (e.g., immediately 3′ of the nick) to spacer position +27 (e.g., 30 nt 3′ of the nick in the input allele). Other positions can also be used. For example, edits at spacer position −4 can be performed using the SpCas9 system with prime editing (e.g., which can be caused by occasional RuvC cleavage between spacer positions −5 and −4). The type of change, the number of nt changes, and/or the position of the change can be configurable parameter(s) that the computerized techniques can use to determine extended gRNA structures.

As discussed in conjunction with FIGS. 3A-3B and FIGS. 27-28, an extended gRNA can include various components such as a spacer for the extended gRNA that is complementary to a target nucleotide sequence in the input allele, a gRNA backbone for interacting with the fusion protein, and an extension. Referring further to step 3104, the computing device determines one or more of the spacer, the gRNA backbone, and the extension. In some embodiments, while the techniques can include determining any combination of the spacer, gRNA backbone, and/or extension, in some embodiments one or more of such components and/or aspects of such components are known (e.g., predetermined, pre-specified, fixed, etc.), and therefore may not be determined as part of step 3104.

As described herein, the gRNA extension can include various components. For example, as shown in FIGS. 3A-3B and 27-28, the extension can include one or more of an RT template (which includes an RT edit template and a homology arm), an Primer binding site, an RT termination signal, an optional 5′ end modifier region, and an optional 3′ end modifier region. FIG. 32 is a flow chart showing an exemplary computerized method 3200 for determining the components of an extended gRNA structure, including the components of the extension, according to some embodiments. It should be appreciated that FIG. 32 is intended to be illustrative, and therefore techniques used to determine the extended gRNA can include more, or fewer, steps than those shown in FIG. 32.

At step 3202, the computing device determines the set of protospacers that are compatible with the PAM motif of the selected CRISPR system in the input allele on both strands. In some embodiments, the computing device determines an initial set of protospacers and filters out protospacers whose associated nick positions are incompatible with prime editing to the output allele to generate a set of remaining candidate protospacers. For example, the computing device may determine that a protospacer is incompatible because the nick is on the 3′ side of the desired edit on the strand. As another example, the computing device may determine that the distance between the nick and the desired edit is too large (e.g., greater than a user-defined threshold, for example 30 nt, 35 nt, etc.).

At step 3204, the computing device selects a protospacer from the set of determined protospacers. At step 3206, the computing device determines a spacer and an edit template sequence using the protospacer sequence of the input allele, the position of the nick, and the sequence of the desired edit. The spacer can include a nucleotide sequence of approximately 20 nucleotides.

At step 3208, the computing device selects one or more sets of parameters, where each set parameters includes a value for the primer binding site length (e.g., which can vary in the number of nt, such as from approximately 8 nt to 17 nt), the homology arm length (e.g., which can vary in the number of nt, such as from approximately from 2 nt to 33 nt), and the gRNA backbone sequence. For example, the gRNA backbone sequence can be GTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGA AAAAGTGGCACCGAGTCGGTGC (SEQ ID NO: 1361579), GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCA CCGAGTCGGTGC (SEQ ID NO: 1361580), and/or other gRNA backbone sequences, such as gRNA backbone sequences that retain wild-type RNA secondary structure.

At step 3210, the computing device selects a set of parameters determined in step 3208. At step 3212, the computing device determines a homology arm, a primer binding site sequence, and a gRNA backbone using the selected set of parameters. At step 3214, the computing device then forms a resulting PEgRNA sequence by concatenating the spacer, the gRNA backbone, the PEgRNA extension arm (which includes the homology arm and the edit template). In addition, the extension arm may include a terminator signal which is a sequence which triggers the termination of reverse transcription. Such terminator sequences may include, for example, TTTTTTGTTTT (SEQ ID NO: 1361581). In some embodiments, the PEgRNA extension arm may be considered to comprise the termination signal. In other embodiments, the PEgRNA extension arm may be considered to exclude the termination signal, but instead where the extension arm is attached to the termination signal as an element lying outside of the extension arm.

The method 3200 proceeds to step 3216, and the computing device determines whether there are more sets of parameters. If yes, the method proceeds to step 3210 and the computing device selects another set of parameters. If no, the method proceeds to step 3218 and the computing device determines whether there are more protospacers. If yes, the method proceeds back to step 3204 and the computing device selects another protospacer from the set of protospacers. If no, the method proceeds to step 3220 and ends.

As described herein, the DNA synthesis template (e.g., RT template sequence) of the extension includes a desired nucleotide change to change the input allele to the output allele, and includes the RT edit template (e.g., determined in step 3206) and the homology arm (e.g., determined at step 3212). As also described herein, the DNA synthesis template (e.g., RT template sequence) encodes a single-strand DNA flap that is complementary to an endogenous DNA sequence adjacent to the nick site. The single-strand DNA flap comprises the desired nucleotide change (e.g., a single nucleotide change, one or more nucleotide insertions, one or more nucleotide deletions, and/or the like). In some base editing deployments, the single-strand DNA flap can hybridize to the endogenous DNA sequence that is adjacent to the nick site to install the desired nucleotide change. In some base editing deployments, the single-stranded DNA flap displaces the endogenous DNA sequence that is adjacent to the nick site. Cellular repair of the single-strand DNA flap can result in installation of the desired nucleotide change to form the desired output allele product. The DNA synthesis template (e.g., RT template sequence) can have a variable number of nucleotides, and can range from approximately 7 nucleotides to 34 nucleotides.

While not shown in FIG. 32, the computing device can be configured to determine other components of the extended gRNA. For example, in some embodiments the computing device is configured to determine an RT termination signal adjacent to the RT template. In some embodiments, the computing device can be configured to determine a first modifier adjacent to the RT termination signal. In some embodiments, the computing device is configured to determine a second modifier adjacent to the Primer binding site.

The extended gRNA components can be arranged in different configurations, such as those shown in FIGS. 3A-3B and FIGS. 27-28. For example, referring to FIG. 3A, the extension is at the 5′ end of the extended gRNA structure, the spacer is 3′ to the extension and is 5′ to the gRNA core. As another example, referring to FIG. 3B, the spacer is at a 5′ end of the extended gRNA structure (and is 5′ to the gRNA core), and the extension is at a 3′ end of the extended gRNA structure (and is 3′ to the gRNA core).

In some embodiments, the computing device accesses a database that includes a set of input alleles and associated output alleles. For example, the computing device can access a database provided by ClinVar that includes hundreds of thousands of mutations, each of which includes an allele representing a pathogenic mutation and an allele representing the corrected wild-type sequence. The techniques can be used to determine one or more extended gRNA structures for each database entry. FIG. 33 is a flow chart showing an exemplary computerized method 3300 for determining sets of extended gRNA structures for each mutation entry in a database, according to some embodiments. At step 3302, the computing device accesses a database (e.g., a ClinVar database) that includes a set of mutation entries that each include an input allele representing the mutation and an output allele representing the corrected wild-type sequence.

At step 3304, the computing device accesses a set of one or more fusion proteins. In some embodiments, the techniques can include generating sets of extended gRNA structures for a single fusion protein and/or for a combination of different fusion proteins (e.g., for different Cas9 proteins). The computing device can be configured to access data indicative of the plurality of fusion proteins, and can create a set of extended gRNA structures for each fusion protein (e.g., a Cas9-NG protein and an SpCas9 protein) as described herein.

At step 3306, the computing device selects a fusion protein from the set of fusion proteins. At step 3308, the computing device selects a mutation entry from the set of entries in the database. The computing device can be configured, for example, to iterate through each entry in the database and create a set of extended gRNA structures for the entry (e.g., one set for a particular fusion protein, and/or multiple sets for each of a plurality of fusion proteins). In some embodiments, the computing device can be configured to generate extended gRNA structures for a subset of entries in the database, such as a pre-configured set, a set of mutations with a highest significance (e.g., those with known therapeutic benefits), and/or the like. In some embodiments, if the database includes entries that are not compatible with some fusion proteins for prime editing, the computing device can be configured to determine which entries in the database are compatible for prime editing using the selected fusion protein from step 3304, and to select entries that are compatible with the selected fusion protein in step 3308.

At step 3310, the computing device determines a set of one or more extended gRNA structures using the techniques described herein. The method proceeds from step 3310 to step 3312, and the computing device determines whether there are additional entries in the database. If yes, the computing device proceeds back to step 3308 and selects another entry. If no, the computing device proceeds to step 3314 and determines whether there are more fusion proteins. If yes, the computing device proceeds back to step 3306 and selects another fusion protein. If no, the computing device proceeds to step 3316 and ends the method 3300.

In some embodiments, the techniques can design PEgRNA with gRNA extensions that contain non-complementary sequences, such as non-complementary sequences that are 5′ of the homology arm, 3′ of the primer binding site, or both. For example, non-complementary sequences can be designed to form a kissing loop interaction, to act as a protecting hairpin for RNA stability, and/or the like.

In some embodiments, PEgRNA may be designed using strategies that prioritize among multiple design candidates. For example, the techniques can be designed to avoid PEgRNA extensions where the 5′-most nucleotide is a cytosine (e.g., due to interrupting native nucleotide-protein interactions in the sgRNA:Cas9 complex). As another example, the techniques can use RNA secondary structure prediction tools to select a preferred PBS length, flap length, and/or the like based on other parameters of the extended gRNA, such as a protospacer, a desired edit, and/or the like.

An exemplary implementation of the computerized techniques described herein for determining extended gRNA structures is as follows:

# Python 3

# b_design_PEgRNA .py

from _future_ import division

import _config

import sys, os, fnmatch, datetime, subprocess

sys.path.append(‘/home/unix/maxwshen/’)

import numpy as np

from collections import defaultdict

from mylib import util, compbio

import pandas as pd

# Default params

inp_dir = _config.OUT_PLACE + ‘a_annotate/’

NAME = util.get_fn(_file_)

out_dir = _config.OUT_PLACE + NAME + ‘/’

util.ensure_dir_exists(out_dir)

SPLIT = None

# Hyperparameters

grna_nick_pos = 17

grna_len = 20

max_dist_nick_to_edit = 20

primer_binding_len = 13

homology_arm_len = 13

grna_hairpin =

(SEQ ID NO: 1361579)

‘GTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGT

CCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC’

terminator =

(SEQ ID NO: 1361581)

‘TTTTTTGTTTT’

castypes = {

‘SpCas9 (NGG)’: ‘NGG’,

‘SpCas9-NG (NG)’: ‘NG’,

}

assert grna_nick_pos < grna_len

assert primer_binding_len < grna_nick_pos

##

# Find gRNAs

##

iupac_nt = {

‘A’: list(‘A’),

‘C’: list(‘C’),

‘G’: list(‘G’),

‘T’: list(‘T’),

‘Y’: list(‘CT’),

‘R’: list(‘AG’),

‘W’: list(‘AT’),

‘S’: list(‘GC’),

‘K’: list(‘TG’),

‘M‘: list(‘AC’),

‘D’: list(‘AGT’),

‘V’: list(‘ACG’),

‘H’: list(‘ACT’),

‘B’: list(‘CGT’),

‘N’: list(‘ACGT’),

}

def match(template, dna):

if len(dna) != len(template):

return False

for char, t in zip(dna, template):

if char not in iupac_nt[t]:

return False

return True

def pam_match(seq, grna_pos1, pam):

flag, stats = None, dict( )

cand_pam = seq[grna_pos1 + grna_len : grna_pos1 + grna_len +

len(pam)]

if match(pam, cand_pam):

flag = True

stats[‘Designed gRNA (NGG orientation)’] = seq[grna_pos1 :

grna_pos1 + grna_len]

stats[‘PAM’] = cand_pam

stats[‘gRNA pos1 within sequence’] = grna_pos1

return flag, stats

def find_grnas(seq, alt_start, alt_len, ref_allele, path_idx,

orient):

min_grna_pos1 = alt_start − grna_nick_pos −

max_dist_nick_to_edit

max_grna_pos1 = alt_start − grna_nick_pos

‘‘‘

gRNA nick site must be on the 5′ side of the edit. Limit up

to 10 nt away and consider gRNAs on both strands

’’’

all_grnas = defaultdict(list)

for grna_pos1 in range(min_grna_pos1, max_grna_pos1 + 1):

for castype in castypes:

pam = castypes[castype]

flag, grna_details = pam_match(seq, grna_pos1, pam)

if flag:

for key in grna_details:

all_grnas[key].append(grna_details[key])

all_grnas[‘Cas type’].append(castype)

grna = grna_details[‘Designed gRNA (NGG orientation)’]

grna_pos1 = grna_details[‘gRNA pos1 within sequence’]

primer_binding = grna[grna_nick_pos −

primer_binding_len : grna_nick_pos]

edit_template = seq[grna_pos1 + grna_nick_pos :

path_idx] + ref_allele

homology_arm = seq[path_idx + alt_len : path_idx +

alt_len + homology_arm_len]

grna_extension = primer_binding + edit_template +

homology_arm

all_grnas[‘Designed primer

binder’].append(primer_binding)

all_grnas[‘Designed_edit

template’].append(edit_template)

all_grnas[‘Designed homology arm’].append(homology_arm)

all_grnas[‘Designed gRNA

extension’].append(grna_extension)

all_grnas[‘Designed orientation’].append(orient)

all_grnas[‘Designed gRNA full (NGG

orientation)’].append(grna + grna_hairpin +

compbio.reverse_complement(grna_extension) + terminator)

return all_grnas

##

#

##

def process_row(row):

‘‘‘

Find gRNAs in sequence at a single row.

’’’

dis_seq = row[‘Sequence − alternate’]

ref_seq = row[‘ Sequence − reference’]

path_start = row[‘buffer_length_bp’]

# [, ) interval

alt_len = len(row[‘AlternateAllele’])

path_end = path_start + alt_len

fwd_grnas = find_grnas(

dis_seq,

path_start,

alt_len,

row[‘ReferenceAllele’],

path_start,

‘+’

)

rev_grnas = find_grnas(

compbio.reverse_complement(dis_seq),

len(dis_seq) − path_end,

alt_len,

compbio.reverse_complement(row[‘ReferenceAllele’]),

path_start,

‘−’

)

fwd_df = pd.DataFrame(fwd_grnas)

rev_df = pd.DataFrame(rev_grnas)

df = fwd_df.append(rev_df, ignore_index = True)

for col in row.index:

df[col] = row[col]

return df

def process_df( ):

df = pd.read_csv(inp_dir + f‘clinvar_{SPLIT}.csv’, index_col =

0)

mdf = pd.DataFrame( )

timer = util.Timer(total = len(df))

for idx, row in df.iterrows( ):

d = process_row(row)

mdf = mdf.append(d, ignore_index = True)

timer.update( )

mdf.to_csv(out_dir + f‘clinvar_{SPLIT}.csv’)

return

##

# qsub

##

def gen_qsubs( ):

# Generate qsub shell scripts and commands for easy

parallelization

print(‘Generating qsub scripts...’)

qsubs_dir = _config.QSUBS_DIR + NAME + ‘/’

util.ensure_dir_exists(qsubs_dir)

qsub_commands = [ ]

num_scripts = 0

for idx in range(0, 60):

command = ‘python %s.py %s’ % (NAME, idx)

script_id = NAME.split(‘_’) [0]

# Write shell scripts

sh_fn = qsubs_dir + ‘q_%s_%s.sh’ % (script_id, idx)

with open(sh_fn, ‘w’) as f:

f.write(‘#!/bin/bash\n%s\n’ % (command))

num_scripts += 1

# Write qsub commands

qsub_commands.append(‘qsub −V −1

h_rt=12:00:00, h_vmem=1G, os=RedHat7 −wd %s %s &’ %

(_config.SRC_DIR, sh_fn))

# Save commands

commands_fn = qsubs_dir + ‘_commands.sh’

with open(commands_fn, ‘w’) as f:

f.write(‘\n’.join(qsub_commands))

subprocess.check_output(‘chmod +x %s’ % (commands_fn), shell =

True)

print(‘Wrote %s shell scripts to %s’ %(num_scripts,

qsubs_dir) )

return

##

# Main

##

@util.time_dec

def main(argv):

print(NAME)

# Function calls

global SPLIT

SPLIT = int(argv[0])

process_df( )

return

if _name_ == ‘_main_’:

if len(sys.argv) > 1:

main(sys.argv[1:])

else:

gen_qsubs( )

The exemplary sequence listings submitted herewith were generated using the techniques described herein using the ClinVar database for the input alleles and corresponding output alleles. The entries in the ClinVar database were first filtered to germline mutations annotated as pathogenic or likely pathogenic. For these examples, Cas9-NG and SpCas9 were used to identify compatible mutations. Of the filtered mutations, approximately 72,020 unique ClinVar mutations were identified as compatible with prime editing with Cas9-NG, and approximately 63,496 unique ClinVar mutations were identified as compatible with prime editing with SpCas9 with an NGG PAM. It should be appreciated that other and/or additional mutations could be correctable if using a prime editor containing a different Cas9 variant with different PAM compatibility.

In various embodiments, the algorithm was used to design therapeutic PEgRNA of SEQ ID NOs: 1-135514 and 813085-880462 designed using the herein disclosed algorithm against ClinVar database entries.

In various other embodiments, the algorithm was used to design PEgRNA against the ClinVar database using the herein disclosed algorithm are included in the Sequence Listing, which forms a part of this specification. The Sequence Listing includes complete PEgRNA sequences of SEQ ID NOs: 1-135514 and 813085-880462. Each of these complete PEgRNA are each comprised of a spacer (SEQ ID NOs: 135515-271028 and 880463-947840) and an extension arm (SEQ ID NOs: 271029-406542 and 947841-1015218). In addition, each PEgRNA comprises a gRNA core, for example, as defined by SEQ ID NOs: 1361579-1361580. The extension arms of SEQ ID NOs: 271029-406542 and 947841-1015218 are further each comprised of a primer binding site (SEQ ID NOs.: 406543-542056 and 1015219-1082596), an edit template (SEQ ID NOs.: 542057-677570 and 1082597-1149974), and a homology arm (SEQ ID NOs.: 677571-813084 and 1149975-1217352). The PEgRNA optionally may comprise a 5′ end modifier region and/or a 3′ end modifier region. The PEgRNA may also comprise a reverse transcription termination signal (e.g., SEQ ID NOs: 1361560-1361566) at the 3′ of the PEgRNA. The application embraces the design and use of all of these sequences.

The mutations were classified into four classes of clinical significance using minor allele frequency, number of submitters, whether or not submitters conflicted in their interpretations, and whether or not the mutation was reviewed by an expert panel. Among the 63,496 SpCas9-compatible mutations: 4,627 mutations were identified at the most significant level (four); 13,943 mutations were identified at significance levels three or four; and 44,385 mutations were identified at significance levels two, three, or four.

The provided sequence listings enumerate a single PEgRNA per unique mutation, selected as the PEgRNA with the shortest distance between the nick and the edit. The PEgRNA were designed with homology arm length of 13 nt, a primer binding site length of 13 nt, a gRNA nick position at 17 nt, and a gRNA length of 20 nt. Protospacers with nick sites farther than 20 nt to the edit were disregarded. The gRNA backbone sequence used was GTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGA AAAAGTGGCACCGAGTCGGTGC (SEQ ID NO: 1361579). The terminator sequence used was TTTTTTGTTTT (SEQ ID NO: 1361581).

As described herein, the provided exemplary sequence listings are not intended to be limiting. It should be appreciated that variations on the provided PEgRNA designs can include variations described herein, including varying the gRNA backbone sequence, primer binding site length, flap length, and/or the like.

An illustrative implementation of a computer system 3400 that may be used to perform any of the aspects of the techniques and embodiments disclosed herein is shown in FIG. 34. The computer system 3400 may include one or more processors 3410 and one or more non-transitory computer-readable storage media (e.g., memory 3420 and one or more non-volatile storage media 3430) and a display 3440. The processor 3410 may control writing data to and reading data from the memory 3420 and the non-volatile storage device 3430 in any suitable manner, as the aspects of the invention described herein are not limited in this respect. To perform functionality and/or techniques described herein, the processor 3410 may execute one or more instructions stored in one or more computer-readable storage media (e.g., the memory 3420, storage media, etc.), which may serve as non-transitory computer-readable storage media storing instructions for execution by the processor 3410.

In connection with techniques described herein, code used to, for example, to determine extended gRNA structures may be stored on one or more computer-readable storage media of computer system 3400. Processor 3410 may execute any such code to provide any techniques for planning an exercise as described herein. Any other software, programs or instructions described herein may also be stored and executed by computer system 3400. It will be appreciated that computer code may be applied to any aspects of methods and techniques described herein. For example, computer code may be applied to interact with an operating system to determine extended gRNA structures through conventional operating system processes.

The various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of numerous suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a virtual machine or a suitable framework.

In this respect, various inventive concepts may be embodied as at least one non-transitory computer readable storage medium (e.g., a computer memory, one or more floppy discs, compact discs, optical discs, magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, etc.) encoded with one or more programs that, when executed on one or more computers or other processors, implement the various embodiments of the present invention. The non-transitory computer-readable medium or media may be transportable, such that the program or programs stored thereon may be loaded onto any computer resource to implement various aspects of the present invention as discussed above.

The terms “program,” “software,” and/or “application” are used herein in a generic sense to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of embodiments as discussed above. Additionally, it should be appreciated that according to one aspect, one or more computer programs that when executed perform methods of the present invention need not reside on a single computer or processor, but may be distributed in a modular fashion among different computers or processors to implement various aspects of the present invention.

Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.

Also, data structures may be stored in non-transitory computer-readable storage media in any suitable form. Data structures may have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a non-transitory computer-readable medium that convey relationship between the fields. However, any suitable mechanism may be used to establish relationships among information in fields of a data structure, including through the use of pointers, tags or other mechanisms that establish relationships among data elements.

Various inventive concepts may be embodied as one or more methods, of which examples have been provided. The acts performed as part of a method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.

III. Prime Editors for Use with Therapeutic PEgRNA

The therapeutic PEgRNA designed in accordance with the herein disclosed algorithm can be used to conduct prime editing when in complex with a prime editor. Prime editors comprise a napDNAbp fused with a polymerase (e.g., a reverse transcriptase) (or one which is provided in trans), optionally where the two domains are joined by linkers and further may comprise one or more NLS. These aspects are further described, as follows.

A. napDNAbp

The prime editors described herein may comprise a nucleic acid programmable DNA binding protein (napDNAbp).

In one aspect, a napDNAbp can be associated with or complexed with at least one guide nucleic acid (e.g., guide RNA or a PEgRNA), which localizes the napDNAbp to a DNA sequence that comprises a DNA strand (i.e., a target strand) that is complementary to the guide nucleic acid, or a portion thereof (e.g., the spacer of a guide RNA which anneals to the protospacer of the DNA target). In other words, the guide nucleic-acid “programs” the napDNAbp (e.g., Cas9 or equivalent) to localize and bind to complementary sequence of the protospacer in the DNA.

Any suitable napDNAbp may be used in the prime editors described herein. In various embodiments, the napDNAbp may be any Class 2 CRISPR-Cas system, including any type II, type V, or type VI CRISPR-Cas enzyme. Given the rapid development of CRISPR-Cas as a tool for genome editing, there have been constant developments in the nomenclature used to describe and/or identify CRISPR-Cas enzymes, such as Cas9 and Cas9 orthologs. This application references CRISPR-Cas enzymes with nomenclature that may be old and/or new. The skilled person will be able to identify the specific CRISPR-Cas enzyme being referenced in this Application based on the nomenclature that is used, whether it is old (i.e., “legacy”) or new nomenclature. CRISPR-Cas nomenclature is extensively discussed in Makarova et al., “Classification and Nomenclature of CRISPR-Cas Systems: Where from Here?,” The CRISPR Journal, Vol. 1. No. 5, 2018, the entire contents of which are incorporated herein by reference. The particular CRISPR-Cas nomenclature used in any given instance in this application is not limiting in any way and the skilled person will be able to identify which CRISPR-Cas enzyme is being referenced.

For example, the following type II, type V, and type VI Class 2 CRISPR-Cas enzymes have the following art-recognized old (i.e., legacy) and new names. Each of these enzymes, and/or variants thereof, may be used with the prime editors described herein:

Legacy nomenclature
Current nomenclature*

type II CRISPR-Cas enzymes

Cas9
same

type V CRISPR-Cas enzymes

Cpf1
Cas12a

CasX
Cas12e

C2c1
Cas12b1

Cas12b2
same

C2c3
Cas12c

CasY
Cas12d

C2c4
same

C2c8
same

C2c5
same

C2c10
same

C2c9
same

type VI CRISPR-Cas enzymes

C2c2
Cas13a

Cas13d
same

C2c7
Cas13c

C2c6
Cas13b

*See Makarova et al., The CRISPR Journal, Vol. 1, No. 5, 2018.

Without being bound by theory, the mechanism of action of certain napDNAbp contemplated herein includes the step of forming an R-loop whereby the napDNAbp induces the unwinding of a double-strand DNA target, thereby separating the strands in the region bound by the napDNAbp. The guide RNA spacer then hybridizes to the “target strand” at the protospacer sequence. This displaces a “non-target strand” that is complementary to the target strand, which forms the single strand region of the R-loop. In some embodiments, the napDNAbp includes one or more nuclease activities, which then cut the DNA leaving various types of lesions. For example, the napDNAbp may comprises a nuclease activity that cuts the non-target strand at a first location, and/or cuts the target strand at a second location. Depending on the nuclease activity, the target DNA can be cut to form a “double-stranded break” whereby both strands are cut. In other embodiments, the target DNA can be cut at only a single site, i.e., the DNA is “nicked” on one strand. Exemplary napDNAbp with different nuclease activities include “Cas9 nickase” (“nCas9”) and a deactivated Cas9 having no nuclease activities (“dead Cas9” or “dCas9”).

The below description of various napDNAbps which can be used in connection with the presently disclose prime editors is not meant to be limiting in any way. The prime editors may comprise the canonical SpCas9, or any ortholog Cas9 protein, or any variant Cas9 protein—including any naturally occurring variant, mutant, or otherwise engineered version of Cas9—that is known or which can be made or evolved through a directed evolutionary or otherwise mutagenic process. In various embodiments, the Cas9 or Cas9 variants have a nickase activity, i.e., only cleave of strand of the target DNA sequence. In other embodiments, the Cas9 or Cas9 variants have inactive nucleases, i.e., are “dead” Cas9 proteins. Other variant Cas9 proteins that may be used are those having a smaller molecular weight than the canonical SpCas9 (e.g., for easier delivery) or having modified or rearranged primary amino acid structure (e.g., the circular permutant formats).

The prime editors described herein may also comprise Cas9 equivalents, including Cas12a (Cpf1) and Cas12b1 proteins which are the result of convergent evolution. The napDNAbps used herein (e.g., SpCas9, Cas9 variant, or Cas9 equivalents) may also may also contain various modifications that alter/enhance their PAM specificities. Lastly, the application contemplates any Cas9, Cas9 variant, or Cas9 equivalent which has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence, such as a references SpCas9 canonical sequence or a reference Cas9 equivalent (e.g., Cas12a (Cpf1)).

The napDNAbp can be a CRISPR (clustered regularly interspaced short palindromic repeat)-associated nuclease. As outlined above, CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gRNA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M. et al., Science 337:816-821(2012), the entire contents of which is hereby incorporated by reference.

In some embodiments, the napDNAbp directs cleavage of one or both strands at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the napDNAbp directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence. In some embodiments, a vector encodes a napDNAbp that is mutated to with respect to a corresponding wild-type enzyme such that the mutated napDNAbp lacks the ability to cleave one or both strands of a target polynucleotide containing a target sequence. For example, an aspartate-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 from S. pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves a single strand). Other examples of mutations that render Cas9 a nickase include, without limitation, H840A, N854A, and N863A in reference to the canonical SpCas9 sequence, or to equivalent amino acid positions in other Cas9 variants or Cas9 equivalents.

As used herein, the term “Cas protein” refers to a full-length Cas protein obtained from nature, a recombinant Cas protein having a sequences that differs from a naturally occurring Cas protein, or any fragment of a Cas protein that nevertheless retains all or a significant amount of the requisite basic functions needed for the disclosed methods, i.e., (i) possession of nucleic-acid programmable binding of the Cas protein to a target DNA, and (ii) ability to nick the target DNA sequence on one strand. The Cas proteins contemplated herein embrace CRISPR Cas 9 proteins, as well as Cas9 equivalents, variants (e.g., Cas9 nickase (nCas9) or nuclease inactive Cas9 (dCas9)) homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and may include a Cas9 equivalent from any Class 2 CRISPR system (e.g., type II, V, VI), including Cas12a (Cpf1), Cas12e (CasX), Cas12b1 (C2c1), Cas12b2, Cas12c (C2c3), C2c4, C2c8, C2c5, C2c10, C2c9 Cas13a (C2c2), Cas13d, Cas13c (C2c7), Cas13b (C2c6), and Cas13b. Further Cas-equivalents are described in Makarova et al., “C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299) and Makarova et al., “Classification and Nomenclature of CRISPR-Cas Systems: Where from Here?,” The CRISPR Journal, Vol. 1. No. 5, 2018, the contents of which are incorporated herein by reference.

The terms “Cas9” or “Cas9 nuclease” or “Cas9 moiety” or “Cas9 domain” embrace any naturally occurring Cas9 from any organism, any naturally-occurring Cas9 equivalent or functional fragment thereof, any Cas9 homolog, ortholog, or paralog from any organism, and any mutant or variant of a Cas9, naturally-occurring or engineered. The term Cas9 is not meant to be particularly limiting and may be referred to as a “Cas9 or equivalent.” Exemplary Cas9 proteins are further described herein and/or are described in the art and are incorporated herein by reference. The present disclosure is unlimited with regard to the particular Cas9 that is employed in the prime editor (PE) of the invention.

As noted herein, Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J. J., McShan W. M., Ajdic D. J., Savic D. J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A. N., Kenton S., Lai H. S., Lin S. P., Qian Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S. W., Roe B. A., McLaughlin R. E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C. M., Gonzales K., Chao Y., Pirzada Z. A., Eckert M. R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference).

Examples of Cas9 and Cas9 equivalents are provided as follows; however, these specific examples are not meant to be limiting. The primer editor of the present disclosure may use any suitable napDNAbp, including any suitable Cas9 or Cas9 equivalent.

(a) WildMtype Canonical SpCas9

In one embodiment, the primer editor constructs described herein may comprise the “canonical SpCas9” nuclease from S. pyogenes, which has been widely used as a tool for genome engineering and is categorized as the type II subgroup of enzymes of the Class 2 CRISPR-Cas systems. This Cas9 protein is a large, multi-domain protein containing two distinct nuclease domains. Point mutations can be introduced into Cas9 to abolish one or both nuclease activities, resulting in a nickase Cas9 (nCas9) or dead Cas9 (dCas9), respectively, that still retains its ability to bind DNA in a sgRNA-programmed manner. In principle, when fused to another protein or domain, Cas9 or variant thereof (e.g., nCas9) can target that protein to virtually any DNA sequence simply by co-expression with an appropriate sgRNA. As used herein, the canonical SpCas9 protein refers to the wild type protein from Streptococcus pyogenes having the following amino acid sequence:

Description
Sequence
SEQ ID NO:

SpCas9
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDS
SEQ ID NO:

Streptococcus

GETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESELVEED
1361421

pyogenes

KKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKER

M1
GHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR

SwissProt
RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDL

Accession
DNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQ

No. Q99ZW2
DLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDG

Wild type
TEELLVKLNREDLLRKQRTEDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKI

EKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERM

TNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVD

LLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKD

FLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG

RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSG

QGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTT

QKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVD

QELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMK

NYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL

DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLN

AVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNF

FKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEV

QTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSK

KLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGR

KRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY

LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGA

PAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD

SpCas9
ATGGATAAAAAATATAGCATTGGCCTGGATATTGGCACCAACAGCGTGGGCTGGG
SEQ ID NO:

Reverse
CGGTGATTACCGATGAATATAAAGTGCCGAGCAAAAAATTTAAAGTGCTGGGCAA
1361422

translation
CACCGATCGCCATAGCATTAAAAAAAACCTGATTGGCGCGCTGCTGTTTGATAGC

of
GGCGAAACCGCGGAAGCGACCCGCCTGAAACGCACCGCGCGCCGCCGCTATACCC

SwissProt
GCCGCAAAAACCGCATTTGCTATCTGCAGGAAATTTTTAGCAACGAAATGGCGAA

Accession
AGTGGATGATAGCTTTTTTCATCGCCTGGAAGAAAGCTTTCTGGTGGAAGAAGAT

No. Q99ZW2
AAAAAACATGAACGCCATCCGATTTTTGGCAACATTGTGGATGAAGTGGCGTATC

Streptococcus

ATGAAAAATATCCGACCATTTATCATCTGCGCAAAAAACTGGTGGATAGCACCGA

pyogenes

TAAAGCGGATCTGCGCCTGATTTATCTGGCGCTGGCGCATATGATTAAATTTCGC

GGCCATTTTCTGATTGAAGGCGATCTGAACCCGGATAACAGCGATGTGGATAAAC

TGTTTATTCAGCTGGTGCAGACCTATAACCAGCTGTTTGAAGAAAACCCGATTAA

CGCGAGCGGCGTGGATGCGAAAGCGATTCTGAGCGCGCGCCTGAGCAAAAGCCGC

CGCCTGGAAAACCTGATTGCGCAGCTGCCGGGCGAAAAAAAAAACGGCCTGTTTG

ACCAACTTTGATAAAAACCTGCCGAACGAAAAAGTGCTGCCGAAACATAGCCTGC

AGGCATGCGCAAACCGGCGTTTCTGAGCGGCGAACAGAAAAAAGCGATTGTGGAT

ACAGCCGCTTTGCGTGGATGACCCGCAAAAGCGAAGAAACCATTACCCCGTGGAA

TGTATGAATATTTTACCGTGTATAACGAACTGACCAAAGTGAAATATGTGACCGA

CCTTTGATAACGGCAGCATTCCGCATCAGATTCATCTGGGCGAACTGCATGCGAT

CTTTGAAGAAGTGGTGGATAAAGGCGCGAGCGCGCAGAGCTTTATTGAACGCATG

GATCTGACCCTGCTGAAAGCGCTGGTGCGCCAGCAGCTGCCGGAAAAATATAAAG

GAGCCAGGAAGAATTTTATAAATTTATTAAACCGATTCTGGAAAAAATGGATGGC

ACCGAAGAACTGCTGGTGAAACTGAACCGCGAAGATCTGCTGCGCAAACAGCGCA

GAAAAAATTCTGACCTTTCGCATTCCGTATTATGTGGGCCCGCTGGCGCGCGGCA

AAATTTTTTTTGATCAGAGCAAAAACGGCTATGCGGGCTATATTGATGGCGGCGC

TCTGCGCCGCCAGGAAGATTTTTATCCGTTTCTGAAAGATAACCGCGAAAAAATT

AAAACCTGAGCGATGCGATTCTGCTGAGCGATATTCTGCGCGTGAACACCGAAAT

TACCAAAGCGCCGCTGAGCGCGAGCATGATTAAACGCTATGATGAACATCATCAG

GCAACCTGATTGCGCTGAGCCTGGGCCTGACCCCGAACTTTAAAAGCAACTTTGA

GATAACCTGCTGGCGCAGATTGGCGATCAGTATGCGGATCTGTTTCTGGCGGCGA

TCTGGCGGAAGATGCGAAACTGCAGCTGAGCAAAGATACCTATGATGATGATCTG

CTGCTGTTTAAAACCAACCGCAAAGTGACCGTGAAACAGCTGAAAGAAGATTATT

TTAAAAAAATTGAATGCTTTGATAGCGTGGAAATTAGCGGCGTGGAAGATCGCTT

TAACGCGAGCCTGGGCACCTATCATGATCTGCTGAAAATTATTAAAGATAAAGAT

TTTCTGGATAACGAAGAAAACGAAGATATTCTGGAAGATATTGTGCTGACCCTGA

CCCTGTTTGAAGATCGCGAAATGATTGAAGAACGCCTGAAAACCTATGCGCATCT

GTTTGATGATAAAGTGATGAAACAGCTGAAACGCCGCCGCTATACCGGCTGGGGC

CGCCTGAGCCGCAAACTGATTAACGGCATTCGCGATAAACAGAGCGGCAAAACCA

TTCTGGATTTTCTGAAAAGCGATGGCTTTGCGAACCGCAACTTTATGCAGCTGAT

TCATGATGATAGCCTGACCTTTAAAGAAGATATTCAGAAAGCGCAGGTGAGCGGC

CAGGGCGATAGCCTGCATGAACATATTGCGAACCTGGGGGGCAGCCCGGCGATTA

AAAAAGGCATTCTGCAGACCGTGAAAGTGGTGGATGAACTGGTGAAAGTGATGGG

CCGCCATAAACCGGAAAACATTGTGATTGAAATGGCGCGCGAAAACCAGACCACC

CAGAAAGGCCAGAAAAACAGCCGCGAACGCATGAAACGCATTGAAGAAGGCATTA

AAGAACTGGGCAGCCAGATTCTGAAAGAACATCCGGTGGAAAACACCCAGCTGCA

GAACGAAAAACTGTATCTGTATTATCTGCAGAACGGCCGCGATATGTATGTGGAT

CAGGAACTGGATATTAACCGCCTGAGCGATTATGATGTGGATCATATTGTGCCGC

AGAGCTTTCTGAAAGATGATAGCATTGATAACAAAGTGCTGACCCGCAGCGATAA

AAACCGCGGCAAAAGCGATAACGTGCCGAGCGAAGAAGTGGTGAAAAAAATGAAA

AACTATTGGCGCCAGCTGCTGAACGCGAAACTGATTACCCAGCGCAAATTTGATA

ACCTGACCAAAGCGGAACGCGGCGGCCTGAGCGAACTGGATAAAGCGGGCTTTAT

TAAACGCCAGCTGGTGGAAACCCGCCAGATTACCAAACATGTGGCGCAGATTCTG

GATAGCCGCATGAACACCAAATATGATGAAAACGATAAACTGATTCGCGAAGTGA

AAGTGATTACCCTGAAAAGCAAACTGGTGAGCGATTTTCGCAAAGATTTTCAGTT

TTATAAAGTGCGCGAAATTAACAACTATCATCATGCGCATGATGCGTATCTGAAC

GCGGTGGTGGGCACCGCGCTGATTAAAAAATATCCGAAACTGGAAAGCGAATTTG

TGTATGGCGATTATAAAGTGTATGATGTGCGCAAAATGATTGCGAAAAGCGAACA

GGAAATTGGCAAAGCGACCGCGAAATATTTTTTTTATAGCAACATTATGAACTTT

TTTAAAACCGAAATTACCCTGGCGAACGGCGAAATTCGCAAACGCCCGCTGATTG

AAACCAACGGCGAAACCGGCGAAATTGTGTGGGATAAAGGCCGCGATTTTGCGAC

CGTGCGCAAAGTGCTGAGCATGCCGCAGGTGAACATTGTGAAAAAAACCGAAGTG

CAGACCGGCGGCTTTAGCAAAGAAAGCATTCTGCCGAAACGCAACAGCGATAAAC

TGATTGCGCGCAAAAAAGATTGGGATCCGAAAAAATATGGCGGCTTTGATAGCCC

GACCGTGGCGTATAGCGTGCTGGTGGTGGCGAAAGTGGAAAAAGGCAAAAGCAAA

AAACTGAAAAGCGTGAAAGAACTGCTGGGCATTACCATTATGGAACGCAGCAGCT

TTGAAAAAAACCCGATTGATTTTCTGGAAGCGAAAGGCTATAAAGAAGTGAAAAA

AGATCTGATTATTAAACTGCCGAAATATAGCCTGTTTGAACTGGAAAACGGCCGC

AAACGCATGCTGGCGAGCGCGGGCGAACTGCAGAAAGGCAACGAACTGGCGCTGC

CGAGCAAATATGTGAACTTTCTGTATCTGGCGAGCCATTATGAAAAACTGAAAGG

CAGCCCGGAAGATAACGAACAGAAACAGCTGTTTGTGGAACAGCATAAACATTAT

CTGGATGAAATTATTGAACAGATTAGCGAATTTAGCAAACGCGTGATTCTGGCGG

ATGCGAACCTGGATAAAGTGCTGAGCGCGTATAACAAACATCGCGATAAACCGAT

TCGCGAACAGGCGGAAAACATTATTCATCTGTTTACCCTGACCAACCTGGGCGCG

CCGGCGGCGTTTAAATATTTTGATACCACCATTGATCGCAAACGCTATACCAGCA

CCAAAGAAGTGCTGGATGCGACCCTGATTCATCAGAGCATTACCGGCCTGTATGA

AACCCGCATTGATCTGAGCCAGCTGGGCGGCGAT

The prime editors described herein may include canonical SpCas9, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with a wild type Cas9 sequence provided above. These variants may include SpCas9 variants containing one or more mutations, including any known mutation reported with the SwissProt Accession No. Q99ZW2 entry, which include:

SpCas9 mutation (relative to
Function/Characteristic (as reported)

the amino acid sequence of the
(see UniProtKB - Q99ZW2

canonical SpCas9 sequence,
(CAS9_STRPT1) entry -

SEQ ID NO: 1361421)
incorporated herein by reference)

D10A
Nickase mutant which cleaves the

protospacer strand (but no cleavage

of non-protospacer strand)

S15A
Decreased DNA cleavage activity

R66A
Decreased DNA cleavage activity

R70A
No DNA cleavage

R74A
Decreased DNA cleavage

R78A
Decreased DNA cleavage

97-150 deletion
No nuclease activity

R165A
Decreased DNA cleavage

175-307 deletion
About 50% decreased DNA cleavage

312-409 deletion
No nuclease activity

E762A
Nickase

H840A
Nickase mutant which cleaves

the non- protospacer strand but does

not cleave the protospacer strand

N854A
Nickase

N863A
Nickase

H982A
Decreased DNA cleavage

D986A
Nickase

1099-1368 deletion
No nuclease activity

R1333A
Reduced DNA binding

Other wild type SpCas9 sequences that may be used in the resent disclosure, include:

Description
Sequence
SEQ ID NO:

SpCas9
ATGGATAAGAAATACTCAATAGGCTTAGATATCGGCACAAATAGCGTCGGATGGGCGGT
SEQ ID NO:

Streptococcus

GATCACTGATGATTATAAGGTTCCGTCTAAAAAGTTCAAGGTTCTGGGAAATACAGACC
1361423

pyogenes

GCCACAGTATCAAAAAAAATCTTATAGGGGCTCTTTTATTTGGCAGTGGAGAGACAGCG

MGAS1882 wild
GAAGCGACTCGTCTCAAACGGACAGCTCGTAGAAGGTATACACGTCGGAAGAATCGTAT

type
TTGTTATCTACAGGAGATTTTTTCAAATGAGATGGCGAAAGTAGATGATAGTTTCTTTC

NC_017053.1
ATCGACTTGAAGAGTCTTTTTTGGTGGAAGAAGACAAGAAGCATGAACGTCATCCTATT

TTTGGAAATATAGTAGATGAAGTTGCTTATCATGAGAAATATCCAACTATCTATCATCT

GCGAAAAAAATTGGCAGATTCTACTGATAAAGCGGATTTGCGCTTAATCTATTTGGCCT

TAGCGCATATGATTAAGTTTCGTGGTCATTTTTTGATTGAGGGAGATTTAAATCCTGAT

AATAGTGATGTGGACAAACTATTTATCCAGTTGGTACAAATCTACAATCAATTATTTGA

AGAAAACCCTATTAACGCAAGTAGAGTAGATGCTAAAGCGATTCTTTCTGCACGATTGA

GTAAATCAAGACGATTAGAAAATCTCATTGCTCAGCTCCCCGGTGAGAAGAGAAATGGC

TTGTTTGGGAATCTCATTGCTTTGTCATTGGGATTGACCCCTAATTTTAAATCAAATTT

TGATTTGGCAGAAGATGCTAAATTACAGCTTTCAAAAGATACTTACGATGATGATTTAG

ATAATTTATTGGCGCAAATTGGAGATCAATATGCTGATTTGTTTTTGGCAGCTAAGAAT

TTATCAGATGCTATTTTACTTTCAGATATCCTAAGAGTAAATAGTGAAATAACTAAGGC

TCCCCTATCAGCTTCAATGATTAAGCGCTACGATGAACATCATCAAGACTTGACTCTTT

TAAAAGCTTTAGTTCGACAACAACTTCCAGAAAAGTATAAAGAAATCTTTTTTGATCAA

TCAAAAAACGGATATGCAGGTTATATTGATGGGGGAGCTAGCCAAGAAGAATTTTATAA

ATTTATCAAACCAATTTTAGAAAAAATGGATGGTACTGAGGAATTATTGGTGAAACTAA

ATCGTGAAGATTTGCTGCGCAAGCAACGGACCTTTGACAACGGCTCTATTCCCCATCAA

ATTCACTTGGGTGAGCTGCATGCTATTTTGAGAAGACAAGAAGACTTTTATCCATTTTT

AAAAGACAATCGTGAGAAGATTGAAAAAATCTTGACTTTTCGAATTCCTTATTATGTTG

GTCCATTGGCGCGTGGCAATAGTCGTTTTGCATGGATGACTCGGAAGTCTGAAGAAACA

ATTACCCCATGGAATTTTGAAGAAGTTGTCGATAAAGGTGCTTCAGCTCAATCATTTAT

TGAACGCATGACAAACTTTGATAAAAATCTTCCAAATGAAAAAGTACTACCAAAACATA

GTTTGCTTTATGAGTATTTTACGGTTTATAACGAATTGACAAAGGTCAAATATGTTACT

GAGGGAATGCGAAAACCAGCATTTCTTTCAGGTGAACAGAAGAAAGCCATTGTTGATTT

ACTCTTCAAAACAAATCGAAAAGTAACCGTTAAGCAATTAAAAGAAGATTATTTCAAAA

AAATAGAATGTTTTGATAGTGTTGAAATTTCAGGAGTTGAAGATAGATTTAATGCTTCA

TTAGGCGCCTACCATGATTTGCTAAAAATTATTAAAGATAAAGATTTTTTGGATAATGA

AGAAAATGAAGATATCTTAGAGGATATTGTTTTAACATTGACCTTATTTGAAGATAGGG

GGATGATTGAGGAAAGACTTAAAACATATGCTCACCTCTTTGATGATAAGGTGATGAAA

CAGCTTAAACGTCGCCGTTATACTGGTTGGGGACGTTTGTCTCGAAAATTGATTAATGG

TATTAGGGATAAGCAATCTGGCAAAACAATATTAGATTTTTTGAAATCAGATGGTTTTG

CCAATCGCAATTTTATGCAGCTGATCCATGATGATAGTTTGACATTTAAAGAAGATATT

CAAAAAGCACAGGTGTCTGGACAAGGCCATAGTTTACATGAACAGATTGCTAACTTAGC

TGGCAGTCCTGCTATTAAAAAAGGTATTTTACAGACTGTAAAAATTGTTGATGAACTGG

TCAAAGTAATGGGGCATAAGCCAGAAAATATCGTTATTGAAATGGCACGTGAAAATCAG

ACAACTCAAAAGGGCCAGAAAAATTCGCGAGAGCGTATGAAACGAATCGAAGAAGGTAT

CAAAGAATTAGGAAGTCAGATTCTTAAAGAGCATCCTGTTGAAAATACTCAATTGCAAA

ATGAAAAGCTCTATCTCTATTATCTACAAAATGGAAGAGACATGTATGTGGACCAAGAA

TTAGATATTAATCGTTTAAGTGATTATGATGTCGATCACATTGTTCCACAAAGTTTCAT

TAAAGACGATTCAATAGACAATAAGGTACTAACGCGTTCTGATAAAAATCGTGGTAAAT

CGGATAACGTTCCAAGTGAAGAAGTAGTCAAAAAGATGAAAAACTATTGGAGACAACTT

CTAAACGCCAAGTTAATCACTCAACGTAAGTTTGATAATTTAACGAAAGCTGAACGTGG

AGGTTTGAGTGAACTTGATAAAGCTGGTTTTATCAAACGCCAATTGGTTGAAACTCGCC

AAATCACTAAGCATGTGGCACAAATTTTGGATAGTCGCATGAATACTAAATACGATGAA

AATGATAAACTTATTCGAGAGGTTAAAGTGATTACCTTAAAATCTAAATTAGTTTCTGA

CTTCCGAAAAGATTTCCAATTCTATAAAGTACGTGAGATTAACAATTACCATCATGCCC

ATGATGCGTATCTAAATGCCGTCGTTGGAACTGCTTTGATTAAGAAATATCCAAAACTT

GAATCGGAGTTTGTCTATGGTGATTATAAAGTTTATGATGTTCGTAAAATGATTGCTAA

GTCTGAGCAAGAAATAGGCAAAGCAACCGCAAAATATTTCTTTTACTCTAATATCATGA

ACTTCTTCAAAACAGAAATTACACTTGCAAATGGAGAGATTCGCAAACGCCCTCTAATC

GAAACTAATGGGGAAACTGGAGAAATTGTCTGGGATAAAGGGCGAGATTTTGCCACAGT

GCGCAAAGTATTGTCCATGCCCCAAGTCAATATTGTCAAGAAAACAGAAGTACAGACAG

GCGGATTCTCCAAGGAGTCAATTTTACCAAAAAGAAATTCGGACAAGCTTATTGCTCGT

AAAAAAGACTGGGATCCAAAAAAATATGGTGGTTTTGATAGTCCAACGGTAGCTTATTC

AGTCCTAGTGGTTGCTAAGGTGGAAAAAGGGAAATCGAAGAAGTTAAAATCCGTTAAAG

AGTTACTAGGGATCACAATTATGGAAAGAAGTTCCTTTGAAAAAAATCCGATTGACTTT

TTAGAAGCTAAAGGATATAAGGAAGTTAAAAAAGACTTAATCATTAAACTACCTAAATA

TAGTCTTTTTGAGTTAGAAAACGGTCGTAAACGGATGCTGGCTAGTGCCGGAGAATTAC

AAAAAGGAAATGAGCTGGCTCTGCCAAGCAAATATGTGAATTTTTTATATTTAGCTAGT

CATTATGAAAAGTTGAAGGGTAGTCCAGAAGATAACGAACAAAAACAATTGTTTGTGGA

GCAGCATAAGCATTATTTAGATGAGATTATTGAGCAAATCAGTGAATTTTCTAAGCGTG

TTATTTTAGCAGATGCCAATTTAGATAAAGTTCTTAGTGCATATAACAAACATAGAGAC

AAACCAATACGTGAACAAGCAGAAAATATTATTCATTTATTTACGTTGACGAATCTTGG

AGCTCCCGCTGCTTTTAAATATTTTGATACAACAATTGATCGTAAACGATATACGTCTA

CAAAAGAAGTTTTAGATGCCACTCTTATCCATCAATCCATCACTGGTCTTTATGAAACA

CGCATTGATTTGAGTCAGCTAGGAGGTGACTGA

SpCas9
MDKKYSIGLDIGTNSVGWAVITDDYKVPSKKFKVLGNTDRHSIKKNLIGALLFGSGETA
SEQ ID NO:

Streptococcus

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
1361424

pyogenes

FGNIVDEVAYHEKYPTIYHLRKKLADSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD

MGAS1882 wild
NSDVDKLFIQLVQIYNQLFEENPINASRVDAKAILSARLSKSRRLENLIAQLPGEKRNG

type
LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

NC_017053.1
LSDAILLSDILRVNSEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQ

IHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGAYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDRGMIEERLKTYAHLEDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGHSLHEQIANLAGSPAIKKGILQTVKIVDELVKVMGHKPENIVIEMARENQ

TTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQE

LDINRLSDYDVDHIVPQSFIKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQL

LNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDE

NDKLIREVKVITLKSKLVSDERKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKL

ESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLI

ETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR

KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDF

LEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS

HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRD

KPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYET

RIDLSQLGGD

SpCas9
ATGGATAAAAAGTATTCTATTGGTTTAGACATCGGCACTAATTCCGTTGGATGGGCTGT
SEQ ID NO:

Streptococcus

CATAACCGATGAATACAAAGTACCTTCAAAGAAATTTAAGGTGTTGGGGAACACAGACC
1361425

pyogenes wild
GTCATTCGATTAAAAAGAATCTTATCGGTGCCCTCCTATTCGATAGTGGCGAAACGGCA

type
GAGGCGACTCGCCTGAAACGAACCGCTCGGAGAAGGTATACACGTCGCAAGAACCGAAT

SWBC2D7W014
ATGTTACTTACAAGAAATTTTTAGCAATGAGATGGCCAAAGTTGACGATTCTTTCTTTC

ACCGTTTGGAAGAGTCCTTCCTTGTCGAAGAGGACAAGAAACATGAACGGCACCCCATC

TTTGGAAACATAGTAGATGAGGTGGCATATCATGAAAAGTACCCAACGATTTATCACCT

CAGAAAAAAGCTAGTTGACTCAACTGATAAAGCGGACCTGAGGTTAATCTACTTGGCTC

TTGCCCATATGATAAAGTTCCGTGGGCACTTTCTCATTGAGGGTGATCTAAATCCGGAC

AACTCGGATGTCGACAAACTGTTCATCCAGTTAGTACAAACCTATAATCAGTTGTTTGA

AGAGAACCCTATAAATGCAAGTGGCGTGGATGCGAAGGCTATTCTTAGCGCCCGCCTCT

CTAAATCCCGACGGCTAGAAAACCTGATCGCACAATTACCCGGAGAGAAGAAAAATGGG

TTGTTCGGTAACCTTATAGCGCTCTCACTAGGCCTGACACCAAATTTTAAGTCGAACTT

CGACTTAGCTGAAGATGCCAAATTGCAGCTTAGTAAGGACACGTACGATGACGATCTCG

ACAATCTACTGGCACAAATTGGAGATCAGTATGCGGACTTATTTTTGGCTGCCAAAAAC

CTTAGCGATGCAATCCTCCTATCTGACATACTGAGAGTTAATACTGAGATTACCAAGGC

GCCGTTATCCGCTTCAATGATCAAAAGGTACGATGAACATCACCAAGACTTGACACTTC

TCAAGGCCCTAGTCCGTCAGCAACTGCCTGAGAAATATAAGGAAATATTCTTTGATCAG

TCGAAAAACGGGTACGCAGGTTATATTGACGGCGGAGCGAGTCAAGAGGAATTCTACAA

GTTTATCAAACCCATATTAGAGAAGATGGATGGGACGGAAGAGTTGCTTGTAAAACTCA

ATCGCGAAGATCTACTGCGAAAGCAGCGGACTTTCGACAACGGTAGCATTCCACATCAA

ATCCACTTAGGCGAATTGCATGCTATACTTAGAAGGCAGGAGGATTTTTATCCGTTCCT

CAAAGACAATCGTGAAAAGATTGAGAAAATCCTAACCTTTCGCATACCTTACTATGTGG

GACCCCTGGCCCGAGGGAACTCTCGGTTCGCATGGATGACAAGAAAGTCCGAAGAAACG

ATTACTCCATGGAATTTTGAGGAAGTTGTCGATAAAGGTGCGTCAGCTCAATCGTTCAT

CGAGAGGATGACCAACTTTGACAAGAATTTACCGAACGAAAAAGTATTGCCTAAGCACA

GTTTACTTTACGAGTATTTCACAGTGTACAATGAACTCACGAAAGTTAAGTATGTCACT

GAGGGCATGCGTAAACCCGCCTTTCTAAGCGGAGAACAGAAGAAAGCAATAGTAGATCT

GTTATTCAAGACCAACCGCAAAGTGACAGTTAAGCAATTGAAAGAGGACTACTTTAAGA

AAATTGAATGCTTCGATTCTGTCGAGATCTCCGGGGTAGAAGATCGATTTAATGCGTCA

CTTGGTACGTATCATGACCTCCTAAAGATAATTAAAGATAAGGACTTCCTGGATAACGA

AGAGAATGAAGATATCTTAGAAGATATAGTGTTGACTCTTACCCTCTTTGAAGATCGGG

AAATGATTGAGGAAAGACTAAAAACATACGCTCACCTGTTCGACGATAAGGTTATGAAA

CAGTTAAAGAGGCGTCGCTATACGGGCTGGGGACGATTGTCGCGGAAACTTATCAACGG

GATAAGAGACAAGCAAAGTGGTAAAACTATTCTCGATTTTCTAAAGAGCGACGGCTTCG

CCAATAGGAACTTTATGCAGCTGATCCATGATGACTCTTTAACCTTCAAAGAGGATATA

CAAAAGGCACAGGTTTCCGGACAAGGGGACTCATTGCACGAACATATTGCGAATCTTGC

TGGTTCGCCAGCCATCAAAAAGGGCATACTCCAGACAGTCAAAGTAGTGGATGAGCTAG

TTAAGGTCATGGGACGTCACAAACCGGAAAACATTGTAATCGAGATGGCACGCGAAAAT

CAAACGACTCAGAAGGGGCAAAAAAACAGTCGAGAGCGGATGAAGAGAATAGAAGAGGG

TATTAAAGAACTGGGCAGCCAGATCTTAAAGGAGCATCCTGTGGAAAATACCCAATTGC

AGAACGAGAAACTTTACCTCTATTACCTACAAAATGGAAGGGACATGTATGTTGATCAG

GAACTGGACATAAACCGTTTATCTGATTACGACGTCGATCACATTGTACCCCAATCCTT

TTTGAAGGACGATTCAATCGACAATAAAGTGCTTACACGCTCGGATAAGAACCGAGGGA

AAAGTGACAATGTTCCAAGCGAGGAAGTCGTAAAGAAAATGAAGAACTATTGGCGGCAG

CTCCTAAATGCGAAACTGATAACGCAAAGAAAGTTCGATAACTTAACTAAAGCTGAGAG

GGGTGGCTTGTCTGAACTTGACAAGGCCGGATTTATTAAACGTCAGCTCGTGGAAACCC

GCCAAATCACAAAGCATGTTGCACAGATACTAGATTCCCGAATGAATACGAAATACGAC

GAGAACGATAAGCTGATTCGGGAAGTCAAAGTAATCACTTTAAAGTCAAAATTGGTGTC

GGACTTCAGAAAGGATTTTCAATTCTATAAAGTTAGGGAGATAAATAACTACCACCATG

CGCACGACGCTTATCTTAATGCCGTCGTAGGGACCGCACTCATTAAGAAATACCCGAAG

CTAGAAAGTGAGTTTGTGTATGGTGATTACAAAGTTTATGACGTCCGTAAGATGATCGC

GAAAAGCGAACAGGAGATAGGCAAGGCTACAGCCAAATACTTCTTTTATTCTAACATTA

TGAATTTCTTTAAGACGGAAATCACTCTGGCAAACGGAGAGATACGCAAACGACCTTTA

ATTGAAACCAATGGGGAGACAGGTGAAATCGTATGGGATAAGGGCCGGGACTTCGCGAC

GGTGAGAAAAGTTTTGTCCATGCCCCAAGTCAACATAGTAAAGAAAACTGAGGTGCAGA

CCGGAGGGTTTTCAAAGGAATCGATTCTTCCAAAAAGGAATAGTGATAAGCTCATCGCT

CGTAAAAAGGACTGGGACCCGAAAAAGTACGGTGGCTTCGATAGCCCTACAGTTGCCTA

TTCTGTCCTAGTAGTGGCAAAAGTTGAGAAGGGAAAATCCAAGAAACTGAAGTCAGTCA

AAGAATTATTGGGGATAACGATTATGGAGCGCTCGTCTTTTGAAAAGAACCCCATCGAC

TTCCTTGAGGCGAAAGGTTACAAGGAAGTAAAAAAGGATCTCATAATTAAACTACCAAA

GTATAGTCTGTTTGAGTTAGAAAATGGCCGAAAACGGATGTTGGCTAGCGCCGGAGAGC

TTCAAAAGGGGAACGAACTCGCACTACCGTCTAAATACGTGAATTTCCTGTATTTAGCG

TCCCATTACGAGAAGTTGAAAGGTTCACCTGAAGATAACGAACAGAAGCAACTTTTTGT

TGAGCAGCACAAACATTATCTCGACGAAATCATAGAGCAAATTTCGGAATTCAGTAAGA

GAGTCATCCTAGCTGATGCCAATCTGGACAAAGTATTAAGCGCATACAACAAGCACAGG

GATAAACCCATACGTGAGCAGGCGGAAAATATTATCCATTTGTTTACTCTTACCAACCT

CGGCGCTCCAGCCGCATTCAAGTATTTTGACACAACGATAGATCGCAAACGATACACTT

CTACCAAGGAGGTGCTAGACGCGACACTGATTCACCAATCCATCACGGGATTATATGAA

ACTCGGATAGATTTGTCACAGCTTGGGGGTGACGGATCCCCCAAGAAGAAGAGGAAAGT

CTCGAGCGACTACAAAGACCATGACGGTGATTATAAAGATCATGACATCGATTACAAGG

ATGACGATGACAAGGCTGCAGGA

SpCas9
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLEDSGETA
SEQ ID NO:

Streptococcus

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
1361426

pyogenes wild
FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKERGHFLIEGDLNPD

type Encoded
NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

product of
LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

SWBC2D7W014
LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTEDNGSIPHQ

IHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDERKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGDGSPKKKRKVSSDYKDHDGDYKDHDIDYKDDDDKAAG

SpCas9
ATGGATAAGAAATACTCAATAGGCTTAGATATCGGCACAAATAGCGTCGGATGGGCGGT
SEQ ID NO:

Streptococcus

GATCACTGATGAATATAAGGTTCCGTCTAAAAAGTTCAAGGTTCTGGGAAATACAGACC
1361427

pyogenes

GCCACAGTATCAAAAAAAATCTTATAGGGGCTCTTTTATTTGACAGTGGAGAGACAGCG

M1GAS wild
GAAGCGACTCGTCTCAAACGGACAGCTCGTAGAAGGTATACACGTCGGAAGAATCGTAT

type
TTGTTATCTACAGGAGATTTTTTCAAATGAGATGGCGAAAGTAGATGATAGTTTCTTTC

NC_002737.2
ATCGACTTGAAGAGTCTTTTTTGGTGGAAGAAGACAAGAAGCATGAACGTCATCCTATT

TTTGGAAATATAGTAGATGAAGTTGCTTATCATGAGAAATATCCAACTATCTATCATCT

GCGAAAAAAATTGGTAGATTCTACTGATAAAGCGGATTTGCGCTTAATCTATTTGGCCT

TAGCGCATATGATTAAGTTTCGTGGTCATTTTTTGATTGAGGGAGATTTAAATCCTGAT

AATAGTGATGTGGACAAACTATTTATCCAGTTGGTACAAACCTACAATCAATTATTTGA

AGAAAACCCTATTAACGCAAGTGGAGTAGATGCTAAAGCGATTCTTTCTGCACGATTGA

GTAAATCAAGACGATTAGAAAATCTCATTGCTCAGCTCCCCGGTGAGAAGAAAAATGGC

TTATTTGGGAATCTCATTGCTTTGTCATTGGGTTTGACCCCTAATTTTAAATCAAATTT

TGATTTGGCAGAAGATGCTAAATTACAGCTTTCAAAAGATACTTACGATGATGATTTAG

ATAATTTATTGGCGCAAATTGGAGATCAATATGCTGATTTGTTTTTGGCAGCTAAGAAT

TTATCAGATGCTATTTTACTTTCAGATATCCTAAGAGTAAATACTGAAATAACTAAGGC

TCCCCTATCAGCTTCAATGATTAAACGCTACGATGAACATCATCAAGACTTGACTCTTT

TAAAAGCTTTAGTTCGACAACAACTTCCAGAAAAGTATAAAGAAATCTTTTTTGATCAA

TCAAAAAACGGATATGCAGGTTATATTGATGGGGGAGCTAGCCAAGAAGAATTTTATAA

ATTTATCAAACCAATTTTAGAAAAAATGGATGGTACTGAGGAATTATTGGTGAAACTAA

ATCGTGAAGATTTGCTGCGCAAGCAACGGACCTTTGACAACGGCTCTATTCCCCATCAA

ATTCACTTGGGTGAGCTGCATGCTATTTTGAGAAGACAAGAAGACTTTTATCCATTTTT

AAAAGACAATCGTGAGAAGATTGAAAAAATCTTGACTTTTCGAATTCCTTATTATGTTG

GTCCATTGGCGCGTGGCAATAGTCGTTTTGCATGGATGACTCGGAAGTCTGAAGAAACA

ATTACCCCATGGAATTTTGAAGAAGTTGTCGATAAAGGTGCTTCAGCTCAATCATTTAT

TGAACGCATGACAAACTTTGATAAAAATCTTCCAAATGAAAAAGTACTACCAAAACATA

GTTTGCTTTATGAGTATTTTACGGTTTATAACGAATTGACAAAGGTCAAATATGTTACT

GAAGGAATGCGAAAACCAGCATTTCTTTCAGGTGAACAGAAGAAAGCCATTGTTGATTT

ACTCTTCAAAACAAATCGAAAAGTAACCGTTAAGCAATTAAAAGAAGATTATTTCAAAA

AAATAGAATGTTTTGATAGTGTTGAAATTTCAGGAGTTGAAGATAGATTTAATGCTTCA

TTAGGTACCTACCATGATTTGCTAAAAATTATTAAAGATAAAGATTTTTTGGATAATGA

AGAAAATGAAGATATCTTAGAGGATATTGTTTTAACATTGACCTTATTTGAAGATAGGG

AGATGATTGAGGAAAGACTTAAAACATATGCTCACCTCTTTGATGATAAGGTGATGAAA

CAGCTTAAACGTCGCCGTTATACTGGTTGGGGACGTTTGTCTCGAAAATTGATTAATGG

TATTAGGGATAAGCAATCTGGCAAAACAATATTAGATTTTTTGAAATCAGATGGTTTTG

CCAATCGCAATTTTATGCAGCTGATCCATGATGATAGTTTGACATTTAAAGAAGACATT

CAAAAAGCACAAGTGTCTGGACAAGGCGATAGTTTACATGAACATATTGCAAATTTAGC

TGGTAGCCCTGCTATTAAAAAAGGTATTTTACAGACTGTAAAAGTTGTTGATGAATTGG

TCAAAGTAATGGGGCGGCATAAGCCAGAAAATATCGTTATTGAAATGGCACGTGAAAAT

CAGACAACTCAAAAGGGCCAGAAAAATTCGCGAGAGCGTATGAAACGAATCGAAGAAGG

TATCAAAGAATTAGGAAGTCAGATTCTTAAAGAGCATCCTGTTGAAAATACTCAATTGC

AAAATGAAAAGCTCTATCTCTATTATCTCCAAAATGGAAGAGACATGTATGTGGACCAA

GAATTAGATATTAATCGTTTAAGTGATTATGATGTCGATCACATTGTTCCACAAAGTTT

CCTTAAAGACGATTCAATAGACAATAAGGTCTTAACGCGTTCTGATAAAAATCGTGGTA

AATCGGATAACGTTCCAAGTGAAGAAGTAGTCAAAAAGATGAAAAACTATTGGAGACAA

CTTCTAAACGCCAAGTTAATCACTCAACGTAAGTTTGATAATTTAACGAAAGCTGAACG

TGGAGGTTTGAGTGAACTTGATAAAGCTGGTTTTATCAAACGCCAATTGGTTGAAACTC

GCCAAATCACTAAGCATGTGGCACAAATTTTGGATAGTCGCATGAATACTAAATACGAT

GAAAATGATAAACTTATTCGAGAGGTTAAAGTGATTACCTTAAAATCTAAATTAGTTTC

TGACTTCCGAAAAGATTTCCAATTCTATAAAGTACGTGAGATTAACAATTACCATCATG

CCCATGATGCGTATCTAAATGCCGTCGTTGGAACTGCTTTGATTAAGAAATATCCAAAA

CTTGAATCGGAGTTTGTCTATGGTGATTATAAAGTTTATGATGTTCGTAAAATGATTGC

TAAGTCTGAGCAAGAAATAGGCAAAGCAACCGCAAAATATTTCTTTTACTCTAATATCA

TGAACTTCTTCAAAACAGAAATTACACTTGCAAATGGAGAGATTCGCAAACGCCCTCTA

ATCGAAACTAATGGGGAAACTGGAGAAATTGTCTGGGATAAAGGGCGAGATTTTGCCAC

AGTGCGCAAAGTATTGTCCATGCCCCAAGTCAATATTGTCAAGAAAACAGAAGTACAGA

CAGGCGGATTCTCCAAGGAGTCAATTTTACCAAAAAGAAATTCGGACAAGCTTATTGCT

CGTAAAAAAGACTGGGATCCAAAAAAATATGGTGGTTTTGATAGTCCAACGGTAGCTTA

TTCAGTCCTAGTGGTTGCTAAGGTGGAAAAAGGGAAATCGAAGAAGTTAAAATCCGTTA

AAGAGTTACTAGGGATCACAATTATGGAAAGAAGTTCCTTTGAAAAAAATCCGATTGAC

TTTTTAGAAGCTAAAGGATATAAGGAAGTTAAAAAAGACTTAATCATTAAACTACCTAA

ATATAGTCTTTTTGAGTTAGAAAACGGTCGTAAACGGATGCTGGCTAGTGCCGGAGAAT

TACAAAAAGGAAATGAGCTGGCTCTGCCAAGCAAATATGTGAATTTTTTATATTTAGCT

AGTCATTATGAAAAGTTGAAGGGTAGTCCAGAAGATAACGAACAAAAACAATTGTTTGT

GGAGCAGCATAAGCATTATTTAGATGAGATTATTGAGCAAATCAGTGAATTTTCTAAGC

GTGTTATTTTAGCAGATGCCAATTTAGATAAAGTTCTTAGTGCATATAACAAACATAGA

GACAAACCAATACGTGAACAAGCAGAAAATATTATTCATTTATTTACGTTGACGAATCT

TGGAGCTCCCGCTGCTTTTAAATATTTTGATACAACAATTGATCGTAAACGATATACGT

CTACAAAAGAAGTTTTAGATGCCACTCTTATCCATCAATCCATCACTGGTCTTTATGAA

ACACGCATTGATTTGAGTCAGCTAGGAGGTGACTGA

SpCas9
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
SEQ ID NO:

Streptococcus

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
1361428

pyogenes M1GAS
FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD

wild type
NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

Encoded
LFGNLIALSLGLTPNEKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

product of
LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

NC_002737.2
SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTEDNGSIPHQ

(100%
IHLGELHAILRRQEDFYPELKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

identical to
ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

the canonical
EGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

Q99ZW2 wild
LGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMK

type)
QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDELKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRAKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

The prime editors described herein may include any of the above SpCas9 sequences, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.

(ii) Wild Type Cas9 Orthologs

In other embodiments, the Cas9 protein can be a wild type Cas9 ortholog from another bacterial species different from the canonical Cas9 from S. pyogenes. For example, the following Cas9 orthologs can be used in connection with the prime editor constructs described in this specification. In addition, any variant Cas9 orthologs having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to any of the below orthologs may also be used with the present prime editors.

Description
Sequence

LfCas9
MKEYHIGLDIGTSSIGWAVTDSQFKLMRIKGKTAIGVRLFEEGKTAAERRTFRTTRRRLKRRKWRLHYLDEIFAPHLQEVD

Lactobacillus

ENFLRRLKQSNIHPEDPTKNQAFIGKLLFPDLLKKNERGYPTLIKMRDELPVEQRAHYPVMNIYKLREAMINEDRQFDLRE

fermentum

VYLAVHHIVKYRGHFLNNASVDKFKVGRIDEDKSENVLNEAYEELQNGEGSFTIEPSKVEKIGQLLLDTKMRKLDRQKAVA

wild type
KLLEVKVADKEETKRNKQIATAMSKLVLGYKADFATVAMANGNEWKIDLSSETSEDEIEKFREELSDAQNDILTEITSLFS

GenBank:
QIMLNEIVPNGMSISESMMDRYWTHERQLAEVKEYLATQPASARKEFDQVYNKYIGQAPKERGFDLEKGLKKILSKKENWK

SNX31424.11
EIDELLKAGDELPKQRTSANGVIPHQMHQQELDRIIEKQAKYYPWLATENPATGERDRHQAKYELDQLVSFRIPYYVGPLV

TPEVQKATSGAKFAWAKRKEDGEITPWNLWDKIDRAESAEAFIKRMTVKDTYLLNEDVLPANSLLYQKYNVLNELNNVRVN

GRRLSVGIKQDIYTELFKKKKTVKASDVASLVMAKTRGVNKPSVEGLSDPKKENSNLATYLDLKSIVGDKVDDNRYQTDLE

NIIEWRSVFEDGEIFADKLTEVEWLTDEQRSALVKKRYKGWGRLSKKLLTGIVDENGQRIIDLMWNTDQNFKEIVDQPVFK

EQIDQLNQKAITNDGMTLRERVESVLDDAYTSPQNKKAIWQVVRVVEDIVKAVGNAPKSISIEFARNEGNKGEITRSRRTQ

LQKLFEDQAHELVKDTSLTEELEKAPDLSDRYYFYFTQGGKDMYTGDPINFDEISTKYDIDHILPQSFVKDNSLDNRVLTS

RKENNKKSDQVPAKLYAAKMKPYWNQLLKQGLITQRKFENLTKDVDQNIKYRSLGFVKRQLVETRQVIKLTANILGSMYQE

AGTEIIETRAGLTKQLREEFDLPKVREVNDYHHAVDAYLTTFAGQYLNRRYPKLRSFFVYGEYMKFKHGSDLKLRNFNFFH

ELMEGDKSQGKVVDQQTGELITTRDEVAKSFDRLLNMKYMLVSKEVHDRSDQLYGATIVTAKESGKLTSPIEIKKNRLVDL

YGAYTNGTSAFMTIIKFTGNKPKYKVIGIPTTSAASLKRAGKPGSESYNQELHRIIKSNPKVKKGFEIVVPHVSYGQLIVD

GDCKFTLASPTVQHPATQLVLSKKSLETISSGYKILKDKPAIANERLIRVEDEVVGQMNRYFTIFDQRSNRQKVADARDKF

LSLPTESKYEGAKKVQVGKTEVITNLLMGLHANATQGDLKVLGLATFGFFQSTTGLSLSEDTMIVYQSPTGLFERRICLKD

I (SEQ ID NO: 1361429)

SaCas9
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICY

Staphylococcus

LQEIFSNEMAKVDDSFEHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMI

aureus

KFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLEGNLIA

wild type
LSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR

GenBank:
YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTF

AYD60528.1
DNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA

SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKED

YFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVM

KQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDELKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAG

SPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEK

LYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKL

ITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDERKDFQF

YKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA

NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGF

DSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLA

SAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNK

HRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD (SEQ ID

NO: 1361430)

SaCas9
MGKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARRLKRRRRHRIQRVKKLLFDYNLLTDH

Staphylococcus

SELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKKD

aureus

GEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPE

ELRSVKYAYNADLYNALNDLNNLVITRDENEKLEYYEKFQIIENVEKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGKPEF

TNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAINLIL

DELWHTNDNQIAIFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSK

DAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFD

NSFNNKVLVKQEENSKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNLV

DTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIANADFIFKEWKKLDKAKKVM

ENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLY

DKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAH

LDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYKNDL

IKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKK

(SEQ ID NO: 1361431)

StCas9
MLFNKCIIISINLDFSNKEKCMTKPYSIGLDIGTNSVGWAVITDNYKVPSKKMKVLGNTSKKYIKKNLLGVLLEDSGITAE

Streptococcus

GRRLKRTARRRYTRRRNRILYLQEIFSTEMATLDDAFFQRLDDSFLVPDDKRDSKYPIFGNLVEEKVYHDEFPTIYHLRKY

thermophilus

LADSTKKADLRLVYLALAHMIKYRGHFLIEGEFNSKNNDIQKNFQDFLDTYNAIFESDLSLENSKQLEEIVKDKISKLEKK

UniProtKB/
DRILKLEPGEKNSGIFSEFLKLIVGNQADFRKCFNLDEKASLHFSKESYDEDLETLLGYIGDDYSDVFLKAKKLYDAILLS

Swiss-Prot:
GFLTVTDNETEAPLSSAMIKRYNEHKEDLALLKEYIRNISLKTYNEVEKDDTKNGYAGYIDGKTNQEDFYVYLKNLLAEFE

G3ECR1.2
GADYFLEKIDREDFLRKQRTEDNGSIPYQIHLQEMRAILDKQAKFYPFLAKNKERIEKILTFRIPYYVGPLARGNSDFAWS

Wild type
IRKRNEKITPWNFEDVIDKESSAEAFINRMTSFDLYLPEEKVLPKHSLLYETENVYNELTKVRFIAESMRDYQFLDSKQKK

DIVRLYFKDKRKVTDKDIIEYLHAIYGYDGIELKGIEKQFNSSLSTYHDLLNIINDKEFLDDSSNEAIIEEIIHTLTIFED

REMIKQRLSKFENIFDKSVLKKLSRRHYTGWGKLSAKLINGIRDEKSGNTILDYLIDDGISNRNFMQLIHDDALSFKKKIQ

KAQIIGDEDKGNIKEVVKSLPGSPAIKKGILQSIKIVDELVKVMGGRKPESIVVEMARENQYTNQGKSNSQQRLKRLEKSL

KELGSKILKENIPAKLSKIDNNALQNDRLYLYYLQNGKDMYTGDDLDIDRLSNYDIDHIIPQAFLKDNSIDNKVLVSSASN

RGKSDDFPSLEVVKKRKTFWYQLLKSKLISQRKFDNLTKAERGGLLPEDKAGFIQRQLVETRQITKHVARLLDEKENNKKD

ENNRAVRTVKIITLKSTLVSQFRKDFELYKVREINDFHHAHDAYLNAVIASALLKKYPKLEPEFVYGDYPKYNSFRERKSA

TEKVYFYSNIMNIFKKSISLADGRVIERPLIEVNEETGESVWNKESDLATVRRVLSYPQVNVVKKVEEQNHGLDRGKPKGL

FNANLSSKPKPNSNENLVGAKEYLDPKKYGGYAGISNSFAVLVKGTIEKGAKKKITNVLEFQGISILDRINYRKDKLNELL

EKGYKDIELIIELPKYSLFELSDGSRRMLASILSTNNKRGEIHKGNQIFLSQKFVKLLYHAKRISNTINENHRKYVENHKK

EFEELFYYILEFNENYVGAKKNGKLLNSAFQSWQNHSIDELCSSFIGPTGSERKGLFELTSRGSAADFEFLGVKIPRYRDY

TPSSLLKDATLIHQSVTGLYETRIDLAKLGEG (SEQ ID NO: 1361432)

LcCas9
MKIKNYNLALTPSTSAVGHVEVDDDLNILEPVHHQKAIGVAKFGEGETAEARRLARSARRTTKRRANRINHYFNEIMKPEI

Lactobacillus

DKVDPLMFDRIKQAGLSPLDERKEFRTVIFDRPNIASYYHNQFPTIWHLQKYLMITDEKADIRLIYWALHSLLKHRGHFEN

crispatus

TTPMSQFKPGKLNLKDDMLALDDYNDLEGLSFAVANSPEIEKVIKDRSMHKKEKIAELKKLIVNDVPDKDLAKRNNKIITQ

NCBI
IVNAIMGNSFHLNFIFDMDLDKLTSKAWSFKLDDPELDTKEDAISGSMTDNQIGIFETLQKIYSAISLLDILNGSSNVVDA

Reference
KNALYDKHKRDLNLYFKFLNTLPDEIAKTLKAGYTLYIGNRKKDLLAARKLLKVNVAKNFSQDDFYKLINKELKSIDKQGL

Sequence:
QTRFSEKVGELVAQNNFLPVQRSSDNVFIPYQLNAITENKILENQGKYYDELVKPNPAKKDRKNAPYELSQLMQFTIPYYV

WP_133478044.1
GPLVTPEEQVKSGIPKTSRFAWMVRKDNGAITPWNFYDKVDIEATADKFIKRSIAKDSYLLSELVLPKHSLLYEKYEVFNE

Wild type
LSNVSLDGKKLSGGVKQILFNEVFKKTNKVNTSRILKALAKHNIPGSKITGLSNPEEFTSSLQTYNAWKKYFPNQIDNFAY

QQDLEKMIEWSTVFEDHKILAKKLDEIEWLDDDQKKFVANTRLRGWGRLSKRLLTGLKDNYGKSIMQRLETTKANFQQIVY

KPEFREQIDKISQAAAKNQSLEDILANSYTSPSNRKAIRKTMSVVDEYIKLNHGKEPDKIFLMFQRSEQEKGKQTEARSKQ

LNRILSQLKADKSANKLESKQLADEFSNAIKKSKYKLNDKQYFYFQQLGRDALTGEVIDYDELYKYTVLHIIPRSKLTDDS

QNNKVLTKYKIVDGSVALKFGNSYSDALGMPIKAFWTELNRLKLIPKGKLLNLTTDFSTLNKYQRDGYIARQLVETQQIVK

LLATIMQSRFKHTKIIEVRNSQVANIRYQFDYFRIKNLNEYYRGFDAYLAAVVGTYLYKVYPKARRLFVYGQYLKPKKTNQ

ENQDMHLDSEKKSQGFNFLWNLLYGKQDQIFVNGTDVIAFNRKDLITKMNTVYNYKSQKISLAIDYHNGAMFKATLEPRND

RDTAKTRKLIPKKKDYDTDIYGGYTSNVDGYMLLAEIIKRDGNKQYGFYGVPSRLVSELDTLKKTRYTEYEEKLKEIIKPE

LGVDLKKIKKIKILKNKVPFNQVIIDKGSKFFITSTSYRWNYRQLILSAESQQTLMDLVVDPDFSNHKARKDARKNADERL

IKVYEEILYQVKNYMPMFVELHRCYEKLVDAQKTFKSLKISDKAMVLNQILILLHSNATSPVLEKLGYHTRFTLGKKHNLI

SENAVLVTQSITGLKENHVSIKQML (SEQ ID NO: 1361433)

PdCas 9
MTNEKYSIGLDIGTSSIGFAVVNDNNRVIRVKGKNAIGVRLFDEGKAAADRRSFRTTRRSFRTTRRRLSRRRWRLKLLREI

Pedicoccus

FDAYITPVDEAFFIRLKESNLSPKDSKKQYSGDILFNDRSDKDFYEKYPTIYHLRNALMTEHRKFDVREIYLAIHHIMKER

damnosus

GHFLNATPANNFKVGRLNLEEKFEELNDIYQRVFPDESIEFRTDNLEQIKEVLLDNKRSRADRQRTLVSDIYQSSEDKDIE

NCBI
KRNKAVATEILKASLGNKAKLNVITNVEVDKEAAKEWSITFDSESIDDDLAKIEGQMTDDGHEIIEVLRSLYSGITLSAIV

Reference
PENHTLSQSMVAKYDLHKDHLKLFKKLINGMTDTKKAKNLRAAYDGYIDGVKGKVLPQEDFYKQVQVNLDDSAEANEIQTY

Sequence:
IDQDIFMPKQRTKANGSIPHQLQQQELDQIIENQKAYYPWLAELNPNPDKKRQQLAKYKLDELVTFRVPYYVGPMITAKDQ

WP_062913273.1
KNQSGAEFAWMIRKEPGNITPWNFDQKVDRMATANQFIKRMTTTDTYLLGEDVLPAQSLLYQKFEVLNELNKIRIDHKPIS

Wild type
IEQKQQIFNDLFKQFKNVTIKHLQDYLVSQGQYSKRPLIEGLADEKRENSSLSTYSDLCGIFGAKLVEENDRQEDLEKIIE

WSTIFEDKKIYRAKLNDLTWLTDDQKEKLATKRYQGWGRLSRKLLVGLKNSEHRNIMDILWITNENFMQIQAEPDFAKLVT

DANKGMLEKTDSQDVINDLYTSPQNKKAIRQILLVVHDIQNAMHGQAPAKIHVEFARGEERNPRRSVQRQRQVEAAYEKVS

NELVSAKVRQEFKEAINNKRDFKDRLFLYFMQGGIDIYTGKQLNIDQLSSYQIDHILPQAFVKDDSLTNRVLTNENQVKAD

SVPIDIFGKKMLSVWGRMKDQGLISKGKYRNLTMNPENISAHTENGFINRQLVETRQVIKLAVNILADEYGDSTQIISVKA

DLSHQMREDFELLKNRDVNDYHHAFDAYLAAFIGNYLLKRYPKLESYFVYGDFKKFTQKETKMRRFNFIYDLKHCDQVVNK

ETGEILWTKDEDIKYIRHLFAYKKILVSHEVREKRGALYNQTIYKAKDDKGSGQESKKLIRIKDDKETKIYGGYSGKSLAY

MTIVQITKKNKVSYRVIGIPTLALARLNKLENDSTENNGELYKIIKPQFTHYKVDKKNGEIIETTDDFKIVVSKVRFQQLI

DDAGQFFMLASDTYKNNAQQLVISNNALKAINNTNITDCPRDDLERLDNLRLDSAFDEIVKKMDKYFSAYDANNFREKIRN

SNLIFYQLPVEDQWENNKITELGKRTVLTRILQGLHANATTTDMSIFKIKTPFGQLRQRSGISLSENAQLIYQSPTGLFER

RVQLNKIK (SEQ ID NO: 1361434)

FnCas9
MKKQKFSDYYLGFDIGTNSVGWCVTDLDYNVLRENKKDMWGSRLFEEAKTAAERRVQRNSRRRLKRRKWRLNLLEEIFSNE

Fusobaterium

ILKIDSNFFRRLKESSLWLEDKSSKEKFTLENDDNYKDYDFYKQYPTIFHLRNELIKNPEKKDIRLVYLAIHSIFKSRGHF

nucleatum

LFEGQNLKEIKNFETLYNNLIAFLEDNGINKIIDKNNIEKLEKIVCDSKKGLKDKEKEFKEIFNSDKQLVAIFKLSVGSSV

NCBI
SLNDLFDTDEYKKGEVEKEKISFREQIYEDDKPIYYSILGEKIELLDIAKTFYDFMVLNNILADSQYISEAKVKLYEEHKK

Reference
DLKNLKYIIRKYNKGNYDKLFKDKNENNYSAYIGLNKEKSKKEVIEKSRLKIDDLIKNIKGYLPKVEEIEEKDKAIFNKIL

Sequence:
NKIELKTILPKQRISDNGTLPYQIHEAELEKILENQSKYYDFLNYEENGIITKDKLLMTFKFRIPYYVGPLNSYHKDKGGN

WP_060798984.1
SWIVRKEEGKILPWNFEQKVDIEKSAEEFIKRMTNKCTYLNGEDVIPKDTFLYSEYVILNELNKVQVNDEFLNEENKRKII

DELFKENKKVSEKKFKEYLLVKQIVDGTIELKGVKDSENSNYISYIRFKDIFGEKLNLDIYKEISEKSILWKCLYGDDKKI

FEKKIKNEYGDILTKDEIKKINTFKENNWGRLSEKLLTGIEFINLETGECYSSVMDALRRTNYNLMELLSSKFTLQESINN

ENKEMNEASYRDLIEESYVSPSLKRAIFQTLKIYEEIRKITGRVPKKVFIEMARGGDESMKNKKIPARQEQLKKLYDSCGN

DIANFSIDIKEMKNSLISYDNNSLRQKKLYLYYLQFGKCMYTGREIDLDRLLQNNDTYDIDHIYPRSKVIKDDSFDNLVLV

LKNENAEKSNEYPVKKEIQEKMKSFWRELKEKNFISDEKYKRLTGKDDFELRGFMARQLVNVRQTTKEVGKILQQIEPEIK

IVYSKAEIASSFREMFDFIKVRELNDTHHAKDAYLNIVAGNVYNTKFTEKPYRYLQEIKENYDVKKIYNYDIKNAWDKENS

LEIVKKNMEKNTVNITRFIKEKKGQLFDLNPIKKGETSNEIISIKPKVYNGKDDKLNEKYGYYKSLNPAYFLYVEHKEKNK

RIKSFERVNLVDVNNIKDEKSLVKYLIENKKLVEPRVIKKVYKRQVILINDYPYSIVTLDSNKLMDFENLKPLFLENKYEK

ILKNVIKFLEDNQGKSEENYKFIYLKKKDRYEKNETLESVKDRYNLEFNEMYDKFLEKLDSKDYKNYMNNKKYQELLDVKE

KFIKLNLFDKAFTLKSFLDLFNRKTMADFSKVGLTKYLGKIQKISSNVLSKNELYLLEESVTGLFVKKIKL (SEQ ID

NO: 1361435)

EcCas9
RRKQRIQILQELLGEEVLKTDPGFFHRMKESRYVVEDKRTLDGKQVELPYALFVDKDYTDKEYYKQFPTINHLIVYLMTTS

Enterococcus

DTPDIRLVYLALHYYMKNRGNELHSGDINNVKDINDILEQLDNVLETFLDGWNLKLKSYVEDIKNIYNRDLGRGERKKAFV

cecorum

NTLGAKTKAEKAFCSLISGGSTNLAELFDDSSLKEIETPKIEFASSSLEDKIDGIQEALEDRFAVIEAAKRLYDWKTLTDI

NCBI
LGDSSSLAEARVNSYQMHHEQLLELKSLVKEYLDRKVFQEVFVSLNVANNYPAYIGHTKINGKKKELEVKRTKRNDFYSYV

Reference
KKQVIEPIKKKVSDEAVLTKLSEIESLIEVDKYLPLQVNSDNGVIPYQVKLNELTRIEDNLENRIPVLRENRDKIIKTFKE

Sequence:
RIPYYVGSLNGVVKNGKCTNWMVRKEEGKIYPWNFEDKVDLEASAEQFIRRMTNKCTYLVNEDVLPKYSLLYSKYLVLSEL

WP_047338501.1
NNLRIDGRPLDVKIKQDIYENVFKKNRKVTLKKIKKYLLKEGIITDDDELSGLADDVKSSLTAYRDFKEKLGHLDLSEAQM

Wild type
ENIILNITLFGDDKKLLKKRLAALYPFIDDKSLNRIATLNYRDWGRLSERFLSGITSVDQETGELRTIIQCMYETQANLMQ

LLAEPYHFVEAIEKENPKVDLESISYRIVNDLYVSPAVKRQIWQTLLVIKDIKQVMKHDPERIFIEMAREKQESKKTKSRK

QVLSEVYKKAKEYEHLFEKLNSLTEEQLRSKKIYLYFTQLGKCMYSGEPIDFENLVSANSNYDIDHIYPQSKTIDDSENNI

VLVKKSLNAYKSNHYPIDKNIRDNEKVKTLWNTLVSKGLITKEKYERLIRSTPFSDEELAGFIARQLVETRQSTKAVAEIL

SNWFPESEIVYSKAKNVSNFRQDFEILKVRELNDCHHAHDAYLNIVVGNAYHTKFTNSPYRFIKNKANQEYNLRKLLQKVN

KIESNGVVAWVGQSENNPGTIATVKKVIRRNTVLISRMVKEVDGQLFDLTLMKKGKGQVPIKSSDERLTDISKYGGYNKAT

GAYFTFVKSKKRGKVVRSFEYVPLHLSKQFENNNELLKEYIEKDRGLTDVEILIPKVLINSLFRYNGSLVRITGRGDTRLL

LVHEQPLYVSNSFVQQLKSVSSYKLKKSENDNAKLTKTATEKLSNIDELYDGLLRKLDLPIYSYWFSSIKEYLVESRTKYI

KLSIEEKALVIFEILHLFQSDAQVPNLKILGLSTKPSRIRIQKNLKDTDKMSIIHQSPSGIFEHEIELTSL (SEQ ID

NO: 1361436)

AhCas9
MQNGFLGITVSSEQVGWAVINPKYELERASRKDLWGVRLEDKAETAEDRRMERTNRRLNQRKKNRIHYLRDIFHEEVNQKD

Anaerostipes

PNFFQQLDESNFCEDDRTVEFNFDTNLYKNQFPTVYHLRKYLMETKDKPDIRLVYLAFSKFMKNRGHFLYKGNLGEVMDFE

hadrus

NSMKGFCESLEKFNIDFPTLSDEQVKEVRDILCDHKIAKTVKKKNIITITKVKSKTAKAWIGLFCGCSVPVKVLFQDIDEE

NCBI
IVTDPEKISFEDASYDDYIANIEKGVGIYYEAIVSAKMLFDWSILNEILGDHQLLSDAMIAEYNKHHDDLKRLQKIIKGTG

Reference
SRELYQDIFINDVSGNYVCYVGHAKTMSSADQKQFYTFLKNRLKNVNGISSEDAEWIDTEIKNGTLLPKQTKRDNSVIPHQ

Sequence:
LQLREFELILDNMQEMYPFLKENREKLLKIFNFVIPYYVGPLKGVVRKGESTNWMVPKKDGVIHPWNFDEMVDKEASAECF

WP_044924278.1
ISRMTGNCSYLFNEKVLPKNSLLYETFEVLNELNPLKINGEPISVELKQRIYEQLFLTGKKVTKKSLTKYLIKNGYDKDIE

Wild type
LSGIDNEFHSNLKSHIDFEDYDNLSDEEVEQIILRITVFEDKQLLKDYLNREFVKLSEDERKQICSLSYKGWGNLSEMLLN

GITVTDSNGVEVSVMDMLWNTNLNLMQILSKKYGYKAEIEHYNKEHEKTIYNREDLMDYLNIPPAQRRKVNQLITIVKSLK

KTYGVPNKIFFKISREHQDDPKRTSSRKEQLKYLYKSLKSEDEKHLMKELDELNDHELSNDKVYLYFLQKGRCIYSGKKLN

LSRLRKSNYQNDIDYIYPLSAVNDRSMNNKVLTGIQENRADKYTYFPVDSEIQKKMKGFWMELVLQGFMTKEKYERLSREN

DESKSELVSFIEREISDNQQSGRMIASVLQYYFPESKIVFVKEKLISSFKRDFHLISSYGHNHLQAAKDAYITIVVGNVYH

TKFTMDPAIYFKNHKRKDYDLNRLFLENISRDGQIAWESGPYGSIQTVRKEYAQNHIAVTKRVVEVKGGLFKQMPLKKGHG

EYPLKTNDPRFGNIAQYGGYTNVTGSYFVLVESMEKGKKRISLEYVPVYLHERLEDDPGHKLLKEYLVDHRKLNHPKILLA

KVRKNSLLKIDGFYYRLNGRSGNALILTNAVELIMDDWQTKTANKISGYMKRRAIDKKARVYQNEFHIQELEQLYDFYLDK

LKNGVYKNRKNNQAELIHNEKEQFMELKTEDQCVLLTEIKKLFVCSPMQADLTLIGGSKHTGMIAMSSNVTKADFAVIAED

PLGLRNKVIYSHKGEK (SEQ ID NO: 1361437)

KvCas9
MSQNNNKIYNIGLDIGDASVGWAVVDEHYNLLKRHGKHMWGSRLFTQANTAVERRSSRSTRRRYNKRRERIRLLREIMEDM

Kandleria

VLDVDPTFFIRLANVSFLDQEDKKDYLKENYHSNYNLFIDKDENDKTYYDKYPTIYHLRKHLCESKEKEDPRLIYLALHHI

vitulina

VKYRGNFLYEGQKFSMDVSNIEDKMIDVLRQFNEINLFEYVEDRKKIDEVLNVLKEPLSKKHKAEKAFALFDTTKDNKAAY

NCBI
KELCAALAGNKFNVTKMLKEAELHDEDEKDISFKFSDATFDDAFVEKQPLLGDCVEFIDLLHDIYSWVELQNILGSAHTSE

Reference
PSISAAMIQRYEDHKNDLKLLKDVIRKYLPKKYFEVERDEKSKKNNYCNYINHPSKTPVDEFYKYIKKLIEKIDDPDVKTI

Sequence:
LNKIELESFMLKQNSRINGAVPYQMQLDELNKILENQSVYYSDLKDNEDKIRSILTFRIPYYFGPLNITKDRQEDWIIKKE

WP_031589969.1
GKENERILPWNANEIVDVDKTADEFIKRMRNFCTYFPDEPVMAKNSLTVSKYEVLNEINKLRINDHLIKRDMKDKMLHTLF

Wild type
MDHKSISANAMKKWLVKNQYFSNTDDIKIEGFQKENACSTSLTPWIDETKIFGKINESNYDFIEKIIYDVTVFEDKKILRR

RLKKEYDLDEEKIKKILKLKYSGWSRLSKKLLSGIKTKYKDSTRTPETVLEVMERTNMNLMQVINDEKLGFKKTIDDANST

SVSGKFSYAEVQELAGSPAIKRGIWQALLIVDEIKKIMKHEPAHVYIEFARNEDEKERKDSFVNQMLKLYKDYDFEDETEK

EANKHLKGEDAKSKIRSERLKLYYTQMGKCMYTGKSLDIDRLDTYQVDHIVPQSLLKDDSIDNKVLVLSSENQRKLDDLVI

PSSIRNKMYGFWEKLENNKIISPKKFYSLIKTEFNEKDQERFINRQIVETRQITKHVAQIIDNHYENTKVVTVRADLSHQF

RERYHIYKNRDINDFHHAHDAYIATILGTYIGHRFESLDAKYIYGEYKRIFRNQKNKGKEMKKNNDGFILNSMRNIYADKD

TGEIVWDPNYIDRIKKCFYYKDCFVTKKLEENNGTFFNVTVLPNDTNSDKDNTLATVPVNKYRSNVNKYGGFSGVNSFIVA

IKGKKKKGKKVIEVNKLTGIPLMYKNADEEIKINYLKQAEDLEEVQIGKEILKNQLIEKDGGLYYIVAPTEIINAKQLILN

ESQTKLVCEIYKAMKYKNYDNLDSEKIIDLYRLLINKMELYYPEYRKQLVKKFEDRYEQLKVISIEEKCNIIKQILATLHC

NSSIGKIMYSDFKISTTIGRLNGRTISLDDISFIAESPTGMYSKKYKL (SEQ ID NO: 1361438)

EfCas9
MRLFEEGHTAEDRRLKRTARRRISRRRNRLRYLQAFFEEAMTDLDENFFARLQESFLVPEDKKWHRHPIFAKLEDEVAYHE

Enterococcus

TYPTIYHLRKKLADSSEQADLRLIYLALAHIVKYRGHFLIEGKLSTENTSVKDQFQQFMVIYNQTFVNGESRLVSAPLPES

faecalis

VLIEEELTEKASRTKKSEKVLQQFPQEKANGLFGQFLKLMVGNKADFKKVFGLEEEAKITYASESYEEDLEGILAKVGDEY

NCBI
SDVFLAAKNVYDAVELSTILADSDKKSHAKLSSSMIVRFTEHQEDLKKFKRFIRENCPDEYDNLFKNEQKDGYAGYIAHAG

Reference
KVSQLKFYQYVKKIIQDIAGAEYFLEKIAQENFLRKQRTFDNGVIPHQIHLAELQAIIHRQAAYYPELKENQEKIEQLVTE

Sequence:
RIPYYVGPLSKGDASTFAWLKRQSEEPIRPWNLQETVDLDQSATAFIERMTNEDTYLPSEKVLPKHSLLYEKFMVFNELTK

WP_016631044.1
ISYTDDRGIKANFSGKEKEKIFDYLFKTRRKVKKKDIIQFYRNEYNTEIVTLSGLEEDQFNASFSTYQDLLKCGLTRAELD

Wild type
HPDNAEKLEDIIKILTIFEDRQRIRTQLSTFKGQFSAEVLKKLERKHYTGWGRLSKKLINGIYDKESGKTILDYLVKDDGV

SKHYNRNFMQLINDSQLSFKNAIQKAQSSEHEETLSETVNELAGSPAIKKGIYQSLKIVDELVAIMGYAPKRIVVEMAREN

QTTSTGKRRSIQRLKIVEKAMAEIGSNLLKEQPTTNEQLRDTRLFLYYMQNGKDMYTGDELSLHRLSHYDIDHIIPQSFMK

DDSLDNLVLVGSTENRGKSDDVPSKEVVKDMKAYWEKLYAAGLISQRKFQRLTKGEQGGLTLEDKAHFIQRQLVETRQITK

NVAGILDQRYNAKSKEKKVQIITLKASLTSQFRSIFGLYKVREVNDYHHGQDAYLNCVVATTLLKVYPNLAPEFVYGEYPK

FQTFKENKATAKAIIYTNLLRFFTEDEPRFTKDGEILWSNSYLKTIKKELNYHQMNIVKKVEVQKGGFSKESIKPKGPSNK

LIPVKNGLDPQKYGGFDSPVVAYTVLFTHEKGKKPLIKQEILGITIMEKTRFEQNPILFLEEKGFLRPRVLMKLPKYTLYE

FPEGRRRLLASAKEAQKGNQMVLPEHLLTLLYHAKQCLLPNQSESLAYVEQHQPEFQEILERVVDFAEVHTLAKSKVQQIV

KLFEANQTADVKEIAASFIQLMQFNAMGAPSTFKFFQKDIERARYTSIKEIFDATIIYQSPTGLYETRRKVVD (SEQ ID

NO: 1361439)

Staphylococcus

KRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARRLKRRRRHRIQRVKKLLFDYNLLTDHSE

aureus

LSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKKDGE

Cas9
VRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEEL

RSVKYAYNADLYNALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGKPEFTN

LKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAINLILDE

LWHTNDNQIAIFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDA

QKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNS

FNNKVLVKQEENSKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNLVDT

RYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIANADFIFKEWKKLDKAKKVMEN

QMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDK

DNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLD

ITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYNNDLIK

INGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPPRIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG

(SEQ ID NO: 1361440)

Geobacillus

MKYKIGLDIGITSIGWAVINLDIPRIEDLGVRIFDRAENPKTGESLALPRRLARSARRRLRRRKHRLERIRRLFVREGILT

thermodeni-

KEELNKLFEKKHEIDVWQLRVEALDRKLNNDELARILLHLAKRRGFRSNRKSERTNKENSTMLKHIEENQSILSSYRTVAE

trificans

MVVKDPKFSLHKRNKEDNYTNTVARDDLEREIKLIFAKQREYGNIVCTEAFEHEYISIWASQRPFASKDDIEKKVGFCTFE

Cas9
PKEKRAPKATYTFQSFTVWEHINKLRLVSPGGIRALTDDERRLIYKQAFHKNKITFHDVRTLLNLPDDTRFKGLLYDRNTT

LKENEKVRFLELGAYHKIRKAIDSVYGKGAAKSFRPIDEDTFGYALTMFKDDTDIRSYLRNEYEQNGKRMENLADKVYDEE

LIEELLNLSFSKFGHLSLKALRNILPYMEQGEVYSTACERAGYTFTGPKKKQKTVLLPNIPPIANPVVMRALTQARKVVNA

IIKKYGSPVSIHIELARELSQSFDERRKMQKEQEGNRKKNETAIRQLVEYGLTLNPTGLDIVKFKLWSEQNGKCAYSLQPI

EIERLLEPGYTEVDHVIPYSRSLDDSYTNKVLVLTKENREKGNRTPAEYLGLGSERWQQFETFVLTNKQFSKKKRDRLLRL

HYDENEENEFKNRNLNDTRYISRFLANFIREHLKFADSDDKQKVYTVNGRITAHLRSRWNENKNREESNLHHAVDAAIVAC

TTPSDIARVTAFYQRREQNKELSKKTDPQFPQPWPHFADELQARLSKNPKESIKALNLGNYDNEKLESLQPVFVSRMPKRS

ITGAAHQETLRRYIGIDERSGKIQTVVKKKLSEIQLDKTGHFPMYGKESDPRTYEAIRQRLLEHNNDPKKAFQEPLYKPKK

NGELGPIIRTIKIIDTTNQVIPLNDGKTVAYNSNIVRVDVFEKDGKYYCVPIYTIDMMKGILPNKAIEPNKPYSEWKEMTE

DYTFRFSLYPNDLIRIEFPREKTIKTAVGEEIKIKDLFAYYQTIDSSNGGLSLVSHDNNFSLRSIGSRTLKRFEKYQVDVL

GNIYKVRGEKRVGVASSSHSKAGETIRPL (SEQ ID NO: 1361441)

ScCas9
MEKKYSIGLDIGTNSVGWAVITDDYKVPSKKFKVLGNTNRKSIKKNLMGALLFDSGETAEATRLKRTARRRYTRRKNRIRY

S. canis

LQEIFANEMAKLDDSFFQRLEESELVEEDKKNERHPIFGNLADEVAYHRNYPTIYHLRKKLADSPEKADLRLIYLALAHII

1375 AA
KFRGHFLIEGKLNAENSDVAKLFYQLIQTYNQLFEESPLDEIEVDAKGILSARLSKSKRLEKLIAVFPNEKKNGLFGNIIA

159.2 kDa
LALGLTPNFKSNFDLTEDAKLQLSKDTYDDDLDELLGQIGDQYADLFSAAKNLSDAILLSDILRSNSEVTKAPLSASMVKR

YDEHHQDLALLKTLVRQQFPEKYAEIFKDDTKNGYAGYVGIGIKHRKRTTKLATQEEFYKFIKPILEKMDGAEELLAKLNR

DDLLRKQRTFDNGSIPHQIHLKELHAILRRQEEFYPFLKENREKIEKILTFRIPYYVGPLARGNSRFAWLTRKSEEAITPW

NFEEVVDKGASAQSFIERMTNFDEQLPNKKVLPKHSLLYEYFTVYNELTKVKYVTERMRKPEFLSGEQKKAIVDLLFKTNR

KVTVKQLKEDYFKKIECFDSVEIIGVEDRFNASLGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDREMIEERLKT

YAHLFDDKVMKQLKRRHYTGWGRLSRKMINGIRDKQSGKTILDFLKSDGFSNRNEMQLIHDDSLTFKEEIEKAQVSGQGDS

LHEQIADLAGSPAIKKGILQTVKIVDELVKVMGHKPENIVIEMARENQTTTKGLQQSRERKKRIEEGIKELESQILKENPV

ENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFIKDDSIDNKVLTRSVENRGKSDNVPSEEVVKKMKNY

WRQLLNAKLITQRKFDNLTKAERGGLSEADKAGFIKRQLVETRQITKHVARILDSRMNTKRDKNDKPIREVKVITLKSKLV

SDFRKDFQLYKVRDINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKRFFYSNIMN

FFKTEVKLANGEIRKRPLIETNGETGEVVWNKEKDFATVRKVLAMPQVNIVKKTEVQTGGFSKESILSKRESAKLIPRKKG

WDTRKYGGFGSPTVAYSILVVAKVEKGKAKKLKSVKVLVGITIMEKGSYEKDPIGFLEAKGYKDIKKELIFKLPKYSLFEL

ENGRRRMLASATELQKANELVLPQHLVRLLYYTQNISATTGSNNLGYIEQHREEFKEIFEKIIDFSEKYILKNKVNSNLKS

SFDEQFAVSDSILLSNSFVSLLKYTSFGASGGFTFLDLDVKQGRLRYQTVTEVLDATLIYQSITGLYETRTDLSQLGGD

(SEQ ID NO: 1361442)

The prime editors described herein may include any of the above Cas9 ortholog sequences, or any variants thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.

The napDNAbp may include any suitable homologs and/or orthologs or naturally occurring enzymes, such as, Cas9. Cas9 homologs and/or orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Preferably, the Gas moiety is configured (e.g, mutagenized, recombinantly engineered, or otherwise obtained from nature) as a nickase, i.e., capable of cleaving only a single strand of the target doubpdditional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease has an inactive (e.g., an inactivated) DNA cleavage domain, that is, the Cas9 is a nickase. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 3. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the Cas9 orthologs in the above tables.

(iii) Dead Cas9 Variant

In some embodiments, the prime editors described herein may include a dead Cas9, e.g., dead SpCas9, which has no nuclease activity due to one or more mutations that inactive both nuclease domains of Cas9, namely the RuvC domain (which cleaves the non-protospacer DNA strand) and HNH domain (which cleaves the protospacer DNA strand). The nuclease inactivation may be due to one or mutations that result in one or more substitutions and/or deletions in the amino acid sequence of the encoded protein, or any variants thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.

As used herein, the term “dCas9” refers to a nuclease-inactive Cas9 or nuclease-dead Cas9, or a functional fragment thereof, and embraces any naturally occurring dCas9 from any organism, any naturally-occurring dCas9 equivalent or functional fragment thereof, any dCas9 homolog, ortholog, or paralog from any organism, and any mutant or variant of a dCas9, naturally-occurring or engineered. The term dCas9 is not meant to be particularly limiting and may be referred to as a “dCas9 or equivalent.” Exemplary dCas9 proteins and method for making dCas9 proteins are further described herein and/or are described in the art and are incorporated herein by reference.

In other embodiments, dCas9 corresponds to, or comprises in part or in whole, a Cas9 amino acid sequence having one or more mutations that inactivate the Cas9 nuclease activity. In other embodiments, Cas9 variants having mutations other than D10A and H840A are provided which may result in the full or partial inactivate of the endogenous Cas9 nuclease activity (e.g., nCas9 or dCas9, respectively). Such mutations, by way of example, include other amino acid substitutions at D10 and H820, or other substitutions within the nuclease domains of Cas9 (e.g., substitutions in the HNH nuclease subdomain and/or the RuvC1 subdomain) with reference to a wild type sequence such as Cas9 from Streptococcus pyogenes (NCBI Reference Sequence: NC_017053.1 (SEQ ID NO: 1361424)). In some embodiments, variants or homologues of Cas9 (e.g., variants of Cas9 from Streptococcus pyogenes (NCBI Reference Sequence: NC_017053.1 (SEQ ID NO: 1361424))) are provided which are at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to NCBI Reference Sequence: NC_017053.1 (SEQ ID NO: 1361424). In some embodiments, variants of dCas9 (e.g., variants of NCBI Reference Sequence: NC_017053.1 (SEQ ID NO: 1361424)) are provided having amino acid sequences which are shorter, or longer than NC_017053.1 (SEQ ID NO: 1361424) by about 5 amino acids, by about 10 amino acids, by about 15 amino acids, by about 20 amino acids, by about 25 amino acids, by about 30 amino acids, by about 40 amino acids, by about 50 amino acids, by about 75 amino acids, by about 100 amino acids or more.

In one embodiment, the dead Cas9 may be based on the canonical SpCas9 sequence of Q99ZW2 and may have the following sequence, which comprises a D10A and an H810A substitutions (underlined and bolded), or a variant of SEQ ID NO: 1361444 having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto:

Description
Sequence
SEQ ID NO:

dead Cas9 or
MDKKYSIGLXIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
SEQ ID NO:

dCas9
EATRLKRTAKRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
1361443

Streptococcus

FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKERGHFLIEGDLNPD

pyogenes

NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

Q99ZW2 Cas9
LFGNLIALSLGLTPNEKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

with D10X and
LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

H810X
SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTEDNGSIPHQ

Where “X” is
IHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

any amino
ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

acid
EGMRKPAFLSGEQKKAIVDLLEKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDXIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTT?DRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

dead Cas9 or
MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
SEQ ID NO:

dCas9
EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
1361444

Streptococcus

FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD

pyogenes

NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

Q99ZW2 Cas9
LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

with D10A and
LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

H810A
SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTEDNGSIPHQ

IHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDERKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

(iv) Cas9 Nickase Variant

In one embodiment, the prime editors described herein comprise a Cas9 nickase. The term “Cas9 nickase” of “nCas9” refers to a variant of Cas9 which is capable of introducing a single-strand break in a double strand DNA molecule target. In some embodiments, the Cas9 nickase comprises only a single functioning nuclease domain. The wild type Cas9 (e.g., the canonical SpCas9) comprises two separate nuclease domains, namely, the RuvC domain (which cleaves the non-protospacer DNA strand) and HNH domain (which cleaves the protospacer DNA strand). In one embodiment, the Cas9 nickase comprises a mutation in the RuvC domain which inactivates the RuvC nuclease activity. For example, mutations in aspartate (D) 10, histidine (H) 983, aspartate (D) 986, or glutamate (E) 762, have been reported as loss-of-function mutations of the RuvC nuclease domain and the creation of a functional Cas9 nickase (e.g., Nishimasu et al., “Crystal structure of Cas9 in complex with guide RNA and target DNA,” Cell 156(5), 935-949, which is incorporated herein by reference). Thus, nickase mutations in the RuvC domain could include D10X, H983X, D986X, or E762X, wherein X is any amino acid other than the wild type amino acid. In some embodiments, the nickase could be D10A, of H983A, or D986A, or E762A, or a combination thereof.

In various embodiments, the Cas9 nickase can having a mutation in the RuvC nuclease domain and have one of the following amino acid sequences, or a variant thereof having an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.

Description
Sequence
SEQ ID NO:

Cas9 nickase
MDKKYSIGLXIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
SEQ ID NO:

Streptococcus

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
1361445

pyogenes

FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD

Q992W2 Cas9
NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

with D10X,
LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

wherein X is
LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

any alternate
SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTEDNGSIPHQ

amino acid
IHLGELHAILRRQEDFYPELKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLEKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGTYHDLLKIIKDKDFLDNEENEDILEDIVLILTLFEDREMIEERLKTYAHLEDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

Cas9 nickase
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
SEQ ID NO:

Streptococcus

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
1361446

pyogenes

FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKERGHFLIEGDLNPD

Q99ZW2 Cas9
NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

with E762X,
LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

wherein X is
LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

any alternate
SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQ

amino acid
IHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRAKPENIVIXMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

Cas9 nickase
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
SEQ ID NO:

Streptococcus

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
1361447

pyogenes

FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKERGHFLIEGDLNPD

Q99ZW2 Cas9
NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

with H983X,
LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

wherein X is
LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

any alternate
SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTEDNGSIPHQ

amino acid
IHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDERKDFQFYKVREINNYHXAHDAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

Cas9 nickase
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
SEQ ID NO:

Streptococcus

IHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET
1361448

pyogenes

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI

Q99ZW2 Cas9
FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD

with D986X,
NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

wherein X is
LFGNLIALSLGLTPNEKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

any alternate
LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

amino acid
SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTEDNGSIPHQ

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLEKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHXAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

Cas9 nickase
MDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
SEQ ID NO:

Streptococcus

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
1361449

pyogenes

FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD

Q99ZW2 Cas9
NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

with D10A
LFGNLIALSLGLTPNEKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQ

IHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTEKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

Cas9 nickase
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
SEQ ID NO:

Streptococcus

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
1361450

pyogenes

FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKERGHFLIEGDLNPD

Q992W2 Cas9
NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

with E762A
LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTEDNGSIPHQ

IHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYEDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

Cas9 nickase
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLEDSGETA
SEQ ID NO:

Streptococcus

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
1361451

pyogenes

FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD

Q99ZW2 Cas9
NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

with H983A
LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQ

IHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNAS

LGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHAAHDAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

Cas9 nickase
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
SEQ ID NO:

Streptococcus

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESELVEEDKKHERHPI
1361452

pyogenes

FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD

Q99ZW2 Cas9
NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

with D986A
LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQ

IHLGELHAILRRQEDFYPELKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHAAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

In another embodiment, the Cas9 nickase comprises a mutation in the HNH domain which inactivates the HNH nuclease activity. For example, mutations in histidine (H) 840 or asparagine (R) 863 have been reported as loss-of-function mutations of the HNH nuclease domain and the creation of a functional Cas9 nickase (e.g., Nishimasu et al., “Crystal structure of Cas9 in complex with guide RNA and target DNA,” Cell 156(5), 935-949, which is incorporated herein by reference). Thus, nickase mutations in the HNH domain could include H840X and R863X, wherein X is any amino acid other than the wild type amino acid. In some embodiments, the nickase could be H840A or R863A, or a combination thereof.

In various embodiments, the Cas9 nickase can have a mutation in the HNH nuclease domain and have one of the following amino acid sequences, or a variant thereof, having an amino acid sequence that has at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto.

Description
Sequence
SEQ ID NO:

Cas9 nickase
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
SEQ ID NO:

Streptococcus

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
1361453

pyogenes

FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD

Q992W2 Cas9
NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

with H840X,
LFGNLIALSLGLTPNEKSNEDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

wherein X is
LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

any alternate
SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTEDNGSIPHQ

amino acid
IHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNEMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDXIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDERKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

Cas9 nickase
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
SEQ ID NO:

Streptococcus

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
1361454

pyogenes

FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPD

Q992W2 Cas9
IHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

with H840A,
NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

wherein X is
LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

any alternate
LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

amino acid
SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQ

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

Cas9 nickase
FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKERGHFLIEGDLNPD
SEQ ID NO:

Streptococcus

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
1361455

pyogenes

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESELVEEDKKHERHPI

Q99zW2 Cas9
NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

with R863X,
IHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

wherein X is
LFGNLIALSLGLTPNEKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

any alternate
LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

amino acid
SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQ

ITPWNFEEVVDKGASAQSFIERMTNEDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNXGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

Cas9 nickase
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA
SEQ ID NO:

Streptococcus

EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPI
1361456

pyogenes

FGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKERGHFLIEGDLNPD

Q99zW2 Cas9
NSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG

with R863A,
LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKN

wherein Xis
LSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

any alternate
SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQ

amino acid
IHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEET

ITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVT

EGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENAS

LGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMK

QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI

QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMAREN

QTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQ

ELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNAGKSDNVPSEEVVKKMKNYWRQ

LLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK

LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPL

IETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA

SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHR

DKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

(v) Other Cas9 Variants

Besides dead Cas9 and Cas9 nickase variants, the Cas9 proteins used herein may also include other “Cas9 variants” having at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference Cas9 protein, including any wild type Cas9, or mutant Cas9 (e.g., a dead Cas9 or Cas9 nickase), or fragment Cas9, or circular permutant Cas9, or other variant of Cas9 disclosed herein or known in the art. In some embodiments, a Cas9 variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more amino acid changes compared to a reference Cas9. In some embodiments, the Cas9 variant comprises a fragment of a reference Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9. In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9 (e.g., SEQ ID NO: 1361421).

In some embodiments, the disclosure also may utilize Cas9 fragments which retain their functionality and which are fragments of any herein disclosed Cas9 protein. In some embodiments, the Cas9 fragment is at least 100 amino acids in length. In some embodiments, the fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or at least 1300 amino acids in length.

In various embodiments, the prime editors disclosed herein may comprise one of the Cas9 variants described as follows, or a Cas9 variant thereof having at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference Cas9 variants.

(vi) Small-Sized Cas9 Variants

In some embodiments, the prime editors contemplated herein can include a Cas9 protein that is of smaller molecular weight than the canonical SpCas9 sequence. In some embodiments, the smaller-sized Cas9 variants may facilitate delivery to cells, e.g., by an expression vector, nanoparticle, or other means of delivery. In certain embodiments, the smaller-sized Cas9 variants can include enzymes categorized as type II enzymes of the Class 2 CRISPR-Cas systems. In some embodiments, the smaller-sized Cas9 variants can include enzymes categorized as type V enzymes of the Class 2 CRISPR-Cas systems. In other embodiments, the smaller-sized Cas9 variants can include enzymes categorized as type VI enzymes of the Class 2 CRISPR-Cas systems.

The canonical SpCas9 protein is 1368 amino acids in length and has a predicted molecular weight of 158 kilodaltons. The term “small-sized Cas9 variant”, as used herein, refers to any Cas9 variant—naturally occurring, engineered, or otherwise—that is less than at least 1300 amino acids, or at least less than 1290 amino acids, or than less than 1280 amino acids, or less than 1270 amino acid, or less than 1260 amino acid, or less than 1250 amino acids, or less than 1240 amino acids, or less than 1230 amino acids, or less than 1220 amino acids, or less than 1210 amino acids, or less than 1200 amino acids, or less than 1190 amino acids, or less than 1180 amino acids, or less than 1170 amino acids, or less than 1160 amino acids, or less than 1150 amino acids, or less than 1140 amino acids, or less than 1130 amino acids, or less than 1120 amino acids, or less than 1110 amino acids, or less than 1100 amino acids, or less than 1050 amino acids, or less than 1000 amino acids, or less than 950 amino acids, or less than 900 amino acids, or less than 850 amino acids, or less than 800 amino acids, or less than 750 amino acids, or less than 700 amino acids, or less than 650 amino acids, or less than 600 amino acids, or less than 550 amino acids, or less than 500 amino acids, but at least larger than about 400 amino acids and retaining the required functions of the Cas9 protein. The Cas9 variants can include those categorized as type II, type V, or type VI enzymes of the Class 2 CRISPR-Cas system.

In various embodiments, the prime editors disclosed herein may comprise one of the small-sized Cas9 variants described as follows, or a Cas9 variant thereof having at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference small-sized Cas9 protein.

Description
Sequence
SEQ ID NO:

SaCas9
MGKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARRLKR
SEQ ID NO:

Staphylococcus

RRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRG
1361457

aureus

VHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSDY
same as

1053 AA
VKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMG
1361431

123 kDa
HCTYFPEELRSVKYAYNADLYNALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKP

TLKQIAKEILVNEEDIKGYRVTSTGKPEFTNLKVYHDIKDITARKEIIENAELLDQIAK

ILTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAINLILDELWATND

NQIAIFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLP

NDIIIELAREKNSKDAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQ

EGKCLYSLEAIPLEDLINNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKGNRTPF

QYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNLVD

TRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALI

IANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDE

KDYKYSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPE

KLLMYHHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYG

NKLNAHLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVN

SKCYEEAKKLKKISNQAEFIASFYKNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYR

EYLENMNDKRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKK

NmeCas9
MAAFKPNSINYILGLDIGIASVGWAMVEIDEEENPIRLIDLGVRVFERAEVPKTGDSLA
SEQ ID NO:

N.

MARRLARSVRRLTRRRAHRLLRTRRLLKREGVLQAANFDENGLIKSLPNTPWQLRAAAL
11361458

meningitidis

DRKLTPLEWSAVLLHLIKHRGYLSQRKNEGETADKELGALLKGVAGNAHALQTGDFRTP

1083 AA
AELALNKFEKESGHIRNQRSDYSHTFSRKDLQAELILLFEKQKEFGNPHVSGGLKEGIE

124.5 kDa
TLLMTQRPALSGDAVQKMLGHCTFEPAEPKAAKNTYTAERFIWLTKLNNLRILEQGSER

PLTDTERATLMDEPYRKSKLTYAQARKLLGLEDTAFFKGLRYGKDNAEASTLMEMKAYH

AISRALEKEGLKDKKSPLNLSPELQDEIGTAFSLFKTDEDITGRLKDRIQPEILEALLK

HISEDKFVQISLKALRRIVPLMEQGKRYDEACAEIYGDHYGKKNTEEKIYLPPIPADEI

RNPVVLRALSQARKVINGVVRRYGSPARIHIETAREVGKSFKDRKEIEKRQEENRKDRE

KAAAKFREYFPNFVGEPKSKDILKLRLYEQQHGKCLYSGKEINLGRLNEKGYVEIDAAL

PFSRTWDDSENNKVLVLGSENQNKGNQTPYEYFNGKDNSREWQEFKARVETSRFPRSKK

QRILLQKFDEDGFKERNLNDTRYVNRFLCQFVADRMRLTGKGKKRVFASNGQITNLLRG

FWGLRKVRAENDRHHALDAVVVACSTVAMQQKITREVRYKEMNAFDGKTIDKETGEVLH

QKTHFPQPWEFFAQEVMIRVFGKPDGKPEFEEADTLEKLRTLLAEKLSSRPEAVHEYVT

PLFVSRAPNRKMSGQGHMETVKSAKRLDEGVSVLRVPLTQLKLKDLEKMVNREREPKLY

EALKARLEAHKDDPAKAFAEPFYKYDKAGNRTQQVKAVRVEQVQKTGVWVRNHNGIADN

ATMVRVDVFEKGDKYYLVPIYSWQVAKGILPDRAVVQGKDEEDWQLIDDSFNFKFSLHP

NDLVEVITKKARMFGYFASCHRGTGNINIRIHDLDHKIGKNGILEGIGVKTALSFQKYQ

IDELGKEIRPCRLKKRPPVR

CjCas9
MARILAFDIGISSIGWAFSENDELKDCGVRIFTKVENPKTGESLALPRRLARSARKRLA
SEQ ID NO:

C. jejuni

RRKARLNHLKHLIANEFKLNYEDYQSFDESLAKAYKGSLISPYELRFRALNELLSKQDE
1361459

984 AA
ARVILHIAKRRGYDDIKNSDDKEKGAILKAIKQNEEKLANYQSVGEYLYKEYFQKFKEN

114.9 kDa
SKEFTNVRNKKESYERCIAQSFLKDELKLIFKKQREFGFSFSKKFEEEVLSVAFYKRAL

KDFSHLVGNCSFFTDEKRAPKNSPLAFMFVALTRIINLLNNLKNTEGILYTKDDLNALL

NEVLKNGTLTYKQTKKLLGLSDDYEFKGEKGTYFIEFKKYKEFIKALGEHNLSQDDLNE

IAKDITLIKDEIKLKKALAKYDLNQNQIDSLSKLEFKDHLNISFKALKLVTPLMLEGKK

YDEACNELNLKVAINEDKKDELPAFNETYYKDEVTNPVVLRAIKEYRKVLNALLKKYGK

VHKINIELAREVGKNHSQRAKIEKEQNENYKAKKDAELECEKLGLKINSKNILKLRLFK

EQKEFCAYSGEKIKISDLQDEKMLEIDHIYPYSRSFDDSYMNKVLVFTKQNQEKLNQTP

FEAFGNDSAKWQKIEVLAKNLPTKKQKRILDKNYKDKEQKNFKDRNLNDTRYIARLVLN

YTKDYLDFLPLSDDENTKLNDTQKGSKVHVEAKSGMLTSALRHTWGFSAKDRNNHLHHA

IDAVIIAYANNSIVKAFSDFKKEQESNSAELYAKKISELDYKNKRKFFEPFSGFRQKVL

DKIDEIFVSKPERKKPSGALHEETFRKEEEFYQSYGGKEGVLKALELGKIRKVNGKIVK

NGDMFRVDIFKHKKINKFYAVPIYTMDFALKVLPNKAVARSKKGEIKDWILMDENYEFC

FSLYKDSLILIQTKDMQEPEFVYYNAFTSSTVSLIVSKHDNKFETLSKNQKILFKNANE

KEVIAKSIGIQNLKVFEKYIVSALGEVTKAEFRQREDEKK

GeoCas9
MRYKIGLDIGITSVGWAVMNLDIPRIEDLGVRIFDRAENPQTGESLALPRRLARSARRR
SEQ ID NO:

G. stearo-

LRRRKHRLERIRRLVIREGILTKEELDKLFEEKHEIDVWQLRVEALDRKLNNDELARVL
1361460

thermophilus

LHLAKRRGFKSNRKSERSNKENSTMLKHIEENRAILSSYRTVGEMIVKDPKFALHKRNK

1087 AA
GENYTNTIARDDLEREIRLIFSKQREFGNMSCTEEFENEYITIWASQRPVASKDDIEKK

127 kDa
VGFCTFEPKEKRAPKATYTFQSFIAWEHINKLRLISPSGARGLTDEERRLLYEQAFQKN

KITYHDIRTLLHLPDDTYFKGIVYDRGESRKQNENIRFLELDAYHQIRKAVDKVYGKGK

SSSFLPIDEDTFGYALTLFKDDADIHSYLRNEYEQNGKRMPNLANKVYDNELIEELLNL

SFTKFGHLSLKALRSILPYMEQGEVYSSACERAGYTFTGPKKKQKTMLLPNIPPIANPV

VMRALTQARKVVNAIIKKYGSPVSIHIELARDLSQTFDERRKTKKEQDENRKKNETAIR

QLMEYGLTLNPTGHDIVKFKLWSEQNGRCAYSLQPIEIERLLEPGYVEVDHVIPYSRSL

DDSYTNKVLVLTRENREKGNRIPAEYLGVGTERWQQFETFVLTNKQFSKKKRDRLLRLH

YDENEETEFKNRNLNDTRYISRFFANFIREHLKFAESDDKQKVYTVNGRVTAHLRSRWE

FNKNREESDLHHAVDAVIVACTTPSDIAKVTAFYQRREQNKELAKKTEPHFPQPWPHFA

DELRARLSKHPKESIKALNLGNYDDQKLESLQPVFVSRMPKRSVTGAAHQETERRYVGI

DERSGKIQTVVKTKLSEIKLDASGHFPMYGKESDPRTYEAIRQRLLEHNNDPKKAFQEP

LYKPKKNGEPGPVIRTVKIIDTKNQVIPLNDGKTVAYNSNIVRVDVFEKDGKYYCVPVY

TMDIMKGILPNKAIEPNKPYSEWKEMTEDYTFRFSLYPNDLIRIELPREKTVKTAAGEE

INVKDVFVYYKTIDSANGGLELISHDHRFSLRGVGSRTLKRFEKYQVDVLGNIYKVRGE

KRVGLASSAHSKPGKTIRPLQSTRD

LbaCas12a
MSKLEKFTNCYSLSKTLRFKAIPVGKTQENIDNKRLLVEDEKRAEDYKGVKKLLDRYYL
SEQ ID NO:

L. bacterium

SFINDVLHSIKLKNLNNYISLERKKTRTEKENKELENLEINLRKEIAKAFKGNEGYKSL
1361461

1228 AA
FKKDIIETILPEFLDDKDEIALVNSENGFTTAFTGFFDNRENMFSEEAKSTSIAFRCIN

143.9 kDa
ENLTRYISNMDIFEKVDAIFDKHEVQEIKEKILNSDYDVEDFFEGEFFNFVLTQEGIDV

YNAIIGGFVTESGEKIKGLNEYINLYNQKTKQKLPKFKPLYKQVLSDRESLSFYGEGYT

SDEEVLEVERNTLNKNSEIFSSIKKLEKLFKNFDEYSSAGIFVKNGPAISTISKDIFGE

WNVIRDKWNAEYDDIHLKKKAVVTEKYEDDRRKSFKKIGSFSLEQLQEYADADLSVVEK

LKEIIIQKVDEIYKVYGSSEKLFDADFVLEKSLKKNDAVVAIMKDLLDSVKSFENYIKA

FFGEGKETNRDESFYGDFVLAYDILLKVDHIYDAIRNYVTQKPYSKDKFKLYFQNPQFM

GGWDKDKETDYRATILRYGSKYYLAIMDKKYAKCLQKIDKDDVNGNYEKINYKLLPGPN

KMLPKVFFSKKWMAYYNPSEDIQKIYKNGTFKKGDMENLNDCHKLIDFFKDSISRYPKW

SNAYDFNFSETEKYKDIAGFYREVEEQGYKVSFESASKKEVDKLVEEGKLYMFQIYNKD

FSDKSHGTPNLHTMYFKLLFDENNHGQIRLSGGAELFMRRASLKKEELVVHPANSPIAN

KNPDNPKKTTTLSYDVYKDKRFSEDQYELHIPIAINKCPKNIFKINTEVRVLLKHDDNP

YVIGIDRGERNLLYIVVVDGKGNIVEQYSLNEIINNENGIRIKTDYHSLLDKKEKERFE

ARQNWTSIENIKELKAGYISQVVHKICELVEKYDAVIALEDLNSGFKNSRVKVEKQVYQ

KFEKMLIDKLNYMVDKKSNPCATGGALKGYQITNKFESFKSMSTQNGFIFYIPAWLTSK

IDPSTGFVNLLKTKYTSIADSKKFISSFDRIMYVPEEDLFEFALDYKNFSRTDADYIKK

WKLYSYGNRIRIFRNPKKNNVEDWEEVCLTSAYKELFNKYGINYQQGDIRALLCEQSDK

AFYSSFMALMSLMLQMRNSITGRTDVDFLISPVKNSDGIFYDSRNYEAQENAILPKNAD

ANGAYNIARKVLWAIGQFKKAEDEKLDKVKIAISNKEWLEYAQTSVKH

BhCas12b
MATRSFILKIEPNEEVKKGLWKTHEVLNHGIAYYMNILKLIRQEAIYEHHEQDPKNPKK
SEQ ID NO:

B. hisashii

VSKAEIQAELWDFVLKMQKCNSFTHEVDKDEVFNILRELYEELVPSSVEKKGEANQLSN
1361462

1108 AA
KFLYPLVDPNSQSGKGTASSGRKPRWYNLKIAGDPSWEEEKKKWEEDKKKDPLAKILGK

130.4 kDa
LAEYGLIPLFIPYTDSNEPIVKEIKWMEKSRNQSVRRLDKDMFIQALERFLSWESWNLK

VKEEYEKVEKEYKTLEERIKEDIQALKALEQYEKERQEQLLRDTLNTNEYRLSKRGLRG

WREIIQKWLKMDENEPSEKYLEVEKDYQRKHPREAGDYSVYEFLSKKENHFIWRNHPEY

PYLYATFCEIDKKKKDAKQQATFTLADPINHPLWVRFEERSGSNLNKYRILTEQLHTEK

LKKKLTVQLDRLIYPTESGGWEEKGKVDIVLLPSRQFYNQIFLDIEEKGKHAFTYKDES

IKFPLKGTLGGARVQFDRDHLRRYPHKVESGNVGRIYENMTVNIEPTESPVSKSLKIHR

DDFPKVVNFKPKELTEWIKDSKGKKLKSGIESLEIGLRVMSIDLGQRQAAAASIFEVVD

QKPDIEGKLFFPIKGTELYAVHRASFNIKLPGETLVKSREVLRKAREDNLKLMNQKLNE

LRNVLHFQQFEDITEREKRVTKWISRQENSDVPLVYQDELIQIRELMYKPYKDWVAFLK

QLHKRLEVEIGKEVKHWRKSLSDGRKGLYGISLKNIDEIDRTRKELLRWSLRPTEPGEV

RRLEPGQRFAIDQLNHLNALKEDRLKKMANTIIMHALGYCYDVRKKKWQAKNPACQIIL

FEDLSNYNPYEERSRFENSKLMKWSRREIPRQVALQGEIYGLQVGEVGAQFSSRFHAKT

GSPGIRCSVVTKEKLQDNRFFKNLQREGRLTLDKIAVLKEGDLYPDKGGEKFISLSKDR

KCVTTHADINAAQNLQKRFWTRTHGFYKVYCKAYQVDGQTVYIPESKDQKQKIIEEFGE

GYFILKDGVYEWVNAGKLKIKKGSSKQSSSELVDSDILKDSFDLASELKGEKLMLYRDP

SGNVFPSDKWMAAGVFFGKLERILISKLTNQYSISTIEDDSSKQSM

(vii) Cas9 Equivalents

In some embodiments, the prime editors described herein can include any Cas9 equivalent. As used herein, the term “Cas9 equivalent” is a broad term that encompasses any napDNAbp protein that serves the same function as Cas9 in the present prime editors despite that its amino acid primary sequence and/or its three-dimensional structure may be different and/or unrelated from an evolutionary standpoint. Thus, while Cas9 equivalents include any Cas9 ortholog, homolog, mutant, or variant described or embraced herein that are evolutionarily related, the Cas9 equivalents also embrace proteins that may have evolved through convergent evolution processes to have the same or similar function as Cas9, but which do not necessarily have any similarity with regard to amino acid sequence and/or three dimensional structure. The prime editors described here embrace any Cas9 equivalent that would provide the same or similar function as Cas9 despite that the Cas9 equivalent may be based on a protein that arose through convergent evolution. For instance, if Cas9 refers to a type II enzyme of the CRISPR-Cas system, a Cas9 equivalent can refer to a type V or type VI enzyme of the CRISPR-Cas system.

For example, Cas12e (CasX) is a Cas9 equivalent that reportedly has the same function as Cas9 but which evolved through convergent evolution. Thus, the Cas12e (CasX) protein described in Liu et al., “CasX enzymes comprises a distinct family of RNA-guided genome editors,” Nature, 2019, Vol. 566: 218-223, is contemplated to be used with the prime editors described herein. In addition, any variant or modification of Cas12e (CasX) is conceivable and within the scope of the present disclosure.

Cas9 is a bacterial enzyme that evolved in a wide variety of species. However, the Cas9 equivalents contemplated herein may also be obtained from archaea, which constitute a domain and kingdom of single-celled prokaryotic microbes different from bacteria.

In some embodiments, Cas9 equivalents may refer to Cas12e (CasX) or Cas12d (CasY), which have been described in, for example, Burstein et al., “New CRISPR-Cas systems from uncultivated microbes.” Cell Res. 2017 Feb. 21. doi: 10.1038/cr.2017.21, the entire contents of which is hereby incorporated by reference. Using genome-resolved metagenomics, a number of CRISPR-Cas systems were identified, including the first reported Cas9 in the archaeal domain of life. This divergent Cas9 protein was found in little-studied nanoarchaea as part of an active CRISPR-Cas system. In bacteria, two previously unknown systems were discovered, CRISPR-Cas12e and CRISPR-Cas12d, which are among the most compact systems yet discovered. In some embodiments, Cas9 refers to Cas12e, or a variant of Cas12e. In some embodiments, Cas9 refers to a Cas12d, or a variant of Cas12d. It should be appreciated that other RNA-guided DNA binding proteins may be used as a nucleic acid programmable DNA binding protein (napDNAbp), and are within the scope of this disclosure. Also see Liu et al., “CasX enzymes comprises a distinct family of RNA-guided genome editors,” Nature, 2019, Vol. 566: 218-223. Any of these Cas9 equivalents are contemplated.

In some embodiments, the Cas9 equivalent comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring Cas12e (CasX) or Cas12d (CasY) protein. In some embodiments, the napDNAbp is a naturally-occurring Cas12e (CasX) or Cas12d (CasY) protein. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a wild-type Cas moiety or any Cas moiety provided herein.

In various embodiments, the nucleic acid programmable DNA binding proteins include, without limitation, Cas9 (e.g., dCas9 and nCas9), Cas12e (CasX), Cas12d (CasY), Cas12a (Cpf1), Cas12b1 (C2c1), Cas13a (C2c2), Cas12c (C2c3), Argonaute, and Cas12b1. One example of a nucleic acid programmable DNA-binding protein that has different PAM specificity than Cas9 is Clustered Regularly Interspaced Short Palindromic Repeats from Prevotella and Francisella 1 (i.e, Cas12a (Cpf1)). Similar to Cas9, Cas12a (Cpf1) is also a Class 2 CRISPR effector, but it is a member of type V subgroup of enzymes, rather than the type II subgroup. It has been shown that Cas12a (Cpf1) mediates robust DNA interference with features distinct from Cas9. Cas12a (Cpf1) is a single RNA-guided endonuclease lacking tracrRNA, and it utilizes a T-rich protospacer-adjacent motif (TTN, TTTN, or YTN). Moreover, Cpf1 cleaves DNA via a staggered DNA double-stranded break. Out of 16 Cpf1-family proteins, two enzymes from Acidaminococcus and Lachnospiraceae are shown to have efficient genome-editing activity in human cells. Cpf1 proteins are known in the art and have been described previously, for example Yamano et al., “Crystal structure of Cpf1 in complex with guide RNA and target DNA.” Cell (165) 2016, p. 949-962; the entire contents of which is hereby incorporated by reference.

In still other embodiments, the Cas protein may include any CRISPR associated protein, including but not limited to, Cas12a, Cas12b1, Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas10, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, homologs thereof, or modified versions thereof, and preferably comprising a nickase mutation (e.g., a mutation corresponding to the D10A mutation of the wild type Cas9 polypeptide of SEQ ID NO: 1361421).

In various other embodiments, the napDNAbp can be any of the following proteins: a Cas9, a Cas12a (Cpf1), a Cas12e (CasX), a Cas12d (CasY), a Cas12b1 (C2c1), a Cas13a (C2c2), a Cas12c (C2c), a GeoCas9, a CjCas9, a Cas12g, a Cas12h, a Cas12i, a Cas13b, a Cas13c, a Gas13d, a Cas14, a Csn2, an xCas9, an SpCas9-NG, a circularly permuted Cas9, or an Argonaute (Ago) domain, or a variant thereof.

Exemplary Cas9 equivalent protein sequences can include the following:

Description
Sequence

Cas12a
MTQFEGFTNLYQVSKTLRFELIPQGKTLKHIQEQGFIEEDKARNDHYKELKPIIDRIYKTYADQCLQLVQLDWE

(previously
NLSAAIDSYRKEKTEETRNALIEEQATYRNAIHDYFIGRTDNLTDAINKRHAEIYKGLFKAELFNGKVLKQLGT

known as
VTTTEHENALLRSFDKFTTYFSGFYENRKNVFSAEDISTAIPHRIVQDNFPKFKENCHIFTRLITAVPSLREHF

Cpf1)
ENVKKAIGIFVSTSIEEVFSFPFYNQLLTQTQIDLYNQLLGGISREAGTEKIKGLNEVLNLAIQKNDETAHIIA

Acidamino-

SLPHRFIPLFKQILSDRNTLSFILEEFKSDEEVIQSFCKYKTLLRNENVLETAEALFNELNSIDLTHIFISHKK

coccus sp.
LETISSALCDHWDTLRNALYERRISELTGKITKSAKEKVQRSLKHEDINLQEIISAAGKELSEAFKQKTSEILS

(strain
HAHAALDQPLPTTLKKQEEKEILKSQLDSLLGLYHLLDWFAVDESNEVDPEFSARLTGIKLEMEPSLSFYNKAR

BV3L6)
NYATKKPYSVEKFKLNFQMPTLASGWDVNKEKNNGAILFVKNGLYYLGIMPKQKGRYKALSFEPTEKTSEGEDK

UniProtKB
MYYDYFPDAAKMIPKCSTQLKAVTAHFQTHTTPILLSNNFIEPLEITKEIYDLNNPEKEPKKFQTAYAKKTGDQ

U2UMQ6
KGYREALCKWIDFTRDELSKYTKTTSIDLSSLRPSSQYKDLGEYYAELNPLLYHISFQRIAEKEIMDAVETGKL

YLFQIYNKDFAKGHHGKPNLHTLYWTGLFSPENLAKTSIKLNGQAELFYRPKSRMKRMAHRLGEKMLNKKLKDQ

KTPIPDTLYQELYDYVNHRLSHDLSDEARALLPNVITKEVSHEIIKDRRFTSDKFFFHVPITLNYQAANSPSKF

NQRVNAYLKEHPETPIIGIDRGERNLIYITVIDSTGKILEQRSLNTIQQFDYQKKLDNREKERVAARQAWSVVG

TIKDLKQGYLSQVIHEIVDLMIHYQAVVVLENLNFGFKSKRTGIAEKAVYQQFEKMLIDKLNCLVLKDYPAEKV

GGVLNPYQLTDQFTSFAKMGTQSGELFYVPAPYTSKIDPLTGFVDPFVWKTIKNHESRKHFLEGFDFLHYDVKT

GDFILHFKMNRNLSFQRGLPGFMPAWDIVFEKNETQFDAKGTPFIAGKRIVPVIENHRFTGRYRDLYPANELIA

LLEEKGIVERDGSNILPKLLENDDSHAIDTMVALIRSVLQMRNSNAATGEDYINSPVRDLNGVCFDSRFQNPEW

PMDADANGAYHIALKGQLLLNHLKESKDLKLQNGISNQDWLAYIQELRN (SEQ ID NO: 1361463)

Cas12a
MNYKTGLEDFIGKESLSKTLRNALIPTESTKIHMEEMGVIRDDELRAEKQQELKEIMDDYYRTFIEEKLGQIQG

(previously
IQWNSLFQKMEETMEDISVRKDLDKIQNEKRKEICCYFTSDKRFKDLFNAKLITDILPNFIKDNKEYTEEEKAE

known as
KEQTRVLFQRFATAFTNYFNQRRNNFSEDNISTAISFRIVNENSEIHLQNMRAFQRIEQQYPEEVCGMEEEYKD

Cpf1)
MLQEWQMKHIYSVDFYDRELTQPGIEYYNGICGKINEHMNQFCQKNRINKNDFRMKKLAKQILCKKSSYYEIPF

Lachno-

RFESDQEVYDALNEFIKTMKKKEIIRRCVHLGQECDDYDLGKIYISSNKYEQISNALYGSWDTIRKCIKEEYMD

spiraceae

ALPGKGEKKEEKAEAAAKKEEYRSIADIDKIISLYGSEMDRTISAKKCITEICDMAGQISIDPLVCNSDIKLLQ

bacterium
NKEKTTEIKTILDSFLHVYQWGQTFIVSDIIEKDSYFYSELEDVLEDFEGITTLYNHVRSYVTQKPYSTVKFKL

GAM79
HFGSPTLANGWSQSKEYDNNAILLMRDQKFYLGIFNVRNKPDKQIIKGHEKEEKGDYKKMIYNLLPGPSKMLPK

Ref Seq.
VFITSRSGQETYKPSKHILDGYNEKRHIKSSPKFDLGYCWDLIDYYKECIHKHPDWKNYDFHFSDTKDYEDISG

WP_119623382.1
FYREVEMQGYQIKWTYISADEIQKLDEKGQIFLFQIYNKDFSVHSTGKDNLHTMYLKNLFSEENLKDIVLKLNG

EAELFFRKASIKTPIVHKKGSVLVNRSYTQTVGNKEIRVSIPEEYYTEIYNYLNHIGKGKLSSEAQRYLDEGKI

KSFTATKDIVKNYRYCCDHYFLHLPITINFKAKSDVAVNERTLAYIAKKEDIHIIGIDRGERNLLYISVVDVHG

NIREQRSFNIVNGYDYQQKLKDREKSRDAARKNWEEIEKIKELKEGYLSMVIHYIAQLVVKYNAVVAMEDLNYG

FKTGRFKVERQVYQKFETMLIEKLHYLVEKDREVCEEGGVLRGYQLTYIPESLKKVGKQCGFIFYVPAGYTSKI

DPTTGFVNLESFKNLTNRESRQDFVGKFDEIRYDRDKKMFEFSFDYNNYIKKGTILASTKWKVYTNGTRLKRIV

VNGKYTSQSMEVELTDAMEKMLQRAGIEYHDGKDLKGQIVEKGIEAEIIDIFRLTVQMRNSRSESEDREYDRLI

SPVLNDKGEFFDTATADKTLPQDADANGAYCIALKGLYEVKQIKENWKENEQFPRNKLVQDNKTWFDFMQKKRY

L (SEQ ID NO: 1361464)

Cas12a -
MAKNFEDFKRLYSLSKTLRFEAKPIGATLDNIVKSGLLDEDEHRAASYVKVKKLIDEYHKVFIDRVLDDGCLPL

previously
ENKGNNNSLAEYYESYVSRAQDEDAKKKFKEIQQNLRSVIAKKLTEDKAYANLFGNKLIESYKDKEDKKKIIDS

known at
DLIQFINTAESTQLDSMSQDEAKELVKEFWGFVTYFYGFFDNRKNMYTAEEKSTGIAYRLVNENLPKFIDNIEA

Cpf1
FNRAITRPEIQENMGVLYSDFSEYLNVESIQEMFQLDYYNMLLTQKQIDVYNAIIGGKTDDEHDVKIKGINEYI

Prevotella

NLYNQQHKDDKLPKLKALFKQILSDRNAISWLPEEFNSDQEVLNAIKDCYERLAENVLGDKVLKSLLGSLADYS

copri

LDGIFIRNDLQLTDISQKMFGNWGVIQNAIMQNIKRVAPARKHKESEEDYEKRIAGIFKKADSFSISYINDCLN

Ref Seq.
EADPNNAYFVENYFATFGAVNTPTMQRENLFALVQNAYTEVAALLHSDYPTVKHLAQDKANVSKIKALLDAIKS

WP_119227726.1
LQHFVKPLLGKGDESDKDERFYGELASLWAELDTVTPLYNMIRNYMTRKPYSQKKIKLNFENPQLLGGWDANKE

VLNSKAFNKPLTITKEVEDLNNVLYGKYKKFQKGYLTATGDNVGYTHAVNVWIKFCMDELNSYDSTCIYDFSSL

KDYATIILRRNGLYYLAIMDKDSRKLLGKAMPSDGECYEKMVYKFFKDVTTMIPKCSTQLKDVQAYFKVNTDDY

KPESYLSLDAFYQDANLLLYKLSFARASVSYINQLVEEGKMYLFQIYNKDFSEYSKGTPNMHTLYWKALFDERN

LADVVYKLNGQAEMFYRKKSIENTHPTHPANHPILNKNKDNKKKESLFDYDLIKDRRYTVDKFMFHVPITMNEK

SVGSENINQDVKAYLRHADDMHIIGIDRGERHLLYLVVIDLQGNIKEQYSLNEIVNEYNGNTYHTNYHDLLDVR

EEERLKARQSWQTIENIKELKEGYLSQVIHKITQLMVRYHAIVVLEDLSKGFMRSRQKVEKQVYQKFEKMLIDK

LNYLVDKKTDVSTPGGLLNAYQLTCKSDSSQKLGKQSGFLFYIPAWNTSKIDPVTGFVNLLDTHSLNSKEKIKA

FFSKFDAIRYNKDKKWFEFNLDYDKFGKKAEDTRTKWTLCTRGMRIDTFRNKEKNSQWDNQEVDLTTEMKSLLE

HYYIDIHGNLKDAISAQTDKAFFTGLLHILKLTLQMRNSITGTETDYLVSPVADENGIFYDSRSCGNQLPENAD

ANGAYNIARKGLMLIEQIKNAEDLNNVKFDISNKAWINFAQQKPYKNG (SEQ ID NO: 1361465)

Cas12a -
MFSAKLISDILPEFVIHNNNYSASEKEEKTQVIKLESRFATSFKDYFKNRANCESANDISSSSCHRIVNDNAEI

previously
FFSNALVYRRIVKNLSNDDINKISGDMKDSLKEMSLEEIYSYEKYGEFITQEGISFYNDICGKVNLFMNLYCQK

known at
NKENKNLYKLRKLHKQILCIADTSYEVPYKFESDEEVYQSVNGELDNISSKHIVERLRKIGENYNGYNLDKIYI

Cpf1
VSKFYESVSQKTYRDWETINTALEIHYNNILPGNGKSKADKVKKAVKNDLQKSITEINELVSNYKLCPDDNIKA

Eubacterium

ETYIHEISHILNNFEAQELKYNPEIHLVESELKASELKNVLDVIMNAFHWCSVFMTEELVDKDNNFYAELEEIY

rectale

DEIYPVISLYNLVRNYVTQKPYSTKKIKLNFGIPTLADGWSKSKEYSNNAIILMRDNLYYLGIFNAKNKPDKKI

Ref Seq.
IEGNTSENKGDYKKMIYNLLPGPNKMIPKVFLSSKTGVETYKPSAYILEGYKQNKHLKSSKDEDITFCHDLIDY

WP_119223642.1
FKNCIAIHPEWKNFGFDFSDTSTYEDISGFYREVELQGYKIDWTYISEKDIDLLQEKGQLYLFQIYNKDFSKKS

SGNDNLHTMYLKNLFSEENLKDIVLKLNGEAEIFFRKSSIKNPIIHKKGSILVNRTYEAEEKDQFGNIQIVRKT

IPENIYQELYKYFNDKSDKELSDEAAKLKNVVGHHEAATNIVKDYRYTYDKYFLHMPITINFKANKTSFINDRI

LQYIAKEKDLHVIGIDRGERNLIYVSVIDTCGNIVEQKSFNIVNGYDYQIKLKQQEGARQIARKEWKEIGKIKE

IKEGYLSLVIHEISKMVIKYNAIIAMEDLSYGFKKGREKVERQVYQKFETMLINKLNYLVFKDISITENGGLLK

GYQLTYIPDKLKNVGHQCGCIFYVPAAYTSKIDPTTGFVNIFKFKDLTVDAKREFIKKFDSIRYDSDKNLFCFT

FDYNNFITQNTVMSKSSWSVYTYGVRIKRRFVNGRFSNESDTIDITKDMEKTLEMTDINWRDGHDLRQDIIDYE

IVQHIFEIFKLTVQMRNSLSELEDRDYDRLISPVLNENNIFYDSAKAGDALPKDADANGAYCIALKGLYEIKQI

TENWKEDGKFSRDKLKISNKDWFDFIQNKRYL (SEQ ID NO: 1361466)

Cas12a
MNYKTGLEDFIGKESLSKTLRNALIPTESTKIHMEEMGVIRDDELRAEKQQELKEIMDDYYRAFIEEKLGQIQG

previously
IQWNSLFQKMEETMEDISVRKDLDKIQNEKRKEICCYFTSDKRFKDLFNAKLITDILPNFIKDNKEYTEE?K?E

known at
KEQTRVLFQRFATAFTNYFNQRRNNFSEDNISTAISFRIVNENSEIHLQNMRAFQRIEQQYPEEVCGMEEEYKD

Cpf1
MLQEWQMKHIYLVDFYDRVLTQPGIEYYNGICGKINEHMNQFCQKNRINKNDFRMKKLAKQILCKKSSYYEIPF

Clostridium

RFESDQEVYDALNEFIKTMKEKEIICRCVHLGQKCDDYDLGKIYISSNKYEQISNALYGSWDTIRKCIKEEYMD

sp. AF34-10BH
ALPGKGEKKEEKAEAAAKKEEYRSIADIDKIISLYGSEMDRTISAKKCITEICDMAGQISTDPLVCNSDIKLLQ

Ref Seq.
NKEKTTEIKTILDSFLHVYQWGQTFIVSDIIEKDSYFYSELEDVLEDFEGITTLYNHVRSYVTQKPYSTVKFKL

WP_118538418.1
HFGSPTLANGWSQSKEYDNNAILLMRDQKFYLGIFNVRNKPDKQIIKGHEKEEKGDYKKMIYNLLPGPSKMLPK

VFITSRSGQETYKPSKHILDGYNEKRHIKSSPKFDLGYCWDLIDYYKECIHKHPDWKNYDFHFSDTKDYEDISG

FYREVEMQGYQIKWTYISADEIQKLDEKGQIFLFQIYNKDFSVHSTGKDNLHTMYLKNLFSEENLKDIVLKLNG

EAELFFRKASIKTPVVHKKGSVLVNRSYTQTVGDKEIRVSIPEEYYTEIYNYLNHIGRGKLSTEAQRYLEERKI

KSFTATKDIVKNYRYCCDHYFLHLPITINFKAKSDIAVNERTLAYIAKKEDIHIIGIDRGERNLLYISVVDVHG

NIREQRSFNIVNGYDYQQKLKDREKSRDAARKNWEEIEKIKELKEGYLSMVIHYIAQLVVKYNAVVAMEDLNYG

FKTGRFKVERQVYQKFETMLIEKLHYLVEKDREVCEEGGVLRGYQLTYIPESLKKVGKQCGFIFYVPAGYTSKI

DPTTGFVNLFSFKNLTNRESRQDFVGKFDEIRYDRDKKMFEFSFDYNNYIKKGTMLASTKWKVYTNGTRLKRIV

VNGKYTSQSMEVELTDAMEKMLQRAGIEYHDGKDLKGQIVEKGIEAEIIDIFRLTVQMRNSRSESEDREYDRLI

SPVLNDKGEFFDTATADKTLPQDADANGAYCIALKGLYEVKQIKENWKENEQFPRNKLVQDNKTWFDFMQKKRY

L (SEQ ID NO: 1361467)

Cas12b
MATRSFILKIEPNEEVKKGLWKTHEVLNHGIAYYMNILKLIRQEAIYEHHEQDPKNPKKVSKAEIQAELWDFVL

Bacillus

KMQKCNSFTHEVDKDEVFNILRELYEELVPSSVEKKGEANQLSNKFLYPLVDPNSQSGKGTASSGRKPRWYNLK

hisashii

IAGDPSWEEEKKKWEEDKKKDPLAKILGKLAEYGLIPLFIPYTDSNEPIVKEIKWMEKSRNQSVRRLDKDMFIQ

Ref Seq.
ALERFLSWESWNLKVKEEYEKVEKEYKTLEERIKEDIQALKALEQYEKERQEQLLRDTLNTNEYRLSKRGLRGW

WP_095142515.1
REIIQKWLKMDENEPSEKYLEVFKDYQRKHPREAGDYSVYEFLSKKENHFIWRNHPEYPYLYATFCEIDKKKKD

AKQQATFTLADPINHPLWVRFEERSGSNLNKYRILTEQLHTEKLKKKLTVQLDRLIYPTESGGWEEKGKVDIVL

LPSRQFYNQIFLDIEEKGKHAFTYKDESIKFPLKGTLGGARVQFDRDHLRRYPHKVESGNVGRIYENMTVNIEP

TESPVSKSLKIHRDDFPKVVNFKPKELTEWIKDSKGKKLKSGIESLEIGLRVMSIDLGQRQAAAASIFEVVDQK

PDIEGKLFFPIKGTELYAVHRASFNIKLPGETLVKSREVLRKAREDNLKLMNQKLNFLRNVLHFQQFEDITERE

KRVTKWISRQENSDVPLVYQDELIQIRELMYKPYKDWVAFLKQLHKRLEVEIGKEVKHWRKSLSDGRKGLYGIS

LKNIDEIDRTRKFLLRWSLRPTEPGEVRRLEPGQRFAIDQLNHLNALKEDRLKKMANTIIMHALGYCYDVRKKK

WQAKNPACQIILFEDLSNYNPYEERSRFENSKLMKWSRREIPRQVALQGEIYGLQVGEVGAQFSSRFHAKTGSP

GIRCSVVTKEKLQDNRFFKNLQREGRLTLDKIAVLKEGDLYPDKGGEKFISLSKDRKCVTTHADINAAQNLQKR

FWTRTHGFYKVYCKAYQVDGQTVYIPESKDQKQKIIEEFGEGYFILKDGVYEWVNAGKLKIKKGSSKQSSSELV

DSDILKDSFDLASELKGEKLMLYRDPSGNVFPSDKWMAAGVFFGKLERILISKLTNQYSISTIEDDSSKQSM

(SEQ ID NO: 1361468)

Cas12b
MSEKTTQRAYTLRLNRASGECAVCQNNSCDCWHDALWATHKAVNRGAKAFGDWLLTLRGGLCHTLVEMEVPAKG

Thermomonas

NNPPQRPTDQERRDRRVLLALSWLSVEDEHGAPKEFIVATGRDSADDRAKKVEEKLREILEKRDFQEHEIDAWL

hydrothermalis

QDCGPSLKAHIREDAVWVNRRALFDAAVERIKTLTWEEAWDFLEPFFGTQYFAGIGDGKDKDDAEGPARQGEKA

Ref. Seq.
KDLVQKAGQWLSARFGIGTGADFMSMAEAYEKIAKWASQAQNGDNGKATIEKLACALRPSEPPTLDTVLKCISG

WP_072754838
PGHKSATREYLKTLDKKSTVTQEDLNQLRKLADEDARNCRKKVGKKGKKPWADEVLKDVENSCELTYLQDNSPA

RHREFSVMLDHAARRVSMAHSWIKKAEQRRRQFESDAQKLKNLQERAPSAVEWLDRFCESRSMTTGANTGSGYR

IRKRAIEGWSYVVQAWAEASCDTEDKRIAAARKVQADPEIEKFGDIQLFEALAADEAICVWRDQEGTQNPSILI

DYVTGKTAEHNQKRFKVPAYRHPDELRHPVFCDFGNSRWSIQFAIHKEIRDRDKGAKQDTRQLQNRHGLKMRLW

NGRSMTDVNLHWSSKRLTADLALDQNPNPNPTEVTRADRLGRAASSAFDHVKIKNVENEKEWNGRLQAPRAELD

RIAKLEEQGKTEQAEKLRKRLRWYVSFSPCLSPSGPFIVYAGQHNIQPKRSGQYAPHAQANKGRARLAQLILSR

LPDLRILSVDLGHRFAAACAVWETLSSDAFRREIQGLNVLAGGSGEGDLFLHVEMTGDDGKRRTVVYRRIGPDQ

LLDNTPHPAPWARLDRQFLIKLQGEDEGVREASNEELWTVHKLEVEVGRTVPLIDRMVRSGFGKTEKQKERLKK

LRELGWISAMPNEPSAETDEKEGEIRSISRSVDELMSSALGTLRLALKRHGNRARIAFAMTADYKPMPGGQKYY

FHEAKEASKNDDETKRRDNQIEFLQDALSLWHDLESSPDWEDNEAKKLWQNHIATLPNYQTPEEISAELKRVER

NKKRKENRDKLRTAAKALAENDQLRQHLHDTWKERWESDDQQWKERLRSLKDWIFPRGKAEDNPSIRHVGGLSI

TRINTISGLYQILKAFKMRPEPDDLRKNIPQKGDDELENFNRRLLEARDRLREQRVKQLASRIIEAALGVGRIK

IPKNGKLPKRPRTTVDTPCHAVVIESLKTYRPDDLRTRRENRQLMQWSSAKVRKYLKEGCELYGLHFLEVPANY

TSRQCSRTGLPGIRCDDVPTGDFLKAPWWRRAINTAREKNGGDAKDRFLVDLYDHLNNLQSKGEALPATVRVPR

QGGNLFIAGAQLDDTNKERRAIQADLNAAANIGLRALLDPDWRGRWWYVPCKDGTSEPALDRIEGSTAFNDVRS

LPTGDNSSRRÅPREIENLWRDPSGDSLESGTWSPTRAYWDTVQSRVIELLRRHAGLPTS (SEQ ID NO:

1361469)

Cas12b
MSIRSFKLKLKTKSGVNAEQLRRGLWRTHQLINDGIAYYMNWLVLLRQEDLFIRNKETNEIEKRSKEEIQAVLL

Laceyella

ERVHKQQQRNQWSGEVDEQTLLQALRQLYEEIVPSVIGKSGNASLKARFFLGPLVDPNNKTTKDVSKSGPTPKW

sacchari

KKMKDAGDPNWVQEYEKYMAERQTLVRLEEMGLIPLFPMYTDEVGDIHWLPQASGYTRTWDRDMFQQAIERLLS

WP_132221894.1
WESWNRRVRERRAQFEKKTHDFASRFSESDVQWMNKLREYEAQQEKSLEENAFAPNEPYALTKKALRGWERVYH

SWMRLDSAASEEAYWQEVATCQTAMRGEFGDPAIYQFLAQKENHDIWRGYPERVIDFAELNHLQRELRRAKEDA

TFTLPDSVDHPLWVRYEAPGGTNIHGYDLVQDTKRNLTLILDKFILPDENGSWHEVKKVPFSLAKSKQFHRQVW

LQEEQKQKKREVVFYDYSTNLPHLGTLAGAKLQWDRNFLNKRTQQQIEETGEIGKVFFNISVDVRPAVEVKNGR

LQNGLGKALTVLTHPDGTKIVTGWKAEQLEKWVGESGRVSSLGLDSLSEGLRVMSIDLGQRTSATVSVFEITKE

APDNPYKFFYQLEGTEMFAVHQRSFLLALPGENPPQKIKQMREIRWKERNRIKQQVDQLSAILRLHKKVNEDER

IQAIDKLLQKVASWQLNEEIATAWNQALSQLYSKAKENDLQWNQAIKNAHHQLEPVVGKQISLWRKDLSTGRQG

IAGLSLWSIEELEATKKLLTRWSKRSREPGVVKRIERFETFAKQIQHHINQVKENRLKQLANLIVMTALGYKYD

QEQKKWIEVYPACQVVLFENLRSYRFSFERSRRENKKLMEWSHRSIPKLVQMQGELFGLQVADVYAAYSSRYHG

RTGAPGIRCHALTEADLRNETNIIHELIEAGFIKEEHRPYLQQGDLVPWSGGELFATLQKPYDNPRILTLHADI

NAAQNIQKRFWHPSMWFRVNCESVMEGEIVTYVPKNKTVHKKQGKTFRFVKVEGSDVYEWAKWSKNRNKNTFSS

ITERKPPSSMILFRDPSGTFFKEQEWVEQKTFWGKVQSMIQAYMKKTIVQRMEE (SEQ ID NO:

1361470)

Cas12b
MVLGRKDDTAELRRALWTTHEHVNLAVAEVERVLLRCRGRSYWTLDRRGDPVHVPESQVAEDALAMAREAQRRN

Dsulfonatronum

GWPVVGEDEEILLALRYLYEQIVPSCLLDDLGKPLKGDAQKIGTNYAGPLFDSDTCRRDEGKDVACCGPFHEVA

thiodismutans

GKYLGALPEWATPISKQEFDGKDASHLRFKATGGDDAFFRVSIEKANAWYEDPANQDALKNKAYNKDDWKKEKD

WP_031386437
KGISSWAVKYIQKQLQLGQDPRTEVRRKLWLELGLLPLFIPVFDKTMVGNLWNRLAVRLALAHLLSWESWNHRA

VQDQALARAKRDELAALFLGMEDGFAGLREYELRRNESIKQHAFEPVDRPYVVSGRALRSWTRVREEWLRHGDT

QESRKNICNRLQDRLRGKFGDPDVFHWLAEDGQEALWKERDCVTSFSLLNDADGLLEKRKGYALMTFADARLHP

RWAMYEAPGGSNLRTYQIRKTENGLWADVVLLSPRNESAAVEEKTENVRLAPSGQLSNVSFDQIQKGSKMVGRC

RYQSANQQFEGLLGGAEILFDRKRIANEQHGATDLASKPGHVWFKLTLDVRPQAPQGWLDGKGRPALPPEAKHF

KTALSNKSKFADQVRPGLRVLSVDLGVRSFAACSVFELVRGGPDQGTYFPAADGRTVDDPEKLWAKHERSFKIT

LPGENPSRKEEIARRAAMEELRSLNGDIRRLKAILRLSVLQEDDPRTEHLRLFMEAIVDDPAKSALNAELFKGF

GDDRFRSTPDLWKQHCHFFHDKAEKVVAERFSRWRTETRPKSSSWQDWRERRGYAGGKSYWAVTYLEAVRGLIL

RWNMRGRTYGEVNRQDKKQFGTVASALLHHINQLKEDRIKTGADMIIQAARGFVPRKNGAGWVQVHEPCRLILF

EDLARYRFRTDRSRRENSRLMRWSHREIVNEVGMQGELYGLHVDTTEAGFSSRYLASSGAPGVRCRHLVEEDFH

DGLPGMHLVGELDWLLPKDKDRTANEARRLLGGMVRPGMLVPWDGGELFATLNAASQLHVIHADINAAQNLQRR

FWGRCGEAIRIVCNQLSVDGSTRYEMAKAPKARLLGALQQLKNGDAPFHLTSIPNSQKPENSYVMTPTNAGKKY

RAGPGEKSSGEEDELALDIVEQAEELAQGRKTFFRDPSGVFFAPDRWLPSEIYWSRIRRRIWQVTLERNSSGRQ

ERAEMDEMPY (SEQ ID NO: 1361471)

The prime editors described herein may also comprise Cas12a/Cpf1 (dCpf1) variants that may be used as a guide nucleotide sequence-programmable DNA-binding protein domain. The Cas12a/Cpf1 protein has a RuvC-like endonuclease domain that is similar to the RuvC domain of Cas9 but does not have a HNH endonuclease domain, and the N-terminal of Cpf1 does not have the alfa-helical recognition lobe of Cas9. It was shown in Zetsche et al., Cell, 163, 759-771, 2015 (which is incorporated herein by reference) that, the RuvC-like domain of Cpf1 is responsible for cleaving both DNA strands and inactivation of the RuvC-like domain inactivates Cpf1 nuclease activity. The prime editors described herein may also comprise Cas12a (Cpf1) (dCpf1) variants that may be used as a guide nucleotide sequence-programmable DNA-binding protein domain. The Cas12a (Cpf1) protein has a RuvC-like endonuclease domain that is similar to the RuvC domain of Cas9 but does not have a HNH endonuclease domain, and the N-terminal of Cas12a (Cpf1) does not have the alfa-helical recognition lobe of Cas9. It was shown in Zetsche et al., Cell, 163, 759-771, 2015 (which is incorporated herein by reference) that, the RuvC-like domain of Cas12a (Cpf1) is responsible for cleaving both DNA strands and inactivation of the RuvC-like domain inactivates Cas12a (Cpf1) nuclease activity.

Effectors of two of the systems, Cas12b1 and Cas12c, contain RuvC-like endonuclease domains related to Cas12a. A third system, Cas13a contains an effector with two predicated HEPN RNase domains. Production of mature CRISPR RNA is tracrRNA-independent, unlike production of CRISPR RNA by Cas12b1. Cas12b1 depends on both CRISPR RNA and tracrRNA for DNA cleavage. Bacterial Cas13a has been shown to possess a unique RNase activity for CRISPR RNA maturation distinct from its RNA-activated single-stranded RNA degradation activity. These RNase functions are different from each other and from the CRISPR RNA-processing behavior of Cas12a. See, e.g., East-Seletsky, et al., “Two distinct RNase activities of CRISPR-Cas13a enable guide-RNA processing and RNA detection”, Nature, 2016 Oct. 13; 538(7624):270-273, the entire contents of which are hereby incorporated by reference. In vitro biochemical analysis of Cas13a in Leptotrichia shahii has shown that Cas13a is guided by a single CRISPR RNA and can be programed to cleave ssRNA targets carrying complementary protospacers. Catalytic residues in the two conserved HEPN domains mediate cleavage. Mutations in the catalytic residues generate catalytically inactive RNA-binding proteins. See e.g., Abudayyeh et al., “C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector”, Science, 2016 Aug. 5; 353(6299), the entire contents of which are hereby incorporated by reference.

The crystal structure of Alicyclobaccillus acidoterrastris Cas12b1 (AacC2c1) has been reported in complex with a chimeric single-molecule guide RNA (sgRNA). See e.g., Liu et al., “C2c1-sgRNA Complex Structure Reveals RNA-Guided DNA Cleavage Mechanism”, Mol. Cell, 2017 Jan. 19; 65(2):310-322, the entire contents of which are hereby incorporated by reference. The crystal structure has also been reported in Alicyclobacillus acidoterrestris C2c1 bound to target DNAs as ternary complexes. See e.g., Yang et al., “PAM-dependent Target DNA Recognition and Cleavage by C2C1 CRISPR-Cas endonuclease”, Cell, 2016 Dec. 15; 167(7):1814-1828, the entire contents of which are hereby incorporated by reference. Catalytically competent conformations of AacC2c1, both with target and non-target DNA strands, have been captured independently positioned within a single RuvC catalytic pocket, with C2c1-mediated cleavage resulting in a staggered seven-nucleotide break of target DNA. Structural comparisons between C2c1 ternary complexes and previously identified Cas9 and Cpf1 counterparts demonstrate the diversity of mechanisms used by CRISPR-Cas9 systems.

In some embodiments, the napDNAbp may be a C2c1, a C2c2, or a C2c3 protein. In some embodiments, the napDNAbp is a C2c1 protein. In some embodiments, the napDNAbp is a Cas13a protein. In some embodiments, the napDNAbp is a Cas12c protein. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring Cas12b1 (C2c1), Cas13a (C2c2), or Cas12c (C2c3) protein. In some embodiments, the napDNAbp is a naturally-occurring Cas12b1 (C2c1), Cas13a (C2c2), or Cas12c (C2c3) protein.

(viii) Cas9 Circular Permutants

In various embodiments, the prime editors disclosed herein may comprise a circular permutant of Cas9.

The term “circularly permuted Cas9” or “circular permutant” of Cas9 or “CP-Cas9”) refers to any Cas9 protein, or variant thereof, that occurs or has been modify to engineered as a circular permutant variant, which means the N-terminus and the C-terminus of a Cas9 protein (e.g., a wild type Cas9 protein) have been topically rearranged. Such circularly permuted Cas9 proteins, or variants thereof, retain the ability to bind DNA when complexed with a guide RNA (gRNA). See, Oakes et al., “Protein Engineering of Cas9 for enhanced function,” Methods Enzymol, 2014, 546: 491-511 and Oakes et al., “CRISPR-Cas9 Circular Permutants as Programmable Scaffolds for Genome Modification,” Cell, Jan. 10, 2019, 176: 254-267, each of are incorporated herein by reference. The instant disclosure contemplates any previously known CP-Cas9 or use a new CP-Cas9 so long as the resulting circularly permuted protein retains the ability to bind DNA when complexed with a guide RNA (gRNA).

Any of the Cas9 proteins described herein, including any variant, ortholog, or naturally occurring Cas9 or equivalent thereof, may be reconfigured as a circular permutant variant.

In various embodiments, the circular permutants of Cas9 may have the following structure: N-terminus-[original C-terminus]-[optional linker]-[original N-terminus]-C-terminus.

As an example, the present disclosure contemplates the following circular permutants of canonical S. pyogenes Cas9 (1368 amino acids of UniProtKB-Q99ZW2 (CAS9_STRP1) (numbering is based on the amino acid position in SEQ ID NO: 1361421)): N-terminus-[1268-1368]-[optional linker]-[1-1267]-C-terminus; N-terminus-[1168-1368]-[optional linker]-[1-1167]-C-terminus; N-terminus-[1068-1368]-[optional linker]-[1-1067]-C-terminus; N-terminus-[968-1368]-[optional linker]-[1-967]-C-terminus; N-terminus-[868-1368]-[optional linker]-[1-867]-C-terminus; N-terminus-[768-1368]-[optional linker]-[1-767]-C-terminus; N-terminus-[668-1368]-[optional linker]-[1-667]-C-terminus; N-terminus-[568-1368]-[optional linker]-[1-567]-C-terminus; N-terminus-[468-1368]-[optional linker]-[1-467]-C-terminus; N-terminus-[368-1368]-[optional linker]-[1-367]-C-terminus; N-terminus-[268-1368]-[optional linker]-[1-267]-C-terminus; N-terminus-[168-1368]-[optional linker]-[1-167]-C-terminus; N-terminus-[68-1368]-[optional linker]-[1-67]-C-terminus; or N-terminus-[10-1368]-[optional linker]-[1-9]-C-terminus, or the corresponding circular permutants of other Cas9 proteins (including other Cas9 orthologs, variants, etc).

In particular embodiments, the circular permutant Cas9 has the following structure (based on S. pyogenes Cas9 (1368 amino acids of UniProtKB-Q99ZW2 (CAS9 STRP1) (numbering is based on the amino acid position in SEQ ID NO: 1361421): N-terminus-[102-1368]-[optional linker]-[1-101]-C-terminus; N-terminus-[1028-1368]-[optional linker]-[1-1027]-C-terminus; N-terminus-[1041-1368]-[optional linker]-[1-1043]-C-terminus; N-terminus-[1249-1368]-[optional linker]-[1-1248]-C-terminus; or N-terminus-[1300-1368]-[optional linker]-[1-1299]-C-terminus, or the corresponding circular permutants of other Cas9 proteins (including other Cas9 orthologs, variants, etc).

In still other embodiments, the circular permutant Cas9 has the following structure (based on S. pyogenes Cas9 (1368 amino acids of UniProtKB-Q99ZW2 (CAS9_STRP1) (numbering is based on the amino acid position in SEQ ID NO: 1361421): N-terminus-[103-1368]-[optional linker]-[1-102]-C-terminus; N-terminus-[1029-1368]-[optional linker]-[1-1028]-C-terminus; N-terminus-[1042-1368]-[optional linker]-[1-1041]-C-terminus; N-terminus-[1250-1368]-[optional linker]-[1-1249]-C-terminus; or N-terminus-[1301-1368]-[optional linker]-[1-1300]-C-terminus, or the corresponding circular permutants of other Cas9 proteins (including other Cas9 orthologs, variants, etc).

In some embodiments, the circular permutant can be formed by linking a C-terminal fragment of a Cas9 to an N-terminal fragment of a Cas9, either directly or by using a linker, such as an amino acid linker. In some embodiments, the C-terminal fragment that is rearranged to the N-terminus, includes or corresponds to the C-terminal 30% or less of the amino acids of a Cas9 (e.g., amino acids 1012-1368 of SEQ ID NO: 1361421). In some embodiments, the C-terminal fragment that is rearranged to the N-terminus, includes or corresponds to the C-terminal 30%, 29%, 28%, 27%, 26%, 25%, 24%, 23%, 22%, 21%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% of the amino acids of a Cas9 (e.g., the Cas9 of SEQ ID NO: 1361421). In some embodiments, the C-terminal fragment that is rearranged to the N-terminus, includes or corresponds to the C-terminal 410 residues or less of a Cas9 (e.g., the Cas9 of SEQ ID NO: 1361421). In some embodiments, the C-terminal portion that is rearranged to the N-terminus, includes or corresponds to the C-terminal 410, 400, 390, 380, 370, 360, 350, 340, 330, 320, 310, 300, 290, 280, 270, 260, 250, 240, 230, 220, 210, 200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 residues of a Cas9 (e.g., the Cas9 of SEQ ID NO: 1361421). In some embodiments, the C-terminal portion that is rearranged to the N-terminus, includes or corresponds to the C-terminal 357, 341, 328, 120, or 69 residues of a Cas9 (e.g., the Cas9 of SEQ ID NO: 1361421).

In other embodiments, circular permutant Cas9 variants may be defined as a topological rearrangement of a Cas9 primary structure based on the following method, which is based on S. pyogenes Cas9 of SEQ ID NO: 1361421: (a) selecting a circular permutant (CP) site corresponding to an internal amino acid residue of the Cas9 primary structure, which dissects the original protein into two halves: an N-terminal region and a C-terminal region; (b) modifying the Cas9 protein sequence (e.g., by genetic engineering techniques) by moving the original C-terminal region (comprising the CP site amino acid) to precede the original N-terminal region, thereby forming a new N-terminus of the Cas9 protein that now begins with the CP site amino acid residue. The CP site can be located in any domain of the Cas9 protein, including, for example, the helical-II domain, the RuvCIII domain, or the CTD domain. For example, the CP site may be located (relative the S. pyogenes Cas9 of SEQ ID NO: 1361421) at original amino acid residue 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1041, 1247, 1249, or 1282. Thus, once relocated to the N-terminus, original amino acid 181, 199, 230, 270, 310, 1010, 1016, 1023, 1029, 1041, 1247, 1249, or 1282 would become the new N-terminal amino acid. Nomenclature of these CP-Cas9 proteins may be referred to as Cas9-CP¹⁸¹, Cas9-CP¹⁹⁹, Cas9-CP²³⁰, Cas9-CP²⁷⁰, Cas9-CP³¹⁰, Cas9-CP¹⁰¹⁰, Cas9-CP¹⁰¹⁶, Cas9-CP¹⁰²³, Cas9-CP¹⁰²⁹, Cas9-CP¹⁰⁴¹, Cas9-CP¹²⁴⁷, Cas9-CP¹²⁴⁹, and Cas9-CP¹²⁸², respectively. This description is not meant to be limited to making CP variants from SEQ ID NO: 1361421, but may be implemented to make CP variants in any Cas9 sequence, either at CP sites that correspond to these positions, or at other CP sites entirely. This description is not meant to limit the specific CP sites in any way. Virtually any CP site may be used to form a CP-Cas9 variant.

Exemplary CP-Cas9 amino acid sequences, based on the Cas9 of SEQ ID NO: 1361421, are provided below in which linker sequences are indicated by underlining and optional methionine (M) residues are indicated in bold. It should be appreciated that the disclosure provides CP-Cas9 sequences that do not include a linker sequence or that include different linker sequences. It should be appreciated that CP-Cas9 sequences may be based on Cas9 sequences other than that of SEQ ID NO: 1361421 and any examples provided herein are not meant to be limiting.

CP1012
DYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETN
SEQ ID NO:

GETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA
1361475

RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEK

NPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSK

YVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADAN

LDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKE

VLDATLIHQSITGLYETRIDLSQLGGDGGSGGSGGSGGSGGSGGSGGDKKYSIGL

AIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRL

KRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESELVEEDKKHERHPIF

GNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDL

NPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQL

PGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGD

QYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALV

RQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLN

REDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIP

YYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPN

EKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLEKTNRKV

TVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENED

ILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMKQLKRRRYTGWGRLSRKLING

IRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTEKEDIQKAQVSGQGDSLHEHI

ANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRE

RMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLS

KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

DYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA

KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD

ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIK

KYPKLESEFVYG

CP1028
EIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFAT
SEQ ID NO:

VRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSP
1361476

TVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKK

DLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKG

SPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPI

REQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGDGGSGGSGGSGGSGGSGGSGGMDKKYSIGLAIGTNSVGWAVITDE

YKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRI

CYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPT

IYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLV

QTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLEGNLIAL

SLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDA

ILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ

SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGS

IPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAW

MTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFT

VYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLEKTNRKVTVKQLKEDYFKKIEC

FDSVEISGVEDRENASLGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDR

EMIEERLKTYAHLEDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDELK

SDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQ

TVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQ

ILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKD

DSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKEDNLTKAE

RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLK

SKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYK

VYDVRKMIAKSEQ

CP1041
NIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIV
SEQ ID NO:

KKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVE
1361477

KGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFE

LENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLEVE

QHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTL

TNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDGG

SGGSGGSGGSGGSGGSGGDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNT

DRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKV

DDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDK

ADLRLIYLALAHMIKERGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINA

SGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNEDL

AEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEIT

KAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGAS

QEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTEDNGSIPHQIHLGELHAIL

RRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNF

EEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEG

MRKPAFLSGEQKKAIVDLLEKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDREN

ASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLF

DDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDELKSDGFANRNFMQLIH

DDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGR

HKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQN

EKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKN

RGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKEDNLTKAERGGLSELDKAGFIK

RQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFY

KVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQE

IGKATAKYFFYS

CP1249
PEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIR
SEQ ID NO:

EQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYET
1361478

RIDLSQLGGDGGSGGSGGSGGSGGSGGSGGMDKKYSIGLAIGTNSVGWAVITDEY

KVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRIC

YLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTI

YHLRKKLVDSTDKADLRLIYLALAHMIKERGHFLIEGDLNPDNSDVDKLFIQLVQ

TYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALS

LGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAI

LLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQS

KNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSI

PHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWM

TRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTV

YNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECF

DSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDRE

MIEERLKTYAHLEDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDELKS

DGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQT

VKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQI

LKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDD

SIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAER

GGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKS

KLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKV

YDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETG

EIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKD

WDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID

FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNF

LYLASHYEKLKGS

CP1300
KPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITG
SEQ ID NO:

LYETRIDLSQLGGDGGSGGSGGSGGSGGSGGSGGDKKYSIGLAIGTNSVGWAVIT
1361479

DEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKN

RICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKY

PTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQ

LVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLEGNLI

ALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLELAAKNLS

DAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFF

DQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDN

GSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRF

AWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNEDKNLPNEKVLPKHSLLYEY

FTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLEKTNRKVTVKQLKEDYFKKI

ECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFE

DREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDF

LKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGI

LQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELG

SQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSEL

KDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK

AERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVIT

LKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGD

YKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNG

ETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR

KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKN

PIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKY

VNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANL

DKVLSAYNKHRD

The Cas9 circular permutants that may be useful in the prime editing constructs described herein. Exemplary C-terminal fragments of Cas9, based on the Cas9 of SEQ ID NO: 1361421, which may be rearranged to an N-terminus of Cas9, are provided below. It should be appreciated that such C-terminal fragments of Cas9 are exemplary and are not meant to be limiting.

CP1012 C-
DYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETN
SEQ ID NO:

terminal
GETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIA
1361480

fragment
RKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEK

NPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSK

YVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADAN

LDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYEDTTIDRKRYTSTKE

VLDATLIHQSITGLYETRIDLSQLGGD

CP1028 C-
EIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFAT
SEQ ID NO:

terminal
VRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSP
1361481

fragment
TVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKK

DLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKG

SPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPI

REQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE

TRIDLSQLGGD

CP1041 C-
NIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIV
SEQ ID NO:

terminal
KKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVE
1361482

fragment
KGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFE

LENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVE

QHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTL

TNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD

CP1249 C-
PEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIR
SEQ ID NO:

terminal
EQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYET
1361483

fragment
RIDLSQLGGD

CP1300 C-
KPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITG
SEQ ID NO:

terminal
LYETRIDLSQLGGD
1361484

fragment

(ix) Cas9 Variants with Modified PAM Specificities

The prime editors of the present disclosure may also comprise Cas9 variants with modified PAM specificities. Some aspects of this disclosure provide Cas9 proteins that exhibit activity on a target sequence that does not comprise the canonical PAM (5′-NGG-3′, where N is A, C, G, or T) at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NGG-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NNG-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NNA-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NNC-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NNT-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NGT-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NGA-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NGC-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NAA-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NAC-3′ PAM sequence at its 3′-end. In some embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NAT-3′ PAM sequence at its 3′-end. In still other embodiments, the Cas9 protein exhibits activity on a target sequence comprising a 5′-NAG-3′ PAM sequence at its 3′-end.

It should be appreciated that any of the amino acid mutations described herein, (e.g., A262T) from a first amino acid residue (e.g., A) to a second amino acid residue (e.g., T) may also include mutations from the first amino acid residue to an amino acid residue that is similar to (e.g., conserved) the second amino acid residue. For example, mutation of an amino acid with a hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan) may be a mutation to a second amino acid with a different hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan). For example, a mutation of an alanine to a threonine (e.g., a A262T mutation) may also be a mutation from an alanine to an amino acid that is similar in size and chemical properties to a threonine, for example, serine. As another example, mutation of an amino acid with a positively charged side chain (e.g., arginine, histidine, or lysine) may be a mutation to a second amino acid with a different positively charged side chain (e.g., arginine, histidine, or lysine). As another example, mutation of an amino acid with a polar side chain (e.g., serine, threonine, asparagine, or glutamine) may be a mutation to a second amino acid with a different polar side chain (e.g., serine, threonine, asparagine, or glutamine). Additional similar amino acid pairs include, but are not limited to, the following: phenylalanine and tyrosine; asparagine and glutamine; methionine and cysteine; aspartic acid and glutamic acid; and arginine and lysine. The skilled artisan would recognize that such conservative amino acid substitutions will likely have minor effects on protein structure and are likely to be well tolerated without compromising function. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a threonine may be an amino acid mutation to a serine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an arginine may be an amino acid mutation to a lysine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an isoleucine, may be an amino acid mutation to an alanine, valine, methionine, or leucine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a lysine may be an amino acid mutation to an arginine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to an aspartic acid may be an amino acid mutation to a glutamic acid or asparagine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a valine may be an amino acid mutation to an alanine, isoleucine, methionine, or leucine. In some embodiments, any amino of the amino acid mutations provided herein from one amino acid to a glycine may be an amino acid mutation to an alanine. It should be appreciated, however, that additional conserved amino acid residues would be recognized by the skilled artisan and any of the amino acid mutations to other conserved amino acid residues are also within the scope of this disclosure.

In some embodiments, the present disclosure may utilize any of the Cas9 variants disclosed in the Sequence Listing section herein.

SpCas9 H840A

(SEQ ID NO: 1361593)

DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLK

RTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYH

EKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYN

QLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNED

LAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASM

IKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDG

TEELLVKLNREDLLRKORTEDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIP

YYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNEDKNLPNEKVLPKHSL

LYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKOLKEDYFKKIECFDS

VEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAH

LFDDKVMKOLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTEK

EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQT

TQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLONGRDMYVDQELDINRL

SDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITORKF

DNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSK

LVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKS

EQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSM

PQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGK

SKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASA

GELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKOLFVEQHKHYLDEIIEQISEFSKRVI

LADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDA

TLIHQSITGLYETRIDLSQLGGD

Cas9-NG H840A

(SEQ ID NO: 1361595)

DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLK

RTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYH

EKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYN

QLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNED

LAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASM

IKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDG

TEELLVKLNREDLLRKORTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIP

YYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSL

LYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDS

VEISGVEDRFNASLGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDREMIEERLKTYAH

LFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTEK

EDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQT

TQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLONGRDMYVDQELDINRL

SDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITORKF

DNLTKAERGGLSELDKAGFIKROLVETROITKHVAQILDSRMNTKYDENDKLIREVKVITLKSK

LVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKS

EQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSM

PQVNIVKKTEVQTGGFSKESIRPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGK

SKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASA

RFLQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKOLFVEQHKHYLDEIIEQISEFSKRVI

LADANLDKVLSAYNKHRDKPIREQAENIIHLFTLINLGAPRAFKYEDTTIDRKVYRSTKEVLDA

TLIHQSITGLYETRIDLSQLGGD

KKH-Cas9 N580A

(SEQ ID NO: 1361596)

LDQSFIDTYIDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLY

NALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGK

GKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARRLKRRRRHRI

QRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDT

GNELSTKEQISRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSDYVKEAKOLLKVQKAYHQ

NLKGYTGTHNLSLKAINLILDELWHINDNQIAIFNRLKLVPKKVDLSQQKEIPTTLVDDFILSP

VVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQKMINEMQKRNRQTNERIEEIIRTT

PEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSELTQEEIEQIS

GKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVK

QEEASKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRESVQKD

FINRNLVDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAED

ALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKD

YKYSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHH

DPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDD

YPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQA

EFIASFYKNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPPHIIKTIASKT

QSIKKYSTDILGNLYEVKSKKHPQIIKKG

In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5′-NAA-3′ PAM sequence at its 3′-end. In some embodiments, the combination of mutations are present in any one of the clones listed in

TABLE 1

In some embodiments, the combination of mutations are conservative mutations of

the clones listed in Table 1. In some embodiments, the Cas9 protein comprises

the combination of mutations of any one of the Cas9 clones listed in Table X.

Table X: NAA PAM Clones

Mutations from wild-type SpCas9 (e.g., SEQ ID NO: 1361421)

D177N, K218R, D614N, D1135N, P1137S, E1219V, A1320V, A1323D, R1333K

D177N, K218R, D614N, D1135N, E1219V, Q1221H, H1264Y, A1320V, R1333K

A10T, I322V, S409I, E427G, G715C, D1135N, E1219V, Q1221H, H1264Y, A1320V, R1333K

A367T, K710E, R1114G, D1135N, P1137S, E1219V, Q1221H, H1264Y, A1320V, R1333K

A10T, I322V, S409I, E427G, R753G, D861N, D1135N, K1188R, E1219V, Q1221H, H1264H,

A1320V, R1333K

A10T, I322V, S409I, E427G, R654L, V743I, R753G, M1021T, D1135N, D1180G, K1211R,

E1219V, Q1221H, H1264Y, A1320V, R1333K

A10T, I322V, S409I, E427G, V743I, R753G, E762G, D1135N, D1180G, K1211R, E1219V,

Q1221H, H1264Y, A1320V, R1333K

A10T, I322V, S409I, E427G, R753G, D1135N, D1180G, K1211R, E1219V, Q1221H, H1264Y,

S1274R, A1320V, R1333K

A10T, I322V, S409I, E427G, A589S, R753G, D1135N, E1219V, Q1221H, H1264H, A1320V,

R1333K

A10T, I322V, S409I, E427G, R753G, E757K, G865G, D1135N, E1219V, Q1221H, H1264Y,

A1320V, R1333K

A10T, I322V, S409I, E427G, R654L, R753G, E757K, D1135N, E1219V, Q1221H, H1264Y,

A1320V, R1333K

A10T, I322V, S409I, E427G, K599R, M631A, R654L, K673E, V743I, R753G, N758H, E762G,

D1135N, D1180G, E1219V, Q1221H, Q1256R, H1264Y, A1320V, A1323D, R1333K

A10T, I322V, S409I, E427G, R654L, K673E, V743I, R753G, E762G, N869S, N1054D, R1114G,

D1135N, D1180G, E1219V, Q1221H, H1264Y, A1320V, A1323D, R1333K

A10T, I322V, S409I, E427G, R654L, L727I, V743I, R753G, E762G, R859S, N946D, F1134L,

D1135N, D1180G, E1219V, Q1221H, H1264Y, N1317T, A1320V, A1323D, R1333K

A10T, I322V, S409I, E427G, R654L, K673E, V743I, R753G, E762G, N803S, N869S, Y1016D,

G1077D, R1114G, F1134L, D1135N, D1180G, E1219V, Q1221H, H1264Y, V1290G, L1318S,

A1320V, A1323D, R1333K

A10T, I322V, S409I, E427G, R654L, K673E, V743I, R753G, E762G, N803S, N869S, Y1016D,

G1077D, R1114G, F1134L, D1135N, K1151E, D1180G, E1219V, Q1221H, H1264Y, V1290G,

L1318S, A1320V, R1333K

A10T, I322V, S409I, E427G, R654L, K673E, V743I, R753G, E762G, N803S, N869S, Y1016D,

G1077D, R1114G, F1134L, D1135N, D1180G, E1219V, Q1221H, H1264Y, V1290G, L1318S,

A1320V, A1323D, R1333K

A10T, I322V, S409I, E427G, R654L, K673E, F693L, V743I, R753G, E762G, N803S, N869S,

L921P, Y1016D, G1077D, F1080S, R1114G, D1135N, D1180G, E1219V, Q1221H, H1264Y,

L1318S, A1320V, A1323D, R1333K

A10T, I322V, S409I, E427G, E630K, R654L, K673E, V743I, R753G, E762G, Q768H, N803S,

N869S, Y1016D, G1077D, R1114G, F1134L, D1135N, D1180G, E1219V, Q1221H, H1264Y,

L1318S, A1320V, R1333K

A10T, I322V, S409I, E427G, R654L, K673E, F693L, V743I, R753G, E762G, Q768H, N803S,

N869S, Y1016D, G1077D, R1114G, F1134L, D1135N, D1180G, E1219V, Q1221H, G1223S,

H1264Y, L1318S, A1320V, R1333K

A10T, I322V, S409I, E427G, R654L, K673E, F693L, V743I, R753G, E762G, N803S, N869S,

L921P, Y1016D, G1077D, F1801S, R1114G, D1135N, D1180G, E1219V, Q1221H, H1264Y,

L1318S, A1320V, A1323D, R1333K

A10T, I322V, S409I, E427G, R654L, V743I, R753G, M1021T, D1135N, D1180G, K1211R,

E1219V, Q1221H, H1264Y, A1320V, R1333K

A10T, I322V, S409I, E427G, R654L, K673E, V743I, R753G, E762G, M673I, N803S, N869S,

G1077D, R1114G, D1135N, V1139A, D1180G, E1219V, Q1221H, A1320V, R1333K

A10T, I322V, S409I, E427G, R654L, K673E, V743I, R753G, E762G, N803S, N869S, R1114G,

D1135N, E1219V, Q1221H, A1320V, R1333K

In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 1. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table X.

In some embodiments, the Cas9 protein exhibits an increased activity on a target sequence that does not comprise the canonical PAM (5′-NGG-3′) at its 3′ end as compared to Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 1361421. In some embodiments, the Cas9 protein exhibits an activity on a target sequence having a 3′ end that is not directly adjacent to the canonical PAM sequence (5′-NGG-3′) that is at least 5-fold increased as compared to the activity of Streptococcus pyogenes Cas9 as provided by SEQ ID NO: 1361421 on the same target sequence. In some embodiments, the Cas9 protein exhibits an activity on a target sequence that is not directly adjacent to the canonical PAM sequence (5′-NGG-3′) that is at least 10-fold, at least 50-fold, at least 100-fold, at least 500-fold, at least 1,000-fold, at least 5,000-fold, at least 10,000-fold, at least 50,000-fold, at least 100,000-fold, at least 500,000-fold, or at least 1,000,000-fold increased as compared to the activity of Streptococcus pyogenes as provided by SEQ ID NO: 1361421 on the same target sequence. In some embodiments, the 3′ end of the target sequence is directly adjacent to an AAA, GAA, CAA, or TAA sequence. In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5′-NAC-3′ PAM sequence at its 3′-end. In some embodiments, the combination of mutations are present in any one of the clones listed in Table 2. In some embodiments, the combination of mutations are conservative mutations of the clones listed in

TABLE 2

In some embodiments, the Cas9 protein comprises the

combination of mutations of any one of the Cas9 clones listed in Table Y.

Table Y NAC PAM Clones

Mutations from wild-type SpCas9 (e.g., SEQ ID NO: 1361421)

T472I, R753G, K890E, D1332N, R1335Q, T1337N

I1057S, D1135N, P1301S, R1335Q, T1337N

T472I, R753G, D1332N, R1335Q, T1337N

D1135N, E1219V, D1332N, R1335Q, T1337N

T472I, R753G, K890E, D1332N, R1335Q, T1337N

I1057S, D1135N, P1301S, R1335Q, T1337N

T472I, R753G, D1332N, R1335Q, T1337N

T472I, R753G, Q771H, D1332N, R1335Q, T1337N

E627K, T638P, K652T, R753G, N803S, K959N, R1114G, D1135N, E1219V, D1332N,

R1335Q, T1337N

E627K, T638P, K652T, R753G, N803S, K959N, R1114G, D1135N, K1156E, E1219V,

D1332N, R1335Q, T1337N

E627K, T638P, V647I, R753G, N803S, K959N, G1030R, I1055E, R1114G, D1135N,

E1219V, D1332N, R1335Q, T1337N

E627K, E630G, T638P, V647A, G687R, N767D, N803S, K959N, R1114G, D1135N,

E1219V, D1332G, R1335Q, T1337N

E627K, T638P, R753G, N803S, K959N, R1114G, D1135N, E1219V, N1266H, D1332N,

R1335Q, T1337N

E627K, T638P, R753G, N803S, K959N, I1057T, R1114G, D1135N, E1219V, D1332N,

R1335Q, T1337N

E627K, T638P, R753G, N803S, K959N, R1114G, D1135N, E1219V, D1332N, R1335Q,

T1337N

E627K, M631I, T638P, R753G, N803S, K959N, Y1036H, R1114G, D1135N, E1219V,

D1251G, D1332G, R1335Q, T1337N

E627K, T638P, R753G, N803S, V875I, K959N, Y1016C, R1114G, D1135N, E1219V,

D1251G, D1332G, R1335Q, T1337N, I1348V

K608R, E627K, T638P, V647I, R654L, R753G, N803S, T804A, K848N, V922A, K959N,

R1114G, D1135N, E1219V, D1332N, R1335Q, T1337N

K608R, E627K, T638P, V647I, R753G, N803S, V922A, K959N, K1014N, V1015A,

R1114G, D1135N, K1156N, E1219V, N1252D, D1332N, R1335Q, T1337N

K608R, E627K, R629G, T638P, V647I, A711T, R753G, K775R, K789E, N803S, K959N,

V1015A, Y1036H, R1114G, D1135N, E1219V, N1286H, D1332N, R1335Q, T1337N

K608R, E627K, T638P, V647I, T740A, R753G, N803S, K948E, K959N, Y1016S,

R1114G, D1135N, E1219V, N1286H, D1332N, R1335Q, T1337N

K608R, E627K, T638P, V647I, T740A, N803S, K948E, K959N, Y1016S, R1114G,

D1135N, E1219V, N1286H, D1332N, R1335Q, T1337N

I670S, K608R, E627K, E630G, T638P, V647I, R653K, R753G, I795L, K797N, N803S,

K866R, K890N, K959N, Y1016C; R1114G, D1135N, E1219V, D1332N, R1335Q, T1337N

K608R, E627K, T638P, V647I, T740A, G752R, R753G, K797N, N803S, K948E, K959N,

V1015A, Y1016S, R1114G, D1135N, E1219V, N1266H, D1332N, R1335Q, T1337N

I570T, A589V, K608R, E627K, T638P, V6471, R654L, Q716R, R753G, N803S, K948E,

K959N, Y1016S, R1114G, D1135N, E1207G, E1219V, N1234D, D1332N, R1335Q,

T1337N

K608R, E627K, R629G, T638P, V647I, R654L, Q740R, R753G, N803S, K959N, N990S,

T995S, V1015A, Y1036D, R1114G, D1135N, E1207G, E1219V, N1234D, N1266H,

D132N, R1335Q, T1337N

I562F, V565D, I570T, K608R, L625S, E627K, T638P, V647I, R654I, G752R, R753G,

N803S, N808D, K959N, M1021L, R1114G, D1135N, N1177S, N1234D, D1332N, R1335Q,

T1337N

I562F, I570T, K608R, E627K, T638P, V647I, R753G, E790A, N803S, K959N,

V1015A, Y1036H, R1114G, D1135N, D1180E, A1184T, E1219V, D1332N, R1335Q,

T1337N

I570T, K608R, E627K, T638P, V647I, R654H, R753G, E790A, N803S, K959N,

V1015A, R1114G, D1127A, D1135N, E1219V, D1332N, R1335Q, T1337N

I570T, K608R, L625S, E627K, T638P, V647I, R654I, T703P, R753G, N803S, N808D,

K959N, M1021L, R1114G, D1135N, E1219V, D1332N, R1335Q, T1337N

I570S, K608R, E627K, E630G, T638P, V647I, R653K, R753G, I795L, N803S, K866R,

K890N, K959N, Y1016C, R1114G, D1135N, E1219V, D1332N, R1335Q, T1337N

I570T, K608R, E627K, T638P, V647I, R654H, R753G, E790A, N803S, K959N,

V1016A, R1114G, D1135N, E1219V, K1246E, D1332N, R1335Q, T1337N

K608R, E627K, T638P, V647I, R654L, K673E, R753G, E790A, N803S, K948E, K959N,

R1114G, D1127G, D1135N, D1180E, E1219V, N1286H, D1332N, R1335Q, T1337N

K608R, L625S, E627K, T638P, V647I, R654I, 1670T, R753G, N803S, N808D, K959N,

M1021L, R1114G, D1135N, E1219V, N1286H, D1332N, R1335Q, T1337N

E627K, M631V, T638P, V647I, K710E, R753G, N803S, N808D, K948E, M1021L,

R1114G, D1135N, E1219V, D1332N, R1335Q, T1337N, S1338T, H1349R

In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 80% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table 2. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the variants of Table Y.

In some embodiments, the Cas9 protein comprises a combination of mutations that exhibit activity on a target sequence comprising a 5′-NAT-3′ PAM sequence at its 3′-end. In some embodiments, the combination of mutations are present in any one of the clones listed in

TABLE 3

In some embodiments, the combination of mutations are conservative mutations of the

clones listed in Table 3. In some embodiments, the Cas9 protein comprises

the combination of mutations of any one of the Cas9 clones listed in Table Z.

Table Z: NAT PAM Clones

Mutations from wild-type SpCas9 (e.g., SEQ ID NO: 1361421)

K961E, H985Y, D1135N, K1191N, E1219V, Q1221H, A1320A, P1321S, R1335L

D1135N, G1218S, E1219V, Q1221H, P1249S, P1321S, D1322G, R1335L

V743I, R753G, E790A, D1135N, G1218S, E1219V, Q1221H, A1227V, P1249S, N1286K,

A1293T, P1321S, D1322G, R1335L, T1339I

F575S, M631L, R654L, V748I, V743I, R753G, D853E, V922A, R1114G D1135N,

G1218S, E1219V, Q1221H, A1227V, P1249S, N1286K, A1293T, P1321S, D1322G,

R1335L, T1339I

F575S, M631L, R654L, R664K, R753G, D853E, V922A, R1114G D1135N, D1180G,

G1218S, E1219V, Q1221H, P1249S, N1286K, P1321S, D1322G, R1335L

M631L, R654L, R753G, K797E, D853E, V922A, D1012A, R1114G D1135N, G1218S,

E1219V, Q1221H, P1249S, N1317K, P1321S, D1322G, R1335L

F575S, M631L, R654L, R664K, R753G, D853E, V922A, R1114G, Y1131C, D1135N,

D1180G, G1218S, E1219V, Q1221H, P1249S, P1321S, D1322G, R1335L

F575S, M631L, R654L, R664K, R753G, D853E, V922A, R1114G, Y1131C, D1135N,

D1180G, G1218S, E1219V, Q1221H, P1249S, P1321S, D1322G, R1335L

F575S, D596Y, M631L, R654L, R664K, R753G, D853E, V922A, R1114G, Y1131C,

D1135N, D1180G, G1218S, E1219V, Q1221H, P1249S, Q1256R, P1321S, D1322G,

R1335L

F575S, M631L, R654L, R664K, K710E, V750A, R753G, D853E, V922A, R1114G,

Y1131C, D1135N, D1180G, G1218S, E1219V, Q1221H, P1249S, P1321S, D1322G,

R1335L

F575S, M631L, K649R, R654L, R664K, R753G, D853E, V922A, R1114G, Y1131C,

D1135N, K1156E, D1180G, G1218S, E1219V, Q1221H, P1249S, P1321S, D1322G,

R1335L

F575S, M631L, R654L, R664K, R753G, D853E, V922A, R1114G, Y1131C, D1135N,

D1180G, G1218S, E1219V, Q1221H, P1249S, P1321S, D1322G, R1335L

F575S, M631L, R654L, R664K, R753G, D853E, V922A, I1057G, R1114G, Y1131C,

D1135N, D1180G, G1218S, E1219V, Q1221H, P1249S, N1308D, P1321S, D1322G,

R1335L

M631L, R654L, R753G, D853E, V922A, R1114G, Y1131C, D1135N, E1150V, D1180G,

G1218S, E1219V, Q1221H, P1249S, P1321S, D1332G, R1335L

M631L, R654L, R664K, R753G, D853E, I1057V, Y1131C, D1135N, D1180G, G1218S,

E1219V, Q1221H, P1249S, P1321S, D1332G, R1335L

M631L, R654L, R664K, R753G, I1057V, R1114G, Y1131C, D1135N, D1180G, G1218S,

E1219V, Q1221H, P1249S, P1321S, D1332G, R1335L

The above description of various napDNAbps which can be used in connection with the presently disclose prime editors is not meant to be limiting in any way. The prime editors may comprise the canonical SpCas9, or any ortholog Cas9 protein, or any variant Cas9 protein—including any naturally occurring variant, mutant, or otherwise engineered version of Cas9—that is known or which can be made or evolved through a directed evolutionary or otherwise mutagenic process. In various embodiments, the Cas9 or Cas9 variants have a nickase activity, i.e., only cleave of strand of the target DNA sequence. In other embodiments, the Cas9 or Cas9 variants have inactive nucleases, i.e., are “dead” Cas9 proteins. Other variant Cas9 proteins that may be used are those having a smaller molecular weight than the canonical SpCas9 (e.g., for easier delivery) or having modified or rearranged primary amino acid structure (e.g., the circular permutant formats). The prime editors described herein may also comprise Cas9 equivalents, including Cas12a/Cpf1 and Cas12b proteins which are the result of convergent evolution. The napDNAbps used herein (e.g., SpCas9, Cas9 variant, or Cas9 equivalents) may also may also contain various modifications that alter/enhance their PAM specifies. Lastly, the application contemplates any Cas9, Cas9 variant, or Cas9 equivalent which has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence, such as a references SpCas9 canonical sequences or a reference Cas9 equivalent (e.g., Cas12a/Cpf1).

In addition, any available methods may be utilized to obtain or construct a variant or mutant Cas9 protein. The term “mutation,” as used herein, refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)). Mutations can include a variety of categories, such as single base polymorphisms, microduplication regions, indel, and inversions, and is not meant to be limiting in any way. Mutations can include “loss-of-function” mutations which is the normal result of a mutation that reduces or abolishes a protein activity. Most loss-of-function mutations are recessive, because in a heterozygote the second chromosome copy carries an unmutated version of the gene coding for a fully functional protein whose presence compensates for the effect of the mutation. Mutations also embrace “gain-of-function” mutations, which is one which confers an abnormal activity on a protein or cell that is otherwise not present in a normal condition. Many gain-of-function mutations are in regulatory sequences rather than in coding regions, and can therefore have a number of consequences. For example, a mutation might lead to one or more genes being expressed in the wrong tissues, these tissues gaining functions that they normally lack. Because of their nature, gain-of-function mutations are usually dominant.

Mutations can be introduced into a reference Cas9 protein using site-directed mutagenesis. Older methods of site-directed mutagenesis known in the art rely on sub-cloning of the sequence to be mutated into a vector, such as an M13 bacteriophage vector, that allows the isolation of single-stranded DNA template. In these methods, one anneals a mutagenic primer (i.e., a primer capable of annealing to the site to be mutated but bearing one or more mismatched nucleotides at the site to be mutated) to the single-stranded template and then polymerizes the complement of the template starting from the 3′ end of the mutagenic primer. The resulting duplexes are then transformed into host bacteria and plaques are screened for the desired mutation. More recently, site-directed mutagenesis has employed PCR methodologies, which have the advantage of not requiring a single-stranded template. In addition, methods have been developed that do not require sub-cloning. Several issues must be considered when PCR-based site-directed mutagenesis is performed. First, in these methods it is desirable to reduce the number of PCR cycles to prevent expansion of undesired mutations introduced by the polymerase. Second, a selection must be employed in order to reduce the number of non-mutated parental molecules persisting in the reaction. Third, an extended-length PCR method is preferred in order to allow the use of a single PCR primer set. And fourth, because of the non-template-dependent terminal extension activity of some thermostable polymerases it is often necessary to incorporate an end-polishing step into the procedure prior to blunt-end ligation of the PCR-generated mutant product.

Mutations may also be introduced by directed evolution processes, such as phage-assisted continuous evolution (PACE) or phage-assisted noncontinuous evolution (PANCE). The term “phage-assisted continuous evolution (PACE),” as used herein, refers to continuous evolution that employs phage as viral vectors. The general concept of PACE technology has been described, for example, in International PCT application, PCT/US2009/056194, filed Sep. 8, 2009, published as WO 2010/028347 on Mar. 11, 2010; International PCT application, PCT/US2011/066747, filed Dec. 22, 2011, published as WO 2012/088381 on Jun. 28, 2012; U.S. application, U.S. Pat. No. 9,023,594, issued May 5, 2015, International PCT application, PCT/US2015/012022, filed Jan. 20, 2015, published as WO 2015/134121 on Sep. 11, 2015, and International PCT application, PCT/US2016/027795, filed Apr. 15, 2016, published as WO 2016/168631 on Oct. 20, 2016, the entire contents of each of which are incorporated herein by reference. Error-prone reverse transcriptases may also be obtain by phage-assisted non-continuous evolution (PANCE),” which as used herein, refers to non-continuous evolution that employs phage as viral vectors. PANCE is a simplified technique for rapid in vivo directed evolution using serial flask transfers of evolving ‘selection phage’ (SP), which contain a gene of interest to be evolved, across fresh E. coli host cells, thereby allowing genes inside the host E. coli to be held constant while genes contained in the SP continuously evolve. Serial flask transfers have long served as a widely-accessible approach for laboratory evolution of microbes, and, more recently, analogous approaches have been developed for bacteriophage evolution. The PANCE system features lower stringency than the PACE system.

Any of the references noted above which relate to Cas9 or Cas9 equivalents are hereby incorporated by reference in their entireties, if not already stated so.

In some embodiments, the napDNAbp is a nucleic acid programmable DNA binding protein that does not require a canonical (NGG) PAM sequence. In some embodiments, the napDNAbp is an argonaute protein. One example of such a nucleic acid programmable DNA binding protein is an Argonaute protein from Natronobacterium gregoryi (NgAgo). NgAgo is a ssDNA-guided endonuclease. NgAgo binds 5′ phosphorylated ssDNA of ˜24 nucleotides (gDNA) to guide it to its target site and will make DNA double-strand breaks at the gDNA site. In contrast to Cas9, the NgAgo-gDNA system does not require a protospacer-adjacent motif (PAM). Using a nuclease inactive NgAgo (dNgAgo) can greatly expand the bases that may be targeted. The characterization and use of NgAgo have been described in Gao et al., Nat Biotechnol., 2016 July; 34(7):768-73. PubMed PMID: 27136078; Swarts et al., Nature. 507(7491) (2014):258-61; and Swarts et al., Nucleic Acids Res. 43(10) (2015):5120-9, each of which is incorporated herein by reference.

In some embodiments, the napDNAbp is a prokaryotic homolog of an Argonaute protein. Prokaryotic homologs of Argonaute proteins are known and have been described, for example, in Makarova K., et al., “Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements”, Biol Direct. 2009 Aug. 25; 4:29. doi: 10.1186/1745-6150-4-29, the entire contents of which is hereby incorporated by reference. In some embodiments, the napDNAbp is a Marinitoga piezophila Argunaute (MpAgo) protein. The CRISPR-associated Marinitoga piezophila Argunaute (MpAgo) protein cleaves single-stranded target sequences using 5′-phosphorylated guides. The 5′ guides are used by all known Argonautes. The crystal structure of an MpAgo-RNA complex shows a guide strand binding site comprising residues that block 5′ phosphate interactions. This data suggests the evolution of an Argonaute subclass with noncanonical specificity for a 5′-hydroxylated guide. See, e.g., Kaya et al., “A bacterial Argonaute with noncanonical guide RNA specificity”, Proc Natl Acad Sci USA. 2016 Apr. 12; 113(15):4057-62, the entire contents of which are hereby incorporated by reference). It should be appreciated that other argonaute proteins may be used, and are within the scope of this disclosure.

In some embodiments, the napDNAbp is a single effector of a microbial CRISPR-Cas system. Single effectors of microbial CRISPR-Cas systems include, without limitation, Cas9, Cpf1, C2c1, C2c2, and C2c3. Typically, microbial CRISPR-Cas systems are divided into Class 1 and Class 2 systems. Class 1 systems have multisubunit effector complexes, while Class 2 systems have a single protein effector. For example, Cas9 and Cpf1 are Class 2 effectors. In addition to Cas9 and Cpf1, three distinct Class 2 CRISPR-Cas systems (C2c1, C2c2, and C2c3) have been described by Shmakov et al., “Discovery and Functional Characterization of Diverse Class 2 CRISPR Cas Systems”, Mol. Cell, 2015 Nov. 5; 60(3): 385-397, the entire contents of which is hereby incorporated by reference. Effectors of two of the systems, C2c1 and C2c3, contain RuvC-like endonuclease domains related to Cpf1. A third system, C2c2 contains an effector with two predicated HEPN RNase domains. Production of mature CRISPR RNA is tracrRNA-independent, unlike production of CRISPR RNA by C2c1. C2c1 depends on both CRISPR RNA and tracrRNA for DNA cleavage. Bacterial C2c2 has been shown to possess a unique RNase activity for CRISPR RNA maturation distinct from its RNA-activated single-stranded RNA degradation activity. These RNase functions are different from each other and from the CRISPR RNA-processing behavior of Cpf1. See, e.g., East-Seletsky, et al., “Two distinct RNase activities of CRISPR-C2c2 enable guide-RNA processing and RNA detection”, Nature, 2016 Oct. 13; 538(7624):270-273, the entire contents of which are hereby incorporated by reference. In vitro biochemical analysis of C2c2 in Leptotrichia shahii has shown that C2c2 is guided by a single CRISPR RNA and can be programed to cleave ssRNA targets carrying complementary protospacers. Catalytic residues in the two conserved HEPN domains mediate cleavage. Mutations in the catalytic residues generate catalytically inactive RNA-binding proteins. See e.g., Abudayyeh et al., “C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector”, Science, 2016 Aug. 5; 353(6299), the entire contents of which are hereby incorporated by reference.

The crystal structure of Alicyclobaccillus acidoterrastris C2c1 (AacC2c1) has been reported in complex with a chimeric single-molecule guide RNA (sgRNA). See e.g., Liu et al., “C2c1-sgRNA Complex Structure Reveals RNA-Guided DNA Cleavage Mechanism”, Mol. Cell, 2017 Jan. 19; 65(2):310-322, the entire contents of which are hereby incorporated by reference. The crystal structure has also been reported in Alicyclobacillus acidoterrestris C2c1 bound to target DNAs as ternary complexes. See e.g., Yang et al., “PAM-dependent Target DNA Recognition and Cleavage by C2C1 CRISPR-Cas endonuclease”, Cell, 2016 Dec. 15; 167(7):1814-1828, the entire contents of which are hereby incorporated by reference. Catalytically competent conformations of AacC2c1, both with target and non-target DNA strands, have been captured independently positioned within a single RuvC catalytic pocket, with C2c1-mediated cleavage resulting in a staggered seven-nucleotide break of target DNA. Structural comparisons between C2c1 ternary complexes and previously identified Cas9 and Cpf1 counterparts demonstrate the diversity of mechanisms used by CRISPR-Cas9 systems.

In some embodiments, the napDNAbp may be a C2c1, a C2c2, or a C2c3 protein. In some embodiments, the napDNAbp is a C2c1 protein. In some embodiments, the napDNAbp is a C2c2 protein. In some embodiments, the napDNAbp is a C2c3 protein. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring C2c1, C2c2, or C2c3 protein. In some embodiments, the napDNAbp is a naturally-occurring C2c1, C2c2, or C2c3 protein.

Some aspects of the disclosure provide Cas9 domains that have different PAM specificities. Typically, Cas9 proteins, such as Cas9 from S. pyogenes (spCas9), require a canonical NGG PAM sequence to bind a particular nucleic acid region. This may limit the ability to edit desired bases within a genome. In some embodiments, the base editing fusion proteins provided herein may need to be placed at a precise location, for example where a target base is placed within a 4 base region (e.g., a“editing window”), which is approximately 15 bases upstream of the PAM. See Komor, A. C., et al., “Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage” Nature 533, 420-424 (2016), the entire contents of which are hereby incorporated by reference. Accordingly, in some embodiments, any of the fusion proteins provided herein may contain a Cas9 domain that is capable of binding a nucleotide sequence that does not contain a canonical (e.g., NGG) PAM sequence. Cas9 domains that bind to non-canonical PAM sequences have been described in the art and would be apparent to the skilled artisan. For example, Cas9 domains that bind non-canonical PAM sequences have been described in Kleinstiver, B. P., et al., “Engineered CRISPR-Cas9 nucleases with altered PAM specificities” Nature 523, 481-485 (2015); and Kleinstiver, B. P., et al., “Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition” Nature Biotechnology 33, 1293-1298 (2015); the entire contents of each are hereby incorporated by reference.

For example, a napDNAbp domain with altered PAM specificity, such as a domain with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with wild type Francisella novicida Cpf1 (SEQ ID NO: 1361472) (D917, E1006, and D1255 are bolded and underlined), may be used:

(SEQ ID NO: 1361472)

MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKA

KQIIDKYHQFFIEEILSSVCISEDLLQNYSDVYFKLKKSDDDNLQKDFKS

AKDTIKKQISEYIKDSEKFKNLFNQNLIDAKKGQESDLILWLKQSKDNGI

ELFKANSDITDIDEALEIIKSFKGWTTYFKGFHENRKNVYSSNDIPTSII

YRIVDDNLPKFLENKAKYESLKDKAPEAINYEQIKKDLAEELTFDIDYKT

SEVNQRVFSLDEVFEIANENNYLNQSGITKENTIIGGKFVNGENTKRKGI

NEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVT

TMQSFYEQIAAFKTVEEKSIKETLSLLEDDLKAQKLDLSKIYFKNDKSLT

DLSQQVEDDYSVIGTAVLEYITQQIAPKNLDNPSKKEQELIAKKTEKAKY

LSLETIKLALEEFNKHRDIDKQCRFEEILANFAAIPMIFDEIAQNKDNLA

QISIKYQNQGKKDLLQASAEDDVKAIKDLLDQTNNLLHKLKIFHISQSED

KANILDKDEHFYLVFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNF

ENSTLANGWDKNKEPDNTAILFIKDDKYYLGVMNKKNNKIFDDKAIKENK

GEGYKKIVYKLLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKN

GSPQKGYEKFEFNIEDCRKFIDFYKQSISKHPEWKDFGFRFSDTQRYNSI

DEFYREVENQGYKLTFENISESYIDSVVNQGKLYLFQIYNKDFSAYSKGR

PNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKKITHPAKEAIA

NKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKENDEI

NLLLKEKANDVHILSIDRGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMK

TNYHDKLAAIEKDRDSARKDWKKINNIKEMKEGYLSQVVHEIAKLVIEYN

AIVVFEDLNFGFKRGREKVEKQVYQKLEKMLIEKLNYLVFKDNEFDKTGG

VLRAYQLTAPFETFKKMGKQTGIIYYVPAGFTSKICPVTGFVNQLYPKYE

SVSKSQEFFSKEDKICYNLDKGYFEFSFDYKNFGDKAAKGKWTIASFGSR

LINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEYGHGECIKAAICGESD

KKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNM

PQDADANGAYHIGLKGLMLLGRIKNNQEGKKLNLVIKNEEYFEFVQNRNN

An additional napDNAbp domain with altered PAM specificity, such as a domain having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with wild type Geobacillus thermodenitrificans Cas9 (SEQ ID NO: 1361473) may be used.

(SEQ ID NO: 1361473)

MKYKIGLDIGITSIGWAVINLDIPRIEDLGVRIFDRAENPKTGESLALPRRLARSARRRLRRRK

HRLERIRRLFVREGILTKEELNKLFEKKHEIDVWQLRVEALDRKLNNDELARILLHLAKRRGFR

SNRKSERTNKENSTMLKHIEENQSILSSYRTVAEMVVKDPKFSLHKRNKEDNYTNTVARDDLER

EIKLIFAKQREYGNIVCTEAFEHEYISIWASQRPFASKDDIEKKVGFCTFEPKEKRAPKATYTF

QSFTVWEHINKLRLVSPGGIRALTDDERRLIYKQAFHKNKITFHDVRTLLNLPDDTRFKGLLYD

RNTTLKENEKVRFLELGAYHKIRKAIDSVYGKGAAKSFRPIDFDTFGYALTMFKDDTDIRSYLR

NEYEQNGKRMENLADKVYDEELIEELLNLSFSKFGHLSLKALRNILPYMEQGEVYSTACERAGY

TFTGPKKKQKTVLLPNIPPIANPVVMRALTQARKVVNAIIKKYGSPVSIHIELARELSQSFDER

RKMQKEQEGNRKKNETAIRQLVEYGLTLNPTGLDIVKFKLWSEQNGKCAYSLQPIEIERLLEPG

YTEVDHVIPYSRSLDDSYTNKVLVLTKENREKGNRTPAEYLGLGSERWQQFETFVLTNKQFSKK

KRDRLLRLHYDENEENEFKNRNLNDTRYISRFLANFIREHLKFADSDDKQKVYTVNGRITAHLR

SRWNFNKNREESNLHHAVDAAIVACTTPSDIARVTAFYQRREQNKELSKKTDPQFPQPWPHFAD

ELQARLSKNPKESIKALNLGNYDNEKLESLQPVFVSRMPKRSITGAAHQETLRRYIGIDERSGK

IQTVVKKKLSEIQLDKTGHFPMYGKESDPRTYEAIRQRLLEHNNDPKKAFQEPLYKPKKNGELG

PIIRTIKIIDTTNQVIPLNDGKTVAYNSNIVRVDVFEKDGKYYCVPIYTIDMMKGILPNKAIEP

NKPYSEWKEMTEDYTFRFSLYPNDLIRIEFPREKTIKTAVGEEIKIKDLFAYYQTIDSSNGGLS

LVSHDNNFSLRSIGSRTLKRFEKYQVDVLGNIYKVRGEKRVGVASSSHSKAGETIRPL

In some embodiments, the nucleic acid programmable DNA binding protein (napDNAbp) is a nucleic acid programmable DNA binding protein that does not require a canonical (NGG) PAM sequence. In some embodiments, the napDNAbp is an argonaute protein. One example of such a nucleic acid programmable DNA binding protein is an Argonaute protein from Natronobacterium gregoryi (NgAgo). NgAgo is a ssDNA-guided endonuclease. NgAgo binds 5′ phosphorylated ssDNA of ˜24 nucleotides (gDNA) to guide it to its target site and will make DNA double-strand breaks at the gDNA site. In contrast to Cas9, the NgAgo-gDNA system does not require a protospacer-adjacent motif (PAM). Using a nuclease inactive NgAgo (dNgAgo) can greatly expand the bases that may be targeted. The characterization and use of NgAgo have been described in Gao et al., Nat Biotechnol., 34(7): 768-73 (2016), PubMed PMID: 27136078; Swarts et al., Nature, 507(7491): 258-61 (2014); and Swarts et al., Nucleic Acids Res. 43(10) (2015): 5120-9, each of which is incorporated herein by reference. The sequence of Natronobacterium gregoryi Argonaute is provided in SEQ ID NO: 1361474.

The disclosed fusion proteins may comprise a napDNAbp domain having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity with wild type Natronobacterium gregoryi Argonaute (SEQ ID NO: 1361474).

(SEQ ID NO: 1361474)

MTVIDLDSTTTADELTSGHTYDISVTLTGVYDNTDEQHPRMSLAFEQDNGERRYITLWKNTTPK

DVFTYDYATGSTYIFTNIDYEVKDGYENLTATYQTTVENATAQEVGTTDEDETFAGGEPLDHHL

DDALNETPDDAETESDSGHVMTSFASRDQLPEWTLHTYTLTATDGAKTDTEYARRTLAYTVRQE

LYTDHDAAPVATDGLMLLTPEPLGETPLDLDCGVRVEADETRTLDYTTAKDRLLARELVEEGLK

RSLWDDYLVRGIDEVLSKEPVLTCDEFDLHERYDLSVEVGHSGRAYLHINFRHRFVPKLTLADI

DDDNIYPGLRVKTTYRPRRGHIVWGLRDECATDSLNTLGNQSVVAYHRNNQTPINTDLLDAIEA

ADRRVVETRRQGHGDDAVSFPQELLAVEPNTHQIKQFASDGFHQQARSKTRLSASRCSEKAQAF

AERLDPVRLNGSTVEFSSEFFTGNNEQQLRLLYENGESVLTFRDGARGAHPDETFSKGIVNPPE

SFEVAVVLPEQQADTCKAQWDTMADLLNQAGAPPTRSETVQYDAFSSPESISLNVAGAIDPSEV

DAAFVVLPPDQEGFADLASPTETYDELKKALANMGIYSQMAYFDRFRDAKIFYTRNVALGLLAA

AGGVAFTTEHAMPGDADMFIGIDVSRSYPEDGASGQINIAATATAVYKDGTILGHSSTRPQLGE

KLQSTDVRDIMKNAILGYQQVTGESPTHIVIHRDGFMNEDLDPATEFLNEQGVEYDIVEIRKQP

QTRLLAVSDVQYDTPVKSIAAINQNEPRATVATFGAPEYLATRDGGGLPRPIQIERVAGETDIE

TLTRQVYLLSQSHIQVHNSTARLPITTAYADQASTHATKGYLVQTGAFESNVGEL

In some embodiments, the Cas9 domain is a Cas9 domain from Staphylococcus aureus (SaCas9). In some embodiments, the SaCas9 domain is a nuclease active SaCas9, a nuclease inactive SaCas9 (SaCas9d), or a SaCas9 nickase (SaCas9n).

(x) Divided napDNAbp Domains for Split PE Delivery

In various embodiments, the prime editors described herein may be delivered to cells as two or more fragments which become assembled inside the cell (either by passive assembly, or by active assembly, such as using split intein sequences) into a reconstituted prime editor. In some cases, the self-assembly may be passive whereby the two or more prime editor fragments associate inside the cell covalently or non-covalently to reconstitute the prime editor. In other cases, the self-assembly may be catalyzed by dimerization domains installed on each of the fragments. Examples of dimerization domains are described herein. In still other cases, the self-assembly may be catalyzed by split intein sequences installed on each of the prime editor fragments.

Split PE delivery may be advantageous to address various size constraints of different delivery approaches. For example, delivery approaches may include virus-based delivery methods, messenger RNA-based delivery methods, or RNP-based delivery (ribonucleoprotein-based delivery). And, each of these methods of delivery may be more efficient and/or effective by dividing up the prime editor into smaller pieces. Once inside the cell, the smaller pieces can assemble into a functional prime editor. Depending on the means of splitting, the divided prime editor fragments can be reassembled in a non-covalent manner or a covalent manner to reform the prime editor. In one embodiment, the prime editor can be split at one or more split sites into two or more fragments. The fragments can be unmodified (other than being split). Once the fragments are delivered to the cell (e.g., by direct delivery of a ribonucleoprotein complex or by nucleic delivery—e.g., mRNA delivery or virus vector based delivery), the fragments can reassociate covalently or non-covalently to reconstitute the prime editor. In another embodiment, the prime editor can be split at one or more split sites into two or more fragments. Each of the fragments can be modified to comprise a dimerization domain, whereby each fragment that is formed is coupled to a dimerization domain. Once delivered or expressed within a cell, the dimerization domains of the different fragments associate and bind to one another, bringing the different prime editor fragments together to reform a functional prime editor. In yet another embodiment, the prime editor fragment may be modified to comprise a split intein. Once delivered or expressed within a cell, the split intein domains of the different fragments associate and bind to one another, and then undergo trans-splicing, which results in the excision of the split-intein domains from each of the fragments, and a concomitant formation of a peptide bond between the fragments, thereby restoring the prime editor.

In one embodiment, the prime editor can be delivered using a split-intein approach.

The location of the split site can be positioned between any one or more pair of residues in the prime editor and in any domains therein, including within the napDNAbp domain, the polymerase domain (e.g., RT domain), linker domain that joins the napDNAbp domain and the polymerase domain.

In one embodiment, depicted in FIG. 66, the prime editor (PE) is divided at a split site within the napDNAbp.

In certain embodiments, the napDNAbp is a canonical SpCas9 polypeptide of SEQ ID NO: 1361421, as follows:

SpCas9
MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDS
SEQ ID NO:

Streptococcus

GETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEED
1361421

pyogenes

KKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFR

M1
GHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR

SwissProt
RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNEDLAEDAKLQLSKDTYDDDL

Accession
DNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQ

No. Q99ZW2
DLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDG

Wild type
TEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKI

1368 AA
EKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERM

TNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVD

LLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKD

FLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWG

RLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSG

QGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTT

QKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVD

QELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMK

NYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL

DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLN

AVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNF

FKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEV

QTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSK

KLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGR

KRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY

LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGA

PAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD

In certain embodiments, the SpCas9 is split into two fragments at a split site located between residues 1 and 2, or 2 and 3, or 3 and 4, or 4 and 5, or 5 and 6, or 6 and 7, or 7 and 8, or 8 and 9, or 9 and 10, or between any two pair of residues located anywhere between residues 1-10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 1000-1100, 1100-1200, 1200-1300, or 1300-1368 of canonical SpCas9 of SEQ ID NO: 1361421.

In certain embodiments, a napDNAbp is split into two fragments at a split site that is located at a pair of residue that corresponds to any two pair of residues located anywhere between positions 1-10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-200, 200-300, 300-400, 400-500; 500-600, 600-700, 700-800, 800-900, 1000-1100, 1100-1200, 1200-1300, or 1300-1368 of canonical SpCas9 of SEQ ID NO: 1361421.

In certain embodiments, the split site is located one or more polypeptide bond sites (i.e., a “split site or split-intein split site”), fused to a split intein, and then delivered to cells as separately-encoded fusion proteins. Once the split-intein fusion proteins (i.e., protein halves) are expressed within a cell, the proteins undergo trans-splicing to form a complete or whole PE with the concomitant removal of the joined split-intein sequences.

For example, as shown in FIG. 38, the N-terminal extein can be fused to a first split-intein (e.g., N intein) and the C-terminal extein can be fused to a second split-intein (e.g., C intein). The N-terminal extein becomes fused to the C-terminal extein to reform a whole prime editor fusion protein comprising an napDNAbp domain and a polymerase domain (e.g., RT domain) upon the self-association of the N intein and the C intein inside the cell, followed by their self-excision, and the concomitant formation of a peptide bond between the N-terminal extein and C-terminal extein portions of a whole prime editor (PE).

To take advantage of a split-PE delivery strategy using split-inteins, the prime editor needs to be divided at one or more split sites to create at least two separate halves of a prime editor, each of which may be rejoined inside a cell if each half is fused to a split-intein sequence.

In certain embodiments, the prime editor is split at a single split site. In certain other embodiments, the prime editor is split at two split sites, or three split sites, or four split sites, or more.

In a preferred embodiment, the prime editor is split at a single split site to create two separate halves of a prime editor, each of which can be fused to a split intein sequence

An exemplary split intein is the Ssp DnaE intein, which comprises two subunits, namely, DnaE-N and DnaE-C. The two different subunits are encoded by separate genes, namely dnaE-n and dnaE-c, which encode the DnaE-N and DnaE-C subunits, respectively. DnaE is a naturally occurring split intein in Synechocytis sp. PCC6803 and is capable of directing trans-splicing of two separate proteins, each comprising a fusion with either DnaE-N or DnaE-C.

Additional naturally occurring or engineered split-intein sequences are known in the or can be made from whole-intein sequences described herein or those available in the art. Examples of split-intein sequences can be found in Stevens et al., “A promiscuous split intein with expanded protein engineering applications,” PNAS, 2017, Vol. 114: 8538-8543; Iwai et al., “Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostc punctiforme, FEBS Lett, 580: 1853-1858, each of which are incorporated herein by reference. Additional split intein sequences can be found, for example, in WO 2013/045632, WO 2014/055782, WO 2016/069774, and EP2877490, the contents each of which are incorporated herein by reference.

In addition, protein splicing in trans has been described in vivo and in vitro (Shingledecker, et al., Gene 207:187 (1998), Southworth, et al., EMBO J. 17:918 (1998); Mills, et al., Proc. Natl. Acad. Sci. USA, 95:3543-3548 (1998); Lew, et al., J. Biol. Chem., 273:15887-15890 (1998); Wu, et al., Biochim. Biophys. Acta 35732:1 (1998b), Yamazaki, et al., J. Am. Chem. Soc. 120:5591 (1998), Evans, et al., J. Biol. Chem. 275:9091 (2000); Otomo, et al., Biochemistry 38:16040-16044 (1999); Otomo, et al., J. Biolmol. NMR 14:105-114 (1999); Scott, et al., Proc. Natl. Acad. Sci. USA 96:13638-13643 (1999)) and provides the opportunity to express a protein as to two inactive fragments that subsequently undergo ligation to form a functional product, e.g., as shown in FIGS. 38 and 39 with regard to the formation of a complete PE fusion protein from two separately-expressed halves.

In various embodiments described herein, the continuous evolution methods (e.g., PACE) may be used to evolve a first portion of a base editor. A first portion could include a single component or domain, e.g., a Cas9 domain, a deaminase domain, or a UGI domain. The separately evolved component or domain can be then fused to the remaining portions of the base editor within a cell by separately express both the evolved portion and the remaining non-evolved portions with split-intein polypeptide domains. The first portion could more broadly include any first amino acid portion of a base editor that is desired to be evolved using a continuous evolution method described herein. The second portion would in this embodiment refer to the remaining amino acid portion of the base editor that is not evolved using the herein methods. The evolved first portion and the second portion of the base editor could each be expressed with split-intein polypeptide domains in a cell. The natural protein splicing mechanisms of the cell would reassemble the evolved first portion and the non-evolved second portion to form a single fusion protein evolved base editor. The evolved first portion may comprise either the N- or C-terminal part of the single fusion protein. In an analogous manner, use of a second orthogonal trans-splicing intein pair could allow the evolved first portion to comprise an internal part of the single fusion protein.

Thus, any of the evolved and non-evolved components of the base editors herein described may be expressed with split-intein tags in order to facilitate the formation of a complete base editor comprising the evolved and non-evolved component within a cell.

The mechanism of the protein splicing process has been studied in great detail (Chong, et al., J. Biol. Chem. 1996, 271, 22159-22168; Xu, M-Q & Perler, F. B. EMBO Journal, 1996, 15, 5146-5153) and conserved amino acids have been found at the intein and extein splicing points (Xu, et al., EMBO Journal, 1994, 13 5517-522). The constructs described herein contain an intein sequence fused to the 5′-terminus of the first gene (e.g., the evolved portion of the base editor). Suitable intein sequences can be selected from any of the proteins known to contain protein splicing elements. A database containing all known inteins can be found on the World Wide Web (Perler, F. B. Nucleic Acids Research, 1999, 27, 346-347). The intein sequence is fused at the 3′ end to the 5′ end of a second gene. For targeting of this gene to a certain organelle, a peptide signal can be fused to the coding sequence of the gene. After the second gene, the intein-gene sequence can be repeated as often as desired for expression of multiple proteins in the same cell. For multi-intein containing constructs, it may be useful to use intein elements from different sources. After the sequence of the last gene to be expressed, a transcription termination sequence must be inserted. In one embodiment, a modified intein splicing unit is designed so that it can both catalyze excision of the exteins from the inteins as well as prevent ligation of the exteins. Mutagenesis of the C-terminal extein junction in the Pyrococcus species GB-D DNA polymerase was found to produce an altered splicing element that induces cleavage of exteins and inteins but prevents subsequent ligation of the exteins (Xu, M-Q & Perler, F. B. EMBO Journal, 1996, 15, 5146-5153). Mutation of serine 538 to either an alanine or glycine induced cleavage but prevented ligation. Mutation of equivalent residues in other intein splicing units should also prevent extein ligation due to the conservation of amino acids at the C-terminal extein junction to the intein. A preferred intein not containing an endonuclease domain is the Mycobacterium xenopi GyrA protein (Telenti, et al. J. Bacteriol. 1997, 179, 6378-6382). Others have been found in nature or have been created artificially by removing the endonuclease domains from endonuclease containing inteins (Chong, et al. J. Biol. Chem. 1997, 272, 15587-15590). In a preferred embodiment, the intein is selected so that it consists of the minimal number of amino acids needed to perform the splicing function, such as the intein from the Mycobacterium xenopi GyrA protein (Telenti, A., et al., J. Bacteriol. 1997, 179, 6378-6382). In an alternative embodiment, an intein without endonuclease activity is selected, such as the intein from the Mycobacterium xenopi GyrA protein or the Saccharomyces cerevisiae VMA intein that has been modified to remove endonuclease domains (Chong, 1997).Further modification of the intein splicing unit may allow the reaction rate of the cleavage reaction to be altered allowing protein dosage to be controlled by simply modifying the gene sequence of the splicing unit.

Inteins can also exist as two fragments encoded by two separately transcribed and translated genes. These so-called split inteins self-associate and catalyze protein-splicing activity in trans. Split inteins have been identified in diverse cyanobacteria and archaea (Caspi et al, Mol Microbiol. 50: 1569-1577 (2003); Choi J. et al, J Mol Biol. 556: 1093-1106 (2006.); Dassa B. et al, Biochemistry. 46:322-330 (2007.); Liu X. and Yang J., J Biol Chem. 275:26315-26318 (2003); Wu H. et al.

Proc Natl Acad Sci USA. £5:9226-9231 (1998.); and Zettler J. et al, FEBS Letters. 553:909-914 (2009)), but have not been found in eukaryotes thus far. Recently, a bioinformatic analysis of environmental metagenomic data revealed 26 different loci with a novel genomic arrangement. At each locus, a conserved enzyme coding region is interrupted by a split intein, with a freestanding endonuclease gene inserted between the sections coding for intein subdomains. Among them, five loci were completely assembled: DNA helicases (gp41-1, gp41-8); Inosine-5′-monophosphate dehydrogenase (IMPDH-1); and Ribonucleotide reductase catalytic subunits (NrdA-2 and NrdJ-1). This fractured gene organization appears to be present mainly in phages (Dassa et al, Nucleic Acids Research. 57:2560-2573 (2009)).

The split intein Npu DnaE was characterized as having the highest rate reported for the protein trans-splicing reaction. In addition, the Npu DnaE protein splicing reaction is considered robust and high-yielding with respect to different extein sequences, temperatures from 6 to 37° C., and the presence of up to 6M Urea (Zettler J. et al, FEBS Letters. 553:909-914 (2009); Iwai I. et al, FEBS Letters 550: 1853-1858 (2006)). As expected, when the Cysl Ala mutation at the N-domain of these inteins was introduced, the initial N to S-acyl shift and therefore protein splicing was blocked. Unfortunately, the C-terminal cleavage reaction was also almost completely inhibited. The dependence of the asparagine cyclization at the C-terminal splice junction on the acyl shift at the N-terminal scissile peptide bond seems to be a unique property common to the naturally split DnaE intein alleles (Zettler J. et al. FEBS Letters. 555:909-914 (2009)).

The mechanism of protein splicing typically has four steps [29-30]: 1) an N—S or N—O acyl shift at the intein N-terminus, which breaks the upstream peptide bond and forms an ester bond between the N-extein and the side chain of the intein's first amino acid (Cys or Ser); 2) a transesterification relocating the N-extein to the intein C-terminus, forming a new ester bond linking the N-extein to the side chain of the C-extein's first amino acid (Cys, Ser, or Thr); 3) Asn cyclization breaking the peptide bond between the intein and the C-extein; and 4) a S—N or O—N acyl shift that replaces the ester bond with a peptide bond between the N-extein and C-extein.

Protein trans-splicing, catalyzed by split inteins, provides an entirely enzymatic method for protein ligation [31]. A split-intein is essentially a contiguous intein (e.g. a mini-intein) split into two pieces named N-intein and C-intein, respectively. The N-intein and C-intein of a split intein can associate non-covalently to form an active intein and catalyze the splicing reaction essentially in same way as a contiguous intein does. Split inteins have been found in nature and also engineered in laboratories [31-35]. As used herein, the term “split intein” refers to any intein in which one or more peptide bond breaks exists between the N-terminal and C-terminal amino acid sequences such that the N-terminal and C-terminal sequences become separate molecules that can non-covalently reassociate, or reconstitute, into an intein that is functional for trans-splicing reactions. Any catalytically active intein, or fragment thereof, may be used to derive a split intein for use in the methods of the invention. For example, in one aspect the split intein may be derived from a eukaryotic intein. In another aspect, the split intein may be derived from a bacterial intein. In another aspect, the split intein may be derived from an archaeal intein. Preferably, the split intein so-derived will possess only the amino acid sequences essential for catalyzing trans-splicing reactions.

As used herein, the “N-terminal split intein (In)” refers to any intein sequence that comprises an N-terminal amino acid sequence that is functional for trans-splicing reactions. An In thus also comprises a sequence that is spliced out when trans-splicing occurs. An In can comprise a sequence that is a modification of the N-terminal portion of a naturally occurring intein sequence. For example, an In can comprise additional amino acid residues and/or mutated residues so long as the inclusion of such additional and/or mutated residues does not render the In non-functional in trans-splicing. Preferably, the inclusion of the additional and/or mutated residues improves or enhances the trans-splicing activity of the In.

As used herein, the “C-terminal split intein (Ic)” refers to any intein sequence that comprises a C-terminal amino acid sequence that is functional for trans-splicing reactions. In one aspect, the Ic comprises 4 to 7 contiguous amino acid residues, at least 4 amino acids of which are from the last β-strand of the intein from which it was derived. An Ic thus also comprises a sequence that is spliced out when trans-splicing occurs. An Ic can comprise a sequence that is a modification of the C-terminal portion of a naturally occurring intein sequence. For example, an Ic can comprise additional amino acid residues and/or mutated residues so long as the inclusion of such additional and/or mutated residues does not render the In non-functional in trans-splicing. Preferably, the inclusion of the additional and/or mutated residues improves or enhances the trans-splicing activity of the Ic.

In some embodiments of the invention, a peptide linked to an Ic or an In can comprise an additional chemical moiety including, among others, fluorescence groups, biotin, polyethylene glycol (PEG), amino acid analogs, unnatural amino acids, phosphate groups, glycosyl groups, radioisotope labels, and pharmaceutical molecules. In other embodiments, a peptide linked to an Ic can comprise one or more chemically reactive groups including, among others, ketone, aldehyde, Cys residues and Lys residues. The N-intein and C-intein of a split intein can associate non-covalently to form an active intein and catalyze the splicing reaction when an “intein-splicing polypeptide (ISP)” is present. As used herein, “intein-splicing polypeptide (ISP)” refers to the portion of the amino acid sequence of a split intein that remains when the Ic, In, or both, are removed from the split intein. In certain embodiments, the In comprises the ISP. In another embodiment, the Ic comprises the ISP. In yet another embodiment, the ISP is a separate peptide that is not covalently linked to In nor to Ic.

Split inteins may be created from contiguous inteins by engineering one or more split sites in the unstructured loop or intervening amino acid sequence between the −12 conserved beta-strands found in the structure of mini-inteins [25-28]. Some flexibility in the position of the split site within regions between the beta-strands may exist, provided that creation of the split will not disrupt the structure of the intein, the structured beta-strands in particular, to a sufficient degree that protein splicing activity is lost.

In protein trans-splicing, one precursor protein consists of an N-extein part followed by the N-intein, another precursor protein consists of the C-intein followed by a C-extein part, and a trans-splicing reaction (catalyzed by the N- and C-inteins together) excises the two intein sequences and links the two extein sequences with a peptide bond. Protein trans-splicing, being an enzymatic reaction, can work with very low (e.g. micromolar) concentrations of proteins and can be carried out under physiological conditions.

B. Programmable Nucleases (Non-napDNAbp)

In various embodiments described herein, the prime editors comprise a napDNAbp, such as a Cas9 protein. These proteins are “programmable” by way of their becoming complexed with a guide RNA (or a PEgRNA, as the case may be), which guides the Cas9 protein to a target site on the DNA which possess a sequence that is complementary to the spacer portion of the gRNA (or PEgRNA) and also which possesses the required PAM sequence. However, in certain embodiment envisioned here, the napDNAbp may be substituted with a different type of programmable protein, such as a zinc finger nuclease or a transcription activator-like effector nuclease (TALEN).

FIG. 1H depicts such a variation of prime editing contemplated herein that replaces the napDNAbp (e.g., SpCas9 nickase) with any programmable nuclease domain, such as zinc finger nucleases (ZFN) or transcription activator-like effector nucleases (TALEN). As such, it is contemplated that suitable nucleases do not necessarily need to be “programmed” by a nucleic acid targeting molecule (such as a guide RNA), but rather, may be programmed by defining the specificity of a DNA-binding domain, such as and in particular, a nuclease. Just as in prime editing with napDNAbp moieties, it is preferable that such alternative programmable nucleases be modified such that only one strand of a target DNA is cut. In other words, the programmable nucleases should function as nickases, preferably. Once a programmable nuclease is selected (e.g., a ZFN or a TALEN), then additional functionalities may be engineered into the system to allow it to operate in accordance with a prime editing-like mechanism. For example, the programmable nucleases may be modified by coupling (e.g., via a chemical linker) an RNA or DNA extension arm thereto, wherein the extension arm comprises a primer binding site (PBS) and a DNA synthesis template. The programmable nuclease may also be coupled (e.g., via a chemical or amino acid linker) to a polymerase, the nature of which will depend upon whether the extension arm is DNA or RNA. In the case of an RNA extension arm, the polymerase can be an RNA-dependent DNA polymerase (e.g., reverse transcriptase). In the case of a DNA extension arm, the polymerase can be a DNA-dependent DNA polymerase (e.g., a prokaryotic polymerase, including Pol I, Pol II, or Pol III, or a eukaryotic polymerase, including Pol a, Pol b, Pol g, Pol d, Pol e, or Pol z). The system may also include other functionalities added as fusions to the programmable nucleases, or added in trans to facilitate the reaction as a whole (e.g., (a) a helicase to unwind the DNA at the cut site to make the cut strand with the 3′ end available as a primer, (b) a FEN1 to help remove the endogenous strand on the cut strand to drive the reaction towards replacement of the endogenous strand with the synthesized strand, or (c) a nCas9:gRNA complex to create a second site nick on the opposite strand, which may help drive the integration of the synthesize repair through favored cellular repair of the non-edited strand). In an analogous manner to prime editing with a napDNAbp, such a complex with an otherwise programmable nuclease could be used to synthesize and then install a newly synthesized replacement strand of DNA carrying an edit of interest permanently into a target site of DNA.

Suitable alternative programmable nucleases are well known in the art which may be used in place of a napDNAbp:gRNA complex to construct an alternative prime editor system that can be programmed to selectively bind a target site of DNA, and which can be further modified in the manner described above to co-localize a polymerase and an RNA or DNA extension arm comprising a primer binding site and a DNA synthesis template to specific nick site. For example, and as represented in FIG. 1H, Transcription Activator-Like Effector Nucleases (TALENs) may be used as the programmable nuclease in the prime editing methods and compositions of matter described herein. TALENS are artificial restriction enzymes generated by fusing the TAL effector DNA binding domain to a DNA cleavage domain. These reagents enable efficient, programmable, and specific DNA cleavage and represent powerful tools for genome editing in situ. Transcription activator-like effectors (TALEs) can be quickly engineered to bind practically any DNA sequence. The term TALEN, as used herein, is broad and includes a monomeric TALEN that can cleave double stranded DNA without assistance from another TALEN. The term TALEN is also used to refer to one or both members of a pair of TALENs that are engineered to work together to cleave DNA at the same site. TALENs that work together may be referred to as a left-TALEN and a right-TALEN, which references the handedness of DNA. See U.S. Ser. No. 12/965,590; U.S. Ser. No. 13/426,991 (U.S. Pat. No. 8,450,471); U.S. Ser. No. 13/427,040 (U.S. Pat. No. 8,440,431); U.S. Ser. No. 13/427,137 (U.S. Pat. No. 8,440,432); and U.S. Ser. No. 13/738,381, all of which are incorporated by reference herein in their entirety. In addition, TALENS are described in WO 2015/027134, U.S. Pat. No. 9,181,535, Boch et al., “Breaking the Code of DNA Binding Specificity of TAL-Type III Effectors”, Science, vol. 326, pp. 1509-1512 (2009), Bogdanove et al., TAL Effectors: Customizable Proteins for DNA Targeting, Science, vol. 333, pp. 1843-1846 (2011), Cade et al., “Highly efficient generation of heritable zebrafish gene mutations using homo- and heterodimeric TALENs”, Nucleic Acids Research, vol. 40, pp. 8001-8010 (2012), and Cermak et al., “Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting”, Nucleic Acids Research, vol. 39, No. 17, e82 (2011), each of which are incorporated herein by reference.

As represented in FIG. 1H, zinc finger nucleases may also be used as alternative programmable nucleases for use in prime editing in place of napDNAbps, such as Cas9 nickases. Like with TALENS, the ZFN proteins may be modified such that they function as nickases, i.e., engineering the ZFN such that it cleaves only one strand of the target DNA in a manner similar to the napDNAbp used with the prime editors described herein. ZFN proteins have been extensively described in the art, for example, in Carroll et al., “Genome Engineering with Zinc-Finger Nucleases,” Genetics, August 2011, Vol. 188: 773-782; Durai et al., “Zinc finger nucleases: custom-designed molecular scissors for genome engineering of plant and mammalian cells,” Nucleic Acids Res, 2005, Vol. 33: 5978-90; and Gaj et al., “ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering,” Trends Biotechnol. 2013, Vol. 31: 397-405, each of which are incorporated herein by reference in their entireties.

C. Polymerases (e.g., Reverse Transcriptase)

In various embodiments, the prime editor (PE) system disclosed herein includes a polymerase (e.g., DNA-dependent DNA polymerase or RNA-dependent DNA polymerase, such as, reverse transcriptase), or a variant thereof, which can be provided as a fusion protein with a napDNAbp or other programmable nuclease, or provide in trans.

Any polymerase may be used in the prime editors disclosed herein. The polymerases may be wild type polymerases, functional fragments, mutants, variants, or truncated variants, and the like. The polymerases may include wild type polymerases from eukaryotic, prokaryotic, archael, or viral organisms, and/or the polymerases may be modified by genetic engineering, mutagenesis, directed evolution-based processes. The polymerases may include T7 DNA polymerase, T5 DNA polymerase, T4 DNA polymerase, Klenow fragment DNA polymerase, DNA polymerase III and the like. The polymerases may also be thermostable, and may include Taq, Tne, Tma, Pfu, Tfl, Tth, Stoffel fragment, VENT® and DEEPVENT® DNA polymerases, KOD, Tgo, JDF3, and mutants, variants and derivatives thereof (see U.S. Pat. Nos. 5,436,149; 4,889,818; 4,965,185; 5,079,352; 5,614,365; 5,374,553; 5,270,179; 5,047,342; 5,512,462; WO 92/06188; WO 92/06200; WO 96/10640; Barnes, W. M., Gene 112:29-35 (1992); Lawyer, F. C., et al., PCR Meth. Appl. 2:275-287 (1993); Flaman, J.-M, et al., Nuc. Acids Res. 22(15):3259-3260 (1994), each of which are incorporated by reference). For synthesis of longer nucleic acid molecules (e.g, nucleic acid molecules longer than about 3-5 Kb in length), at least two DNA polymerases can be employed. In certain embodiments, one of the polymerases can be substantially lacking a 3′ exonuclease activity and the other may have a 3′ exonuclease activity. Such pairings may include polymerases that are the same or different. Examples of DNA polymerases substantially lacking in 3′ exonuclease activity include, but are not limited to, Taq, Tne(exo-), Tma(exo-), Pfu(exo-), Pwo(exo-), exo-KOD and Tth DNA polymerases, and mutants, variants and derivatives thereof.

Preferably, the polymerase usable in the prime editors disclosed herein are “template-dependent” polymerase (since the polymerases are intended to rely on the DNA synthesis template to specify the sequence of the DNA strand under synthesis during prime editing. As used herein, the term “template DNA molecule” refers to that strand of a nucleic acid from which a complementary nucleic acid strand is synthesized by a DNA polymerase, for example, in a primer extension reaction of the DNA synthesis template of a PEgRNAPEgRNA.

As used herein, the term “template dependent manner” is intended to refer to a process that involves the template dependent extension of a primer molecule (e.g., DNA synthesis by DNA polymerase). The term “template dependent manner” refers to polynucleotide synthesis of RNA or DNA wherein the sequence of the newly synthesized strand of polynucleotide is dictated by the well-known rules of complementary base pairing (see, for example, Watson, J. D. et al., In: Molecular Biology of the Gene, 4th Ed., W. A. Benjamin, Inc., Menlo Park, Calif. (1987)). The term “complementary” refers to the broad concept of sequence complementarity between regions of two polynucleotide strands or between two nucleotides through base-pairing. It is known that an adenine nucleotide is capable of forming specific hydrogen bonds (“base pairing”) with a nucleotide which is thymine or uracil. Similarly, it is known that a cytosine nucleotide is capable of base pairing with a guanine nucleotide. As such, in the case of prime editing, it can be said that the single strand of DNA synthesized by the polymerase of the prime editor against the DNA synthesis template is said to be “complementary” to the sequence of the DNA synthesis template.

(i) Exemplary Polymerases

In various embodiments, the prime editors described herein comprise a polymerase. The disclosure contemplates any wild type polymerase obtained from any naturally-occurring organim or virus, or obtained from a commercial or non-commercial source. In addition, the polymerases usable in the prime editors of the disclosure can include any naturally-occurring mutant polymerase, engineered mutant polymerase, or other variant polymerase, including truncated variants that retain function. The polymerases usable herein may also be engineered to contain specific amino acid substitutions, such as those specifically disclosed herein. In certain preferred embodiments, the polymerases usable in the prime editors of the disclosure are template-based polymerases, i.e., they synthesize nucleotide sequences in a template-dependent manner.

A polymerase is an enzyme that synthesizes a nucleotide strand and which may be used in connection with the prime editor systems described herein. The polymerases are preferably “template-dependent” polymerases (i.e., a polymerase which synthesizes a nucleotide strand based on the order of nucleotide bases of a template strand). In certain configurations, the polymerases can also be a “template-independent” (i.e., a polymerase which synthesizes a nucleotide strand without the requirement of a template strand). A polymerase may also be further categorized as a “DNA polymerase” or an “RNA polymerase.” In various embodiments, the prime editor system comprises a DNA polymerase. In various embodiments, the DNA polymerase can be a “DNA-dependent DNA polymerase” (i.e., whereby the template molecule is a strand of DNA). In such cases, the DNA template molecule can be a PEgRNAPEgRNA, wherein the extension arm comprises a strand of DNA. In such cases, the PEgRNAPEgRNA may be referred to as a chimeric or hybrid PEgRNAPEgRNA which comprises an RNA portion (i.e., the guide RNA components, including the spacer and the gRNA core) and a DNA portion (i.e., the extension arm). In various other embodiments, the DNA polymerase can be an “RNA-dependent DNA polymerase” (i.e., whereby the template molecule is a strand of RNA). In such cases, the PEgRNAPEgRNA is RNA, i.e., including an RNA extension. The term “polymerase” may also refer to an enzyme that catalyzes the polymerization of nucleotide (i.e., the polymerase activity). Generally, the enzyme will initiate synthesis at the 3-end of a primer annealed to a polynucleotide template sequence (e.g., such as a primer sequence annealed to the primer binding site of a PEgRNAPEgRNA), and will proceed toward the 5′ end of the template strand. A “DNA polymerase” catalyzes the polymerization of deoxynucleotides. As used herein in reference to a DNA polymerase, the term DNA polymerase includes a “functional fragment thereof”. A “functional fragment thereof” refers to any portion of a wild-type or mutant DNA polymerase that encompasses less than the entire amino acid sequence of the polymerase and which retains the ability, under at least one set of conditions, to catalyze the polymerization of a polynucleotide. Such a functional fragment may exist as a separate entity, or it may be a constituent of a larger polypeptide, such as a fusion protein.

In some embodiments, the polymerases can be from bacteriophage. Bacteriophage DNA polymerases are generally devoid of 5′ to 3′ exonuclease activity, as this activity is encoded by a separate polypeptide. Examples of suitable DNA polymerases are T4, T7, and phi29 DNA polymerase. The enzymes available commercially are: T4 (available from many sources e.g., Epicentre) and T7 (available from many sources, e.g. Epicentre for unmodified and USB for 3′ to 5′ exo. T7 “Sequenase” DNA polymerase).

The other embodiments, the polymerases are archaeal polymerases. There are 2 different classes of DNA polymerases which have been identified in archaea: 1. Family B/pol I type (homologs of Pfu from Pyrococcus furiosus) and 2. pol II type (homologs of P. furiosus DP1/DP2 2-subunit polymerase). DNA polymerases from both classes have been shown to naturally lack an associated 5′ to 3′ exonuclease activity and to possess 3′ to 5′ exonuclease (proofreading) activity. Suitable DNA polymerases (pol I or pol II) can be derived from archaea with optimal growth temperatures that are similar to the desired assay temperatures.

Thermostable archaeal DNA polymerases are isolated from Pyrococcus species (furiosus, species GB-D, woesii, abysii, horikoshii), Thermococcus species (kodakaraensis KOD1, litoralis, species 9 degrees North-7, species JDF-3, gorgonarius), Pyrodictium occultum, and Archaeoglobus fulgidus.

Polymerases may also be from eubacterial species. There are 3 classes of eubacterial DNA polymerases, pol I, II, and III. Enzymes in the Pol I DNA polymerase family possess 5′ to 3′ exonuclease activity, and certain members also exhibit 3′ to 5′ exonuclease activity. Pol II DNA polymerases naturally lack 5′ to 3′ exonuclease activity, but do exhibit 3′ to 5′ exonuclease activity. Pol III DNA polymerases represent the major replicative DNA polymerase of the cell and are composed of multiple subunits. The pol III catalytic subunit lacks 5′ to 3′ exonuclease activity, but in some cases 3′ to 5′ exonuclease activity is located in the same polypeptide.

There are a variety of commercially available Pol I DNA polymerases, some of which have been modified to reduce or abolish 5′ to 3′ exonuclease activity.

Suitable thermostable pol I DNA polymerases can be isolated from a variety of thermophilic eubacteria, including Thermus species and Thermotoga maritima such as Thermus aquaticus (Taq), Thermus thermophilus (Tth) and Thermotoga maritima (Tma UlTma).

Additional eubacteria related to those listed above are described in Thermophilic Bacteria (Kristjansson, J. K., ed.) CRC Press, Inc., Boca Raton, Fla., 1992.

The invention further provides for chimeric or non-chimeric DNA polymerases that are chemically modified according to methods disclosed in U.S. Pat. Nos. 5,677,152, 6,479,264 and 6,183,998, the contents of which are hereby incorporated by reference in their entirety.

Additional archaea DNA polymerases related to those listed above are described in the following references: Archaea: A Laboratory Manual (Robb, F. T. and Place, A. R., eds.), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1995 and Thermophilic Bacteria (Kristjansson, J. K., ed.) CRC Press, Inc., Boca Raton, Fla., 1992.

(ii) B. Exemplarily Reverse Transcriptases

In various embodiments, the prime editor (PE) system disclosed herein includes a reverse transcriptase, or a variant thereof.

Reverse transcriptases are multi-functional enzymes typically with three enzymatic activities including RNA- and DNA-dependent DNA polymerization activity, and an RNaseH activity that catalyzes the cleavage of RNA in RNA-DNA hybrids. Some mutants of reverse transcriptases have disabled the RNaseH moiety to prevent unintended damage to the mRNA. These enzymes that synthesize complementary DNA (cDNA) using mRNA as a template were first identified in RNA viruses. Subsequently, reverse transcriptases were isolated and purified directly from virus particles, cells or tissues. (e.g., see Kacian et al., 1971, Biochim. Biophys. Acta 46: 365-83; Yang et al., 1972, Biochem. Biophys. Res. Comm. 47: 505-11; Gerard et al., 1975, J. Virol. 15: 785-97; Liu et al., 1977, Arch. Virol. 55 187-200; Kato et al., 1984, J. Virol. Methods 9: 325-39; Luke et al., 1990, Biochem. 29: 1764-69 and Le Grice et al., 1991, J. Virol. 65: 7004-07, each of which are incorporated by reference). More recently, mutants and fusion proteins have been created in the quest for improved properties such as thermostability, fidelity and activity. Any of the wild type, variant, and/or mutant forms of reverse transcriptase which are known in the art or which can be made using methods known in the art are contemplated herein.

The reverse transcriptase (RT) gene (or the genetic information contained therein) can be obtained from a number of different sources. For instance, the gene may be obtained from eukaryotic cells which are infected with retrovirus, or from a number of plasmids which contain either a portion of or the entire retrovirus genome. In addition, messenger RNA-like RNA which contains the RT gene can be obtained from retroviruses. Examples of sources for RT include, but are not limited to, Moloney murine leukemia virus (M-MLV or MLVRT); human T-cell leukemia virus type 1 (HTLV-1); bovine leukemia virus (BLV); Rous Sarcoma Virus (RSV); human immunodeficiency virus (HIV); yeast, including Saccharomyces, Neurospora, Drosophila; primates; and rodents. See, for example, Weiss, et al., U.S. Pat. No. 4,663,290 (1987); Gerard, G. R., DNA:271-79 (1986); Kotewicz, M. L., et al., Gene 35:249-58 (1985); Tanese, N., et al., Proc. Natl. Acad. Sci. (USA):4944-48 (1985); Roth, M. J., at al., J. Biol. Chem. 260:9326-35 (1985); Michel, F., et al., Nature 316:641-43 (1985); Akins, R. A., et al., Cell 47:505-16 (1986), EMBO J. 4:1267-75 (1985); and Fawcett, D. F., Cell 47:1007-15 (1986) (each of which are incorporated herein by reference in their entireties).

(a) Wild Type RTs

Exemplary enzymes for use with the herein disclosed prime editors can include, but are not limited to, M-MLV reverse transcriptase and RSV reverse transcriptase. Enzymes having reverse transcriptase activity are commercially available. In certain embodiments, the reverse transcriptase provided in trans to the other components of the prime editor (PE) system. That is, the reverse transcriptase is expressed or otherwise provided as an individual component, i.e., not as a fusion protein with a napDNAbp.

A person of ordinary skill in the art will recognize that wild type reverse transcriptases, including but not limited to, Moloney Murine Leukemia Virus (M-MLV); Human Immunodeficiency Virus (HIV) reverse transcriptase and avian Sarcoma-Leukosis Virus (ASLV) reverse transcriptase, which includes but is not limited to Rous Sarcoma Virus (RSV) reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, Avian Erythroblastosis Virus (AEV) Helper Virus MCAV reverse transcriptase, Avian Myelocytomatosis Virus MC29 Helper Virus MCAV reverse transcriptase, Avian Reticuloendotheliosis Virus (REV-T) Helper Virus REV-A reverse transcriptase, Avian Sarcoma Virus UR2 Helper Virus UR2AV reverse transcriptase, Avian Sarcoma Virus Y73 Helper Virus YAV reverse transcriptase, Rous Associated Virus (RAV) reverse transcriptase, and Myeloblastosis Associated Virus (MAV) reverse transcriptase may be suitably used in the subject methods and composition described herein.

Exemplary wild type RT enzymes are as follows:

Description
Sequence

Reverse
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYP

transcriptase
MSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVP

(M-MLV RT)
NPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNS

wild type
PTLFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQ

moloney
ICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIPGFAEMAAPLY

murine
PLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRR

leukemia
PVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNA

virus
RMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHT

Used in
WYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSR

PE1 (prime
YAFATAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNR

editor 1
MADQAARKAAITETPDTSTLLIENSSP (SEQ ID NO: 1361485)

fusion

protein

disclosed

herein)

Reverse
AFPLERPDWD YTTQAGRNHL VHYRQLLLAG LQNAGRSPTN LAKVKGITQG PNESPSAFLE

transcriptase
RLKEAYRRYT PYDPEDPGQE TNVSMSFIWQ SAPDIGRKLG RLEDLKSKTL GDLVREAEKI

moloney
FNKRETPEER EERIRRETEE KEERRRTVDE QKEKERDRRR HREMSKLLAT VVIGQEQDRQ

murine
EGERKRPQLD KDQCAYCKEK GHWAKDCPKK PRGPRGPRPQ TSLLTLGDXG GQGQDPPPEP

leukemia
RITLKVGGQP VTFLVDTGAQ HSVLTQNPGP LSDKSAWVQG ATGGKRYRWT TDRKVHLATG

virus
KVTHSFLHVP DCPYPLLGRD LLTKLKAQIH FEGSGAQVVG PMGQPLQVLT LNIEDEYRLH

Ref Seq.
ETSKEPDVSL GFTWLSDFPQ AWAESGGMGL AVRQAPLIIP LKATSTPVSI KQYPMSQEAR

AAA66622.1
LGIKPHIQRL LDQGILVPCQ SPWNTPLLPV KKPGTNDYRP VQDLREVNKR VEDIHPTVPN

PYNLLSGLPP SHQWYTVLDL KDAFFCLRLH PTSQPLFAFE WRDPEMGISG QLTWTRLPQG

FKNSPTLFDE ALHRDLADER (SEQ ID NO: 1361486)

Reverse
TLQLEEEYRL FEPESTQKQE MDIWLKNFPQ AWAETGGMGT AHCQAPVLIQ LKATATPISI

transcriptase
RQYPMPHEAY QGIKPHIRRM LDQGILKPCQ SPWNTPLLPV KKPGTEDYRP VQDLREVNKR

Feline
VEDIHPTVPN PYNLLSTLPP SHPWYTVLDL KDAFFCLRLH SESQLLFAFE WRDPEIGLSG

leukemia
QLTWTRLPQG FKNSPTLFDE ALHSDLADER VRYPALVLLQ YVDDLLLAAA TRTECLEGTK

virus
ALLETLGNKG YRASAKKAQI CLQEVTYLGY SLKDGQRWLT KARKEAILSI PVPKNSRQVR

Ref Seq.
EFLGTAGYCR LWIPGFAELA APLYPLTRPG TLFQWGTEQQ LAFEDIKKAL LSSPALGLPD

NP955579.1
ITKPFELFID ENSGFAKGVL VQKLGPWKRP VAYLSKKLDT VASGWPPCLR MVAAIAILVK

DAGKLTLGQP LTILTSHPVE ALVRQPPNKW LSNARMTHYQ AMLLDAERVH FGPTVSLNPA

TLLPLPSGGN HHDCLQILAE THGTRPDLTD QPLPDADLTW YTDGSSFIRN GEREAGAAVT

TESEVIWAAP LPPGTSAQRA ELIALTQALK MAEGKKLTVY TDSRYAFATT HVHGEIYRRR

GLLTSEGKEI KNKNEILALL EALFLPKRLS IIHCPGHQKG DSPQAKGNRL ADDTAKKAAT

ETHSSLTVL (SEQ ID NO: 1361487)

Reverse
PISPIETVPV KLKPGMDGPK VKQWPLTEEK IKALVEICTE MEKEGKISKI GPENPYNTPV

transcriptase
FAIKKKDSTK WRKLVDFREL NKRTQDFWEV QLGIPHPAGL KKKKSVTVLD VGDAYFSVPL

HIV-1 RT,
DEDFRKYTAF TIPSINNETP GIRYQYNVLP QGWKGSPAIF QSSMTKILEP FRKQNPDIVI

chain A
YQYMDDLYVG SDLEIGQHRT KIEELRQHLL RWGLTTPDKK HQKEPPELWM GYELHPDKWT

Ref Seq.
VQPIVLPEKD SWTVNDIQKL VGKLNWASQI YPGIKVRQLX KLLRGTKALT EVIPLTEEAE

ITL3-A
LELAENREIL KEPVHGVYYD PSKDLIAEIQ KQGQGQWTYQ IYQEPFKNLK TGKYARMRGA

HTNDVKQLTE AVQKITTESI VIWGKTPKFK LPIQKETWET WWTEYWQATW IPEWEFVNTP

PLVKLWYQLE KEPIVGAETF YVDGAANRET KLGKAGYVTN RGRQKVVTLT DTTNQKTELQ

AIYLALQDSG LEVNIVTDSQ YALGIIQAQP DQSESELVNQ IIEQLIKKEK VYLAWVPAHK

GIGGNEQVDK LVSAGIRKVL (SEQ ID NO: 1361488)

See Martinelli et al., Virology, 1990, 174 (1): 135-144, which is

incorporated by reference

Reverse
PISPIETVPV KLKPGMDGPK VKQWPLTEEK IKALVEICTE MEKEGKISKI GPENPYNTPV

transcriptase
FAIKKKDSTK WRKLVDFREL NKRTQDFWEV QLGIPHPAGL KKKKSVTVLD VGDAYFSVPL

HIV-1 RT,
DEDERKYTAF TIPSINNETP GIRYQYNVLP QGWKGSPAIF QSSMTKILEP FRKQNPDIVI

chain B
YQYMDDLYVG SDLEIGQHRT KIEELRQHLL RWGLTTPDKK HQKEPPELWM GYELHPDKWT

Ref Seq.
VQPIVLPEKD SWTVNDIQKL VGKLNWASQI YPGIKVRQLC KLLRGTKALT EVIPLTEEAE

ITL3-B
LELAENREIL KEPVHGVYYD PSKDLIAEIQ KQGQGQWTYQ IYQEPFKNLK TGKYARMRGA

HTNDVKQLTE AVQKITTESI VIWGKTPKFK LPIQKETWET WWTEYWQATW IPEWEFVNTP

PLVKLWYQLE KEPIVGAETE (SEQ ID NO: 1361489)

See Stammers et al., J. Mol. Biol., 1994, 242 (4): 586-588, which

is incorporated by reference

Reverse
TVALHLAIPL KWKPNHTPVW IDQWPLPEGK LVALTQLVEK ELQLGHIEPS LSCWNTPVFV

transcriptase
IRKASGSYRL LHDLRAVNAK LVPFGAVQQG APVLSALPRG WPLMVLDLKD CFFSIPLAEQ

rous
DREAFAFTLP SVNNQAPARR FQWKVLPQGM TCSPTICQLI VGQILEPLRL KHPSLRMLHY

sarcoma
MDDLLLAASS HDGLEAAGEE VISTLERAGF TISPDKVQKE PGVQYLGYKL GSTYAAPVGL

virus RT
VAEPRIATLW DVQKLVGSLQ WLRPALGIPP RLRGPFYEQL RGSDPNEARE WNLDMKMAWR

Ref Seq.
EIVQLSTTAA LERWDPALPL EGAVARCEQG AIGVLGQGLS THPRPCLWLF STQPTKAFTA

ACL14945
WLEVLTLLIT KLRASAVRTF GKEVDILLLP ACFRDELPLP EGILLALRGF AGKIRSSDTP

SIFDIARPLH VSLKVRVTDH PVPGPTVFTD ASSSTHKGVV VWREGPRWEI KEIADLGASV

QQLEARAVAM ALLLWPTTPT NVVTDSAFVA KMLLKMGQEG VPSTAAAFIL EDALSQRSAM

AAVLHVRSHS EVPGFFTEGN DVADSQATFQ AYPLREAKDL HTALHIGPRA LSKACNISMQ

QAREVVQTCP HCNSAPALEA GVNPRGLGPL QIWQTDFTLE PRMAPRSWLA VTVDTASSAI

VVTQHGRVTS VAAQHHWATV IAVLGRPKAI KTDNGSCFTS KSTREWLARW GIAHTTGIPG

NSQGQAMVER ANRLLKDKIR VLAEGDGEMK RIPTSKQGEL LAKAMYALNH FERGENTKTP

IQKHWRPTVL TEGPPVKIRI ETGEWEKGWN VLVWGRGYAA VKNRDTDKVI WVPSRKVKPD

IAQKDEVTKK

DEASPLFA (SEQ ID NO: 1361490)

See Yasukawa et al., J. Biochem. 2009, 145(3): 315-324, which is

incorporated by reference

Reverse
MMDHLLQKTQ IQNQTEQVMN ITNPNSIYIK GRLYFKGYKK IELHCFVDTG ASLCIASKFV

transcriptase
IPEEHWINAE RPIMVKIADG SSITINKVCR DIDLIIAGEI FHIPTVYQQE SGIDFIIGNN

cauliflower
FCQLYEPFIQ FTDRVIFTKD RTYPVHIAKL TRAVRVGTEG FLESMKKRSK TQQPEPVNIS

•mosaic
TNKIAILSEG RRLSEEKLFI TQQRMQKIEE LLEKVCSENP LDPNKTKQWM KASIKLSDPS

virus RT
KAIKVKPMKY SPMDREEFDK QIKELLDLKV IKPSKSPHMA PAFLVNNEAE KRRGKKRMVV

Ref Seq.
NYKAMNKATV GDAYNLPNKD ELLTLIRGKK IFSSFDCKSG FWQVLLDQDS RPLTAFTCPQ

AGT42196
GHYEWNVVPF GLKQAPSIFQ RHMDEAFRVF RKFCCVYVDD ILVESNNEED HLLHVAMILQ

KCNQHGIILS KKKAQLFKKK INFLGLEIDE GTHKPQGHIL EHINKFPDTL EDKKQLQRFL

GILTYASDYI PKLAQIRKPL QAKLKENVPW KWTKEDTLYM QKVKKNLQGF PPLHHPLPEE

KLIIETDASD DYWGGMLKAI KINEGTNTEL ICRYASGSFK AAEKNYHSND KETLAVINTI

KKFSIYLTPV HFLIRTDNTH FKSFVNLNYK GDSKLGRNIR WQAWLSHYSF DVEHIKGTDN

HFADELSREF NRVNS (SEQ ID NO: 1361491)

See Farzadfar et al., Virus Genes, 2013, 47 (2): 347-356, which is

incorporated by reference

Reverse
MKEKISKIDK NFYTDIFIKT SFQNEFEAGG VIPPIAKNQV STISNKNKTF YSLAHSSPHY

transcriptase
SIQTRIEKFL LKNIPLSASS FAFRKERSYL HYLEPHTQNV KYCHLDIVSF FHSIDVNIVR

Klebsiella

DTFSVYFSDE FLVKEKQSLL DAFMASVTLT AELDGVEKTF IPMGFKSSPS ISNIIFRKID

pneuomiae

ILIQKFCDKN KITYTRYADD LLFSTKKENN ILSSTFFINE ISSILSINKF KLNKSKYLYK

Ref Seq.
EGTISLGGYV IENILKDNSS GNIRLSSSKL NPLYKALYEI KKGSSSKHIC IKVFNLKLKR

RFF81513.1
FIYKKNKEKF EAKFYSSQLK NKLLGYRSYL LSFVIFHKKY KCINPIFLEK CVFLISEIES

IMNRKF (SEQ ID NO: 1361492)

Reverse
MKITSNNVTA VINGKGWHSI NWKKCHQHVK TIQTRIAKAA CNQQWRTVGR LQRLLVRSES

transcriptase
ARALAVKRVT ENSGRKTPGV DGQIWSTPES KWEAIFKLRR KGYKPLPLKR VFIPKSNGKK

Escerichia

RPLGIPVMLD RAMQALHLLG LEPVSETNAD HNSYGFRPAR CTADAIQQVC NMYSSRNASK

coli RT
WVLEGDIKGC FEHISHEWLL ENIPMDKQIL RNWLKAGIIE KSIFSKTLSG TPQGGIISPV

Ref Seq.
LANMALDGLE RLLQNREGRN RLI (SEQ ID NO: 1361493)

TGH57013

Reverse
MSKIKINYEK YHIKPFPHED QRIKVNKKVK ENLQNPFYIA AHSFYPFIHY KKISYKFKNG

transcriptase
TLSSPKERDI FYSGHMDGYI YKHYGEILNH KYNNTCIGKG IDHVSLAYRN NKMGKSNIHF

Bacillus

AAEVINFISE QQQAFIFVSD FSSYFDSLDH AILKEKLIEV LEEQDKLSKD WWNVEKHITR

subtilis

YNWVEKEEVI SDLECTKEKI ARDKKSRERY YTPAEFREFR KRVNIKSNDT GVGIPQGTAI

RT
SAVLANVYAI DLDQKLNQYA LKYGGIYRRY SDDIIMVLPM TSDGQDPSND HVSFIKSVVK

Ref Seq.
RNKVTMGDSK TSVLYYANNN IYEDYQRKRE SKMDYLGFSF DGMTVKIREK SLFKYYHRTY

QBJ66766
KKINSINWAS VKKEKKVGRK KLYLLYSHLG RNYKGHGNFI SYCKKAHAVE EGNKKIESLI

NQQIKRHWKK IQKRLVDV (SEQ ID NO: 1361494)

Eubacterium

DTSNLMEQILSSDNLNRAYLQVVRNKGAEGVDGMKYTELKEHLAKNGETIKGQLRTRKYKPQPAR

rectale

RVEIPKPDGGVRNLGVPTVTDRFIQQAIAQVLTPIYEEQFHDHSYGFRPNRCAQQAILTALNIMN

Group II
DGNDWIVDIDLEKFFDTVNHDKLMTLIGRTIKDGDVISIVRKYLVSGIMIDDEYEDSIVGTPQGG

intron RT
NLSPLLANIMLNELDKEMEKRGLNFVRYADDCIIMVGSEMSANRVMRNISRFIEEKLGLKVNMTK

SKVDRPSGLKYLGFGFYFDPRAHQFKAKPHAKSVAKFKKRMKELTCRSWGVSNSYKVEKLNQLIR

GWINYFKIGSMKTLCKELDSRIRYRLRMCIWKQWKTPQNQEKNLVKLGIDRNTARRVAYTGKRIA

YVCNKGAVNVAISNKRLASFGLISMLDYYIEKCVTC (SEQ ID NO: 1361495)

Geobacillus

ALLERILARDNLITALKRVEANQGAPGIDGVSTDQLRDYIRAHWSTIHAQLLAGTYRPAPVRRVE

stearo-
IPKPGGGTRQLGIPTVVDRLIQQAILQELTPIEDPDFSSSSFGFRPGRNAHDAVRQAQGYIQEGY

thermophilus

RYVVDMDLEKFFDRVNHDILMSRVARKVKDKRVLKLIRAYLQAGVMIEGVKVQTEEGTPQGGPLS

Group II
PLLANILLDDLDKELEKRGLKFCRYADDCNIYVKSLRAGQRVKQSIQRFLEKTLKLKVNEEKSAV

intron RT
DRPWKRAFLGFSFTPERKARIRLAPRSIQRLKQRIRQLTNPNWSISMPERIHRVNQYVMGWIGYF

RLVETPSVLQTIEGWIRRRLRLCQWLQWKRVRTRIRELRALGLKETAVMEIANTRKGAWRTTKTP

QLHQALGKTYWTAQGLKSLTQR (SEQ ID NO: 1361496)

(b) Variant RTs

In various embodiments, the reverse transcriptase may be a variant reverse transcriptase. As used herein, a “variant reverse transcriptase” includes any naturally occurring or genetically engineered variant comprising one or more mutations (including singular mutations, inversions, deletions, insertions, and rearrangements) relative to a reference sequences (e.g., a reference wild type sequence). RT naturally have several activities, including an RNA-dependent DNA polymerase activity, ribonuclease H activity, and DNA-dependent DNA polymerase activity. Collectively, these activities enable the enzyme to convert single-stranded RNA into double-stranded cDNA. In retroviruses and retrotransposons, this cDNA can then integrate into the host genome, from which new RNA copies can be made via host-cell transcription. Variant RT's may comprise a mutation which impacts one or more of these activities (either which reduces or increases these activities, or which eliminates these activities all together). In addition, variant RTs may comprise one or more mutations which render the RT more or less stable, less prone to aggregration, and facilitates purification and/or detection, and/or other the modification of properties or characteristics.

A person of ordinary skill in the art will recognize that variant reverse transcriptases derived from other reverse transcriptases, including but not limited to Moloney Murine Leukemia Virus (M-MLV); Human Immunodeficiency Virus (HIV) reverse transcriptase and avian Sarcoma-Leukosis Virus (ASLV) reverse transcriptase, which includes but is not limited to Rous Sarcoma Virus (RSV) reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, Avian Erythroblastosis Virus (AEV) Helper Virus MCAV reverse transcriptase, Avian Myelocytomatosis Virus MC29 Helper Virus MCAV reverse transcriptase, Avian Reticuloendotheliosis Virus (REV-T) Helper Virus REV-A reverse transcriptase, Avian Sarcoma Virus UR2 Helper Virus UR2AV reverse transcriptase, Avian Sarcoma Virus Y73 Helper Virus YAV reverse transcriptase, Rous Associated Virus (RAV) reverse transcriptase, and Myeloblastosis Associated Virus (MAV) reverse transcriptase may be suitably used in the subject methods and composition described herein.

One method of preparing variant RTs is by genetic modification (e.g., by modifying the DNA sequence of a wild-type reverse transcriptase). A number of methods are known in the art that permit the random as well as targeted mutation of DNA sequences (see for example, Ausubel et. al. Short Protocols in Molecular Biology (1995) 3.sup.rd Ed. John Wiley & Sons, Inc.). In addition, there are a number of commercially available kits for site-directed mutagenesis, including both conventional and PCR-based methods. Examples include the QuikChange Site-Directed Mutagenesis Kits (AGILENT®), the Q5© Site-Directed Mutagenesis Kit (NEW ENGLAND BIOLABS®), and GeneArt™ Site-Directed Mutagenesis System (THERMOFISHER SCIENTIFIC®).

In addition, mutant reverse transcriptases may be generated by insertional mutation or truncation (N-terminal, internal, or C-terminal insertions or truncations) according to methodologies known to one skilled in the art. The term “mutation,” as used herein, refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)). Mutations can include a variety of categories, such as single base polymorphisms, microduplication regions, indel, and inversions, and is not meant to be limiting in any way. Mutations can include “loss-of-function” mutations which is the normal result of a mutation that reduces or abolishes a protein activity. Most loss-of-function mutations are recessive, because in a heterozygote the second chromosome copy carries an unmutated version of the gene coding for a fully functional protein whose presence compensates for the effect of the mutation. Mutations also embrace “gain-of-function” mutations, which is one which confers an abnormal activity on a protein or cell that is otherwise not present in a normal condition. Many gain-of-function mutations are in regulatory sequences rather than in coding regions, and can therefore have a number of consequences. For example, a mutation might lead to one or more genes being expressed in the wrong tissues, these tissues gaining functions that they normally lack. Because of their nature, gain-of-function mutations are usually dominant.

Older methods of site-directed mutagenesis known in the art rely on sub-cloning of the sequence to be mutated into a vector, such as an M13 bacteriophage vector, that allows the isolation of single-stranded DNA template. In these methods, one anneals a mutagenic primer (i.e., a primer capable of annealing to the site to be mutated but bearing one or more mismatched nucleotides at the site to be mutated) to the single-stranded template and then polymerizes the complement of the template starting from the 3′ end of the mutagenic primer. The resulting duplexes are then transformed into host bacteria and plaques are screened for the desired mutation.

More recently, site-directed mutagenesis has employed PCR methodologies, which have the advantage of not requiring a single-stranded template. In addition, methods have been developed that do not require sub-cloning. Several issues must be considered when PCR-based site-directed mutagenesis is performed. First, in these methods it is desirable to reduce the number of PCR cycles to prevent expansion of undesired mutations introduced by the polymerase. Second, a selection must be employed in order to reduce the number of non-mutated parental molecules persisting in the reaction. Third, an extended-length PCR method is preferred in order to allow the use of a single PCR primer set. And fourth, because of the non-template-dependent terminal extension activity of some thermostable polymerases it is often necessary to incorporate an end-polishing step into the procedure prior to blunt-end ligation of the PCR-generated mutant product.

Methods of random mutagenesis, which will result in a panel of mutants bearing one or more randomly situated mutations, exist in the art. Such a panel of mutants may then be screened for those exhibiting the desired properties, for example, increased stability, relative to a wild-type reverse transcriptase.

An example of a method for random mutagenesis is the so-called “error-prone PCR method.” As the name implies, the method amplifies a given sequence under conditions in which the DNA polymerase does not support high fidelity incorporation. Although the conditions encouraging error-prone incorporation for different DNA polymerases vary, one skilled in the art may determine such conditions for a given enzyme. A key variable for many DNA polymerases in the fidelity of amplification is, for example, the type and concentration of divalent metal ion in the buffer. The use of manganese ion and/or variation of the magnesium or manganese ion concentration may therefore be applied to influence the error rate of the polymerase.

In various aspects, the RT of the prime editors may be an “error-prone” reverse transcriptase variant. Error-prone reverse transcriptases that are known and/or available in the art may be used. In addition, RT may be made using any previously mentioned method of mutagenesis, including directed evolution processes, such as phage-assisted continuous evolution (PACE) or phage-assisted noncontinuous evolution (PANCE). The term “phage-assisted continuous evolution (PACE),” as used herein, refers to continuous evolution that employs phage as viral vectors. The general concept of PACE technology has been described, for example, in International PCT application, PCT/US2009/056194, filed Sep. 8, 2009, published as WO 2010/028347 on Mar. 11, 2010; International PCT application, PCT/US2011/066747, filed Dec. 22, 2011, published as WO 2012/088381 on Jun. 28, 2012; U.S. application, U.S. Pat. No. 9,023,594, issued May 5, 2015, International PCT application, PCT/US2015/012022, filed Jan. 20, 2015, published as WO 2015/134121 on Sep. 11, 2015, and International PCT application, PCT/US2016/027795, filed Apr. 15, 2016, published as WO 2016/168631 on Oct. 20, 2016, the entire contents of each of which are incorporated herein by reference. Error-prone reverse transcriptases may also be obtain by phage-assisted non-continuous evolution (PANCE),” which as used herein, refers to non-continuous evolution that employs phage as viral vectors. PANCE is a simplified technique for rapid in vivo directed evolution using serial flask transfers of evolving ‘selection phage’ (SP), which contain a gene of interest to be evolved, across fresh E. coli host cells, thereby allowing genes inside the host E. coli to be held constant while genes contained in the SP continuously evolve. Serial flask transfers have long served as a widely-accessible approach for laboratory evolution of microbes, and, more recently, analogous approaches have been developed for bacteriophage evolution. The PANCE system features lower stringency than the PACE system.

Genes for desired mutant reverse transcriptases generated by mutagenesis or evolutionary processes may be sequenced to identify the sites and number of mutations. For those mutants comprising more than one mutation, the effect of a given mutation may be evaluated by introduction of the identified mutation to the wild-type gene by site-directed mutagenesis in isolation from the other mutations borne by the particular mutant. Screening assays of the single mutant thus produced will then allow the determination of the effect of that mutation alone.

Variant RT enzymes used herein may also include other “RT variants” having at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference RT protein, including any wild type RT, or mutant RT, or fragment RT, or other variant of RT disclosed or contemplated herein or known in the art.

In some embodiments, an RT variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or up to 100, or up to 200, or up to 300, or up to 400, or up to 500 or more amino acid changes compared to a reference RT. In some embodiments, the RT variant comprises a fragment of a reference RT, such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of the reference RT. In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type RT (e.g., SEQ ID NO: 1361485).

In some embodiments, the disclosure also may utilize RT fragments which retain their functionality and which are fragments of any herein disclosed RT proteins. In some embodiments, the RT fragment is at least 100 amino acids in length. In some embodiments, the fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, or up to 600 or more amino acids in length.

In still other embodiments, the disclosure also may utilize RT variants which are truncated at the N-terminus or the C-terminus, or both, by a certain number of amino acids which results in a truncated variant which still retains sufficient polymerase function. In some embodiments, the RT truncated variant has a truncation of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, or 250 amino acids at the N-terminal end of the protein. In other embodiments, the RT truncated variant has a truncation of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, or 250 amino acids at the C-terminal end of the protein. In still other embodiments, the RT truncated variant has a truncation at the N-terminal and the C-terminal end which are the same or different lengths.

For example, the prime editors disclosed herein may include a truncated version of M-MLV reverse transcriptase. In this embodiment, the reverse transcriptase contains 4 mutations (D200N, T306K, W313F, T330P; noting that the L603W mutation present in PE2 is no longer present due to the truncation). The DNA sequence encoding this truncated editor is 522 bp smaller than PE2, and therefore makes its potentially useful for applications where delivery of the DNA sequence is challenging due to its size (i.e., adeno-associated virus and lentivirus delivery). This embodiment is referred to as MMLV-RT(trunc) and has the following amino acid sequence:

mmlv-rt (TRUNC)

SGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGST

WLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRL

LDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSG

LPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNS

PTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGF

CRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPF

ELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDA

GKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNP

ATLLPLPEEGLQHNCLDNSRLIN (SEQ ID NO: 1361598)

Key

Linker: (SEQ ID NO: 1361528)

RT (TRUNC): (SEQ ID NO: 1361597)

In various embodiments, the prime editors disclosed herein may comprise one of the Cas9 variants described as follows, or a RT variant thereof having at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any reference RT variants.

Other error-prone reverse transcriptases have been described in the literature, each of which are contemplated for use in the herein methods and compositions. For example, error-prone reverse transcriptases have been described in Bebenek et al., “Error-prone Polymerization by HIV-1 Reverse Transcriptase,” J Biol Chem, 1993, Vol. 268: 10324-10334 and Sebastian-Martin et al., “Transcriptional inaccuracy threshold attenuates differences in RNA-dependent DNA synthesis fidelity between retroviral reverse transcriptases,” Scientific Reports, 2018, Vol. 8: 627, each of which are incorporated by reference. Still further, reverse transcriptases, including error-prone reverse transcriptases can be obtained from a commercial supplier, including ProtoScript® (II) Reverse Transcriptase, AMV Reverse Transcriptase, WarmStart® Reverse Transcriptase, and M-MuLV Reverse Transcriptase, all from NEW ENGLAND BIOLABS®, or AMV Reverse Transcriptase XL, SMARTScribe Reverse Transcriptase, GPR ultra-pure MMLV Reverse Transcriptase, all from TAKARA BIO USA, INC. (formerly CLONTECH).

In still other embodiments, the present methods and compositions may utilize a DNA polymerase that has been evolved into a reverse transcriptase, as described in Effefson et al., “Synthetic evolutionary origin of a proofreading reverse transcriptase,” Science, Jun. 24, 2016, Vol. 352: 1590-1593, the contents of which are incorporated herein by reference.

In some embodiments, the reverse transcriptase is provided as a component of a fusion protein also comprising a napDNAbp. In other words, in some embodiments, the reverse transcriptase is fused to a napDNAbp as a fusion protein.

Some exemplary reverse transcriptases that can be fused to napDNAbp proteins or provided as individual proteins according to various embodiments of this disclosure are provided below. Exemplary reverse transcriptases include variants with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the following wild-type enzymes or partial enzymes:

Description
Sequence (variant substitutions relative to wild type)

Reverse
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

transcriptase
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

(M-MLV RT)
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

wild type
QGFKNSPTLFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

moloney
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIP

murine
GFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

leukemia
GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

virus
ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

Used in PE1
TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

(prime
QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRL

editor 1
SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP (SEQ ID NO:

fusion
1361497)

protein

disclosed

herein)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIP

GFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP

(SEQ ID NO: 1361498)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

T330P
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIP

GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP

(SEQ ID NO: 1361499)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

T330P
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

L603W
QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIP

GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP (SEQ ID NO:

1361500)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N
YPMSQKARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

T330P
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

L603W
QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

E69K
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIP

GFAEMAAPLYPLTKPGTLENWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP (SEQ ID NO:

1361501)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

T330P
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

L603W
QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

E302R
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLRRELGTAGFCRLWIP

GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP (SEQ ID NO:

1361502)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

T330P
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

L603W
QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

E607K
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIP

GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSKGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP (SEQ ID NO:

1361503)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

T330P
PTVPNPYNLLSGPPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

L603W
QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

L139P
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIP

GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP (SEQ ID NO:

1361504)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

T330P
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

L603W
QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

L435G
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIP

GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVIGAPHAVE

ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP (SEQ ID NO:

1361505)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

T330P
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

L603W
QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

N454K
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIP

GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

ALVKQPPDRWLSKARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP (SEQ ID NO:

1361506)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

T330P
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

L603W
QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

T306K
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLWIP

GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP

(SEQ ID NO: 1361507)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

T330P
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

L603W
QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

W313F
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLFIP

GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP (SEQ ID NO:

1361508)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

T330P
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

L603W
QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

D524G
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIP

E562Q
GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

D583N
GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTGGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAQLIALT

QALKMAEGKKLNVYTNSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP (SEQ ID NO:

1361509)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

T330P
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

L603W
QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

E302R
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLRRFLGTAGFCRLFIP

W313F
GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP (SEQ ID NO:

1361510)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

T330P
PTVPNPYNLLSGPPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

L603W
QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

E607K
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLWIP

L139P
GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSKGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP (SEQ ID NO:

1361511)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIILLKATSTPVSIKQ

P51L S67K
YPMKQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

T197A H204R
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

E302K F309N
QGFKNSPALFDEALRRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

W313F T330P
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLRKFLGTAGNCRLFIP

L435G N454K
GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

D524G D583N
GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVIGAPHAVE

H594Q D653N
ALVKQPPDRWLSKARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTGGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTNSRYAFATAHIQGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMANQAARKAAITETPDTSTLLIENSSP

(SEQ ID NO: 1361512)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIILLKATSTPVSIKQ

D200N P51L
YPMKQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

S67K T197A
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

H204R E302K
QGFKNSPALFNEALRRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

F309N W313F
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLRKFLGTAGNCRLFIP

T330P L345G
GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

N454K D524G
GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVIGAPHAVE

D583N H594Q
ALVKQPPDRWLSKARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

D653N
TRPDLTDQPLPDADHTWYTGGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTNSRYAFATAHIQGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMANQAARKAAITETPDTSTLLIENSSP (SEQ ID NO:

1361513)

M-MLV RT
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

D200N T330P
YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH

L603W T306K
PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLP

W313F
QGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGY

in PE2
RASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIP

GFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAK

GVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVE

ALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHG

TRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALT

QALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRL

SIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP (SEQ ID NO:

1361514)

In various embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising one or more of the following mutations: P51L, S67K, E69K, L139P, T197A, D200N, H204R, F209N, E302K, E302R, T306K, F309N, W313F, T330P, L345G, L435G, N454K, D524G, E562Q, D583N, H594Q, L603W, E607K, or D653N in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising one or more of the following mutations: P51X, S67X, E69X, L139X, T197X, D200X, H204X, F209X, E302X, T306X, F309X, W313X, T330X, L345X, L435X, N454X, D524X, E562X, D583X, H594X, L603X, E607X, or D653X in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a P51X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is L.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a S67X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is K.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a E69X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is K.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a L139X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is P.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a T197X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is A.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a D200X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is N.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a H204X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is R.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a F209X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is N.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a E302X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is K.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a T306X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is K.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a F309X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is N.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a W313X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is F.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a T330X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is P.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a L345X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is G.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a L435X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is G.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a N454X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is K.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a D524X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is G.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a E562X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is Q.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a D583X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is N.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a H594X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is Q.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a L603X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is W.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a E607X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is K.

In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a D653X mutation in the wild type M-MLV RT of SEQ ID NO: 1361485 or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In some embodiments, X is N.

The prime editor (PE) system described here contemplates any publicly-available reverse transcriptase described or disclosed in any of the following U.S. patents (each of which are incorporated by reference in their entireties): U.S. Pat. Nos. 10,202,658; 10,189,831; 10,150,955; 9,932,567; 9,783,791; 9,580,698; 9,534,201; and 9,458,484, and any variant thereof that can be made using known methods for installing mutations, or known methods for evolving proteins. The following references describe reverse transcriptases in art. Each of their disclosures are incorporated herein by reference in their entireties.

Herzig, E., Voronin, N., Kucherenko, N. & Hizi, A. A Novel Leu92 Mutant of HIV-1 Reverse Transcriptase with a Selective Deficiency in Strand Transfer Causes a Loss of Viral Replication. J. Virol. 89, 8119-8129 (2015).
Mohr, G. et al. A Reverse Transcriptase-Cas1 Fusion Protein Contains a Cas6 Domain Required for Both CRISPR RNA Biogenesis and RNA Spacer Acquisition. Mol. Cell 72, 700-714.e8 (2018).
Zhao, C., Liu, F. & Pyle, A. M. An ultraprocessive, accurate reverse transcriptase encoded by a metazoan group II intron. RNA 24, 183-195 (2018).
Zimmerly, S. & Wu, L. An Unexplored Diversity of Reverse Transcriptases in Bacteria. Microbiol Spectr 3, MDNA3-0058-2014 (2015).
Ostertag, E. M. & Kazazian Jr, H. H. Biology of Mammalian L1 Retrotransposons. Annual Review of Genetics 35, 501-538 (2001).
Perach, M. & Hizi, A. Catalytic Features of the Recombinant Reverse Transcriptase of Bovine Leukemia Virus Expressed in Bacteria. Virology 259, 176-189 (1999).
Lim, D. et al. Crystal structure of the moloney murine leukemia virus RNase H domain. J. Virol. 80, 8379-8389 (2006).
Zhao, C. & Pyle, A. M. Crystal structures of a group II intron maturase reveal a missing link in spliceosome evolution. Nature Structural & Molecular Biology 23, 558-565 (2016).
Griffiths, D. J. Endogenous retroviruses in the human genome sequence. Genome Biol. 2, REVIEWS1017 (2001).
Baranauskas, A. et al. Generation and characterization of new highly thermostable and processive M-MuLV reverse transcriptase variants. Protein Eng Des Sel 25, 657-668 (2012).
Zimmerly, S., Guo, H., Perlman, P. S. & Lambowltz, A. M. Group II intron mobility occurs by target DNA-primed reverse transcription. Cell 82, 545-554 (1995).
Feng, Q., Moran, J. V., Kazazian, H. H. & Boeke, J. D. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87, 905-916 (1996).
Berkhout, B., Jebbink, M. & Zsiros, J. Identification of an Active Reverse Transcriptase Enzyme Encoded by a Human Endogenous HERV-K Retrovirus. Journal of Virology 73, 2365-2375 (1999).
Kotewicz, M. L., Sampson, C. M., D'Alessio, J. M. & Gerard, G. F. Isolation of cloned Moloney murine leukemia virus reverse transcriptase lacking ribonuclease H activity. Nucleic Acids Res 16, 265-277 (1988).
Arezi, B. & Hogrefe, H. Novel mutations in Moloney Murine Leukemia Virus reverse transcriptase increase thermostability through tighter binding to template-primer. Nucleic Acids Res 37, 473-481 (2009).
Blain, S. W. & Goff, S. P. Nuclease activities of Moloney murine leukemia virus reverse transcriptase. Mutants with altered substrate specificities. J Biol. Chem. 268, 23585-23592 (1993).
Xiong, Y. & Eickbush, T. H. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J 9, 3353-3362 (1990).
Herschhorn, A. & Hizi, A. Retroviral reverse transcriptases. Cell. Mol. Life Sci. 67, 2717-2747 (2010).
Taube, R., Loya, S., Avidan, O., Perach, M. & Hizi, A. Reverse transcriptase of mouse mammary tumour virus: expression in bacteria, purification and biochemical characterization. Biochem. J. 329 (Pt 3), 579-587 (1998).
Liu, M. et al. Reverse Transcriptase-Mediated Tropism Switching in Bordetella Bacteriophage. Science 295, 2091-2094 (2002).
Luan, D. D., Korman, M. H., Jakubczak, J. L. & Eickbush, T. H. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72, 595-605 (1993).
Nottingham, R. M. et al. RNA-seq of human reference RNA samples using a thermostable group II intron reverse transcriptase. RNA 22, 597-613 (2016).
Telesnitsky, A. & Goff, S. P. RNase H domain mutations affect the interaction between Moloney murine leukemia virus reverse transcriptase and its primer-template. Proc. Natl. Acad. Sci. U.S.A. 90, 1276-1280 (1993).
Halvas, E. K., Svarovskaia, E. S. & Pathak, V. K. Role of Murine Leukemia Virus Reverse Transcriptase Deoxyribonucleoside Triphosphate-Binding Site in Retroviral Replication and In Vivo Fidelity. Journal of Virology 74, 10349-10358 (2000).
Nowak, E. et al. Structural analysis of monomeric retroviral reverse transcriptase in complex with an RNA/DNA hybrid. Nucleic Acids Res 41, 3874-3887 (2013).
Stamos, J. L., Lentzsch, A. M. & Lambowitz, A. M. Structure of a Thermostable Group II Intron Reverse Transcriptase with Template-Primer and Its Functional and Evolutionary Implications. Molecular Cell 68, 926-939.e4 (2017).
Das, D. & Georgiadis, M. M. The Crystal Structure of the Monomeric Reverse Transcriptase from Moloney Murine Leukemia Virus. Structure 12, 819-829 (2004).
Avidan, O., Meer, M. E., Oz, I. & Hizi, A. The processivity and fidelity of DNA synthesis exhibited by the reverse transcriptase of bovine leukemia virus. European Journal of Biochemistry 269, 859-867 (2002).
Gerard, G. F. et al. The role of template-primer in protection of reverse transcriptase from thermal inactivation. Nucleic Acids Res 30, 3118-3129 (2002).
Monot, C. et al. The Specificity and Flexibility of L1 Reverse Transcription Priming at Imperfect T-Tracts. PLOS Genetics 9, e1003499 (2013).
Mohr, S. et al. Thermostable group II intron reverse transcriptase fusion proteins and their use in cDNA synthesis and next-generation RNA sequencing. RNA 19, 958-970 (2013).

Any of the references noted above which relate to reverse transriptases are hereby incorporated by reference in their entireties, if not already stated so.

D. PE Fusion Proteins

The prime editor (PE) system described herein contemplate fusion proteins comprising a napDNAbp and a polymerase (e.g., DNA-dependent DNA polymerase or RNA-dependent DNA polymerase, such as, reverse transcriptase), and optionally joined by a linker. The application contemplates any suitable napDNAbp and polymerase (e.g., DNA-dependent DNA polymerase or RNA-dependent DNA polymerase, such as, reverse transcriptase) to be combined in a single fusion protein. Examples of napDNAbps and polymerases (e.g., DNA-dependent DNA polymerase or RNA-dependent DNA polymerase, such as, reverse transcriptase) are each defined herein. Since polymerases are well-known in the art, and the amino acid sequences are readily available, this disclosure is not meant in any way to be limited to those specific polymerases identified herein.

In various embodiments, the fusion proteins may comprise any suitable structural configuration. For example, the fusion protein may comprise from the N-terminus to the C-terminus direction, a napDNAbp fused to a polymerase (e.g., DNA-dependent DNA polymerase or RNA-dependent DNA polymerase, such as, reverse transcriptase). In other embodiments, the fusion protein may comprise from the N-terminus to the C-terminus direction, a polymerase (e.g., a reverse transcriptase) fused to a napDNAbp. The fused domain may optionally be joined by a linker, e.g., an amino acid sequence. In other embodiments, the fusion proteins may comprise the structure NH₂-[napDNAbp]-[polymerase]-COOH; or NH₂-[polymerase]-[napDNAbp]-COOH, wherein each instance of “]-[” indicates the presence of an optional linker sequence. In embodiments wherein the polymerase is a reverse transcriptase, the fusion proteins may comprise the structure NH₂-[napDNAbp]-[RT]-COOH; or NH₂-[RT]-[napDNAbp]-COOH, wherein each instance of “]-[” indicates the presence of an optional linker sequence.

An exemplary fusion protein is depicted in FIG. 14, which shows a fusion protein comprising an MLV reverse transcriptase (“MLV-RT”) fused to a nickase Cas9 (“Cas9(H840A)”) via a linker sequence. This example is not intended to limit scope of fusion proteins that may be utilized for the prime editor (PE) system described herein.

In various embodiments, the prime editor fusion protein may have the following amino acid sequence (referred to herein as “PE1”), which includes a Cas9 variant comprising an H840A mutation (i.e., a Cas9 nickase) and an M-MLV RT wild type, as well as an N-terminal NLS sequence (19 amino acids) and an amino acid linker (32 amino acids) that joins the C-terminus of the Cas9 nickase domain to the N-terminus of the RT domain. The PE fusion protein has the following structure: [NLS]-[Cas9(H840A)]-[linker]-[MMLV_RT(wt)]. The amino acid sequence of PE1 and its individual components are as follows:

Description
Sequence

PE1 fusion

MKRTADGSEFESPKKKRKV
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKK

protein

NLIGALLEDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVE

Cas9 (H840A)-

EDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIE

MMLV_RT (wt)

GDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKK

NGLFGNLIALSLGLTPNFKSNEDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLS

DAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYA

GYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAIL

RRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA

SAQSFIERMINEDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIV

DLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENASLGTYHDLLKIIKDKDELDNEEN

EDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQS

GKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGIL

QTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPV

ENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNR

GKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKEDNLTKAERGGLSELDKAGFIKRQLVETRQI

TKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDERKDFQFYKVREINNYHHAHDAYLN

AVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA

NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNS

DKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPI

DFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHY

EKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQ

AENIIHLFTLTNLGAPAAFKYEDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD

SGGSSGGSSGSETPGTSESATPESSGGSSGGSS

TLNIEDEYRLHETSKEPDVSLGSTWLSDFP

QAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSP

WNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFF

CLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFDEALHRDLADFRIQHPDLIL

LQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTE

ARKETVMGQPTPKTPRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQ

EIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPP

CLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQ

FGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQR

KAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEI

YRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAA

ITETPDTSTLLIENSSP
SGGSKRTADGSEFEPKKKRKV (SEQ ID NO: 1361515)

Key:

Nuclear localization sequence (NLS) Top: (SEQ ID NO: 1361532); Bottom: (SEQ ID NO: 1361541)

Cas9(H840A) (SEQ ID NO: 1361454)

33-amino acid linker
(SEQ ID NO: 1361528)

M-MLV reverse transcriptase (SEQ ID NO: 1361485)

In another embodiment, the prime editor fusion protein may have the following amino acid sequence (referred to herein as “PE2”), which includes a Cas9 variant comprising an H840A mutation (i.e., a Cas9 nickase) and an M-MLV RT comprising mutations D200N, T330P, L603W, T306K, and W313F, as well as an N-terminal NLS sequence (19 amino acids) and an amino acid linker (33 amino acids) that joins the C-terminus of the Cas9 nickase domain to the N-terminus of the RT domain. The PE2 fusion protein has the following structure: [NLS]-[Cas9(H840A)]-[linker]-[MMLV_RT(D200N)(T330P)(L603W)(T306K)(W313F)]. The amino acid sequence of PE2 is as follows:

PE2 fusion

MKRTADGSEFESPKKKRKV
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKK

protein

NLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESELVE

Cas9 (H840A)-

EDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKERGHFLIE

MMLV_RT

GDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKK

(D200N, T330P,

NGLFGNLIALSLGLTPNFKSNEDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLS

L603W,

DAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYA

T306K,

GYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTEDNGSIPHQIHLGELHAIL

W313F)

RRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA

SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIV

DLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENASLGTYHDLLKIIKDKDELDNEEN

EDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQS

GKTILDELKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGIL

QTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPV

ENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNR

GKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKEDNLTKAERGGLSELDKAGFIKRQLVETRQI

TKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDERKDFQFYKVREINNYHHAHDAYLN

AVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA

NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNS

DKLIARKKDWDPKKYGGEDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPI

DFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHY

EKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQ

AENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD

SGGSSGGSSGSETPGTSESATPESSGGSSGGSS

TLNIEDEYRLHETSKEPDVSLGSTWLSDFP

QAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSP

WNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFF

CLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLIL

LQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTE

ARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLENWGPDQQKAYQ

EIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPP

CLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQ

FGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQR

KAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEI

YRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAA

ITETPDTSTLLIENSSP
SGGSKRTADGSEFEPKKKRKV (SEQ ID NO: 1361516)

Key:

Nuclear localization sequence (NLS) Top: (SEQ ID NO: 1361532); Bottom: (SEQ ID NO: 1361541)

Cas9(H840A) (SEQ ID NO: 1361454)

33-amino acid linker
(SEQ ID NO: 1361528)

M-MLV reverse transcriptase (SEQ ID NO: 1361514)

In still other embodiments, the prime editor fusion protein may have the following amino acid sequences:

PE fusion

MKRTADGSEFESPKKKRKV
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVR

protein

QAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTN

MMLV_RT (wt)-

DYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAF

32 aa-

EWRDPEMGISGQLTWTRLPQGFKNSPTLFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSE

Cas9 (H840A)

LDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT

PRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGL

PDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKD

AGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLP

LPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIW

AKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEI

KNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIEN

SSP

SGGSSGGSSGSETPGTSESATPESSGGSSGGSS

DKKYSIGLDIGTNSVGWAVITDEYKVP

SKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMA

KVDDSFFHRLEESELVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLI

YLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLS

KSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNEDLAEDAKLQLSKDTYDDDLDNLLA

QIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQL

PEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTED

NGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKS

EETITPWNFEEVVDKGASAQSFIERMTNEDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE

GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENASLGTYH

DLLKIIKDKDELDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMKQLKRRRYTG

WGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLH

EHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRI

EEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSEL

KDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKEDNLTKAERGGLSE

LDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDERKDFQFY

KVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKY

FFYSNIMNFFKTEITLANGEIRKRPLIEINGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTE

VQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE

LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNE

LALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLD

KVLSAYNKHRDKPIREQAENIIHLFTLINLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQS

ITGLYETRIDLSQLGGD
SGGSKRTADGSEFEPKKKRKV (SEQ ID NO: 1361517)

Key:

Nuclear localization sequence (NLS) Top: (SEQ ID NO: 1361532),

Bottom: (SEQ ID NO: 1361541)

Cas9(H840A) (SEQ ID NO: 1361454)

33-amino acid linker
(SEQ ID NO: 1361528?)

M-MLV reverse transcriptase (SEQ ID NO: 1361497)

PE fusion

MKRTADGSEFESPKKKRKV
TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVR

protein

QAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTN

MMLV_RT (wt)-

DYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAF

60 aa-

EWRDPEMGISGQLTWTRLPQGFKNSPTLFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSE

Cas9 (H840A)

LDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT

PRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGL

PDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKD

AGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLP

LPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIW

AKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEI

KNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIEN

SSP

S
GGSSGGSSGSETPGTSESATPESAGSYPYDVPDYAGSAAPAAKKKKLDGSGSGGSSGGS

DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLEDSGETAEATRL

KRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESELVEEDKKHERHPIFGNIVDEVA

YHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQ

TYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLEGNLIALSLGLTPNEK

SNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAP

LSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPI

LEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEK

ILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMINEDKNLPNE

KVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLEKTNRKVTVKQLKEDY

FKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDELDNEENEDILEDIVLTLTLFEDREM

IEERLKTYAHLEDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFM

QLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPE

NIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGR

DMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYW

RQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDEN

DKLIREVKVITLKSKLVSDERKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFV

YGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIEINGETGEI

VWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGED

SPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKL

PKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLEVEQ

HKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAF

KYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD
SGGSKRTADGSEFEPKKKR

KV (SEQ ID NO: 1361518)

Key:

Nuclear localization sequence (NLS) Top: (SEQ ID NO: 1361532),

Bottom: (SEQ ID NO: 1361541)

Cas9(H840A) (SEQ ID NO: 1361454)

33-amino acid linker
(SEQ ID NO: 1361585)

M-MLV reverse transcriptase (SEQ ID NO: 1361497)

PE fusion

MKRTADGSEFESPKKKRKV
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKK

protein

NLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVE

Cas9 (H840A)-

EDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKERGHFLIE

FEN1-

GDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKK

MMLV_RT

NGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLS

(D200N, T330P,

DAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYA

L603W,

GYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTEDNGSIPHQIHLGELHAIL

T306K,

RRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA

W313F)

SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIV

DLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENASLGTYHDLLKIIKDKDELDNEEN

EDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQS

GKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGIL

QTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPV

ENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNR

GKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKEDNLTKAERGGLSELDKAGFIKRQLVETRQI

TKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDERKDFQFYKVREINNYHHAHDAYLN

AVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA

NGEIRKRPLIEINGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNS

DKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPI

DFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHY

EKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQ

AENIIHLFTLTNLGAPAAFKYEDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD

SGGSSGGSSGSETPGTSESATPESSGGSSGGSSGIQGLAKLIADVAPSAIRENDIKSYFGRKV

AIDASMSIYQFLIAVRQGGDVLQNEEGETTSHLMGMFYRTIRMMENGIKPVYVEDGKPPQLKS

GELAKRSERRAEAEKQLQQAQAAGAEQEVEKFTKRLVKVTKQHNDECKHLLSLMGIPYLDAPS

EAEASCAALVKAGKVYAAATEDMDCLTFGSPVLMRHLTASEAKKLPIQEFHLSRILQELGLNQ

EQFVDLCILLGSDYCESIRGIGPKRAVDLIQKHKSIEEIVRRLDPNKYPVPENWLHKEAHQLF

LEPEVLDPESVELKWSEPNEEELIKFMCGEKQFSEERIRSGVKRLSKSRQGSTQGRLDDFFKV

TGSLSSAKRKEPEPKGSTKKKAKTGAAGKFKRGKSGGSSGGSSGSETPGTSESATPESSGGSS

GGSS

TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPV

SIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRV

EDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTW

TRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLG

NLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCR

LFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQ

GYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAP

HAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILA

EAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAEL

IALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFL

PKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSP
SGGSKRTADGSE

FEPKKKRKV (SEQ ID NO: 1361519)

Key:

Nuclear localization sequence (NLS) Top: (SEQ ID NO: 1361532),

Bottom: (SEQ ID NO: 1361541)

Cas9(H840A) (SEQ ID NO: 1361454)

32-amino acid linker
(SEQ ID NO: 1361528?)

M-MLV reverse transcriptase (SEQ ID NO: 1361514)

FEN1: SEQ ID NO: 1361542

In various embodiments, the prime editor fusion proteins contemplated herein may also include any variants of the above-disclosed sequences having an amino acid sequence that is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to PE1, PE2, or any of the above indicated prime editor fusion sequences.

In some embodiments, linkers may be used to link any of the peptides or peptide domains or moieties of the invention (e.g., a napDNAbp linked or fused to a reverse transcriptase).

In other embodiments, the prime editor fusion proteins can be based on SaCas9 or on SpCas9 nickases with altered PAM specificities, such as the following exemplary sequences:

SaCas9-M-MLV RT
MKRTADGSEFESPKKKRKVGKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKE

prime editor
ANVENNEGRRSKRGARRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKG

LSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAE

LQLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRR

TYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLNNLV

ITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGKPEF

TNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSELTQEEI

EQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKVDLSQQKEIP

TTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQKMINEM

QKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPF

NYEVDHIIPRSVSFDNSFNNKVLVKQEEASKKGNRTPFQYLSSSDSKISYETFKKHI

LNLAKGKGRISKTKKEYLLEERDINRESVQKDFINRNLVDTRYATRGLMNLLRSYER

VNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIANADFIFKEWKKLD

KAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVDKKP

NRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQT

YQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDI

TDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEA

KKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLEN

MNDKRPPRIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKGSGGSSGGSSGS

ETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAE

TGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQS

PWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVL

DLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFDEALHRD

LADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQ

KQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGECRLWIPGFAEM

AAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYA

KGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVI

LAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGL

QHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIW

AKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGLLT

SEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAIT

ETPDTSTLLIENSSPSGGSKRTADGSEFEPKKKRKV (SEQ ID NO: 1361599)

SpCas9 (H840A)-
MKRTADGSEFESPKKKRKVDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTD

VRQR-Maloney
RHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDS

Murine Leukemia
FFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRL

Virus Reverse
IYLALAHMIKERGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKA

Transcriptase
ILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNEKSNFDLAEDAKLQLS

prime editor
KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR

YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILE

KMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNRE

KIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERM

TNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLL

FKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDELDN

EENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMKQLKRRRYTGWGRLSRKL

INGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEH

IANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRER

MKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYD

VDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQ

RKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIR

EVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEF

VYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIET

NGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR

KKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPI

DFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARELQKGNELALPSKYVNFL

YLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSA

YNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKQYRSTKEVLDATLIHQ

SITGLYETRIDLSQLGGDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIED

EYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNK

RVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEM

GISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSEL

DCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQ

PTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIK

QALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAG

WPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQA

LLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADH

TWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGK

KLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSII

HCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEF

EPKKKRKV) (SEQ ID NO: 1361600)

SpCas9 (H840A)-
MKRTADGSEFESPKKKRKVDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTD

VRER-Maloney
RHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDS

Murine Leukemia
FFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRL

Virus Reverse
IYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKA

Transcriptase
ILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS

prime editor
KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR

YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILE

KMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNRE

KIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERM

TNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLL

FKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDELDN

EENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKL

INGIRDKQSGKTILDELKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEH

IANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRER

MKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYD

VDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQ

RKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIR

EVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEF

VYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIET

NGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR

KKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPI

DFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARELQKGNELALPSKYVNFL

YLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSA

YNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQ

SITGLYETRIDLSQLGGDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIED

EYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNK

RVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEM

GISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSEL

DCQQGTRALLQTLGNLGYRASAKKAQICQKOVKYLGYLLKEGQRWLTEARKETVMGQ

PTPKTPROLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQOKAYQEIK

QALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAG

WPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKOPPDRWLSNARMTHYQA

LLLDTDRVQFGPVVALNPATLLPLPEEGLOHNCLDILAEAHGTRPDLTDOPLPDADH

TWYTDGSSLLQEGORKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGK

KLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSII

HCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEF

EPKKKRKV (SEQ ID NO: 1361601)

In yet other embodiments, the prime editor fusion proteins contemplated herein may include a Cas9 nickase (e.g., Cas9 (H840A)) fused to a truncated version of M-MLV reverse transcriptase. In this embodiment, the reverse transcriptase also contains 4 mutations (D200N, T306K, W313F, T330P; noting that the L603W mutation present in PE2 is no longer present due to the truncation). The DNA sequence encoding this truncated editor is 522 bp smaller than PE2, and therefore makes its potentially useful for applications where delivery of the DNA sequence is challenging due to its size (i.e. adeno-associated virus and lentivirus delivery). This embodiment is referred to as Cas9(H840A)-MMLV-RT(trunc) or “PE2-short” or “PE2-trunc” and has the following amino acid sequence:

CAS9 (H840a)-

MKRTADGSEFESPKKKRKV
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTD

mmlv-rt (TRUNC)

RHSIKKNLIGALLEDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDS

OR PE2-SHORT

FFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRL

IYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKA

ILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNEKSNEDLAEDAKLQLS

KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKR

YDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILE

KMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPELKDNRE

KIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERM

TNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLL

FKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRENASLGTYHDLLKIIKDKDELDN

EENEDILEDIVLTLTLFEDREMIEERLKTYAHLEDDKVMKQLKRRRYTGWGRLSRKL

INGIRDKQSGKTILDELKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEH

IANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRER

MKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYD

VDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQ

RKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIR

EVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEF

VYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIET

NGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIAR

KKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPI

DFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNEL

YLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSA

YNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQ

SITGLYETRIDLSQLGGD
SGGSSGGSSGSETPGTSESATPESSGGSSGGSS
TLNIED

EYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ

YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNK

RVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEM

GISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSEL

DCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQ

PTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIK

QALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAG

WPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQA

LLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDNSRLIN
SGGSKRTADGSEFEPK

KKRKV (SEQ ID NO: 1361602)

key:

Nuclear localization sequence (NLS)

Top: (SEQ ID NO: 1361532)

Bottom: (SEQ ID NO: 1361541)

Cas9(H840A) (SEQ ID NO: 1361454)

33-amino acid linker 1 (SEQ ID NO: 1361528)

M-MLV TRUNCATED reverse transcriptase: (SEQ ID NO: 1361597)

33-amino acid linker 2 (SEQ ID NO: 1361541)

FEN1 (SEQ ID NO: 1361542)

See FIG. 36, which provides a bar graph comparing the efficiency (i.e., “% of total sequencing reads with the specified edit or indels”) of PE2, PE2-trunc, PE3, and PE3-trunc over different target sites in various cell lines. The data shows that the prime editors comprising the truncated RT variants were about as efficient as the prime editors comprising the non-truncated RT proteins.

In certain embodiments, linkers may be used to link any of the peptides or peptide domains or moieties of the invention (e.g., a napDNAbp linked or fused to a reverse transcriptase).

E. Linkers and Other Fusion Protein Domains

The PE fusion proteins may comprise various other domains besides the napDNAbp (e.g., Cas9 domain) and the polymerase domain (e.g., RT domain). For example, in the case where the napDNAbp is a Cas9 and the polymerase is a RT, the PE fusion proteins may comprise one or more linkers that join the Cas9 domain with the RT domain. The linkers may also join other functional domains, such as nuclear localization sequences (NLS) or a FEN1 (or other flap endonuclease) to the PE fusion proteins or a domain thereof.

(i) Linkers

As defined above, the term “linker,” as used herein, refers to a chemical group or a molecule linking two molecules or moieties, e.g., a binding domain and a cleavage domain of a nuclease. In some embodiments, a linker joins a gRNA binding domain of an RNA-programmable nuclease and the catalytic domain of a polymerase (e.g., a reverse transcriptase). In some embodiments, a linker joins a dCas9 and reverse transcriptase. Typically, the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.

The linker may be as simple as a covalent bond, or it may be a polymeric linker many atoms in length. In some embodiments, the linker is a polypeptide or based on amino acids. In other embodiments, the linker is not peptide-like. In some embodiments, the linker is a covalent bond (e.g., a carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.). In some embodiments, the linker is a carbon-nitrogen bond of an amide linkage. In some embodiments, the linker is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic or heteroaliphatic linker. In some embodiments, the linker is polymeric (e.g., polyethylene, polyethylene glycol, polyamide, polyester, etc.). In some embodiments, the linker comprises a monomer, dimer, or polymer of aminoalkanoic acid. In some embodiments, the linker comprises an aminoalkanoic acid (e.g., glycine, ethanoic acid, alanine, beta-alanine, 3-aminopropanoic acid, 4-aminobutanoic acid, 5-pentanoic acid, etc.). In some embodiments, the linker comprises a monomer, dimer, or polymer of aminohexanoic acid (Ahx). In some embodiments, the linker is based on a carbocyclic moiety (e.g., cyclopentane, cyclohexane). In other embodiments, the linker comprises a polyethylene glycol moiety (PEG). In other embodiments, the linker comprises amino acids. In some embodiments, the linker comprises a peptide. In some embodiments, the linker comprises an aryl or heteroaryl moiety. In some embodiments, the linker is based on a phenyl ring. The linker may include functionalized moieties to facilitate attachment of a nucleophile (e.g., thiol, amino) from the peptide to the linker. Any electrophile may be used as part of the linker. Exemplary electrophiles include, but are not limited to, activated esters, activated amides, Michael acceptors, alkyl halides, aryl halides, acyl halides, and isothiocyanates.

In some other embodiments, the linker comprises the amino acid sequence (GGGGS)n (SEQ ID NO: 1361520), (G)n (SEQ ID NO: 1361521), (EAAAK)n (SEQ ID NO: 1361522), (GGS)n (SEQ ID NO: 1361523), (SGGS)n (SEQ ID NO: 1361524), (XP)n (SEQ ID NO: 1361525), or any combination thereof, wherein n is independently an integer between 1 and 30, and wherein X is any amino acid. In some embodiments, the linker comprises the amino acid sequence (GGS)n (SEQ ID NO: 1361526), wherein n is 1, 3, or 7. In some embodiments, the linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 1361527). In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 1361528). In some embodiments, the linker comprises the amino acid sequence SGGSGGSGGS (SEQ ID NO: 1361529). In some embodiments, the linker comprises the amino acid sequence SGGS (SEQ ID NO: 1361530).

In particular, the following linkers can be used in various embodiments to join prime editor domains with one another:

GGS;

(SEQ ID NO: 1361523)

GGSGGS;

(SEQ ID NO: 1361523)

GGSGGSGGS;

(SEQ ID NO: 1361528)

SGGSSGGSSGSETPGTSESATPESSGGSSGGSS;

(SEQ ID NO: 1361527)

SGSETPGTSESATPES;

(SEQ ID NO: 1361585)

SGGSSGGSSGSETPGTSESATPESAGSYPYDVPDYAGSAAPAAKKKKLD

GSGSGGSSGGS.

(ii) Nuclear Localization Sequence (NLS)

In various embodiments, the PE fusion proteins may comprise one or more nuclear localization sequences (NLS), which help promote translocation of a protein into the cell nucleus. Such sequences are well-known in the art and can include the following examples:

Description
Sequence
SEQ ID NO:

NLS of SV40
PKKKRKV
SEQ ID NO: 1361531

large T-Ag

NLS
MKRTADGSEFESPKKKRKV
SEQ ID NO: 1361532

NLS
MDSLLMNRRKFLYQFKNVRWAKGRRETYLC
SEQ ID NO: 1361533

NLS of
AVKRPAATKKAGQAKKKKLD
SEQ ID NO: 1361534

nucleoplasmin

NLS of EGL-13
MSRRRKANPTKLSENAKKLAKEVEN
SEQ ID NO: 1361535

NLS of c-MYC
PAAKRVKLD
SEQ ID NO: 1361536

NLS of TUS-
KLKIKRPVK
SEQ ID NO: 1361537

protein

NLS of polyoma
VSRKRPRP
SEQ ID NO: 1361538

large T-Ag

NLS of
EGAPPAKRAR
SEQ ID NO: 1361539

Hepatitis D

virus antigen

NLS of
PPQPKKKPLDGE
SEQ ID NO: 1361540

murine p53

SGGSKRTADGSEFEPKKKRKV
SEQ ID NO: 1361541

The NLS examples above are non-limiting. The fusion proteins may comprise any known NLS sequence, including any of those described in Cokol et al., “Finding nuclear localization signals,” EMBO Rep., 2000, 1(5): 411-415 and Freitas et al., “Mechanisms and Signals for the Nuclear Import of Proteins,” Current Genomics, 2009, 10(8): 550-7, each of which are incorporated herein by reference.

In various embodiments, the prime editors and constructs encoding the prime editors disclosed herein further comprise one or more, preferably, at least two nuclear localization signals. In certain embodiments, the prime editors comprise at least two NLSs. In embodiments with at least two NLSs, the NLSs can be the same NLSs or they can be different NLSs. In addition, the NLSs may be expressed as part of a fusion protein with the remaining portions of the prime editors. In some embodiments, one or more of the NLSs are bipartite NLSs (“bpNLS”). In certain embodiments, the disclosed fusion proteins comprise two bipartite NLSs. In some embodiments, the disclosed fusion proteins comprise more than two bipartite NLSs.

The location of the NLS fusion can be at the N-terminus, the C-terminus, or within a sequence of a prime editor (e.g., inserted between the encoded napDNAbp component (e.g., Cas9) and a polymerase domain (e.g., a reverse transcriptase domain).

The NLSs may be any known NLS sequence in the art. The NLSs may also be any future-discovered NLSs for nuclear localization. The NLSs also may be any naturally-occurring NLS, or any non-naturally occurring NLS (e.g., an NLS with one or more desired mutations).

The term “nuclear localization sequence” or “NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., International PCT application PCT/EP2000/011690, filed Nov. 23, 2000, published as WO/2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference. In some embodiments, an NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 1361531), MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 1361533), KRTADGSEFESPKKKRKV (SEQ ID NO: 1361659), or KRTADGSEFEPKKKRKV (SEQ ID NO: 1361660). In other embodiments, NLS comprises the amino acid sequences NLSKRPAAIKKAGQAKKKK (SEQ ID NO: 1361661), PAAKRVKLD (SEQ ID NO: 1361536), RQRRNELKRSF (SEQ ID NO: 1361662), NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 1361663).

In one aspect of the disclosure, a prime editor may be modified with one or more nuclear localization signals (NLS), preferably at least two NLSs. In certain embodiments, the prime editors are modified with two or more NLSs. The disclosure contemplates the use of any nuclear localization signal known in the art at the time of the disclosure, or any nuclear localization signal that is identified or otherwise made available in the state of the art after the time of the instant filing. A representative nuclear localization signal is a peptide sequence that directs the protein to the nucleus of the cell in which the sequence is expressed. A nuclear localization signal is predominantly basic, can be positioned almost anywhere in a protein's amino acid sequence, generally comprises a short sequence of four amino acids (Autieri & Agrawal, (1998) J. Biol. Chem. 273: 14731-37, incorporated herein by reference) to eight amino acids, and is typically rich in lysine and arginine residues (Magin et al., (2000) Virology 274: 11-16, incorporated herein by reference). Nuclear localization signals often comprise proline residues. A variety of nuclear localization signals have been identified and have been used to effect transport of biological molecules from the cytoplasm to the nucleus of a cell. See, e.g., Tinland et al., (1992) Proc. Natl. Acad. Sci. U.S.A. 89:7442-46; Moede et al., (1999) FEBS Lett. 461:229-34, which is incorporated by reference. Translocation is currently thought to involve nuclear pore proteins.

Most NLSs can be classified in three general groups: (i) a monopartite NLS exemplified by the SV40 large T antigen NLS (PKKKRKV (SEQ ID NO: 1361531)); (ii) a bipartite motif consisting of two basic domains separated by a variable number of spacer amino acids and exemplified by the Xenopus nucleoplasmin NLS (KRXXXXXXXXXXKKKL (SEQ ID NO: 1361664)); and (iii) noncanonical sequences such as M9 of the hnRNP Al protein, the influenza virus nucleoprotein NLS, and the yeast Gal4 protein NLS (Dingwall and Laskey 1991).

Nuclear localization signals appear at various points in the amino acid sequences of proteins. NLS's have been identified at the N-terminus, the C-terminus and in the central region of proteins. Thus, the disclosure provides prime editors that may be modified with one or more NLSs at the C-terminus, the N-terminus, as well as at in internal region of the prime editor. The residues of a longer sequence that do not function as component NLS residues should be selected so as not to interfere, for example tonically or sterically, with the nuclear localization signal itself. Therefore, although there are no strict limits on the composition of an NLS-comprising sequence, in practice, such a sequence can be functionally limited in length and composition.

The present disclosure contemplates any suitable means by which to modify a prime editor to include one or more NLSs. In one aspect, the prime editors may be engineered to express a prime editor protein that is translationally fused at its N-terminus or its C-terminus (or both) to one or more NLSs, i.e., to form a prime editor-NLS fusion construct. In other embodiments, the prime editor-encoding nucleotide sequence may be genetically modified to incorporate a reading frame that encodes one or more NLSs in an internal region of the encoded prime editor. In addition, the NLSs may include various amino acid linkers or spacer regions encoded between the prime editor and the N-terminally, C-terminally, or internally-attached NLS amino acid sequence, e.g, and in the central region of proteins. Thus, the present disclosure also provides for nucleotide constructs, vectors, and host cells for expressing fusion proteins that comprise a prime editor and one or more NLSs.

The prime editors described herein may also comprise nuclear localization signals which are linked to a prime editor through one or more linkers, e.g., and polymeric, amino acid, nucleic acid, polysaccharide, chemical, or nucleic acid linker element. The linkers within the contemplated scope of the disclosure are not intented to have any limitations and can be any suitable type of molecule (e.g., polymer, amino acid, polysaccharide, nucleic acid, lipid, or any synthetic chemical linker domain) and be joined to the prime editor by any suitable strategy that effectuates forming a bond (e.g., covalent linkage, hydrogen bonding) between the prime editor and the one or more NLSs.

(iii) Flap Endonucleases (e.g., FEN1)

In various embodiments, the PE fusion proteins may comprise one or more flap endonucleases (e.g., FEN1), which refers to an enzyme that catalyzes the removal of 5′ single strand DNA flaps. These are naturally occurring enzymes that process the removal of 5′ flaps formed during cellular processes, including DNA replication. The prime editing methods herein described may utilize endogenously supplied flap endonucleases or those provided in trans to remove the 5′ flap of endogenous DNA formed at the target site during prime editing. Flap endonucleases are known in the art and can be found described in Patel et al., “Flap endonucleases pass 5′-flaps through a flexible arch using a disorder-thread-order mechanism to confer specificity for free 5′-ends,” Nucleic Acids Research, 2012, 40(10): 4507-4519 and Tsutakawa et al., “Human flap endonuclease structures, DNA double-base flipping, and a unified understanding of the FEN1 superfamily,” Cell, 2011, 145(2): 198-211 (each of which are incorporated herein by reference). An exemplary flap endonuclease is FEN1, which can be represented by the following amino acid sequence:

Description
Sequence
SEQ ID NO:

FEN1
MGIQGLAKLIADVAPSAIRENDIKSYFGRKVAIDASMSIYQFLIAVRQGGDVLQN
SEQ ID NO:

Wild type
EEGETTSHLMGMFYRTIRMMENGIKPVYVFDGKPPQLKSGELAKRSERRAEAEKQ
1361542

(wt)
LQQAQAAGAEQEVEKFTKRLVKVTKQHNDECKHLLSLMGIPYLDAPSEAEASCAA

LVKAGKVYAAATEDMDCLTFGSPVLMRHLTASEAKKLPIQEFHLSRILQELGLNQ

EQFVDLCILLGSDYCESIRGIGPKRAVDLIQKHKSIEEIVRRLDPNKYPVPENWL

HKEAHQLFLEPEVLDPESVELKWSEPNEEELIKFMCGEKQFSEERIRSGVKRLSK

SRQGSTQGRLDDFFKVTGSLSSAKRKEPEPKGSTKKKAKTGAAGKFKRGK

The flap endonucleases may also include any FEN1 variant, mutant, or other flap endonuclease ortholog, homolog, or variant. Non-limiting examples are as follows:

Description
Sequence
SEQ ID NO:

FEN1
MGIQGLAKLIADVAPSAIRENDIKSYFGRKVAIDASMSIYQFLIAVRQGGDVLQN
SEQ ID NO:

K168R
EEGETTSHLMGMFYRTIRMMENGIKPVYVFDGKPPQLKSGELAKRSERRAEAEKQ
1361543

(relative
LQQAQAAGAEQEVEKFTKRLVKVTKQHNDECKHLLSLMGIPYLDAPSEAEASCAA

to FEN1 wt)
LVRAGKVYAAATEDMDCLTFGSPVLMRHLTASEAKKLPIQEFHLSRILQELGLNQ

EQFVDLCILLGSDYCESIRGIGPKRAVDLIQKHKSIEEIVRRLDPNKYPVPENWL

HKEAHQLFLEPEVLDPESVELKWSEPNEEELIKFMCGEKQFSEERIRSGVKRLSK

SRQGSTQGRLDDFFKVTGSLSSAKRKEPEPKGSTKKKAKTGAAGKFKRGK

FEN1
MGIQGLAKLIADVAPSAIRENDIKSYFGRKVAIDASMSIYQFLIAVRQGGDVLQN
SEQ ID NO:

S187A
EEGETTSHLMGMFYRTIRMMENGIKPVYVFDGKPPQLKSGELAKRSERRAEAEKQ
1361544

(relative
LQQAQAAGAEQEVEKFTKRLVKVTKQHNDECKHLLSLMGIPYLDAPSEAEASCAA

to FEN1 wt)
LVKAGKVYAAATEDMDCLTFGAPVLMRHLTASEAKKLPIQEFHLSRILQELGLNQ

EQFVDLCILLGSDYCESIRGIGPKRAVDLIQKHKSIEEIVRRLDPNKYPVPENWL

HKEAHQLFLEPEVLDPESVELKWSEPNEEELIKFMCGEKQFSEERIRSGVKRLSK

SRQGSTQGRLDDFFKVTGSLSSAKRKEPEPKGSTKKKAKTGAAGKFKRGK

FEN1
MGIQGLAKLIADVAPSAIRENDIKSYFGRKVAIDASMSIYQFLIAVRQGGDVLQN
SEQ ID NO:

K354R
EEGETTSHLMGMFYRTIRMMENGIKPVYVEDGKPPQLKSGELAKRSERRAEAEKQ
1361545

(relative
LQQAQAAGAEQEVEKFTKRLVKVTKQHNDECKHLLSLMGIPYLDAPSEAEASCAA

to FEN1 wt)
LVKAGKVYAAATEDMDCLTFGSPVLMRHLTASEAKKLPIQEFHLSRILQELGLNQ

EQFVDLCILLGSDYCESIRGIGPKRAVDLIQKHKSIEEIVRRLDPNKYPVPENWL

HKEAHQLFLEPEVLDPESVELKWSEPNEEELIKFMCGEKQFSEERIRSGVKRLSK

SRQGSTQGRLDDFFKVTGSLSSARRKEPEPKGSTKKKAKTGAAGKFKRGK

GEN1
MGVNDLWQILEPVKQHIPLRNLGGKTIAVDLSLWVCEAQTVKKMMGSVMKPHLRN
SEQ ID NO:

LFFRISYLTQMDVKLVFVMEGEPPKLKADVISKRNQSRYGSSGKSWSQKTGRSHF
1361546

KSVLRECLHMLECLGIPWVQAAGEAEAMCAYLNAGGHVDGCLTNDGDTFLYGAQT

VYRNFTMNTKDPHVDCYTMSSIKSKLGLDRDALVGLAILLGCDYLPKGVPGVGKE

QALKLIQILKGQSLLQRFNRWNETSCNSSPQLLVTKKLAHCSVCSHPGSPKDHER

NGCRLCKSDKYCEPHDYEYCCPCEWHRTEHDRQLSEVENNIKKKACCCEGFPFHE

VIQEFLLNKDKLVKVIRYQRPDLLLFQRFTLEKMEWPNHYACEKLLVLLTHYDMI

ERKLGSRNSNQLQPIRIVKTRIRNGVHCFEIEWEKPEHYAMEDKQHGEFALLTIE

EESLFEAAYPEIVAVYQKQKLEIKGKKQKRIKPKENNLPEPDEVMSFQSHMTLKP

TCEIFHKQNSKLNSGISPDPTLPQESISASLNSLLLPKNTPCLNAQEQFMSSLRP

LAIQQIKAVSKSLISESSQPNTSSHNISVIADLHLSTIDWEGTSFSNSPAIQRNT

FSHDLKSEVESELSAIPDGFENIPEQLSCESERYTANIKKVLDEDSDGISPEEHL

LSGITDLCLQDLPLKERIFTKLSYPQDNLQPDVNLKTLSILSVKESCIANSGSDC

TSHLSKDLPGIPLQNESRDSKILKGDQLLQEDYKVNTSVPYSVSNTVVKTCNVRP

PNTALDHSRKVDMQTTRKILMKKSVCLDRHSSDEQSAPVFGKAKYTTQRMKHSSQ

KHNSSHFKESGHNKLSSPKIHIKETEQCVRSYETAENEESCFPDSTKSSLSSLQC

HKKENNSGTCLDSPLPLRQRLKLRFQST

ERCC5
MGVQGLWKLLECSGRQVSPEALEGKILAVDISIWLNQALKGVRDRHGNSIENPHL
SEQ ID NO:

LTLFHRLCKLLFFRIRPIFVEDGDAPLLKKQTLVKRRQRKDLASSDSRKTTEKLL
1361547

KTFLKRQAIKTAFRSKRDEALPSLTQVRRENDLYVLPPLQEEEKHSSEEEDEKEW

QERMNQKQALQEEFFHNPQAIDIESEDESSLPPEVKHEILTDMKEFTKRRRTLFE

AMPEESDDFSQYQLKGLLKKNYLNQHIEHVQKEMNQQHSGHIRRQYEDEGGFLKE

VESRRVVSEDTSHYILIKGIQAKTVAEVDSESLPSSSKMHGMSFDVKSSPCEKLK

TEKEPDATPPSPRTLLAMQAALLGSSSEEELESENRRQARGRNAPAAVDEGSISP

RTLSAIKRALDDDEDVKVCAGDDVQTGGPGAEEMRINSSTENSDEGLKVRDGKGI

PFTATLASSSVNSAEEHVASTNEGREPTDSVPKEQMSLVHVGTEAFPISDESMIK

DRKDRLPLESAVVRHSDAPGLPNGRELTPASPTCTNSVSKNETHAEVLEQQNELC

PYESKEDSSLLSSDDETKCKPNSASEVIGPVSLQETSSIVSVPSEAVDNVENVVS

FNAKEHENFLETIQEQQTTESAGQDLISIPKAVEPMEIDSEESESDGSFIEVQSV

ISDEELQAEFPETSKPPSEQGEEELVGTREGEAPAESESLLRDNSERDDVDGEPQ

EAEKDAEDSLHEWQDINLEELETLESNLLAQQNSLKAQKQQQERIAATVTGQMEL

ESQELLRLFGIPYIQAPMEAEAQCAILDLTDQTSGTITDDSDIWLFGARHVYRNF

FNKNKFVEYYQYVDFHNQLGLDRNKLINLAYLLGSDYTEGIPTVGCVTAMEILNE

FPGHGLEPLLKESEWWHEAQKNPKIRPNPHDTKVKKKLRTLQLTPGFPNPAVAEA

YLKPVVDDSKGSFLWGKPDLDKIREFCQRYFGWNRTKTDESLEPVLKQLDAQQTQ

LRIDSFFRLAQQEKEDAKRIKSQRLNRAVTCMLRKEKEAAASEIEAVSVAMEKEF

ELLDKAKRKTQKRGITNTLEESSSLKRKRLSDSKRKNTCGGFLGETCLSESSDGS

SSEDAESSSLMNVQRRTAAKEPKTSASDSQNSVKEAPVKNGGATTSSSSDSDDDG

GKEKMVLVTARSVFGKKRRKLRRARGRKRKT

In various embodiments, the prime editor fusion proteins contemplated herein may include any flap endonuclease variant of the above-disclosed sequences having an amino acid sequence that is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any of the above sequences.

Other endonucleases that may be utilized by the instant methods to facilitate removal of the 5′ end single strand DNA flap include, but are not limited to (1) trex 2, (2) exo1 endonuclease (e.g., Keijzers et al., Biosci Rep. 2015, 35(3): e00206)

Trex 2

3′ three prime repair exonuclease 2 (TREX2) human

Accession No. NM_080701

(SEQ ID NO: 1361665)

MSEAPRAETFVFLDLEATGLPSVEPEIAELSLFAVHRSSLENPEHDESG

ALVLPRVLDKLTLCMCPERPFTAKASEITGLSSEGLARCRKAGFDGAVV

RTLQAFLSRQAGPICLVAHNGFDYDFPLLCAELRRLGARLPRDTVCLDT

LPALRGLDRAHSHGTRARGRQGYSLGSLFHRYFRAEPSAAHSAEGDVHT

LLLIFLHRAAELLAWADEQARGWAHIEPMYLPPDDPSLEA.

3′ three prime repair exonuclease 2 (TREX2)-mouse

Accession No. NM_011907

(SEQ ID NO: 1361666)

MSEPPRAETFVFLDLEATGLPNMDPEIAEISLFAVHRSSLENPERDDSG

SLVLPRVLDKLTLCMCPERPFTAKASEITGLSSESLMHCGKAGFNGAVV

RTLQGFLSRQEGPICLVAHNGFDYDFPLLCTELQRLGAHLPQDTVCLDT

LPALRGLDRAHSHGTRAQGRKSYSLASLFHRYFQAEPSAAHSAEGDVHT

LLLIFLHRAPELLAWADEQARSWAHIEPMYVPPDGPSLEA

3′ three prime repair exonuclease 2 (TREX2)-rat

Accession No. NM_001107580

(SEQ ID NO: 1361667)

MSEPLRAETFVFLDLEATGLPNMDPEIAEISLFAVHRSSLENPERDDSG

SLVLPRVLDKLTLCMCPERPFTAKASEITGLSSEGLMNCRKAAFNDAVV

RTLQGFLSRQEGPICLVAHNGFDYDFPLLCTELQRLGAHLPRDTVCLDT

LPALRGLDRVHSHGTRAQGRKSYSLASLFHRYFQAEPSAAHSAEGDVNT

LLLIFLHRAPELLAWADEQARSWAHIEPMYVPPDGPSLEA

ExoI

Human exonuclease 1 (EXO1) has been implicated in many different DNA metabolic processes, including DNA mismatch repair (MMR), micro-mediated end-joining, homologous recombination (HR), and replication. Human EXO1 belongs to a family of eukaryotic nucleases, Rad2/XPG, which also include FEN1 and GEN1. The Rad2/XPG family is conserved in the nuclease domain through species from phage to human. The EXO1 gene product exhibits both 5′ exonuclease and 5′ flap activity. Additionally, EXO1 contains an intrinsic 5′ RNase H activity. Human EXO1 has a high affinity for processing double stranded DNA (dsDNA), nicks, gaps, pseudo Y structures and can resolve Holliday junctions using its inherit flap activity. Human EXO1 is implicated in MMR and contain conserved binding domains interacting directly with MLH1 and MSH2. EXO1 nucleolytic activity is positively stimulated by PCNA, MutSα (MSH2/MSH6 complex), 14-3-3, MRN and 9-1-1 complex.

exonuclease 1 (EXO1) Accession No. NM_003686 (Homo sapiens exonuclease 1

(EXO1), transcript variant 3)-isoform A

(SEQ ID NO: 1361668)

MGIQGLLQFIKEASEPIHVRKYKGQVVAVDTYCWLHKGAIACAEKLAKGEPTDR

YVGFCMKFVNMLLSHGIKPILVFDGCTLPSKKEVERSRRERRQANLLKGKQLLREGKVS

EARECFTRSINITHAMAHKVIKAARSQGVDCLVAPYEADAQLAYLNKAGIVQAIITEDSD

LLAFGCKKVILKMDQFGNGLEIDQARLGMCRQLGDVFTEEKFRYMCILSGCDYLSSLRG

IGLAKACKVLRLANNPDIVKVIKKIGHYLKMNITVPEDYINGFIRANNTFLYQLVFDPIKR

KLIPLNAYEDDVDPETLSYAGQYVDDSIALQIALGNKDINTFEQIDDYNPDTAMPAHSRS

HSWDDKTCQKSANVSSIWHRNYSPRPESGTVSDAPQLKENPSTVGVERVISTKGLNLPR

KSSIVKRPRSAELSEDDLLSQYSLSFTKKTKKNSSEGNKSLSFSEVFVPDLVNGPTNKKSV

STPPRTRNKFATFLQRKNEESGAVVVPGTRSRFFCSSDSTDCVSNKVSIQPLDETAVTDK

ENNLHESEYGDQEGKRLVDTDVARNSSDDIPNNHIPGDHIPDKATVFTDEESYSFESSKF

TRTISPPTLGTLRSCFSWSGGLGDFSRTPSPSPSTALQQFRRKSDSPTSLPENNMSDVSQL

KSEESSDDESHPLREEACSSQSQESGEFSLQSSNASKLSQCSSKDSDSEESDCNIKLLDSQS

DQTSKLRLSHFSKKDTPLRNKVPGLYKSSSADSLSTTKIKPLGPARASGLSKKPASIQKR

KHHNAENKPGLQIKLNELWKNFGFKKF

exonuclease 1 (EXO1) Accession No. NM_006027 (Homo sapiens exonuclease 1

(EXO1), transcript variant 3)-isoform B

(SEQ ID NO: 1361669)

MGIQGLLQFIKEASEPIHVRKYKGQVVAVDTYCWLHKGAIACAEKLAKGEPTDR

YVGFCMKFVNMLLSHGIKPILVFDGCTLPSKKEVERSRRERRQANLLKGKQLLREGKVS

EARECFTRSINITHAMAHKVIKAARSQGVDCLVAPYEADAQLAYLNKAGIVQAIITEDSD

LLAFGCKKVILKMDQFGNGLEIDQARLGMCRQLGDVFTEEKFRYMCILSGCDYLSSLRG

IGLAKACKVLRLANNPDIVKVIKKIGHYLKMNITVPEDYINGFIRANNTFLYQLVFDPIKR

KLIPLNAYEDDVDPETLSYAGQYVDDSIALQIALGNKDINTFEQIDDYNPDTAMPAHSRS

HSWDDKTCQKSANVSSIWHRNYSPRPESGTVSDAPQLKENPSTVGVERVISTKGLNLPR

KSSIVKRPRSAELSEDDLLSQYSLSFTKKTKKNSSEGNKSLSFSEVFVPDLVNGPTNKKSV

STPPRTRNKFATFLQRKNEESGAVVVPGTRSRFFCSSDSTDCVSNKVSIQPLDETAVTDK

ENNLHESEYGDQEGKRLVDTDVARNSSDDIPNNHIPGDHIPDKATVFTDEESYSFESSKF

TRTISPPTLGTLRSCFSWSGGLGDFSRTPSPSPSTALQQFRRKSDSPTSLPENNMSDVSQL

KSEESSDDESHPLREEACSSQSQESGEFSLQSSNASKLSQCSSKDSDSEESDCNIKLLDSQS

DQTSKLRLSHFSKKDTPLRNKVPGLYKSSSADSLSTTKIKPLGPARASGLSKKPASIQKR

KHHNAENKPGLQIKLNELWKNFGFKKDSEKLPPCKKPLSPVRDNIQLTPEAEEDIFNKPE

CGRVQRAIFQ

exonuclease 1 (EXO1) Accession No. NM_001319224 (Homo sapiens exonuclease

1 (EXO1), transcript variant 4)-isoform C

(SEQ ID NO: 1361670)

MGIQGLLQFIKEASEPIHVRKYKGQVVAVDTYCWLHKGAIACAEKLAKGEPTDR

YVGFCMKFVNMLLSHGIKPILVFDGCTLPSKKEVERSRRERRQANLLKGKQLLREGKVS

EARECFTRSINITHAMAHKVIKAARSQGVDCLVAPYEADAQLAYLNKAGIVQAIITEDSD

LLAFGCKKVILKMDQFGNGLEIDQARLGMCRQLGDVFTEEKFRYMCILSGCDYLSSLRG

IGLAKACKVLRLANNPDIVKVIKKIGHYLKMNITVPEDYINGFIRANNTFLYQLVFDPIKR

KLIPLNAYEDDVDPETLSYAGQYVDDSIALQIALGNKDINTFEQIDDYNPDTAMPAHSRS

HSWDDKTCQKSANVSSIWHRNYSPRPESGTVSDAPQLKENPSTVGVERVISTKGLNLPR

KSSIVKRPRSELSEDDLLSQYSLSFTKKTKKNSSEGNKSLSFSEVFVPDLVNGPTNKKSVS

TPPRTRNKFATFLQRKNEESGAVVVPGTRSRFFCSSDSTDCVSNKVSIQPLDETAVTDKE

NNLHESEYGDQEGKRLVDTDVARNSSDDIPNNHIPGDHIPDKATVFTDEESYSFESSKFT

RTISPPTLGTLRSCFSWSGGLGDFSRTPSPSPSTALQQFRRKSDSPTSLPENNMSDVSQLK

SEESSDDESHPLREEACSSQSQESGEFSLQSSNASKLSQCSSKDSDSEESDCNIKLLDSQSD

QTSKLRLSHFSKKDTPLRNKVPGLYKSSSADSLSTTKIKPLGPARASGLSKKPASIQKRK

HHNAENKPGLQIKLNELWKNFGFKKDSEKLPPCKKPLSPVRDNIQLTPEAEEDIFNKPEC

GRVQRAIFQ

(iv) Inteins and Split-Inteins

It will be understood that in some embodiments (e.g., delivery of a prime editor in vivo using AAV particles), it may be advantageous to split a polypeptide (e.g., a deaminase or a napDNAbp) or a fusion protein (e.g., a prime editor) into an N-terminal half and a C-terminal half, delivery them separately, and then allow their colocalization to reform the complete protein (or fusion protein as the case may be) within the cell. Separate halves of a protein or a fusion protein may each comprise a split-intein tag to facilitate the reformation of the complete protein or fusion protein by the mechanism of protein trans splicing.

Protein trans-splicing, catalyzed by split inteins, provides an entirely enzymatic method for protein ligation. A split-intein is essentially a contiguous intein (e.g. a mini-intein) split into two pieces named N-intein and C-intein, respectively. The N-intein and C-intein of a split intein can associate non-covalently to form an active intein and catalyze the splicing reaction essentially in same way as a contiguous intein does. Split inteins have been found in nature and also engineered in laboratories. As used herein, the term “split intein” refers to any intein in which one or more peptide bond breaks exists between the N-terminal and C-terminal amino acid sequences such that the N-terminal and C-terminal sequences become separate molecules that can non-covalently reassociate, or reconstitute, into an intein that is functional for trans-splicing reactions. Any catalytically active intein, or fragment thereof, may be used to derive a split intein for use in the methods of the invention. For example, in one aspect the split intein may be derived from a eukaryotic intein. In another aspect, the split intein may be derived from a bacterial intein. In another aspect, the split intein may be derived from an archaeal intein. Preferably, the split intein so-derived will possess only the amino acid sequences essential for catalyzing trans-splicing reactions.

Split inteins may be created from contiguous inteins by engineering one or more split sites in the unstructured loop or intervening amino acid sequence between the −12 conserved beta-strands found in the structure of mini-inteins. Some flexibility in the position of the split site within regions between the beta-strands may exist, provided that creation of the split will not disrupt the structure of the intein; the structured beta-strands in particular, to a sufficient degree that protein splicing activity is lost.

Exemplary sequences are as follows:

- NAMESEQUENCE OF LIGAND-DEPENDENT INTEIN

2-4 INTEIN:

(SEQ ID NO: 1361671)

CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAAAKDGTLLARPVVSWFD

QGTRDVIGLRIAGGAIVWATPDHKVLTEYGWRAAGELRKGDRVAGPGGSGNSLALSLT

ADQMVSALLDAEPPILYSEYDPTSPFSEASMMGLLTNLADRELVHMINWAKRVPGFVD

LTLHDQAHLLECAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDM

LLATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRALDKITDTLIHL

MAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKYKNVVPLYDLLLEMLDA

HRLHAGGSGASRVQAFADALDDKFLHDMLAEELRYSVIREVLPTRRARTFDLEVEELHT

LVAEGVVVHNC

3-2 INTEIN

(SEQ ID NO: 1361672)

CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAVAKDGTLLARPVVSWFDQGTR

DVIGLRIAGGAIVWATPDHKVLTEYGWRAAGELRKGDRVAGPGGSGNSLALSLTADQ

MVSALLDAEPPILYSEYDPTSPFSEASMMGLLTNLADRELVHMINWAKRVPGFVDLTLH

DQAHLLECAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLAT

SSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRALDKITDTLIHLMAK

AGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKYTNVVPLYDLLLEMLDAHRLH

AGGSGASRVQAFADALDDKFLHDMLAEELRYSVIREVLPTRRARTFDLEVEELHTLVAE

GVVVHNC

30R3-1 INTEIN

(SEQ ID NO: 1361673)

CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAAAKDGTLLARPVVSWFDQGTR

DVIGLRIAGGATVWATPDHKVLTEYGWRAAGELRKGDRVAGPGGSGNSLALSLTADQ

MVSALLDAEPPIPYSEYDPTSPFSEASMMGLLTNLADRELVHMINWAKRVPGFVDLTLH

DQAHLLECAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLAT

SSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRALDKITDTLIHLMAK

AGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKYKNVVPLYDLLLEMLDAHRL

HAGGSGASRVQAFADALDDKFLHDMLAEGLRYSVIREVLPTRRARTFDLEVEELHTLV

AEGVVVHNC

30R3-2 INTEIN

(SEQ ID NO: 1361674)

CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAAAKDGTLLARPVVSWFDQGTR

DVIGLRIAGGATVWATPDHKVLTEYGWRAAGELRKGDRVAGPGGSGNSLALSLTADQ

MVSALLDAEPPILYSEYDPTSPFSEASMMGLLTNLADRELVHMINWAKRVPGFVDLTLH

DQAHLLECAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLAT

SSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRALDKITDTLIHLMAK

AGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKYKNVVPLYDLLLEMLDAHRL

HAGGSGASRVQAFADALDDKFLHDMLAEELRYSVIREVLPTRRARTFDLEVEELHTLVA

EGVVVHNC

30R3-3 INTEIN

(SEQ ID NO: 1361675)

CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAAAKDGTLLARPVVSWFDQGTR

DVIGLRIAGGATVWATPDHKVLTEYGWRAAGELRKGDRVAGPGGSGNSLALSLTADQ

MVSALLDAEPPIPYSEYDPTSPFSEASMMGLLTNLADRELVHMINWAKRVPGFVDLTLH

DQAHLLECAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLAT

SSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRALDKITDTLIHLMAK

AGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKYKNVVPLYDLLLEMLDAHRL

HAGGSGASRVQAFADALDDKFLHDMLAEELRYSVIREVLPTRRARTFDLEVEELHTLVA

EGVVVHNC

37R3-1 INTEIN

(SEQ ID NO: 1361676)

CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAAAKDGTLLARPVVSWFDQGTR

DVIGLRIAGGATVWATPDHKVLTEYGWRAAGELRKGDRVAGPGGSGNSLALSLTADQ

MVSALLDAEPPILYSEYNPTSPFSEASMMGLLTNLADRELVHMINWAKRVPGFVDLTLH

DQAHLLERAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLAT

SSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRALDKITDTLIHLMAK

AGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKYKNVVPLYDLLLEMLDAHRL

HAGGSGASRVQAFADALDDKFLHDMLAEGLRYSVIREVLPTRRARTFDLEVEELHTLV

AEGVVVHNC

37R3-2 INTEIN

(SEQ ID NO: 1361677)

CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAAAKDGTLLARPVVSWFDQGTR

DVIGLRIAGGAIVWATPDHKVLTEYGWRAAGELRKGDRVAGPGGSGNSLALSLTADQ

MVSALLDAEPPILYSEYDPTSPFSEASMMGLLTNLADRELVHMINWAKRVPGFVDLTLH

DQAHLLERAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLAT

SSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRALDKITDTLIHLMAK

AGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKYKNVVPLYDLLLEMLDAHRL

HAGGSGASRVQAFADALDDKFLHDMLAEGLRYSVIREVLPTRRARTFDLEVEELHTLV

AEGVVVHNC

37R3-3 INTEIN

(SEQ ID NO: 13616718)

CLAEGTRIFDPVTGTTHRIEDVVDGRKPIHVVAVAKDGTLLARPVVSWFDQGTR

DVIGLRIAGGATVWATPDHKVLTEYGWRAAGELRKGDRVAGPGGSGNSLALSLTADQ

MVSALLDAEPPILYSEYDPTSPFSEASMMGLLTNLADRELVHMINWAKRVPGFVDLTLH

DQAHLLERAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLAT

SSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRALDKITDTLIHLMAK

AGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKYKNVVPLYDLLLEMLDAHRL

HAGGSGASRVQAFADALDDKFLHDMLAEELRYSVIREVLPTRRARTFDLEVEELHTLVA

EGVVVHNC

Although inteins are most frequently found as a contiguous domain, some exist in a naturally split form. In this case, the two fragments are expressed as separate polypeptides and must associate before splicing takes place, so-called protein trans-splicing.

An exemplary split intein is the Ssp DnaE intein, which comprises two subunits, namely, DnaE-N and DnaE-C. The two different subunits are encoded by separate genes, namely dnaE-n and dnaE-c, which encode the DnaE-N and DnaE-C subunits, respectively. DnaE is a naturally occurring split intein in Synechocytis sp. PCC6803 and is capable of directing trans-splicing of two separate proteins, each comprising a fusion with either DnaE-N or DnaE-C.

In addition, protein splicing in trans has been described in vivo and in vitro (Shingledecker, et al., Gene 207:187 (1998), Southworth, et al., EMBO J. 17:918 (1998); Mills, et al., Proc. Natl. Acad. Sci. USA, 95:3543-3548 (1998); Lew, et al., J. Biol. Chem., 273:15887-15890 (1998); Wu, et al., Biochim. Biophys. Acta 35732:1 (1998b), Yamazaki, et al., J. Am. Chem. Soc. 120:5591 (1998), Evans, et al., J. Biol. Chem. 275:9091 (2000); Otomo, et al., Biochemistry 38:16040-16044 (1999); Otomo, et al., J. Biolmol. NMR 14:105-114 (1999); Scott, et al., Proc. Natl. Acad. Sci. USA 96:13638-13643 (1999)) and provides the opportunity to express a protein as to two inactive fragments that subsequently undergo ligation to form a functional product, e.g., as shown in FIGS. 66 and 67 with regard to the formation of a complete PE fusion protein from two separately-expressed halves.

(v) RNA-Protein Recruitment System

In various embodiments, two separate protein domains (e.g., a Cas9 domain and a polymerase domain) may be colocalized to one another to form a functional complex (akin to the function of a fusion protein comprising the two separate protein domains) by using an “RNA-protein recruitment system,” such as the “MS2 tagging technique.” Such systems generally tag one protein domain with an “RNA-protein interaction domain” (aka “RNA-protein recruitment domain”) and the other with an “RNA-binding protein” that specifically recognizes and binds to the RNA-protein interaction domain, e.g., a specific hairpin structure. These types of systems can be leveraged to colocalize the domains of a prime editor, as well as to recruitment additional functionalities to a prime editor, such as a UGI domain. In one example, the MS2 tagging technique is based on the natural interaction of the MS2 bacteriophage coat protein (“MCP” or “MS2cp”) with a stem-loop or hairpin structure present in the genome of the phage, i.e., the “MS2 hairpin.” In the case of the MS2 hairpin, it is recognized and bound by the MS2 bacteriophage coat protein (MCP). Thus, in one exemplary scenario a deaminase-MS2 fusion can recruit a Cas9-MCP fusion.

A review of other modular RNA-protein interaction domains are described in the art, for example, in Johansson et al., “RNA recognition by the MS2 phage coat protein,” Sem Virol., 1997, Vol. 8(3): 176-185; Delebecque et al., “Organization of intracellular reactions with rationally designed RNA assemblies,” Science, 2011, Vol. 333: 470-474; Mali et al., “Cas9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering,” Nat. Biotechnol., 2013, Vol. 31: 833-838; and Zalatan et al., “Engineering complex synthetic transcriptional programs with CRISPR RNA scaffolds,” Cell, 2015, Vol. 160: 339-350, each of which are incorporated herein by reference in their entireties. Other systems include the PP7 hairpin, which specifically recruits the PCP protein, and the “com” hairpin, which specifically recruits the Corn protein. See Zalatan et al.

The nucleotide sequence of the MS2 hairpin (or equivalently referred to as the “MS2 aptamer”) is: GCCAACATGAGGATCACCCATGTCTGCAGGGCC (SEQ ID NO: 1361679).

The amino acid sequence of the MCP or MS2cp is:

(SEQ ID NO: 1361680)

GSASNFTQFVLVDNGGTGDVTVAPSNFANGVAEWISSNSRSQAYKVTCS

VRQSSAQNRKYTIKVEVPKVATQTVGGEELPVAGWRSYLNMELTIPIFA

TNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY.

(vi) UGI Domain

In other embodiments, the prime editors described herein may comprise one or more uracil glycosylase inhibitor domains. The term “uracil glycosylase inhibitor (UGI)” or “UGI domain,” as used herein, refers to a protein that is capable of inhibiting a uracil-DNA glycosylase base-excision repair enzyme. In some embodiments, a UGI domain comprises a wild-type UGI or a UGI as set forth in SEQ ID NO: 1361681. In some embodiments, the UGI proteins provided herein include fragments of UGI and proteins homologous to a UGI or a UGI fragment. For example, in some embodiments, a UGI domain comprises a fragment of the amino acid sequence set forth in SEQ ID NO: 1361681. In some embodiments, a UGI fragment comprises an amino acid sequence that comprises at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid sequence as set forth in SEQ ID NO: 1361681. In some embodiments, a UGI comprises an amino acid sequence homologous to the amino acid sequence set forth in SEQ ID NO: 1361681, or an amino acid sequence homologous to a fragment of the amino acid sequence set forth in SEQ ID NO: 1361681. In some embodiments, proteins comprising UGI or fragments of UGI or homologs of UGI or UGI fragments are referred to as “UGI variants.” A UGI variant shares homology to UGI, or a fragment thereof. For example a UGI variant is at least 70% identical, at least 75% identical, at least 80% identical, at least 85% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% identical to a wild type UGI or a UGI as set forth in SEQ ID NO: 1361681. In some embodiments, the UGI variant comprises a fragment of UGI, such that the fragment is at least 70% identical, at least 80% identical, at least 90% identical, at least 95% identical, at least 96% identical, at least 97% identical, at least 98% identical, at least 99% identical, at least 99.5% identical, or at least 99.9% to the corresponding fragment of wild-type UGI or a UGI as set forth in SEQ ID NO: 1361681. In some embodiments, the UGI comprises the following amino acid sequence:

Uracil-DNA glycosylase inhibitor:

>sp|P14739|UNGI_BPPB2

(SEQ ID NO: 1361681)

MTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDE

STDENVMLLTSDAPEYKPWALVIQDSNGENKIKML.

The prime editors described herein may comprise more than one UGI domain, which may be separated by one or more linkers as described herein.

(vii) Additional PE Elements

In certain embodiments, the prime editors described herein may comprise an inhibitor of base repair. The term “inhibitor of base repair” or “IBR” refers to a protein that is capable in inhibiting the activity of a nucleic acid repair enzyme, for example a base excision repair enzyme. In some embodiments, the IBR is an inhibitor of OGG base excision repair. In some embodiments, the IBR is an inhibitor of base excision repair (“iBER”). Exemplary inhibitors of base excision repair include inhibitors of APE1, Endo III, Endo IV, Endo V, Endo VIII, Fpg, hOGG1, hNEIL1, T7 EndoI, T4PDG, UDG, bSMUG1, and hAAG. In some embodiments, the IBR is an inhibitor of Endo V or hAAG. In some embodiments, the IBR is an iBER that may be a catalytically inactive glycosylase or catalytically inactive dioxygenase or a small molecule or peptide inhibitor of an oxidase, or variants thereof. In some embodiments, the IBR is an iBER that may be a TDG inhibitor, MBD4 inhibitor or an inhibitor of an AlkBH enzyme. In some embodiments, the IBR is an iBER that comprises a catalytically inactive TDG or catalytically inactive MBD4. An exemplary catalytically inactive TDG is an N140A mutant of SEQ ID NO: 3872 (human TDG).

Some exemplary glycosylases are provided below. The catalytically inactivated variants of any of these glycosylase domains are iBERs that may be fused to the napDNAbp or polymerase domain of the prime editors provided in this disclosure.

OGG (human)

(SEQ ID NO: 1361682)

MPARALLPRRMGHRTLASTPALWASIPCPRSELRLDLVLPSGQSFRWRE

QSPAHWSGVLADQVWTLTQTEEQLHCTVYRGDKSQASRPTPDELEAVRK

YFQLDVTLAQLYHHWGSVDSHFQEVAQKFQGVRLLRQDPIECLFSFICS

SNNNIARITGMVERLCQAFGPRLIQLDDVTYHGFPSLQALAGPEVEAHL

RKLGLGYRARYVSASARAILEEQGGLAWLQQLRESSYEEAHKALCILPG

VGTKVADCICLMALDKPQAVPVDVHMWHIAQRDYSWHPTTSQAKGPSPQ

TNKELGNFFRSLWGPYAGWAQAVLFSADLRQSRHAQEPPAKRRKGSKGP

EG

MPG (human)

(SEQ ID NO: 1361683)

MVTPALQMKKPKQFCRRMGQKKQRPARAGQPHSSSDAAQAPAEQPHSSS

DAAQAPCPRERCLGPPTTPGPYRSIYFSSPKGHLTRLGLEFFDQPAVPL

ARAFLGQVLVRRLPNGTELRGRIVETEAYLGPEDEAAHSRGGRQTPRNR

GMFMKPGTLYVYIIYGMYFCMNISSQGDGACVLLRALEPLEGLETMRQL

RSTLRKGTASRVLKDRELCSGPSKLCQALAINKSFDQRDLAQDEAVWLE

RGPLEPSEPAVVAAARVGVGHAGEWARKPLRFYVRGSPWVSVVDRVAEQ

DTQA

MBD4 (human)

(SEQ ID NO: 1361684)

MGTTGLESLSLGDRGAAPTVTSSERLVPDPPNDLRKEDVAMELERVGED

EEQMMIKRSSECNPLLQEPIASAQFGATAGTECRKSVPCGWERVVKQRL

FGKTAGRFDVYFISPQGLKFRSKSSLANYLHKNGETSLKPEDFDFTVLS

KRGIKSRYKDCSMAALTSHLQNQSNNSNWNLRTRSKCKKDVFMPPSSSS

ELQESRGLSNFTSTHLLLKEDEGVDDVNFRKVRKPKGKVTILKGIPIKK

TKKGCRKSCSGFVQSDSKRESVCNKADAESEPVAQKSQLDRTVCISDAG

ACGETLSVTSEENSLVKKKERSLSSGSNFCSEQKTSGIINKFCSAKDSE

HNEKYEDTFLESEEIGTKVEVVERKEHLHTDILKRGSEMDNNCSPTRKD

FTGEKIFQEDTIPRTQIERRKTSLYFSSKYNKEALSPPRRKAFKKWTPP

RSPFNLVQETLFHDPWKLLIATIFLNRTSGKMAIPVLWKFLEKYPSAEV

ARTADWRDVSELLKPLGLYDLRAKTIVKFSDEYLTKQWKYPIELHGIGK

YGNDSYRIFCVNEWKQVHPEDHKLNKYHDWLWENHEKLSLS

TDG (human)

(SEQ ID NO: 1361685)

MEAENAGSYSLQQAQAFYTFPFQQLMAEAPNMAVVNEQQMPEEVPAPAP

AQEPVQEAPKGRKRKPRTTEPKQPVEPKKPVESKKSGKSAKSKEKQEKI

TDTFKVKRKVDRFNGVSEAELLTKTLPDILTFNLDIVIIGINPGLMAAY

KGHHYPGPGNHFWKCLFMSGLSEVQLNHMDDHTLPGKYGIGFTNMVERT

TPGSKDLSSKEFREGGRILVQKLQKYQPRIAVFNGKCIYEIFSKEVFGV

KVKNLEFGLQPHKIPDTETLCYVMPSSSARCAQFPRAQDKVHYYIKLKD

LRDQLKGIERNMDVQEVQYTFDLQLAQEDAKKMAVKEEKYDPGYEAAYG

GAYGENPCSSEPCGFSSNGLIESVELRGESAFSGIPNGQWMTQSFTDQI

PSFSNHCGTQEQEEESHA

In some embodiments, the fusion proteins described herein may comprise one or more heterologous protein domains (e.g., about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains in addition to the prime editor components). A fusion protein may comprise any additional protein sequence, and optionally a linker sequence between any two domains. Other exemplary features that may be present are localization sequences, such as cytoplasmic localization sequences, export sequences, such as nuclear export sequences, or other localization sequences, as well as sequence tags that are useful for solubilization, purification, or detection of the fusion proteins.

Examples of protein domains that may be fused to a prime editor or component thereof (e.g., the napDNAbp domain, the polymerase domain, or the NLS domain) include, without limitation, epitope tags, and reporter gene sequences. Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags. Examples of reporter genes include, but are not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT), beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP). A prime editor may be fused to a gene sequence encoding a protein or a fragment of a protein that bind DNA molecules or bind other cellular molecules, including, but not limited to, maltose binding protein (MBP), S-tag, Lex A DNA binding domain (DBD) fusions, GAL4 DNA binding domain fusions, and herpes simplex virus (HSV) BP16 protein fusions. Additional domains that may form part of a prime editor are described in US Patent Publication No. 2011/0059502, published Mar. 10, 2011 and incorporated herein by reference in its entirety.

In an aspect of the disclosure, a reporter gene which includes, but is not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT) beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP), may be introduced into a cell to encode a gene product which serves as a marker by which to measure the alteration or modification of expression of the gene product. In certain embodiments of the disclosure the gene product is luciferase. In a further embodiment of the disclosure the expression of the gene product is decreased.

Suitable protein tags provided herein include, but are not limited to, biotin carboxylase carrier protein (BCCP) tags, myc-tags, calmodulin-tags, FLAG-tags, hemagglutinin (HA)-tags, polyhistidine tags, also referred to as histidine tags or His-tags, maltose binding protein (MBP)-tags, nus-tags, glutathione-S-transferase (GST)-tags, green fluorescent protein (GFP)-tags, thioredoxin-tags, S-tags, Softags (e.g., Softag 1, Softag 3), strep-tags, biotin ligase tags, FlAsH tags, V5 tags, and SBP-tags. Additional suitable sequences will be apparent to those of skill in the art. In some embodiments, the fusion protein comprises one or more His tags.

In some embodiments of the present disclosure, the activity of the prime editing system may be temporally regulated by adjusting the residence time, the amount, and/or the activity of the expressed components of the PE system. For example, as described herein, the PE may be fused with a protein domain that is capable of modifying the intracellular half-life of the PE. In certain embodiments involving two or more vectors (e.g., a vector system in which the components described herein are encoded on two or more separate vectors), the activity of the PE system may be temporally regulated by controlling the timing in which the vectors are delivered. For example, in some embodiments a vector encoding the nuclease system may deliver the PE prior to the vector encoding the template. In other embodiments, the vector encoding the PEgRNA may deliver the guide prior to the vector encoding the PE system. In some embodiments, the vectors encoding the PE system and PEgRNA are delivered simultaneously. In certain embodiments, the simultaneously delivered vectors temporally deliver, e.g., the PE, PEgRNA, and/or second strand guide RNA components. In further embodiments, the RNA (such as, e.g., the nuclease transcript) transcribed from the coding sequence on the vectors may further comprise at least one element that is capable of modifying the intracellular half-life of the RNA and/or modulating translational control. In some embodiments, the half-life of the RNA may be increased. In some embodiments, the half-life of the RNA may be decreased. In some embodiments, the element may be capable of increasing the stability of the RNA. In some embodiments, the element may be capable of decreasing the stability of the RNA. In some embodiments, the element may be within the 3 UTR of the RNA. In some embodiments, the element may include a polyadenylation signal (PA). In some embodiments, the element may include a cap, e.g., an upstream mRNA or PEgRNA end. In some embodiments, the RNA may comprise no PA such that it is subject to quicker degradation in the cell after transcription. In some embodiments, the element may include at least one AU-rich element (ARE). The AREs may be bound by ARE binding proteins (ARE-BPs) in a manner that is dependent upon tissue type, cell type, timing, cellular localization, and environment. In some embodiments the destabilizing element may promote RNA decay, affect RNA stability, or activate translation. In some embodiments, the ARE may comprise 50 to 150 nucleotides in length. In some embodiments, the ARE may comprise at least one copy of the sequence AUUUA. In some embodiments, at least one ARE may be added to the 3′ UTR of the RNA. In some embodiments, the element may be a Woodchuck Hepatitis Virus (WHP).

Posttranscriptional Regulatory Element (WPRE), which creates a tertiary structure to enhance expression from the transcript. In further embodiments, the element is a modified and/or truncated WPRE sequence that is capable of enhancing expression from the transcript, as described, for example in Zufferey et al., J Virol, 73(4): 2886-92 (1999) and Flajolet et al., J Virol, 72(7): 6175-80 (1998). In some embodiments, the WPRE or equivalent may be added to the 3′ UTR of the RNA. In some embodiments, the element may be selected from other RNA sequence motifs that are enriched in either fast- or slow-decaying transcripts.

In some embodiments, the vector encoding the PE or the PEgRNA may be self-destroyed via cleavage of a target sequence present on the vector by the PE system. The cleavage may prevent continued transcription of a PE or a PEgRNA from the vector. Although transcription may occur on the linearized vector for some amount of time, the expressed transcripts or proteins subject to intracellular degradation will have less time to produce off-target effects without continued supply from expression of the encoding vectors.

F. PE Complexes

In various embodiments, the prime editor fusion proteins disclosed herein may be complexed with a PEgRNA disclosed herein. Thus, further provided herein are complexes comprising (i) any of the PE fusion proteins provided herein, and (ii) a PEgRNA (e.g., a therapeutic PEgRNA of the Sequence Listing) bound to napDNAbp (e.g., Cas9 domain) of the PE fusion protein. Without wishing to be bound by any particular theory, the PE fusion proteins can be directed to a desired target site in a programmable manner by designing a suitable PEgRNA to specifically and efficiently target the PE fusion protein to a desired genomic target site to install the desired mutation or otherwise genetic modification. However, in some cases the suitability of a target site for prime editing can be dependent on the presence of a suitably positioned PAM. The broaden PAM compatibility of the Cas9 domains provided herein has the potential to expand the targeting scope of base editors to those target sites that do not lie within approximately 15 nucleotides of a canonical 5′-NGG-3′ PAM sequence. A person of ordinary skill in the art will be able to design a suitable PEgRNA sequence to target a desired genomic sequence based on this disclosure and knowledge in the field.

In some embodiments, PE1 is complexed with a PEgRNA and has the following composition: [NLS]-[Cas9(H840A)]-[linker]-[MMLV_RT(wt)]. The structure is shown as:

Description
Sequence

PE1 complex

MKRTADGSEFESPKKKRKV
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKK

Cas9 (H840A)-

NLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVE

MMLV_RT (wt) +

EDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIE

a PEgRNA

GDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKK

NGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLS

DAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYA

GYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAIL

RRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA

SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIV

DLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEEN

EDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQS

GKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGIL

QTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPV

ENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNR

GKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQI

TKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLN

AVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA

NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNS

DKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPI

DFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHY

EKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQ

AENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD

SGGSSGGSSGSETPGTSESATPESSGGSSGGSS

TLNIEDEYRLHETSKEPDVSLGSTWLSDFP

QAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSP

WNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFF

CLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFDEALHRDLADFRIQHPDLIL

LQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTE

ARKETVMGQPTPKTPRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQ

EIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPP

CLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQ

FGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQR

KAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEI

YRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAA

ITETPDTSTLLIENSSP
SGGSKRTADGSEFEPKKKRKV (SEQ ID NO: 1361582)

+ a PEgRNA.

Key:

Nuclear localization sequence (NLS) Top (SEQ ID NO: 1361532); Bottom (SEQ ID NO: 1361541)

Cas9(H840A) (SEQ ID NO: 1361454)

33-amino acid linker
(SEQ ID NO: 1361528)

M-MLV reverse transcriptase (SEQ ID NO: 1361497)

In some embodiments, PE2 is complexed with a PEgRNA. This complex may be referred to as “PE3” and has the following structure: [NLS]-[Cas9(H840A)]-[linker]-[MMLV_RT(D200N)(T330P)(L603W)(T306K)(W313F)]+PEgRNA. PE3 has the following structure:

PE3

MKRTADGSEFESPKKKRKV
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKK

PE2 +

NLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVE

PEgRNA

EDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIE

PE2 is

GDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKK

Cas9 (H840A)-

NGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLS

MMLV_RT

DAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYA

D200N

GYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAIL

T330P

RRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA

L603W

SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIV

T306K

DLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEEN

W313F

EDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQS

GKTILDFLKSDGFANRNEMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGIL

QTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPV

ENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNR

GKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQI

TKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLN

AVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA

NGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNS

DKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPI

DFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHY

EKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQ

AENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD

SGGSSGGSSGSETPGTSESATPESSGGSSGGSS

TLNIEDEYRLHETSKEPDVSLGSTWLSDFP

QAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSP

WNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFF

CLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLIL

LQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTE

ARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQ

EIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPP

CLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQ

FGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQR

KAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEI

YRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAA

ITETPDTSTLLIENSSP
SGGSKRTADGSEFEPKKKRKV (SEQ ID NO: 1361583)

+ PEgRNA.

key:

Nuclear localization sequence (NLS) Top (SEQ ID NO: 1361532); Bottom (SEQ ID NO: 1361541)

Cas9(H840A) (SEQ ID NO: 1361454)

33-amino acid linker
(SEQ ID NO: 1361528)

M-MLV reverse transcriptase (SEQ ID NO: 1361514)

In some embodiments, the PEgRNA is about 15-100 nucleotides long an comprises a sequence of at least 10 contiguous nucleotides that is complementary to a target sequence. In some embodiments, the guide RNA is 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides long. In some embodiments, the guide RNA comprises a sequence of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 contiguous nucleotides that is complementary to a target sequence. In some embodiments, the target sequence is a DNA sequence. In some embodiments, the target sequence is in the genome of an organism. In some embodiments, the organism is a prokaryote. In some embodiments, the prokaryote is a bacterium. In some embodiments, the bacterium is E. coli. In some embodiments, the organism is a eukaryote. In some embodiments, the organism is a plant or fungus. In some embodiments, the organism is a vertebrate. In some embodiments, the vertebrate is a mammal. In some embodiments, the mammal is a human. In some embodiments, the organism is a cell. In some embodiments, the cell is a human cell. In some embodiments, the cell is a HEK293T or U2OS cell.

In some embodiments, the target sequence comprises a sequence associated with a disease or disorder. In some embodiments, the target sequence comprises a point mutation associated with a disease or disorder. In some embodiments, the target sequence comprises a T→C point mutation. In some embodiments, the complex deaminates the target C point mutation, wherein the deamination results in a sequence that is not associated with a disease or disorder. In some embodiments, the target C point mutation is present in the DNA strand that is not complementary to the guide RNA. In some embodiments, the target sequence comprises a T→A point mutation. In some embodiments, the complex deaminates the target A point mutation, and wherein the deamination results in a sequence that is not associated with a disease or disorder. In some embodiments, the target A point mutation is present in the DNA strand that is not complementary to the guide RNA.

In some embodiments, the PE complex further comprises a guide RNA for carrying out second strand nicking, which refers to the introduction of a second nick at a location downstream of the first nick on the unedited strand (i.e., the initial nick site that provides the free 3′ end for use in priming of the reverse transcriptase on the extended portion of the guide RNA). In some embodiments, the first nick and the second nick are on opposite strands. In other embodiments, the first nick and the second nick are on opposite strands. In yet another embodiment, the first nick is on the non-target strand (i.e., the strand that forms the single strand portion of the R-loop), and the second nick is on the target strand. The second nick is positioned at least 5 nucleotides downstream of the first nick, or at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 or more nucleotides downstream of the first nick. Without being bound by theory, the second nick induces the cell's endogenous DNA repair and replication processes towards replacement of the unedited strand, rather than the replacement of the desired edited strand. In some embodiments, the edited strand is the non-target strand and the unedited strand is the target strand. In other embodiments, the edited strand is the target strand, and the unedited strand is the non-target strand.

The second strand nick can be installed using a second guide RNA, which complexes with the PE fusion protein at a second but nearby protospacer sequence and installs a nick.

In some embodiments, the insertion of the second strand nick can occur after the installation of the desired edit. This concept refers to “temporal second-strand nicking.” This avoids concurrent nicks on both strands that could lead to double-stranded DNA breaks. The temporal second strand nicking could be introduced in a variety of ways, including introducing the second guide RNA after the desired edit has been made.

In one embodiment, the present disclosure provides the “PE3b” complex, which refers to the PE3 complex plus the guide RNA for second strand nicking. This complex has the following structure: [NLS]-[Cas9(H840A)]-[linker]-[MMLV_RT(D200N)(T330P)(L603W)(T306K)(W313F)]+PEgRNA+second strand nicking guide RNA. PE3b has the following structure:

PE3b

MKRTADGSEFESPKKKRKV
DKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKK

PE2 +

NLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVE

PEgRNA +

EDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIE

Second

GDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKK

strand

NGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLS

nicking

DAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYA

guide RNA

GYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAIL

PE2 is

RRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA

Cas9 (H840A)-

SAQSFIERMINFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIV

MMLV_RT

DLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEEN

D200N

EDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQS

T330P

GKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGIL

L603W

QTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPV

T306K

ENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNR

W313F

GKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQI

TKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLN

AVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLA

NGEIRKRPLIEINGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNS

DKLIARKKDWDPKKYGGEDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPI

DFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHY

EKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQ

AENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD

SGGSSGGSSGSETPGTSESATPESSGGSSGGSS

TLNIEDEYRLHETSKEPDVSLGSTWLSDFP

QAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSP

WNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFF

CLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLIL

LQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTE

ARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQ

EIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPP

CLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQ

FGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQR

KAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEI

YRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAA

ITETPDTSTLLIENSSP
SGGSKRTADGSEFEPKKKRKV (SEQ ID NO: 1361583)

+ PEgRNA

+ second strand nicking guide RNA.

key:

Nuclear localization sequence (NLS) Top (SEQ ID NO: 1361532); Bottom (SEQ ID NO: 1361541)

Cas9(H840A) (SEQ ID NO: 1361454)

33-amino acid linker
(SEQ ID NO: 1361528)

M-MLV reverse transcriptase (SEQ ID NO: 1361514)

The PE complexes may be delivered to cells as intact fusion complexes. The PE complexes may also be delivered to cells using one or more expression vectors (e.g., lentivirus vectors or adeno-associated virus vectors). For example, delivery of the desired PE complex could include a single expression vector that encodes the PE fusion protein and the PEgRNA and the optional second strand guide RNA from the same or different promoters. In another example, delivery of the desired PE complex could include two or more expression vector that encodes the PE fusion protein a first expression vector and the PEgRNA and/or the optional second strand guide RNA from a second expression vector.

The PEgRNA that may be included as part of these complex include any of those therapeutic PEgRNA included in the Sequence Listing, e.g., the whole PEgRNA sequences of SEQ ID NOs: 1-135514 or 813085-880462.

IV. PE Methods and Treatments

In another aspect, the specification provide methods for editing a target DNA sequence with a “prime editor”. As used herein, the term “prime editing” refers to a novel approach for gene editing using napDNAbps, polymerases, and specialized guide RNAs as described in the present application and which is exemplified in the embodiments of FIG. 1A-1H. Prime editing may also be described as “target-primed reverse transcription” (TPRT) because the target DNA molecule is used to prime the synthesis of a strand of DNA by a polymerase (e.g., reverse transcriptase). The use of the term “reverse transcription” in the name “target-primed reverse transcription” is not intended to limit prime editing to the use of reverse transcriptases, but rather TPRT or prime editors may comprise any polymerase (e.g., DNA-dependent DNA polymerase or RNA-dependent DNA polymerase). In various embodiments, prime editing operates by contacting a target DNA molecule (for which a change in the nucleotide sequence is desired to be introduced) with a nucleic acid programmable DNA binding protein (napDNAbp) complexed with an prime editor guide RNA. In reference to FIG. 1E, the prime editor guide RNA comprises an extension at the 3′ or 5′ end of the guide RNA, or at an intramolecular location in the guide RNA and encodes the desired nucleotide change (e.g., single nucleotide change, insertion, or deletion). In step (a), the napDNAbp/extended gRNA complex contacts the DNA molecule and the extended gRNA guides the napDNAbp to bind to a target locus. In step (b), a nick in one of the strands of DNA of the target locus is introduced (e.g., by a nuclease or chemical agent), thereby creating an available 3′ end in one of the strands of the target locus. In some embodiments, the nick is created in the strand of DNA that corresponds to the R-loop strand, i.e., the strand that is not hybridized to the guide RNA sequence, i.e., the “non-target strand.” The nick, however, could be introduced in either of the strands. That is, the nick could be introduced into the R-loop “target strand” (i.e., the strand hybridized to the protospacer of the extended gRNA) or the “non-target strand” (i.e, the strand forming the single-stranded portion of the R-loop and which is complementary to the target strand). In step (c), the 3′ end of the DNA strand (formed by the nick) interacts with the extended portion of the guide RNA in order to prime reverse transcription (i.e, “target-primed RT”). In some embodiments, the 3′ end DNA strand hybridizes to a specific primer binding site on the extended portion of the guide RNA, i.e, the “reverse transcriptase priming sequence.” In step (d), a reverse transcriptase is introduced which synthesizes a single strand of DNA from the 3′ end of the primed site towards the 3′ end of the prime editor guide RNA. This forms a single-strand DNA flap comprising the desired nucleotide change (e.g., the single base change, insertion, or deletion, or a combination thereof) and which is otherwise homologous to the endogenous DNA at or adjacent to the nick site. In step (e), the napDNAbp and guide RNA are released. Steps (f) and (g) relate to the resolution of the single strand DNA flap such that the desired nucleotide change becomes incorporated into the target locus. This process can be driven towards the desired product formation by removing the corresponding 5′ endogenous DNA flap that forms once the 3′ single strand DNA flap invades and hybridizes to the endogenous DNA sequence. Without being bound by theory, the cells endogenous DNA repair and replication processes resolves the mismatched DNA to incorporate the nucleotide change(s) to form the desired altered product. The process can also be driven towards product formation with “second strand nicking,” as exemplified in FIG. 1D. This process may introduce at least one or more of the following genetic changes: transversions, transitions, deletions, and insertions.

The term “prime editor (PE)” or “prime editor” refers the compositions involved in the method of genome editing using target-primed reverse transcription (TPRT) describe herein, including, but not limited to the napDNAbps, reverse transcriptases, fusion proteins (e.g., comprising napDNAbps and reverse transcriptases), prime editor guide RNAs, and complexes comprising fusion proteins and prime editor guide RNAs, as well as accessory elements, such as second strand nicking components and 5′ endogenous DNA flap removal endonucleases for helping to drive the prime editing process towards the edited product formation.

In various embodiments, the prime editor (PE) system may include the use of an error-prone reverse transcriptase for performing targeted mutagenesis, i.e., to mutate only a well-defined stretch of DNA in a genome or other DNA element in a cell. FIG. 22 provides a schematic of an exemplary process for introducing conducting targeted mutagenesis with an error-prone reverse transcriptase on a target locus using a nucleic acid programmable DNA binding protein (napDNAbp) complexed with an prime editor guide RNA. This process may be referred to as an embodiment of prime editing for targeted mutagenesis. The prime editor guide RNA comprises an extension at the 3′ or 5′ end of the guide RNA, or at an intramolecular location in the guide RNA. In step (a), the napDNAbp/gRNA complex contacts the DNA molecule and the gRNA guides the napDNAbp to bind to the target locus to be mutagenized. In step (b), a nick in one of the strands of DNA of the target locus is introduced (e.g., by a nuclease or chemical agent), thereby creating an available 3′ end in one of the strands of the target locus. In certain embodiments, the nick is created in the strand of DNA that corresponds to the R-loop strand, i.e., the strand that is not hybridized to the guide RNA sequence. In step (c), the 3′ end DNA strand interacts with the extended portion of the guide RNA in order to prime reverse transcription. In some embodiments, the 3′ ended DNA strand hybridizes to a specific primer binding site on the extended portion of the guide RNA. In step (d), an error-prone reverse transcriptase is introduced which synthesizes a mutagenized single strand of DNA from the 3′ end of the primed site towards the 3′ end of the guide RNA. Exemplary mutations are indicated with an asterisk “*”. This forms a single-strand DNA flap comprising the desired mutagenized region. In step (e), the napDNAbp and guide RNA are released. Steps (f) and (g) relate to the resolution of the single strand DNA flap (comprising the mutagenized region) such that the desired mutagenized region becomes incorporated into the target locus. This process can be driven towards the desired product formation by removing the corresponding 5′ endogenous DNA flap that forms once the 3′ single strand DNA flap invades and hybridizes to the complementary sequence on the other strand. The process can also be driven towards product formation with second strand nicking, as exemplified in FIG. 1D. Following endogenous DNA repair and/or replication processes, the mutagenized region becomes incorporated into both strands of DNA of the DNA locus.

In other embodiments, PEgRNA may be used in the prime editing systems described herein to install a genetic change into a target DNA sequence. An exemplary such process is depicted in FIG. 29. The figures provides a schematic of the interaction of a typical PEgRNA with a target site of a double stranded DNA. The double stand DNA is shown with the top strand in the 3′ to 5′ orientation and the lower strand in the 5′ to 3′ direction. The top strand comprises the “protospacer” and the PAM sequence and is referred to as the “target strand.” The complementary lower strand is referred to as the “non-target strand.” Although not shown, the PEgRNA depicted would be complexed with a Cas9 or equivalent. As shown in the schematic, the spacer of the PEgRNA anneals to a complementary region on the target strand, which is referred to as the protospacer, which is located just downstream of the PAM sequence is approximately 20 nucleotides in length. This interaction forms as DNA/RNA hybrid between the spacer RNA and the protospacer DNA, and induces the formation of an R loop in the region opposite the protospacer. As taught elsewhere herein, the Cas9 protein (not shown) then induces a nick in the non-target strand, as shown. This then leads to the formation of the 3′ ssDNA flap region which, in accordance with *z*, interacts with the 3′ end of the PEgRNA at the primer binding site. The 3′ end of the ssDNA flap (i.e., the reverse transcriptase primer sequence) anneals to the primer binding site (A) on the PEgRNA, thereby priming reverse transcriptase. Next, reverse transcriptase (e.g., provided in trans or provided cis as a fusion protein, attached to the Cas9 construct) then polymerizes a single strand of DNA which is coded for by the edit template (B) and homology arm (C). The polymerization continues towards the 5′ end of the extension arm. The polymerized strand of ssDNA forms a ssDNA 3′ end flap which, as describe elsewhere (e.g., as shown in FIG. 1E), invades the endogenous DNA, displacing the corresponding endogenous strand (which is removed as a 5′ DNA flap of endogenous DNA), and installing the desired nucleotide edit (single nucleotide base pair change, deletions, insertions (including whole genes) through naturally occurring DNA repair/replication rounds.

The instant disclosure provides methods for the treatment of a subject diagnosed with a disease associated with or caused by a point mutation that can be corrected by the prime editor (PE) system provided herein. For example, in some embodiments, a method is provided that comprises administering to a subject having such a disease, e.g., a cancer associated with a point mutation as described above, an effective amount of the prime editor (PE) system described herein that corrects the point mutation or introduces a deactivating mutation into a disease-associated gene as mediated by homology-directed repair in the presence of a donor DNA molecule comprising desired genetic change. In some embodiments, a method is provided that comprises administering to a subject having such a disease, e.g., a cancer associated with a point mutation as described above, an effective amount of the prime editor (PE) system described herein that corrects the point mutation or introduces a deactivating mutation into a disease-associated gene. In some embodiments, the disease is a proliferative disease. In some embodiments, the disease is a genetic disease. In some embodiments, the disease is a neoplastic disease. In some embodiments, the disease is a metabolic disease. In some embodiments, the disease is a lysosomal storage disease. Other diseases that can be treated by correcting a point mutation or introducing a deactivating mutation into a disease-associated gene will be known to those of skill in the art, and the disclosure is not limited in this respect.

The instant disclosure provides methods for the treatment of additional diseases or disorders, e.g., diseases or disorders that are associated or caused by a point mutation that can be corrected by TPRT-mediated gene editing. Some such diseases are described herein, and additional suitable diseases that can be treated with the strategies and fusion proteins provided herein will be apparent to those of skill in the art based on the instant disclosure. Exemplary suitable diseases and disorders are listed below. It will be understood that the numbering of the specific positions or residues in the respective sequences depends on the particular protein and numbering scheme used. Numbering might be different, e.g., in precursors of a mature protein and the mature protein itself, and differences in sequences from species to species may affect numbering. One of skill in the art will be able to identify the respective residue in any homologous protein and in the respective encoding nucleic acid by methods well known in the art, e.g., by sequence alignment and determination of homologous residues. Exemplary suitable diseases and disorders include, without limitation: 2-methyl-3-hydroxybutyric aciduria; 3 beta-Hydroxysteroid dehydrogenase deficiency; 3-Methylglutaconic aciduria; 3-Oxo-5 alpha-steroid delta 4-dehydrogenase deficiency; 46,XY sex reversal, type 1, 3, and 5; 5-Oxoprolinase deficiency; 6-pyruvoyl-tetrahydropterin synthase deficiency; Aarskog syndrome; Aase syndrome; Achondrogenesis type 2; Achromatopsia 2 and 7; Acquired long QT syndrome; Acrocallosal syndrome, Schinzel type; Acrocapitofemoral dysplasia; Acrodysostosis 2, with or without hormone resistance; Acroerythrokeratoderma; Acromicric dysplasia; Acth-independent macronodular adrenal hyperplasia 2; Activated PI3K-delta syndrome; Acute intermittent porphyria; deficiency of Acyl-CoA dehydrogenase family, member 9; Adams-Oliver syndrome 5 and 6; Adenine phosphoribosyltransferase deficiency; Adenylate kinase deficiency; hemolytic anemia due to Adenylosuccinate lyase deficiency; Adolescent nephronophthisis; Renal-hepatic-pancreatic dysplasia; Meckel syndrome type 7; Adrenoleukodystrophy; Adult junctional epidermolysis bullosa; Epidermolysis bullosa, junctional, localisata variant; Adult neuronal ceroid lipofuscinosis; Adult neuronal ceroid lipofuscinosis; Adult onset ataxia with oculomotor apraxia; ADULT syndrome; Afibrinogenemia and congenital Afibrinogenemia; autosomal recessive Agammaglobulinemia 2; Age-related macular degeneration 3, 6, 11, and 12; Aicardi Goutieres syndromes 1, 4, and 5; Chilbain lupus 1; Alagille syndromes 1 and 2; Alexander disease; Alkaptonuria; Allan-Herndon-Dudley syndrome; Alopecia universalis congenital; Alpers encephalopathy; Alpha-1-antitrypsin deficiency; autosomal dominant, autosomal recessive, and X-linked recessive Alport syndromes; Alzheimer disease, familial, 3, with spastic paraparesis and apraxia; Alzheimer disease, types, 1, 3, and 4; hypocalcification type and hypomaturation type, IIA1 Amelogenesis imperfecta; Aminoacylase 1 deficiency; Amish infantile epilepsy syndrome; Amyloidogenic transthyretin amyloidosis; Amyloid Cardiomyopathy, Transthyretin-related; Cardiomyopathy; Amyotrophic lateral sclerosis types 1, 6, 15 (with or without frontotemporal dementia), 22 (with or without frontotemporal dementia), and 10; Frontotemporal dementia with TDP43 inclusions, TARDBP-related; Andermann syndrome; Andersen Tawil syndrome; Congenital long QT syndrome; Anemia, nonspherocytic hemolytic, due to G6PD deficiency; Angelman syndrome; Severe neonatal-onset encephalopathy with microcephaly; susceptibility to Autism, X-linked 3; Angiopathy, hereditary, with nephropathy, aneurysms, and muscle cramps; Angiotensin i-converting enzyme, benign serum increase; Aniridia, cerebellar ataxia, and mental retardation; Anonychia; Antithrombin III deficiency; Antley-Bixler syndrome with genital anomalies and disordered steroidogenesis; Aortic aneurysm, familial thoracic 4, 6, and 9; Thoracic aortic aneurysms and aortic dissections; Multisystemic smooth muscle dysfunction syndrome; Moyamoya disease 5; Aplastic anemia; Apparent mineralocorticoid excess; Arginase deficiency; Argininosuccinate lyase deficiency; Aromatase deficiency; Arrhythmogenic right ventricular cardiomyopathy types 5, 8, and 10; Primary familial hypertrophic cardiomyopathy; Arthrogryposis multiplex congenita, distal, X-linked; Arthrogryposis renal dysfunction cholestasis syndrome; Arthrogryposis, renal dysfunction, and cholestasis 2; Asparagine synthetase deficiency; Abnormality of neuronal migration; Ataxia with vitamin E deficiency; Ataxia, sensory, autosomal dominant; Ataxia-telangiectasia syndrome; Hereditary cancer-predisposing syndrome; Atransferrinemia; Atrial fibrillation, familial, 11, 12, 13, and 16; Atrial septal defects 2, 4, and 7 (with or without atrioventricular conduction defects); Atrial standstill 2; Atrioventricular septal defect 4; Atrophia bulborum hereditaria; ATR-X syndrome; Auriculocondylar syndrome 2; Autoimmune disease, multisystem, infantile-onset; Autoimmune lymphoproliferative syndrome, type 1a; Autosomal dominant hypohidrotic ectodermal dysplasia; Autosomal dominant progressive external ophthalmoplegia with mitochondrial DNA deletions 1 and 3; Autosomal dominant torsion dystonia 4; Autosomal recessive centronuclear myopathy; Autosomal recessive congenital ichthyosis 1, 2, 3, 4A, and 4B; Autosomal recessive cutis laxa type IA and 1B; Autosomal recessive hypohidrotic ectodermal dysplasia syndrome; Ectodermal dysplasia 11b; hypohidrotic/hair/tooth type, autosomal recessive; Autosomal recessive hypophosphatemic bone disease; Axenfeld-Rieger syndrome type 3; Bainbridge-Ropers syndrome; Bannayan-Riley-Ruvalcaba syndrome; PTEN hamartoma tumor syndrome; Baraitser-Winter syndromes 1 and 2; Barakat syndrome; Bardet-Biedl syndromes 1, 11, 16, and 19; Bare lymphocyte syndrome type 2, complementation group E; Bartter syndrome antenatal type 2; Bartter syndrome types 3, 3 with hypocalciuria, and 4; Basal ganglia calcification, idiopathic, 4; Beaded hair; Benign familial hematuria; Benign familial neonatal seizures 1 and 2; Seizures, benign familial neonatal, 1, and/or myokymia; Seizures, Early infantile epileptic encephalopathy 7; Benign familial neonatal-infantile seizures; Benign hereditary chorea; Benign scapuloperoneal muscular dystrophy with cardiomyopathy; Bernard-Soulier syndrome, types A1 and A2 (autosomal dominant); Bestrophinopathy, autosomal recessive; beta Thalassemia; Bethlem myopathy and Bethlem myopathy 2; Bietti crystalline corneoretinal dystrophy; Bile acid synthesis defect, congenital, 2; Biotinidase deficiency; Birk Barel mental retardation dysmorphism syndrome; Blepharophimosis, ptosis, and epicanthus inversus; Bloom syndrome; Borjeson-Forssman-Lehmann syndrome; Boucher Neuhauser syndrome; Brachydactyly types A1 and A2; Brachydactyly with hypertension; Brain small vessel disease with hemorrhage; Branched-chain ketoacid dehydrogenase kinase deficiency; Branchiootic syndromes 2 and 3; Breast cancer, early-onset; Breast-ovarian cancer, familial 1, 2, and 4; Brittle cornea syndrome 2; Brody myopathy; Bronchiectasis with or without elevated sweat chloride 3; Brown-Vialetto-Van laere syndrome and Brown-Vialetto-Van Laere syndrome 2; Brugada syndrome; Brugada syndrome 1; Ventricular fibrillation; Paroxysmal familial ventricular fibrillation; Brugada syndrome and Brugada syndrome 4; Long QT syndrome; Sudden cardiac death; Bull eye macular dystrophy; Stargardt disease 4; Cone-rod dystrophy 12; Bullous ichthyosiform erythroderma; Burn-Mckeown syndrome; Candidiasis, familial, 2, 5, 6, and 8; Carbohydrate-deficient glycoprotein syndrome type I and II; Carbonic anhydrase VA deficiency, hyperammonemia due to; Carcinoma of colon; Cardiac arrhythmia; Long QT syndrome, LQT1 subtype; Cardioencephalomyopathy, fatal infantile, due to cytochrome c oxidase deficiency; Cardiofaciocutaneous syndrome; Cardiomyopathy; Danon disease; Hypertrophic cardiomyopathy; Left ventricular noncompaction cardiomyopathy; Carnevale syndrome; Carney complex, type 1; Carnitine acylcarnitine translocase deficiency; Carnitine palmitoyltransferase I, II, II (late onset), and II (infantile) deficiency; Cataract 1, 4, autosomal dominant, autosomal dominant, multiple types, with microcornea, coppock-like, juvenile, with microcornea and glucosuria, and nuclear diffuse nonprogressive; Catecholaminergic polymorphic ventricular tachycardia; Caudal regression syndrome; Cd8 deficiency, familial; Central core disease; Centromeric instability of chromosomes 1,9 and 16 and immunodeficiency; Cerebellar ataxia infantile with progressive external ophthalmoplegi and Cerebellar ataxia, mental retardation, and dysequilibrium syndrome 2; Cerebral amyloid angiopathy, APP-related; Cerebral autosomal dominant and recessive arteriopathy with subcortical infarcts and leukoencephalopathy; Cerebral cavernous malformations 2; Cerebrooculofacioskeletal syndrome 2; Cerebro-oculo-facio-skeletal syndrome; Cerebroretinal microangiopathy with calcifications and cysts; Ceroid lipofuscinosis neuronal 2, 6, 7, and 10; Ch\xc3\xa9diak-Higashi syndrome, Chediak-Higashi syndrome, adult type; Charcot-Marie-Tooth disease types 1B, 2B2, 2C, 2F, 2I, 2U (axonal), 1C (demyelinating), dominant intermediate C, recessive intermediate A, 2A2, 4C, 4D, 4H, IF, IVF, and X; Scapuloperoneal spinal muscular atrophy; Distal spinal muscular atrophy, congenital nonprogressive; Spinal muscular atrophy, distal, autosomal recessive, 5; CHARGE association; Childhood hypophosphatasia; Adult hypophosphatasia; Cholecystitis; Progressive familial intrahepatic cholestasis 3; Cholestasis, intrahepatic, of pregnancy 3; Cholestanol storage disease; Cholesterol monooxygenase (side-chain cleaving) deficiency; Chondrodysplasia Blomstrand type; Chondrodysplasia punctata 1, X-linked recessive and 2 X-linked dominant; CHOPS syndrome; Chronic granulomatous disease, autosomal recessive cytochrome b-positive, types 1 and 2; Chudley-McCullough syndrome; Ciliary dyskinesia, primary, 7, 11, 15, 20 and 22; Citrullinemia type I; Citrullinemia type I and II; Cleidocranial dysostosis; C-like syndrome; Cockayne syndrome type A; Coenzyme Q10 deficiency, primary 1, 4, and 7; Coffin Siris/Intellectual Disability; Coffin-Lowry syndrome; Cohen syndrome; Cold-induced sweating syndrome 1; COLE-CARPENTER SYNDROME 2; Combined cellular and humoral immune defects with granulomas; Combined d-2- and l-2-hydroxyglutaric aciduria; Combined malonic and methylmalonic aciduria; Combined oxidative phosphorylation deficiencies 1, 3, 4, 12, 15, and 25; Combined partial and complete 17-alpha-hydroxylase/17,20-lyase deficiency; Common variable immunodeficiency 9; Complement component 4, partial deficiency of, due to dysfunctional c1 inhibitor; Complement factor B deficiency; Cone monochromatism; Cone-rod dystrophy 2 and 6; Cone-rod dystrophy amelogenesis imperfecta; Congenital adrenal hyperplasia and Congenital adrenal hypoplasia, X-linked; Congenital amegakaryocytic thrombocytopenia; Congenital aniridia; Congenital central hypoventilation; Hirschsprung disease 3; Congenital contractural arachnodactyly; Congenital contractures of the limbs and face, hypotonia, and developmental delay; Congenital disorder of glycosylation types 1B, 1D, 1G, 1H, 1J, 1K, 1N, 1P, 2C, 2J, 2K, IIm; Congenital dyserythropoietic anemia, type I and II; Congenital ectodermal dysplasia of face; Congenital erythropoietic porphyria; Congenital generalized lipodystrophy type 2; Congenital heart disease, multiple types, 2; Congenital heart disease; Interrupted aortic arch; Congenital lipomatous overgrowth, vascular malformations, and epidermal nevi; Non-small cell lung cancer; Neoplasm of ovary; Cardiac conduction defect, nonspecific; Congenital microvillous atrophy; Congenital muscular dystrophy; Congenital muscular dystrophy due to partial LAMA2 deficiency; Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, types A2, A7, A8, A11, and A14; Congenital muscular dystrophy-dystroglycanopathy with mental retardation, types B2, B3, B5, and B15; Congenital muscular dystrophy-dystroglycanopathy without mental retardation, type B5; Congenital muscular hypertrophy-cerebral syndrome; Congenital myasthenic syndrome, acetazolamide-responsive; Congenital myopathy with fiber type disproportion; Congenital ocular coloboma; Congenital stationary night blindness, type 1A, 1B, 1C, 1E, 1F, and 2A; Coproporphyria; Cornea plana 2; Corneal dystrophy, Fuchs endothelial, 4; Corneal endothelial dystrophy type 2; Corneal fragility keratoglobus, blue sclerae and joint hypermobility; Cornelia de Lange syndromes 1 and 5; Coronary artery disease, autosomal dominant 2; Coronary heart disease; Hyperalphalipoproteinemia 2; Cortical dysplasia, complex, with other brain malformations 5 and 6; Cortical malformations, occipital; Corticosteroid-binding globulin deficiency; Corticosterone methyloxidase type 2 deficiency; Costello syndrome; Cowden syndrome 1; Coxa plana; Craniodiaphyseal dysplasia, autosomal dominant; Craniosynostosis 1 and 4; Craniosynostosis and dental anomalies; Creatine deficiency, X-linked; Crouzon syndrome; Cryptophthalmos syndrome; Cryptorchidism, unilateral or bilateral; Cushing symphalangism; Cutaneous malignant melanoma 1; Cutis laxa with osteodystrophy and with severe pulmonary, gastrointestinal, and urinary abnormalities; Cyanosis, transient neonatal and atypical nephropathic; Cystic fibrosis; Cystinuria; Cytochrome c oxidase i deficiency; Cytochrome-c oxidase deficiency; D-2-hydroxyglutaric aciduria 2; Darier disease, segmental; Deafness with labyrinthine aplasia microtia and microdontia (LAMM); Deafness, autosomal dominant 3a, 4, 12, 13, 15, autosomal dominant nonsyndromic sensorineural 17, 20, and 65; Deafness, autosomal recessive 1A, 2, 3, 6, 8, 9, 12, 15, 16, 18b, 22, 28, 31, 44, 49, 63, 77, 86, and 89; Deafness, cochlear, with myopia and intellectual impairment, without vestibular involvement, autosomal dominant, X-linked 2; Deficiency of 2-methylbutyryl-CoA dehydrogenase; Deficiency of 3-hydroxyacyl-CoA dehydrogenase; Deficiency of alpha-mannosidase; Deficiency of aromatic-L-amino-acid decarboxylase; Deficiency of bisphosphoglycerate mutase; Deficiency of butyryl-CoA dehydrogenase; Deficiency of ferroxidase; Deficiency of galactokinase; Deficiency of guanidinoacetate methyltransferase; Deficiency of hyaluronoglucosaminidase; Deficiency of ribose-5-phosphate isomerase; Deficiency of steroid 11-beta-monooxygenase; Deficiency of UDPglucose-hexose-1-phosphate uridylyltransferase; Deficiency of xanthine oxidase; Dejerine-Sottas disease; Charcot-Marie-Tooth disease, types ID and IVF; Dejerine-Sottas syndrome, autosomal dominant; Dendritic cell, monocyte, B lymphocyte, and natural killer lymphocyte deficiency; Desbuquois dysplasia 2; Desbuquois syndrome; DFNA 2 Nonsyndromic Hearing Loss; Diabetes mellitus and insipidus with optic atrophy and deafness; Diabetes mellitus, type 2, and insulin-dependent, 20; Diamond-Blackfan anemia 1, 5, 8, and 10; Diarrhea 3 (secretory sodium, congenital, syndromic) and 5 (with tufting enteropathy, congenital); Dicarboxylic aminoaciduria; Diffuse palmoplantar keratoderma, Bothnian type; Digitorenocerebral syndrome; Dihydropteridine reductase deficiency; Dilated cardiomyopathy 1A, 1AA, 1C, 1G, 1BB, 1DD, 1FF, 1HH, 1I, 1KK, 1N, 1S, 1Y, and 3B; Left ventricular noncompaction 3; Disordered steroidogenesis due to cytochrome p450 oxidoreductase deficiency; Distal arthrogryposis type 2B; Distal hereditary motor neuronopathy type 2B; Distal myopathy Markesbery-Griggs type; Distal spinal muscular atrophy, X-linked 3; Distichiasis-lymphedema syndrome; Dominant dystrophic epidermolysis bullosa with absence of skin; Dominant hereditary optic atrophy; Donnai Barrow syndrome; Dopamine beta hydroxylase deficiency; Dopamine receptor d2, reduced brain density of; Dowling-degos disease 4; Doyne honeycomb retinal dystrophy; Malattia leventinese; Duane syndrome type 2; Dubin-Johnson syndrome; Duchenne muscular dystrophy; Becker muscular dystrophy; Dysfibrinogenemia; Dyskeratosis congenita autosomal dominant and autosomal dominant, 3; Dyskeratosis congenita, autosomal recessive, 1, 3, 4, and 5; Dyskeratosis congenita X-linked; Dyskinesia, familial, with facial myokymia; Dysplasminogenemia; Dystonia 2 (torsion, autosomal recessive), 3 (torsion, X-linked), 5 (Dopa-responsive type), 10, 12, 16, 25, 26 (Myoclonic); Seizures, benign familial infantile, 2; Early infantile epileptic encephalopathy 2, 4, 7, 9, 10, 11, 13, and 14; Atypical Rett syndrome; Early T cell progenitor acute lymphoblastic leukemia; Ectodermal dysplasia skin fragility syndrome; Ectodermal dysplasia-syndactyly syndrome 1; Ectopia lentis, isolated autosomal recessive and dominant; Ectrodactyly, ectodermal dysplasia, and cleft lip/palate syndrome 3; Ehlers-Danlos syndrome type 7 (autosomal recessive), classic type, type 2 (progeroid), hydroxylysine-deficient, type 4, type 4 variant, and due to tenascin-X deficiency; Eichsfeld type congenital muscular dystrophy; Endocrine-cerebroosteodysplasia; Enhanced s-cone syndrome; Enlarged vestibular aqueduct syndrome; Enterokinase deficiency; Epidermodysplasia verruciformis; Epidermolysa bullosa simplex and limb girdle muscular dystrophy, simplex with mottled pigmentation, simplex with pyloric atresia, simplex, autosomal recessive, and with pyloric atresia; Epidermolytic palmoplantar keratoderma; Familial febrile seizures 8; Epilepsy, childhood absence 2, 12 (idiopathic generalized, susceptibility to) 5 (nocturnal frontal lobe), nocturnal frontal lobe type 1, partial, with variable foci, progressive myoclonic 3, and X-linked, with variable learning disabilities and behavior disorders; Epileptic encephalopathy, childhood-onset, early infantile, 1, 19, 23, 25, 30, and 32; Epiphyseal dysplasia, multiple, with myopia and conductive deafness; Episodic ataxia type 2; Episodic pain syndrome, familial, 3; Epstein syndrome; Fechtner syndrome; Erythropoietic protoporphyria; Estrogen resistance; Exudative vitreoretinopathy 6; Fabry disease and Fabry disease, cardiac variant; Factor H, VII, X, v and factor viii, combined deficiency of 2, xiii, a subunit, deficiency; Familial adenomatous polyposis 1 and 3; Familial amyloid nephropathy with urticaria and deafness; Familial cold urticarial; Familial aplasia of the vermis; Familial benign pemphigus; Familial cancer of breast; Breast cancer, susceptibility to; Osteosarcoma; Pancreatic cancer 3; Familial cardiomyopathy; Familial cold autoinflammatory syndrome 2; Familial colorectal cancer; Familial exudative vitreoretinopathy, X-linked; Familial hemiplegic migraine types 1 and 2; Familial hypercholesterolemia; Familial hypertrophic cardiomyopathy 1, 2, 3, 4, 7, 10, 23 and 24; Familial hypokalemia-hypomagnesemia; Familial hypoplastic, glomerulocystic kidney; Familial infantile myasthenia; Familial juvenile gout; Familial Mediterranean fever and Familial mediterranean fever, autosomal dominant; Familial porencephaly; Familial porphyria cutanea tarda; Familial pulmonary capillary hemangiomatosis; Familial renal glucosuria; Familial renal hypouricemia; Familial restrictive cardiomyopathy 1; Familial type 1 and 3 hyperlipoproteinemia; Fanconi anemia, complementation group E, I, N, and O; Fanconi-Bickel syndrome; Favism, susceptibility to; Febrile seizures, familial, 11; Feingold syndrome 1; Fetal hemoglobin quantitative trait locus 1; FG syndrome and FG syndrome 4; Fibrosis of extraocular muscles, congenital, 1, 2, 3a (with or without extraocular involvement), 3b; Fish-eye disease; Fleck corneal dystrophy; Floating-Harbor syndrome; Focal epilepsy with speech disorder with or without mental retardation; Focal segmental glomerulosclerosis 5; Forebrain defects; Frank Ter Haar syndrome; Borrone Di Rocco Crovato syndrome; Frasier syndrome; Wilms tumor 1; Freeman-Sheldon syndrome; Frontometaphyseal dysplasia land 3; Frontotemporal dementia; Frontotemporal dementia and/or amyotrophic lateral sclerosis 3 and 4; Frontotemporal Dementia Chromosome 3-Linked and Frontotemporal dementia ubiquitin-positive; Fructose-biphosphatase deficiency; Fuhrmann syndrome; Gamma-aminobutyric acid transaminase deficiency; Gamstorp-Wohlfart syndrome; Gaucher disease type 1 and Subacute neuronopathic; Gaze palsy, familial horizontal, with progressive scoliosis; Generalized dominant dystrophic epidermolysis bullosa; Generalized epilepsy with febrile seizures plus 3, type 1, type 2; Epileptic encephalopathy Lennox-Gastaut type; Giant axonal neuropathy; Glanzmann thrombasthenia; Glaucoma 1, open angle, e, F, and G; Glaucoma 3, primary congenital, d; Glaucoma, congenital and Glaucoma, congenital, Coloboma; Glaucoma, primary open angle, juvenile-onset; Glioma susceptibility 1; Glucose transporter type 1 deficiency syndrome; Glucose-6-phosphate transport defect; GLUT1 deficiency syndrome 2; Epilepsy, idiopathic generalized, susceptibility to, 12; Glutamate formiminotransferase deficiency; Glutaric acidemia IIA and IIB; Glutaric aciduria, type 1; Gluthathione synthetase deficiency; Glycogen storage disease 0 (muscle), II (adult form), IXa2, IXc, type IA; type II, type IV, IV (combined hepatic and myopathic), type V, and type VI; Goldmann-Favre syndrome; Gordon syndrome; Gorlin syndrome; Holoprosencephaly sequence; Holoprosencephaly 7; Granulomatous disease, chronic, X-linked, variant; Granulosa cell tumor of the ovary; Gray platelet syndrome; Griscelli syndrome type 3; Groenouw corneal dystrophy type I; Growth and mental retardation, mandibulofacial dysostosis, microcephaly, and cleft palate; Growth hormone deficiency with pituitary anomalies; Growth hormone insensitivity with immunodeficiency; GTP cyclohydrolase I deficiency; Hajdu-Cheney syndrome; Hand foot uterus syndrome; Hearing impairment; Hemangioma, capillary infantile; Hematologic neoplasm; Hemochromatosis type 1, 2B, and 3; Microvascular complications of diabetes 7; Transferrin serum level quantitative trait locus 2; Hemoglobin H disease, nondeletional; Hemolytic anemia, nonspherocytic, due to glucose phosphate isomerase deficiency; Hemophagocytic lymphohistiocytosis, familial, 2; Hemophagocytic lymphohistiocytosis, familial, 3; Heparin cofactor II deficiency; Hereditary acrodermatitis enteropathica; Hereditary breast and ovarian cancer syndrome; Ataxia-telangiectasia-like disorder; Hereditary diffuse gastric cancer; Hereditary diffuse leukoencephalopathy with spheroids; Hereditary factors II, IX, VIII deficiency disease; Hereditary hemorrhagic telangiectasia type 2; Hereditary insensitivity to pain with anhidrosis; Hereditary lymphedema type I; Hereditary motor and sensory neuropathy with optic atrophy; Hereditary myopathy with early respiratory failure; Hereditary neuralgic amyotrophy; Hereditary Nonpolyposis Colorectal Neoplasms; Lynch syndrome I and II; Hereditary pancreatitis; Pancreatitis, chronic, susceptibility to; Hereditary sensory and autonomic neuropathy type IIB amd IIA; Hereditary sideroblastic anemia; Hermansky-Pudlak syndrome 1, 3, 4, and 6; Heterotaxy, visceral, 2, 4, and 6, autosomal; Heterotaxy, visceral, X-linked; Heterotopia; Histiocytic medullary reticulosis; Histiocytosis-lymphadenopathy plus syndrome; Holocarboxylase synthetase deficiency; Holoprosencephaly 2, 3,7, and 9; Holt-Oram syndrome; Homocysteinemia due to MTHFR deficiency, CBS deficiency, and Homocystinuria, pyridoxine-responsive; Homocystinuria-Megaloblastic anemia due to defect in cobalamin metabolism, cblE complementation type; Howel-Evans syndrome; Hurler syndrome; Hutchinson-Gilford syndrome; Hydrocephalus; Hyperammonemia, type III; Hypercholesterolaemia and Hypercholesterolemia, autosomal recessive; Hyperekplexia 2 and Hyperekplexia hereditary; Hyperferritinemia cataract syndrome; Hyperglycinuria; Hyperimmunoglobulin D with periodic fever; Mevalonic aciduria; Hyperimmunoglobulin E syndrome; Hyperinsulinemic hypoglycemia familial 3, 4, and 5; Hyperinsulinism-hyperammonemia syndrome; Hyperlysinemia; Hypermanganesemia with dystonia, polycythemia and cirrhosis; Hyperornithinemia-hyperammonemia-homocitrullinuria syndrome; Hyperparathyroidism 1 and 2; Hyperparathyroidism, neonatal severe; Hyperphenylalaninemia, bh4-deficient, a, due to partial pts deficiency, BH4-deficient, D, and non-pku; Hyperphosphatasia with mental retardation syndrome 2, 3, and 4; Hypertrichotic osteochondrodysplasia; Hypobetalipoproteinemia, familial, associated with apob32; Hypocalcemia, autosomal dominant 1; Hypocalciuric hypercalcemia, familial, types 1 and 3; Hypochondrogenesis; Hypochromic microcytic anemia with iron overload; Hypoglycemia with deficiency of glycogen synthetase in the liver; Hypogonadotropic hypogonadism 11 with or without anosmia; Hypohidrotic ectodermal dysplasia with immune deficiency; Hypohidrotic X-linked ectodermal dysplasia; Hypokalemic periodic paralysis 1 and 2; Hypomagnesemia 1, intestinal; Hypomagnesemia, seizures, and mental retardation; Hypomyelinating leukodystrophy 7; Hypoplastic left heart syndrome; Atrioventricular septal defect and common atrioventricular junction; Hypospadias 1 and 2, X-linked; Hypothyroidism, congenital, nongoitrous, 1; Hypotrichosis 8 and 12; Hypotrichosis-lymphedema-telangiectasia syndrome; I blood group system; Ichthyosis bullosa of Siemens; Ichthyosis exfoliativa; Ichthyosis prematurity syndrome; Idiopathic basal ganglia calcification 5; Idiopathic fibrosing alveolitis, chronic form; Dyskeratosis congenita, autosomal dominant, 2 and 5; Idiopathic hypercalcemia of infancy; Immune dysfunction with T-cell inactivation due to calcium entry defect 2; Immunodeficiency 15, 16, 19, 30, 31C, 38, 40, 8, due to defect in cd3-zeta, with hyper IgM type 1 and 2, and X-Linked, with magnesium defect, Epstein-Barr virus infection, and neoplasia; Immunodeficiency-centromeric instability-facial anomalies syndrome 2; Inclusion body myopathy 2 and 3; Nonaka myopathy; Infantile convulsions and paroxysmal choreoathetosis, familial; Infantile cortical hyperostosis; Infantile GM1 gangliosidosis; Infantile hypophosphatasia; Infantile nephronophthisis; Infantile nystagmus, X-linked; Infantile Parkinsonism-dystonia; Infertility associated with multi-tailed spermatozoa and excessive DNA; Insulin resistance; Insulin-resistant diabetes mellitus and acanthosis nigricans; Insulin-dependent diabetes mellitus secretory diarrhea syndrome; Interstitial nephritis, karyomegalic; Intrauterine growth retardation, metaphyseal dysplasia, adrenal hypoplasia congenita, and genital anomalies; Iodotyrosyl coupling defect; IRAK4 deficiency; Iridogoniodysgenesis dominant type and type 1; Iron accumulation in brain; Ischiopatellar dysplasia; Islet cell hyperplasia; Isolated 17,20-lyase deficiency; Isolated lutropin deficiency; Isovaleryl-CoA dehydrogenase deficiency; Jankovic Rivera syndrome; Jervell and Lange-Nielsen syndrome 2; Joubert syndrome 1, 6, 7, 9/15 (digenic), 14, 16, and 17, and Orofaciodigital syndrome xiv; Junctional epidermolysis bullosa gravis of Herlitz; Juvenile GM>1<gangliosidosis; Juvenile polyposis syndrome; Juvenile polyposis/hereditary hemorrhagic telangiectasia syndrome; Juvenile retinoschisis; Kabuki make-up syndrome; Kallmann syndrome 1, 2, and 6; Delayed puberty; Kanzaki disease; Karak syndrome; Kartagener syndrome; Kenny-Caffey syndrome type 2; Keppen-Lubinsky syndrome; Keratoconus 1; Keratosis follicularis; Keratosis palmoplantaris striata 1; Kindler syndrome; L-2-hydroxyglutaric aciduria; Larsen syndrome, dominant type; Lattice corneal dystrophy Type III; Leber amaurosis; Zellweger syndrome; Peroxisome biogenesis disorders; Zellweger syndrome spectrum; Leber congenital amaurosis 11, 12, 13, 16, 4, 7, and 9; Leber optic atrophy; Aminoglycoside-induced deafness; Deafness, nonsyndromic sensorineural, mitochondrial; Left ventricular noncompaction 5; Left-right axis malformations; Leigh disease; Mitochondrial short-chain Enoyl-CoA Hydratase 1 deficiency; Leigh syndrome due to mitochondrial complex I deficiency; Leiner disease; Leri Weill dyschondrosteosis; Lethal congenital contracture syndrome 6; Leukocyte adhesion deficiency type I and III; Leukodystrophy, Hypomyelinating, 11 and 6; Leukoencephalopathy with ataxia, with Brainstem and Spinal Cord Involvement and Lactate Elevation, with vanishing white matter, and progressive, with ovarian failure; Leukonychia totalis; Lewy body dementia; Lichtenstein-Knorr Syndrome; Li-Fraumeni syndrome 1; Lig4 syndrome; Limb-girdle muscular dystrophy, type 1B, 2A, 2B, 2D, C1, C5, C9, C14; Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, type A14 and B14; Lipase deficiency combined; Lipid proteinosis; Lipodystrophy, familial partial, type 2 and 3; Lissencephaly 1, 2 (X-linked), 3, 6 (with microcephaly), X-linked; Subcortical laminar heterotopia, X-linked; Liver failure acute infantile; Loeys-Dietz syndrome 1, 2, 3; Long QT syndrome 1, 2, 2/9, 2/5, (digenic), 3, 5 and 5, acquired, susceptibility to; Lung cancer; Lymphedema, hereditary, id; Lymphedema, primary, with myelodysplasia; Lymphoproliferative syndrome 1, 1 (X-linked), and 2; Lysosomal acid lipase deficiency; Macrocephaly, macrosomia, facial dysmorphism syndrome; Macular dystrophy, vitelliform, adult-onset; Malignant hyperthermia susceptibility type 1; Malignant lymphoma, non-Hodgkin; Malignant melanoma; Malignant tumor of prostate; Mandibuloacral dysostosis; Mandibuloacral dysplasia with type A or B lipodystrophy, atypical; Mandibulofacial dysostosis, Treacher Collins type, autosomal recessive; Mannose-binding protein deficiency; Maple syrup urine disease type 1A and type 3; Marden Walker like syndrome; Marfan syndrome; Marinesco-Sj\xc3\xb6gren syndrome; Martsolf syndrome; Maturity-onset diabetes of the young, type 1, type 2, type 11, type 3, and type 9; May-Hegglin anomaly; MYH9 related disorders; Sebastian syndrome; McCune-Albright syndrome; Somatotroph adenoma; Sex cord-stromal tumor; Cushing syndrome; McKusick Kaufman syndrome; McLeod neuroacanthocytosis syndrome; Meckel-Gruber syndrome; Medium-chain acyl-coenzyme A dehydrogenase deficiency; Medulloblastoma; Megalencephalic leukoencephalopathy with subcortical cysts land 2a; Megalencephaly cutis marmorata telangiectatica congenital; PIK3CA Related Overgrowth Spectrum; Megalencephaly-polymicrogyria-polydactyly-hydrocephalus syndrome 2; Megaloblastic anemia, thiamine-responsive, with diabetes mellitus and sensorineural deafness; Meier-Gorlin syndromes land 4; Melnick-Needles syndrome; Meningioma; Mental retardation, X-linked, 3, 21, 30, and 72; Mental retardation and microcephaly with pontine and cerebellar hypoplasia; Mental retardation X-linked syndromic 5; Mental retardation, anterior maxillary protrusion, and strabismus; Mental retardation, autosomal dominant 12, 13, 15, 24, 3, 30, 4, 5, 6,and 9; Mental retardation, autosomal recessive 15, 44, 46, and 5; Mental retardation, stereotypic movements, epilepsy, and/or cerebral malformations; Mental retardation, syndromic, Claes-Jensen type, X-linked; Mental retardation, X-linked, nonspecific, syndromic, Hedera type, and syndromic, wu type; Merosin deficient congenital muscular dystrophy; Metachromatic leukodystrophy juvenile, late infantile, and adult types; Metachromatic leukodystrophy; Metatrophic dysplasia; Methemoglobinemia types I and 2; Methionine adenosyltransferase deficiency, autosomal dominant; Methylmalonic acidemia with homocystinuria; Methylmalonic aciduria cblB type; Methylmalonic aciduria due to methylmalonyl-CoA mutase deficiency; METHYLMALONIC ACIDURIA, mut(0) TYPE; Microcephalic osteodysplastic primordial dwarfism type 2; Microcephaly with or without chorioretinopathy, lymphedema, or mental retardation; Microcephaly, hiatal hernia and nephrotic syndrome; Microcephaly; Hypoplasia of the corpus callosum; Spastic paraplegia 50, autosomal recessive; Global developmental delay; CNS hypomyelination; Brain atrophy; Microcephaly, normal intelligence and immunodeficiency; Microcephaly-capillary malformation syndrome; Microcytic anemia; Microphthalmia syndromic 5, 7, and 9; Microphthalmia, isolated 3, 5, 6, 8, and with coloboma 6; Microspherophakia; Migraine, familial basilar; Miller syndrome; Minicore myopathy with external ophthalmoplegia; Myopathy, congenital with cores; Mitchell-Riley syndrome; mitochondrial 3-hydroxy-3-methylglutaryl-CoA synthase deficiency; Mitochondrial complex I, II, III, III (nuclear type 2, 4, or 8) deficiency; Mitochondrial DNA depletion syndrome 11, 12 (cardiomyopathic type), 2, 4B (MNGIE type), 8B (MNGIE type); Mitochondrial DNA-depletion syndrome 3 and 7, hepatocerebral types, and 13 (encephalomyopathic type); Mitochondrial phosphate carrier and pyruvate carrier deficiency; Mitochondrial trifunctional protein deficiency; Long-chain 3-hydroxyacyl-CoA dehydrogenase deficiency; Miyoshi muscular dystrophy 1; Myopathy, distal, with anterior tibial onset; Mohr-Tranebjaerg syndrome; Molybdenum cofactor deficiency, complementation group A; Mowat-Wilson syndrome; Mucolipidosis III Gamma; Mucopolysaccharidosis type VI, type VI (severe), and type VII; Mucopolysaccharidosis, MPS-I-H/S, MPS-II, MPS-III-A, MPS-III-B, MPS-III-C, MPS-IV-A, MPS-IV-B; Retinitis Pigmentosa 73; Gangliosidosis GM1 type1 (with cardiac involvement) 3; Multicentric osteolysis nephropathy; Multicentric osteolysis, nodulosis and arthropathy; Multiple congenital anomalies; Atrial septal defect 2; Multiple congenital anomalies-hypotonia-seizures syndrome 3; Multiple Cutaneous and Mucosal Venous Malformations; Multiple endocrine neoplasia, types land 4; Multiple epiphyseal dysplasia 5 or Dominant; Multiple gastrointestinal atresias; Multiple pterygium syndrome Escobar type; Multiple sulfatase deficiency; Multiple synostoses syndrome 3; Muscle AMP guanine oxidase deficiency; Muscle eye brain disease; Muscular dystrophy, congenital, megaconial type; Myasthenia, familial infantile, 1; Myasthenic Syndrome, Congenital, 11, associated with acetylcholine receptor deficiency; Myasthenic Syndrome, Congenital, 17, 2A (slow-channel), 4B (fast-channel), and without tubular aggregates; Myeloperoxidase deficiency; MYH-associated polyposis; Endometrial carcinoma; Myocardial infarction 1; Myoclonic dystonia; Myoclonic-Atonic Epilepsy; Myoclonus with epilepsy with ragged red fibers; Myofibrillar myopathy 1 and ZASP-related; Myoglobinuria, acute recurrent, autosomal recessive; Myoneural gastrointestinal encephalopathy syndrome; Cerebellar ataxia infantile with progressive external ophthalmoplegia; Mitochondrial DNA depletion syndrome 4B, MNGIE type; Myopathy, centronuclear, 1, congenital, with excess of muscle spindles, distal, 1, lactic acidosis, and sideroblastic anemia 1, mitochondrial progressive with congenital cataract, hearing loss, and developmental delay, and tubular aggregate, 2; Myopia 6; Myosclerosis, autosomal recessive; Myotonia congenital; Congenital myotonia, autosomal dominant and recessive forms; Nail-patella syndrome; Nance-Horan syndrome; Nanophthalmos 2; Navajo neurohepatopathy; Nemaline myopathy 3 and 9; Neonatal hypotonia; Intellectual disability; Seizures; Delayed speech and language development; Mental retardation, autosomal dominant 31; Neonatal intrahepatic cholestasis caused by citrin deficiency; Nephrogenic diabetes insipidus, Nephrogenic diabetes insipidus, X-linked; Nephrolithiasis/osteoporosis, hypophosphatemic, 2; Nephronophthisis 13, 15 and 4; Infertility; Cerebello-oculo-renal syndrome (nephronophthisis, oculomotor apraxia and cerebellar abnormalities); Nephrotic syndrome, type 3, type 5, with or without ocular abnormalities, type 7, and type 9; Nestor-Guillermo progeria syndrome; Neu-Laxova syndrome 1; Neurodegeneration with brain iron accumulation 4 and 6; Neuroferritinopathy; Neurofibromatosis, type 1and type 2; Neurofibrosarcoma; Neurohypophyseal diabetes insipidus; Neuropathy, Hereditary Sensory, Type IC; Neutral 1 amino acid transport defect; Neutral lipid storage disease with myopathy; Neutrophil immunodeficiency syndrome; Nicolaides-Baraitser syndrome; Niemann-Pick disease type C1, C2, type A, and type C1, adult form; Non-ketotic hyperglycinemia; Noonan syndrome 1 and 4, LEOPARD syndrome 1; Noonan syndrome-like disorder with or without juvenile myelomonocytic leukemia; Normokalemic periodic paralysis, potassium-sensitive; Norum disease; Epilepsy, Hearing Loss, And Mental Retardation Syndrome; Mental Retardation, X-Linked 102 and syndromic 13; Obesity; Ocular albinism, type I; Oculocutaneous albinism type 1B, type 3, and type 4; Oculodentodigital dysplasia; Odontohypophosphatasia; Odontotrichomelic syndrome; Oguchi disease; Oligodontia-colorectal cancer syndrome; Opitz G/BBB syndrome; Optic atrophy 9; Oral-facial-digital syndrome; Ornithine aminotransferase deficiency; Orofacial cleft 11 and 7, Cleft lip/palate-ectodermal dysplasia syndrome; Orstavik Lindemann Solberg syndrome; Osteoarthritis with mild chondrodysplasia; Osteochondritis dissecans; Osteogenesis imperfecta type 12, type 5, type 7, type 8, type I, type III, with normal sclerae, dominant form, recessive perinatal lethal; Osteopathia striata with cranial sclerosis; Osteopetrosis autosomal dominant type 1 and 2, recessive 4, recessive 1, recessive 6; Osteoporosis with pseudoglioma; Oto-palato-digital syndrome, types I and II; Ovarian dysgenesis 1; Ovarioleukodystrophy; Pachyonychia congenita 4 and type 2; Paget disease of bone, familial; Pallister-Hall syndrome; Palmoplantar keratoderma, nonepidermolytic, focal or diffuse; Pancreatic agenesis and congenital heart disease; Papillon-Lef\xc3\xa8vre syndrome; Paragangliomas 3; Paramyotonia congenita of von Eulenburg; Parathyroid carcinoma; Parkinson disease 14, 15, 19 (juvenile-onset), 2, 20 (early-onset), 6, (autosomal recessive early-onset, and 9; Partial albinism; Partial hypoxanthine-guanine phosphoribosyltransferase deficiency; Patterned dystrophy of retinal pigment epithelium; PC-K6a; Pelizaeus-Merzbacher disease; Pendred syndrome; Peripheral demyelinating neuropathy, central dysmyelination; Hirschsprung disease; Permanent neonatal diabetes mellitus; Diabetes mellitus, permanent neonatal, with neurologic features; Neonatal insulin-dependent diabetes mellitus; Maturity-onset diabetes of the young, type 2; Peroxisome biogenesis disorder 14B, 2A, 4A, 5B, 6A, 7A, and 7B; Perrault syndrome 4; Perry syndrome; Persistent hyperinsulinemic hypoglycemia of infancy; familial hyperinsulinism; Phenotypes; Phenylketonuria; Pheochromocytoma; Hereditary Paraganglioma-Pheochromocytoma Syndromes; Paragangliomas 1; Carcinoid tumor of intestine; Cowden syndrome 3; Phosphoglycerate dehydrogenase deficiency; Phosphoglycerate kinase 1 deficiency; Photosensitive trichothiodystrophy; Phytanic acid storage disease; Pick disease; Pierson syndrome; Pigmentary retinal dystrophy; Pigmented nodular adrenocortical disease, primary, 1; Pilomatrixoma; Pitt-Hopkins syndrome; Pituitary dependent hypercortisolism; Pituitary hormone deficiency, combined 1, 2, 3, and 4; Plasminogen activator inhibitor type 1 deficiency; Plasminogen deficiency, type I; Platelet-type bleeding disorder 15 and 8; Poikiloderma, hereditary fibrosing, with tendon contractures, myopathy, and pulmonary fibrosis; Polycystic kidney disease 2, adult type, and infantile type; Polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy; Polyglucosan body myopathy 1 with or without immunodeficiency; Polymicrogyria, asymmetric, bilateral frontoparietal; Polyneuropathy, hearing loss, ataxia, retinitis pigmentosa, and cataract; Pontocerebellar hypoplasia type 4; Popliteal pterygium syndrome; Porencephaly 2; Porokeratosis 8, disseminated superficial actinic type; Porphobilinogen synthase deficiency; Porphyria cutanea tarda; Posterior column ataxia with retinitis pigmentosa; Posterior polar cataract type 2; Prader-Willi-like syndrome; Premature ovarian failure 4, 5, 7, and 9; Primary autosomal recessive microcephaly 10, 2, 3, and 5; Primary ciliary dyskinesia 24; Primary dilated cardiomyopathy; Left ventricular noncompaction 6; 4, Left ventricular noncompaction 10; Paroxysmal atrial fibrillation; Primary hyperoxaluria, type I, type, and type III; Primary hypertrophic osteoarthropathy, autosomal recessive 2; Primary hypomagnesemia; Primary open angle glaucoma juvenile onset 1; Primary pulmonary hypertension; Primrose syndrome; Progressive familial heart block type 1B; Progressive familial intrahepatic cholestasis 2 and 3; Progressive intrahepatic cholestasis; Progressive myoclonus epilepsy with ataxia; Progressive pseudorheumatoid dysplasia; Progressive sclerosing poliodystrophy; Prolidase deficiency; Proline dehydrogenase deficiency; Schizophrenia 4; Properdin deficiency, X-linked; Propionic academia; Proprotein convertase 1/3 deficiency; Prostate cancer, hereditary, 2; Protan defect; Proteinuria; Finnish congenital nephrotic syndrome; Proteus syndrome; Breast adenocarcinoma; Pseudoachondroplastic spondyloepiphyseal dysplasia syndrome; Pseudohypoaldosteronism type 1 autosomal dominant and recessive and type 2; Pseudohypoparathyroidism type 1A, Pseudopseudohypoparathyroidism; Pseudoneonatal adrenoleukodystrophy; Pseudoprimary hyperaldosteronism; Pseudoxanthoma elasticum; Generalized arterial calcification of infancy 2; Pseudoxanthoma elasticum-like disorder with multiple coagulation factor deficiency; Psoriasis susceptibility 2; PTEN hamartoma tumor syndrome; Pulmonary arterial hypertension related to hereditary hemorrhagic telangiectasia; Pulmonary Fibrosis And/Or Bone Marrow Failure, Telomere-Related, 1 and 3; Pulmonary hypertension, primary, 1, with hereditary hemorrhagic telangiectasia; Purine-nucleoside phosphorylase deficiency; Pyruvate carboxylase deficiency; Pyruvate dehydrogenase E1-alpha deficiency; Pyruvate kinase deficiency of red cells; Raine syndrome; Rasopathy; Recessive dystrophic epidermolysis bullosa; Nail disorder, nonsyndromic congenital, 8; Reifenstein syndrome; Renal adysplasia; Renal carnitine transport defect; Renal coloboma syndrome; Renal dysplasia; Renal dysplasia, retinal pigmentary dystrophy, cerebellar ataxia and skeletal dysplasia; Renal tubular acidosis, distal, autosomal recessive, with late-onset sensorineural hearing loss, or with hemolytic anemia; Renal tubular acidosis, proximal, with ocular abnormalities and mental retardation; Retinal cone dystrophy 3B; Retinitis pigmentosa; Retinitis pigmentosa 10, 11, 12, 14, 15, 17, and 19; Retinitis pigmentosa 2, 20, 25, 35, 36, 38, 39, 4, 40, 43, 45, 48, 66, 7, 70, 72; Retinoblastoma; Rett disorder; Rhabdoid tumor predisposition syndrome 2; Rhegmatogenous retinal detachment, autosomal dominant; Rhizomelic chondrodysplasia punctata type 2 and type 3; Roberts-SC phocomelia syndrome; Robinow Sorauf syndrome; Robinow syndrome, autosomal recessive, autosomal recessive, with brachy-syn-polydactyly; Rothmund-Thomson syndrome; Rapadilino syndrome; RRM2B-related mitochondrial disease; Rubinstein-Taybi syndrome; Salla disease; Sandhoff disease, adult and infantil types; Sarcoidosis, early-onset; Blau syndrome; Schindler disease, type 1; Schizencephaly; Schizophrenia 15; Schneckenbecken dysplasia; Schwannomatosis 2; Schwartz Jampel syndrome type 1; Sclerocornea, autosomal recessive; Sclerosteosis; Secondary hypothyroidism; Segawa syndrome, autosomal recessive; Senior-Loken syndrome 4 and 5; Sensory ataxic neuropathy, dysarthria, and ophthalmoparesis; Sepiapterin reductase deficiency; SeSAME syndrome; Severe combined immunodeficiency due to ADA deficiency, with microcephaly, growth retardation, and sensitivity to ionizing radiation, atypical, autosomal recessive, T cell-negative, B cell-positive, NK cell-negative of NK-positive; Severe congenital neutropenia; Severe congenital neutropenia 3, autosomal recessive or dominant; Severe congenital neutropenia and 6, autosomal recessive; Severe myoclonic epilepsy in infancy; Generalized epilepsy with febrile seizures plus, types 1 and 2; Severe X-linked myotubular myopathy; Short QT syndrome 3; Short stature with nonspecific skeletal abnormalities; Short stature, auditory canal atresia, mandibular hypoplasia, skeletal abnormalities; Short stature, onychodysplasia, facial dysmorphism, and hypotrichosis; Primordial dwarfism; Short-rib thoracic dysplasia 11 or 3 with or without polydactyly; Sialidosis type I and II; Silver spastic paraplegia syndrome; Slowed nerve conduction velocity, autosomal dominant; Smith-Lemli-Opitz syndrome; Snyder Robinson syndrome; Somatotroph adenoma; Prolactinoma; familial, Pituitary adenoma predisposition; Sotos syndrome 1 or 2; Spastic ataxia 5, autosomal recessive, Charlevoix-Saguenay type, 1,10, or 11, autosomal recessive; Amyotrophic lateral sclerosis type 5; Spastic paraplegia 15, 2, 3, 35, 39, 4, autosomal dominant, 55, autosomal recessive, and 5A; Bile acid synthesis defect, congenital, 3; Spermatogenic failure 11, 3, and 8; Spherocytosis types 4 and 5; Spheroid body myopathy; Spinal muscular atrophy, lower extremity predominant 2, autosomal dominant; Spinal muscular atrophy, type II; Spinocerebellar ataxia 14, 21, 35, 40,and 6; Spinocerebellar ataxia autosomal recessive 1 and 16; Splenic hypoplasia; Spondylocarpotarsal synostosis syndrome; Spondylocheirodysplasia, Ehlers-Danlos syndrome-like, with immune dysregulation, Aggrecan type, with congenital joint dislocations, short limb-hand type, Sedaghatian type, with cone-rod dystrophy, and Kozlowski type; Parastremmatic dwarfism; Stargardt disease 1; Cone-rod dystrophy 3; Stickler syndrome type 1; Kniest dysplasia; Stickler syndrome, types 1(nonsyndromic ocular) and 4; Sting-associated vasculopathy, infantile-onset; Stormorken syndrome; Sturge-Weber syndrome, Capillary malformations, congenital, 1; Succinyl-CoA acetoacetate transferase deficiency; Sucrase-isomaltase deficiency; Sudden infant death syndrome; Sulfite oxidase deficiency, isolated; Supravalvar aortic stenosis; Surfactant metabolism dysfunction, pulmonary, 2 and 3; Symphalangism, proximal, 1b; Syndactyly Cenani Lenz type; Syndactyly type 3; Syndromic X-linked mental retardation 16; Talipes equinovarus; Tangier disease; TARP syndrome; Tay-Sachs disease, B1 variant, Gm2-gangliosidosis (adult), Gm2-gangliosidosis (adult-onset); Temtamy syndrome; Tenorio Syndrome; Terminal osseous dysplasia; Testosterone 17-beta-dehydrogenase deficiency; Tetraamelia, autosomal recessive; Tetralogy of Fallot; Hypoplastic left heart syndrome 2; Truncus arteriosus; Malformation of the heart and great vessels; Ventricular septal defect 1; Thiel-Behnke corneal dystrophy; Thoracic aortic aneurysms and aortic dissections; Marfanoid habitus; Three M syndrome 2; Thrombocytopenia, platelet dysfunction, hemolysis, and imbalanced globin synthesis; Thrombocytopenia, X-linked; Thrombophilia, hereditary, due to protein C deficiency, autosomal dominant and recessive; Thyroid agenesis; Thyroid cancer, follicular; Thyroid hormone metabolism, abnormal; Thyroid hormone resistance, generalized, autosomal dominant; Thyrotoxic periodic paralysis and Thyrotoxic periodic paralysis 2; Thyrotropin-releasing hormone resistance, generalized; Timothy syndrome; TNF receptor-associated periodic fever syndrome (TRAPS); Tooth agenesis, selective, 3 and 4; Torsades de pointes; Townes-Brocks-branchiootorenal-like syndrome; Transient bullous dermolysis of the newborn; Treacher collins syndrome 1; Trichomegaly with mental retardation, dwarfism and pigmentary degeneration of retina; Trichorhinophalangeal dysplasia type I; Trichorhinophalangeal syndrome type 3; Trimethylaminuria; Tuberous sclerosis syndrome; Lymphangiomyomatosis; Tuberous sclerosis 1 and 2; Tyrosinase-negative oculocutaneous albinism; Tyrosinase-positive oculocutaneous albinism; Tyrosinemia type I; UDPglucose-4-epimerase deficiency; Ullrich congenital muscular dystrophy; Ulna and fibula absence of with severe limb deficiency; Upshaw-Schulman syndrome; Urocanate hydratase deficiency; Usher syndrome, types 1, 1B, 1D, 1G, 2A, 2C, and 2D; Retinitis pigmentosa 39; UV-sensitive syndrome; Van der Woude syndrome; Van Maldergem syndrome 2; Hennekam lymphangiectasia-lymphedema syndrome 2; Variegate porphyria; Ventriculomegaly with cystic kidney disease; Verheij syndrome; Very long chain acyl-CoA dehydrogenase deficiency; Vesicoureteral reflux 8; Visceral heterotaxy 5, autosomal; Visceral myopathy; Vitamin D-dependent rickets, types land 2; Vitelliform dystrophy; von Willebrand disease type 2M and type 3; Waardenburg syndrome type 1, 4C, and 2E (with neurologic involvement); Klein-Waardenberg syndrome; Walker-Warburg congenital muscular dystrophy; Warburg micro syndrome 2 and 4; Warts, hypogammaglobulinemia, infections, and myelokathexis; Weaver syndrome; Weill-Marchesani syndrome 1 and 3; Weill-Marchesani-like syndrome; Weissenbacher-Zweymuller syndrome; Werdnig-Hoffmann disease; Charcot-Marie-Tooth disease; Werner syndrome; WFS1-Related Disorders; Wiedemann-Steiner syndrome; Wilson disease; Wolfram-like syndrome, autosomal dominant; Worth disease; Van Buchem disease type 2; Xeroderma pigmentosum, complementation group b, group D, group E, and group G; X-linked agammaglobulinemia; X-linked hereditary motor and sensory neuropathy; X-linked ichthyosis with steryl-sulfatase deficiency; X-linked periventricular heterotopia; Oto-palato-digital syndrome, type I; X-linked severe combined immunodeficiency; Zimmermann-Laband syndrome and Zimmermann-Laband syndrome 2; and Zonular pulverulent cataract 3.

In a particular aspect, the instant disclosure provides TPRT-based methods for the treatment of a subject diagnosed with an expansion repeat disorder (also known as a repeat expansion disorder or a trinucleotide repeat disorder). Expansion repeat disorders occur when microsatellite repeats expand beyond a threshold length. Currently, at least 30 genetic diseases are believed to be caused by repeat expansions. Scientific understanding of this diverse group of disorders came to lights in the early 1990's with the discovery that trinucleotide repeats underlie several major inherited conditions, including Fragile X, Spinal and Bulbar Muscular Atrophy, Myotonic Dystrophy, and Huntington's disease (Nelson et al, “The unstable repeats—three evolving faces of neurological disease,” Neuron, Mar. 6, 2013, Vol. 77; 825-843, which is incorporated herein by reference), as well as Haw River Syndrome, Jacobsen Syndrome, Dentatorubral-pallidoluysian atrophy (DRPLA), Machado-Joseph disease, Synpolydactyly (SPD II), Hand-foot genital syndrome (HFGS), Cleidocranial dysplasia (CCD), Holoprosencephaly disorder (HPE), Congenital central hypventilation syndrome (CCHS), ARX-nonsyndromic X-linked mental retardation (XLMR), and Oculopharyngeal muscular dystrophy (OPMD) (see. Microsatellite repeat instability was found to be a hallmark of these conditions, as was anticipation—the phenomenon in which repeat expansion can occur with each successive generation, which leads to a more severe phenotype and earlier age of onset in the offspring. Repeat expansions are believed to cause diseases via several different mechanisms. Namely, expansions may interfere with cellular functioning at the level of the gene, the mRNA transcript, and/or the encoded protein. In some conditions, mutations act via a loss-of-function mechanism by silencing repeat-containing genes. In others, disease results from gain-of-function mechanisms, whereby either the mRNA transcript or protein takes on new, aberrant functions.

In one embodiment, a method of treating a trinucleotide repeat disorder is depicted in FIG. 23. In general, the approach involves using TPRT genome editing in combination with an extended gRNA that comprises a region that encodes a desired and healthy replacement trinucleotide repeat sequence that is intended to replace the endogenous diseased trinucleotide repeat sequence through the mechanism of the prime editing process. A schematic of an exemplary gRNA design for contracting trinucleotide repeat sequences and trinucleotide repeat contraction with TPRT genome editing is shown in FIG. 23.

Trinucleotide repeat expansion is associated with a number of human diseases, including Huntington's Disease, Fragile X syndrome, and Friedreich's ataxia. The most common trinucleotide repeat contains CAG triplets, though GAA triplets (Friedreich's ataxia) and CGG triplets (Fragile X syndrome) also occur. Inheriting a predisposition to expansion, or acquiring an already expanded parental allele, increases the likelihood of acquiring the disease. Pathogenic expansions of trinucleotide repeats could hypothetically be corrected using prime editing.

A region upstream of the repeat region can be nicked by an RNA-guided nuclease, then used to prime synthesis of a new DNA strand that contains a healthy number of repeats (which depends on the particular gene and disease), in accordance with the general mechanism outlined in FIG. 1E or FIG. 22. After the repeat sequence, a short stretch of homology is added that matches the identity of the sequence adjacent to the other end of the repeat (red strand). Invasion of the newly synthesized strand by the TPRT system, and subsequent replacement of the endogenous DNA with the newly synthesized flap, leads to a contracted repeat allele. The term “contracted” refers to a shortening of the length of the nucleotide repeat region, thereby resulting in repairing the trinucleotide repeat region.

In various embodiments, the desired nucleotide change can be a single nucleotide substitution (e.g., and transition or a transversion change), a deletion, or an insertion. For example, the desired nucleotide change can be (1) a G to T substitution, (2) a G to A substitution, (3) a G to C substitution, (4) a T to G substitution, (5) a T to A substitution, (6) a T to C substitution, (7) a C to G substitution, (8) a C to T substitution, (9) a C to A substitution, (10) an A to T substitution, (11) an A to G substitution, or (12) an A to C substitution.

In other embodiments, the desired nucleoid change can convert (1) a G:C basepair to a T:A basepair, (2) a G:C basepair to an A:T basepair, (3) a G:C basepair to C:G basepair, (4) a T:A basepair to a G:C basepair, (5) a T:A basepair to an A:T basepair, (6) a T:A basepair to a C:G basepair, (7) a C:G basepair to a G:C basepair, (8) a C:G basepair to a T:A basepair, (9) a C:G basepair to an A:T basepair, (10) an A:T basepair to a T:A basepair, (11) an A:T basepair to a G:C basepair, or (12) an A:T basepair to a C:G basepair.

In still other embodiments, the method introduces a desired nucleotide change that is an insertion. In certain cases, the insertion is at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 200, at least 300, at least 400, or at least 500 nucleotides in length.

In other embodiments, the method introduces a desired nucleotide change that is a deletion. In certain other cases, the deletion is at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 200, at least 300, at least 400, or at least 500 nucleotides in length.

In various embodiments, the desired nucleotide change corrects a disease-associated gene. The disease-associated gene can be associated with a monogenetic disorder selected from the group consisting of: Adenosine Deaminase (ADA) Deficiency; Alpha-1 Antitrypsin Deficiency; Cystic Fibrosis; Duchenne Muscular Dystrophy; Galactosemia; Hemochromatosis; Huntington's Disease; Maple Syrup Urine Disease; Marfan Syndrome; Neurofibromatosis Type 1; Pachyonychia Congenita; Phenylkeotnuria; Severe Combined Immunodeficiency; Sickle Cell Disease; Smith-Lemli-Opitz Syndrome; and Tay-Sachs Disease. In other embodiments, the disease-associated gene can be associated with a polygenic disorder selected from the group consisting of: heart disease; high blood pressure; Alzheimer's disease; arthritis; diabetes; cancer; and obesity.

The target nucleotide sequence may comprise a target sequence (e.g., a point mutation) associated with a disease, disorder, or condition. The target sequence may comprise a T to C (or A to G) point mutation associated with a disease, disorder, or condition, and wherein the deamination of the mutant C base results in mismatch repair-mediated correction to a sequence that is not associated with a disease, disorder, or condition. The target sequence may comprise a G to A (or C to T) point mutation associated with a disease, disorder, or condition, and wherein the deamination of the mutant A base results in mismatch repair-mediated correction to a sequence that is not associated with a disease, disorder, or condition. The target sequence may encode a protein, and where the point mutation is in a codon and results in a change in the amino acid encoded by the mutant codon as compared to a wild-type codon. The target sequence may also be at a splice site, and the point mutation results in a change in the splicing of an mRNA transcript as compared to a wild-type transcript. In addition, the target may be at a non-coding sequence of a gene, such as a promoter, and the point mutation results in increased or decreased expression of the gene.

Thus, in some aspects, the deamination of a mutant C results in a change of the amino acid encoded by the mutant codon, which in some cases can result in the expression of a wild-type amino acid. In other aspects, the deamination of a mutant A results in a change of the amino acid encoded by the mutant codon, which in some cases can result in the expression of a wild-type amino acid.

The methods described herein involving contacting a cell with a composition or rAAV particle can occur in vitro, ex vivo, or in vivo. In certain embodiments, the step of contacting occurs in a subject. In certain embodiments, the subject has been diagnosed with a disease, disorder, or condition.

In some embodiments, the methods disclosed herein involve contacting a mammalian cell with a composition or rAAV particle. In particular embodiments, the methods involve contacting a retinal cell, cortical cell or cerebellar cell.

The split Cas9 protein or split prime editor delivered using the methods described herein preferably have comparable activity compared to the original Cas9 protein or prime editor (i.e., unsplit protein delivered to a cell or expressed in a cell as a whole). For example, the split Cas9 protein or split prime editor retains at least 50% (e.g., at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or 100%) of the activity of the original Cas9 protein or prime editor. In some embodiments, the split Cas9 protein or split prime editor is more active (e.g., 2-fold, 5-fold, 10-fold, 100-fold, 1000-fold, or more) than that of an original Cas9 protein or prime editor.

The compositions described herein may be administered to a subject in need thereof in a therapeutically effective amount to treat and/or prevent a disease or disorder the subject is suffering from. Any disease or disorder that maybe treated and/or prevented using CRISPR/Cas9-based genome-editing technology may be treated by the split Cas9 protein or the split prime editor described herein. It is to be understood that, if the nucleotide sequences encoding the split Cas9 protein or the prime editor does not further encode a gRNA, a separate nucleic acid vector encoding the gRNA may be administered together with the compositions described herein.

Exemplary suitable diseases, disorders or conditions include, without limitation the disease or disorder is selected from the group consisting of: cystic fibrosis, phenylketonuria, epidermolytic hyperkeratosis (EHK), chronic obstructive pulmonary disease (COPD), Charcot-Marie-Toot disease type 4J, neuroblastoma (NB), von Willebrand disease (vWD), myotonia congenital, hereditary renal amyloidosis, dilated cardiomyopathy, hereditary lymphedema, familial Alzheimer's disease, prion disease, chronic infantile neurologic cutaneous articular syndrome (CINCA), congenital deafness, Niemann-Pick disease type C (NPC) disease, and desmin-related myopathy (DRM). In particular embodiments, the disease or condition is Niemann-Pick disease type C (NPC) disease.

In some embodiments, the disease, disorder or condition is associated with a point mutation in an NPC gene, a DNMT1 gene, a PCSK9 gene, or a TMC1 gene. In certain embodiments, the point mutation is a T3182C mutation in NPC, which results in an 11061T amino acid substitution.

In certain embodiments, the point mutation is an A545G mutation in TMC1, which results in a Y182C amino acid substitution. TMC1 encodes a protein that forms mechanosensitive ion channels in sensory hair cells of the inner ear and is required for normal auditory function. The Y182C amino acid substitution is associated with congenital deafness.

In some embodiments, the disease, disorder or condition is associated with a point mutation that generates a stop codon, for example, a premature stop codon within the coding region of a gene.

Additional exemplary diseases, disorders and conditions include cystic fibrosis (see, e.g., Schwank et al., Functional repair of CFTR by CRISPR/Cas9 in intestinal stem cell organoids of cystic fibrosis patients. Cell stem cell. 2013; 13: 653-658; and Wu et. al., Correction of a genetic disease in mouse via use of CRISPR-Cas9. Cell stem cell. 2013; 13: 659-662, neither of which uses a deaminase fusion protein to correct the genetic defect); phenylketonuria—e.g., phenylalanine to serine mutation at position 835 (mouse) or 240 (human) or a homologous residue in phenylalanine hydroxylase gene (T>C mutation)—see, e.g., McDonald et al., Genomics. 1997; 39:402-405; Bernard-Soulier syndrome (BSS)—e.g., phenylalanine to serine mutation at position 55 or a homologous residue, or cysteine to arginine at residue 24 or a homologous residue in the platelet membrane glycoprotein IX (T>C mutation)—see, e.g., Noris et al., British Journal of Haematology. 1997; 97: 312-320, and Ali et al., Hematol. 2014; 93: 381-384; epidermolytic hyperkeratosis (EHK)—e.g., leucine to proline mutation at position 160 or 161 (if counting the initiator methionine) or a homologous residue in keratin 1 (T>C mutation)—see, e.g., Chipev et al., Cell. 1992; 70: 821-828, see also accession number P04264 in the UNIPROT database at www[dot]uniprot[dot]org; chronic obstructive pulmonary disease (COPD)—e.g., leucine to proline mutation at position 54 or 55 (if counting the initiator methionine) or a homologous residue in the processed form of α₁-antitrypsin or residue 78 in the unprocessed form or a homologous residue (T>C mutation)—see, e.g., Poller et al., Genomics. 1993; 17: 740-743, see also accession number P01011 in the UNIPROT database; Charcot-Marie-Toot disease type 4J—e.g., isoleucine to threonine mutation at position 41 or a homologous residue in FIG. 4 (T>C mutation)—see, e.g., Lenk et al., PLoS Genetics. 2011; 7: e1002104; neuroblastoma (NB)—e.g., leucine to proline mutation at position 197 or a homologous residue in Caspase-9 (T>C mutation)—see, e.g., Kundu et al., 3 Biotech. 2013, 3:225-234; von Willebrand disease (vWD)—e.g., cysteine to arginine mutation at position 509 or a homologous residue in the processed form of von Willebrand factor, or at position 1272 or a homologous residue in the unprocessed form of von Willebrand factor (T>C mutation)—see, e.g., Lavergne et al., Br. J. Haematol. 1992, see also accession number P04275 in the UNIPROT database; 82: 66-72; myotonia congenital—e.g., cysteine to arginine mutation at position 277 or a homologous residue in the muscle chloride channel gene CLCN1 (T>C mutation)—see, e.g., Weinberger et al., The J. of Physiology. 2012; 590: 3449-3464; hereditary renal amyloidosis—e.g., stop codon to arginine mutation at position 78 or a homologous residue in the processed form of apolipoprotein AII or at position 101 or a homologous residue in the unprocessed form (T>C mutation)—see, e.g., Yazaki et al., Kidney Int. 2003; 64: 11-16; dilated cardiomyopathy (DCM)—e.g., tryptophan to Arginine mutation at position 148 or a homologous residue in the FOXD4 gene (T>C mutation), see, e.g., Minoretti et. al., Int. J. of Mol. Med. 2007; 19: 369-372; hereditary lymphedema—e.g., histidine to arginine mutation at position 1035 or a homologous residue in VEGFR3 tyrosine kinase (A>G mutation), see, e.g., Irrthum et al., Am. J. Hum. Genet. 2000; 67: 295-301; familial Alzheimer's disease—e.g., isoleucine to valine mutation at position 143 or a homologous residue in presenilin1 (A>G mutation), see, e.g., Gallo et. al., J. Alzheimer's disease. 2011; 25: 425-431; Prion disease—e.g., methionine to valine mutation at position 129 or a homologous residue in prion protein (A>G mutation)—see, e.g., Lewis et. al., J. of General Virology. 2006; 87: 2443-2449; chronic infantile neurologic cutaneous articular syndrome (CINCA)—e.g., Tyrosine to Cysteine mutation at position 570 or a homologous residue in cryopyrin (A>G mutation)—see, e.g., Fujisawa et. al. Blood. 2007; 109: 2903-2911; and desmin-related myopathy (DRM)—e.g., arginine to glycine mutation at position 120 or a homologous residue in αβ crystallin (A>G mutation)—see, e.g., Kumar et al., J. Biol. Chem. 1999; 274: 24137-24141. The entire contents of all references and database entries is incorporated herein by reference.

V. Pharmaceutical Compositions

Other aspects of the present disclosure relate to pharmaceutical compositions comprising any of the various components of the prime editor (PE) system described herein (e.g., including, but not limited to, the napDNAbps, reverse transcriptases, fusion proteins (e.g., comprising napDNAbps and reverse transcriptases), prime editor guide RNAs, second strand nicking guideRNAs and complexes comprising fusion proteins and prime editor guide RNAs, as well as accessory elements, such as second strand nicking components and 5′ endogenous DNA flap removal endonucleases for helping to drive the prime editing process towards the edited product formation). Such compositions may include PE1, PE2, PE3, or PE3b.

The term “pharmaceutical composition”, as used herein, refers to a composition formulated for pharmaceutical use. In some embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable carrier. In some embodiments, the pharmaceutical composition comprises additional agents (e.g. for specific delivery, increasing half-life, or other therapeutic compounds).

As used here, the term “pharmaceutically-acceptable carrier” means a pharmaceutically-acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., organ, tissue or portion of the body). A pharmaceutically acceptable carrier is “acceptable” in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiologic pH, etc.). Some examples of materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids (23) serum component, such as serum albumin, HDL and LDL; (22) C2-C12 alcohols, such as ethanol; and (23) other non-toxic compatible substances employed in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservative and antioxidants can also be present in the formulation. The terms such as “excipient”, “carrier”, “pharmaceutically acceptable carrier” or the like are used interchangeably herein.

In some embodiments, the pharmaceutical composition is formulated for delivery to a subject, e.g., for gene editing. Suitable routes of administrating the pharmaceutical composition described herein include, without limitation: topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, gingival, intradental, intracochlear, transtympanic, intraorgan, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseus, periocular, intratumoral, intracerebral, and intracerebroventricular administration.

In some embodiments, the pharmaceutical composition described herein is administered locally to a diseased site (e.g., tumor site). In some embodiments, the pharmaceutical composition described herein is administered to a subject by injection, by means of a catheter, by means of a suppository, or by means of an implant, the implant being of a porous, non-porous, or gelatinous material, including a membrane, such as a sialastic membrane, or a fiber.

In other embodiments, the pharmaceutical composition described herein is delivered in a controlled release system. In one embodiment, a pump may be used (see, e.g., Langer, 1990, Science 249:1527-1533; Sefton, 1989, CRC Crit. Ref. Biomed. Eng. 14:201; Buchwald et al., 1980, Surgery 88:507; Saudek et al., 1989, N. Engl. J. Med. 321:574). In another embodiment, polymeric materials can be used. (See, e.g., Medical Applications of Controlled Release (Langer and Wise eds., CRC Press, Boca Raton, Fla., 1974); Controlled Drug Bioavailability, Drug Product Design and Performance (Smolen and Ball eds., Wiley, New York, 1984); Ranger and Peppas, 1983, Macromol. Sci. Rev. Macromol. Chem. 23:61. See also Levy et al., 1985, Science 228:190; During et al., 1989, Ann. Neurol. 25:351; Howard et al., 1989, J. Neurosurg. 71:105.) Other controlled release systems are discussed, for example, in Langer, supra.

In some embodiments, the pharmaceutical composition is formulated in accordance with routine procedures as a composition adapted for intravenous or subcutaneous administration to a subject, e.g., a human. In some embodiments, pharmaceutical composition for administration by injection are solutions in sterile isotonic aqueous buffer. Where necessary, the pharmaceutical can also include a solubilizing agent and a local anesthetic such as lignocaine to ease pain at the site of the injection. Generally, the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent. Where the pharmaceutical is to be administered by infusion, it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline. Where the pharmaceutical composition is administered by injection, an ampoule of sterile water for injection or saline can be provided so that the ingredients can be mixed prior to administration.

A pharmaceutical composition for systemic administration may be a liquid, e.g., sterile saline, lactated Ringer's or Hank's solution. In addition, the pharmaceutical composition can be in solid forms and re-dissolved or suspended immediately prior to use. Lyophilized forms are also contemplated.

The pharmaceutical composition can be contained within a lipid particle or vesicle, such as a liposome or microcrystal, which is also suitable for parenteral administration. The particles can be of any suitable structure, such as unilamellar or plurilamellar, so long as compositions are contained therein. Compounds can be entrapped in “stabilized plasmid-lipid particles” (SPLP) containing the fusogenic lipid dioleoylphosphatidylethanolamine (DOPE), low levels (5-10 mol %) of cationic lipid, and stabilized by a polyethyleneglycol (PEG) coating (Zhang Y. P. et al., Gene Ther. 1999, 6:1438-47). Positively charged lipids such as N-[1-(2,3-dioleoyloxi)propyl]-N,N,N-trimethyl-amoniummethylsulfate, or “DOTAP,” are particularly preferred for such particles and vesicles. The preparation of such lipid particles is well known. See, e.g., U.S. Pat. Nos. 4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; and 4,921,757; each of which is incorporated herein by reference.

The pharmaceutical composition described herein may be administered or packaged as a unit dose, for example. The term “unit dose” when used in reference to a pharmaceutical composition of the present disclosure refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent; i.e., carrier, or vehicle.

Further, the pharmaceutical composition can be provided as a pharmaceutical kit comprising (a) a container containing a compound of the invention in lyophilized form and (b) a second container containing a pharmaceutically acceptable diluent (e.g., sterile water) for injection. The pharmaceutically acceptable diluent can be used for reconstitution or dilution of the lyophilized compound of the invention. Optionally associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration.

In another aspect, an article of manufacture containing materials useful for the treatment of the diseases described above is included. In some embodiments, the article of manufacture comprises a container and a label. Suitable containers include, for example, bottles, vials, syringes, and test tubes. The containers may be formed from a variety of materials such as glass or plastic. In some embodiments, the container holds a composition that is effective for treating a disease described herein and may have a sterile access port. For example, the container may be an intravenous solution bag or a vial having a stopper pierceable by a hypodermic injection needle. The active agent in the composition is a compound of the invention. In some embodiments, the label on or associated with the container indicates that the composition is used for treating the disease of choice. The article of manufacture may further comprise a second container comprising a pharmaceutically-acceptable buffer, such as phosphate-buffered saline, Ringer's solution, or dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.

VI. Viral Delivery Methods

In some aspects, the invention provides methods comprising delivering one or more polynucleotides, such as or one or more vectors as described herein encoding one or more components of the prime editor (PE) system described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell. In some aspects, the invention further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells. In some embodiments, a base editor as described herein in combination with (and optionally complexed with) a guide sequence is delivered to a cell. Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in mammalian cells or target tissues. Such methods can be used to administer nucleic acids encoding components of a base editor to cells in culture, or in a host organism. Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell. For a review of gene therapy procedures, see Anderson, Science 256:808-813 (1992); Nabel & Feigner, TIBTECH 11:211-217 (1993); Mitani & Caskey, TIBTECH 11:162-166 (1993); Dillon, TIBTECH 11:167-175 (1993); Miller, Nature 357:455-460 (1992); Van Brunt, Biotechnology 6(10):1149-1154 (1988); Vigne, Restorative Neurology and Neuroscience 8:35-36 (1995); Kremer & Perricaudet, British Medical Bulletin 51(1):31-44 (1995); Haddada et al., in Current Topics in Microbiology and Immunology Doerfler and Bihm (eds) (1995); and Yu et al., Gene Therapy 1:13-26 (1994).

Methods of non-viral delivery of nucleic acids include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™ and Lipofectin™). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Feigner, WO 91/17424; WO 91/16024. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration).

The preparation of lipid:nucleic acid complexes, including targeted liposomes such as immunolipid complexes, is well known to one of skill in the art (see, e.g., Crystal, Science 270:404-410 (1995); Blaese et al., Cancer Gene Ther. 2:291-297 (1995); Behr et al., Bioconjugate Chem. 5:382-389 (1994); Remy et al., Bioconjugate Chem. 5:647-654 (1994); Gao et al., Gene Therapy 2:710-722 (1995); Ahmad et al., Cancer Res. 52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787).

The use of RNA or DNA viral based systems for the delivery of nucleic acids take advantage of highly evolved processes for targeting a virus to specific cells in the body and trafficking the viral payload to the nucleus. Viral vectors can be administered directly to patients (in vivo) or they can be used to treat cells in vitro, and the modified cells may optionally be administered to patients (ex vivo). Conventional viral based systems could include retroviral, lentivirus, adenoviral, adeno-associated and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.

The tropism of a viruses can be altered by incorporating foreign envelope proteins, expanding the potential target population of target cells. Lentiviral vectors are retroviral vectors that are able to transduce or infect non-dividing cells and typically produce high viral titers. Selection of a retroviral gene transfer system would therefore depend on the target tissue. Retroviral vectors are comprised of cis-acting long terminal repeats with packaging capacity for up to 6-10 kb of foreign sequence. The minimum cis-acting LTRs are sufficient for replication and packaging of the vectors, which are then used to integrate the therapeutic gene into the target cell to provide permanent transgene expression. Widely used retroviral vectors include those based upon murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), Simian Immuno deficiency virus (SIV), human immuno deficiency virus (HIV), and combinations thereof (see, e.g., Buchscher et al., J. Virol. 66:2731-2739 (1992); Johann et al., J. Virol. 66:1635-1640 (1992); Sommnerfelt et al., Virol. 176:58-59 (1990); Wilson et al., J. Virol. 63:2374-2378 (1989); Miller et al., J. Virol. 65:2220-2224 (1991); PCT/US94/05700). In applications where transient expression is preferred, adenoviral based systems may be used. Adenoviral based vectors are capable of very high transduction efficiency in many cell types and do not require cell division. With such vectors, high titer and levels of expression have been obtained. This vector can be produced in large quantities in a relatively simple system. Adeno-associated virus (“AAV”) vectors may also be used to transduce cells with target nucleic acids, e.g., in the in vitro production of nucleic acids and peptides, and for in vivo and ex vivo gene therapy procedures (see, e.g., West et al., Virology 160:38-47 (1987); U.S. Pat. No. 4,797,368; WO 93/24641; Kotin, Human Gene Therapy 5:793-801 (1994); Muzyczka, J. Clin. Invest. 94:1351 (1994). Construction of recombinant AAV vectors are described in a number of publications, including U.S. Pat. No. 5,173,414; Tratschin et al., Mol. Cell. Biol. 5:3251-3260 (1985); Tratschin, et al., Mol. Cell. Biol. 4:2072-2081 (1984); Hermonat & Muzyczka, PNAS 81:6466-6470 (1984); and Samulski et al., J. Virol. 63:03822-3828 (1989).

Packaging cells are typically used to form virus particles that are capable of infecting a host cell. Such cells include 293 cells, which package adenovirus, and ψ2 cells or PA317 cells, which package retrovirus. Viral vectors used in gene therapy are usually generated by producing a cell line that packages a nucleic acid vector into a viral particle. The vectors typically contain the minimal viral sequences required for packaging and subsequent integration into a host, other viral sequences being replaced by an expression cassette for the polynucleotide(s) to be expressed. The missing viral functions are typically supplied in trans by the packaging cell line. For example, AAV vectors used in gene therapy typically only possess ITR sequences from the AAV genome which are required for packaging and integration into the host genome. Viral DNA is packaged in a cell line, which contains a helper plasmid encoding the other AAV genes, namely rep and cap, but lacking ITR sequences. The cell line may also be infected with adenovirus as a helper. The helper virus promotes replication of the AAV vector and expression of AAV genes from the helper plasmid. The helper plasmid is not packaged in significant amounts due to a lack of ITR sequences. Contamination with adenovirus can be reduced by, e.g., heat treatment to which adenovirus is more sensitive than AAV. Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US20030087817, incorporated herein by reference.

VII. Kits, Vectors, Cells, and Delivery

Some aspects of this disclosure provide kits comprising a nucleic acid construct comprising a nucleotide sequence encoding the various components of the prime editor (PE) system described herein (e.g., including, but not limited to, the napDNAbps, reverse transcriptases, fusion proteins (e.g., comprising napDNAbps and reverse transcriptases), prime editor guide RNAs, and complexes comprising fusion proteins and prime editor guide RNAs, as well as accessory elements, such as second strand nicking components and 5′ endogenous DNA flap removal endonucleases for helping to drive the prime editing process towards the edited product formation). In some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of the prime editor (PE) system components.

Some aspects of this disclosure provide kits comprising one or more nucleic acid constructs encoding the various components of the prime editor (PE) system described herein, e.g., the comprising a nucleotide sequence encoding the components of the prime editor (PE) system capable of modifying a target DNA sequence. In some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of the prime editor (PE) system components.

Some aspects of this disclosure provides kits comprising a nucleic acid construct, comprising (a) a nucleotide sequence encoding a napDNAbp (e.g., a Cas9 domain) fused to a reverse transcriptase and (b) a heterologous promoter that drives expression of the sequence of (a).

Kits

The compositions of the present disclosure may be assembled into kits. In some embodiments, the kit comprises nucleic acid vectors for the expression of the prime editors described herein. In other embodiments, the kit further comprises appropriate guide nucleotide sequences (e.g., PEgRNAs and second-site gRNAs) or nucleic acid vectors for the expression of such guide nucleotide sequences, to target the Cas9 protein or prime editor to the desired target sequence.

The kit described herein may include one or more containers housing components for performing the methods described herein and optionally instructions for use. Any of the kit described herein may further comprise components needed for performing the assay methods. Each component of the kits, where applicable, may be provided in liquid form (e.g., in solution) or in solid form, (e.g., a dry powder). In certain cases, some of the components may be reconstitutable or otherwise processible (e.g., to an active form), for example, by the addition of a suitable solvent or other species (for example, water), which may or may not be provided with the kit.

In some embodiments, the kits may optionally include instructions and/or promotion for use of the components provided. As used herein, “instructions” can define a component of instruction and/or promotion, and typically involve written instructions on or associated with packaging of the disclosure. Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit, for example, audiovisual (e.g., videotape, DVD, etc.), Internet, and/or web-based communications, etc. The written instructions may be in a form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which can also reflect approval by the agency of manufacture, use or sale for animal administration. As used herein, “promoted” includes all methods of doing business including methods of education, hospital and other clinical instruction, scientific inquiry, drug discovery or development, academic research, pharmaceutical industry activity including pharmaceutical sales, and any advertising or other promotional activity including written, oral and electronic communication of any form, associated with the disclosure. Additionally, the kits may include other components depending on the specific application, as described herein.

The kits may contain any one or more of the components described herein in one or more containers. The components may be prepared sterilely, packaged in a syringe and shipped refrigerated. Alternatively it may be housed in a vial or other container for storage. A second container may have other components prepared sterilely. Alternatively the kits may include the active agents premixed and shipped in a vial, tube, or other container.

The kits may have a variety of forms, such as a blister pouch, a shrink wrapped pouch, a vacuum sealable pouch, a sealable thermoformed tray, or a similar pouch or tray form, with the accessories loosely packed within the pouch, one or more tubes, containers, a box or a bag. The kits may be sterilized after the accessories are added, thereby allowing the individual accessories in the container to be otherwise unwrapped. The kits can be sterilized using any appropriate sterilization techniques, such as radiation sterilization, heat sterilization, or other sterilization methods known in the art. The kits may also include other components, depending on the specific application, for example, containers, cell media, salts, buffers, reagents, syringes, needles, a fabric, such as gauze, for applying or removing a disinfecting agent, disposable gloves, a support for the agents prior to administration, etc. Some aspects of this disclosure provide kits comprising a nucleic acid construct comprising a nucleotide sequence encoding the various components of the prime editing system described herein (e.g., including, but not limited to, the napDNAbps, reverse transcriptases, polymerases, fusion proteins (e.g., comprising napDNAbps and reverse transcriptases (or more broadly, polymerases), extended guide RNAs, and complexes comprising fusion proteins and extended guide RNAs, as well as accessory elements, such as second strand nicking components (e.g., second strand nicking gRNA) and 5′ endogenous DNA flap removal endonucleases for helping to drive the prime editing process towards the edited product formation). In some embodiments, the nucleotide sequence(s) comprises a heterologous promoter (or more than a single promoter) that drives expression of the prime editing system components.

Other aspects of this disclosure provide kits comprising one or more nucleic acid constructs encoding the various components of the prime editing system described herein, e.g., the comprising a nucleotide sequence encoding the components of the prime editing system capable of modifying a target DNA sequence. In some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of the prime editing system components.

Cells

Cells that may contain any of the compositions described herein include prokaryotic cells and eukaryotic cells. The methods described herein are used to deliver a Cas9 protein or a prime editor into a eukaryotic cell (e.g., a mammalian cell, such as a human cell). In some embodiments, the cell is in vitro (e.g., cultured cell. In some embodiments, the cell is in vivo (e.g., in a subject such as a human subject). In some embodiments, the cell is ex vivo (e.g., isolated from a subject and may be administered back to the same or a different subject).

Mammalian cells of the present disclosure include human cells, primate cells (e.g., vero cells), rat cells (e.g., GH3 cells, OC23 cells) or mouse cells (e.g., MC3T3 cells). There are a variety of human cell lines, including, without limitation, human embryonic kidney (HEK) cells, HeLa cells, cancer cells from the National Cancer Institute's 60 cancer cell lines (NCI60), DU145 (prostate cancer) cells, Lncap (prostate cancer) cells, MCF-7 (breast cancer) cells, MDA-MB-438 (breast cancer) cells, PC3 (prostate cancer) cells, T47D (breast cancer) cells, THP-1 (acute myeloid leukemia) cells, U87 (glioblastoma) cells, SHSY5Y human neuroblastoma cells (cloned from a myeloma) and Saos-2 (bone cancer) cells. In some embodiments, rAAV vectors are delivered into human embryonic kidney (HEK) cells (e.g., HEK 293 or HEK 293T cells). In some embodiments, rAAV vectors are delivered into stem cells (e.g., human stem cells) such as, for example, pluripotent stem cells (e.g., human pluripotent stem cells including human induced pluripotent stem cells (hiPSCs)). A stem cell refers to a cell with the ability to divide for indefinite periods in culture and to give rise to specialized cells. A pluripotent stem cell refers to a type of stem cell that is capable of differentiating into all tissues of an organism, but not alone capable of sustaining full organismal development. A human induced pluripotent stem cell refers to a somatic (e.g., mature or adult) cell that has been reprogrammed to an embryonic stem cell-like state by being forced to express genes and factors important for maintaining the defining properties of embryonic stem cells (see, e.g., Takahashi and Yamanaka, Cell 126 (4): 663-76, 2006, incorporated by reference herein). Human induced pluripotent stem cell cells express stem cell markers and are capable of generating cells characteristic of all three germ layers (ectoderm, endoderm, mesoderm).

Additional non-limiting examples of cell lines that may be used in accordance with the present disclosure include 293-T, 293-T, 3T3, 4T1, 721, 9L, A-549, A172, A20, A253, A2780, A2780ADR, A2780cis, A431, ALC, B16, B35, BCP-1, BEAS-2B, bEnd.3, BHK-21, BR 293, BxPC3, C2C12, C3H-10T1/2, C6, C6/36, Cal-27, CGR8, CHO, CML T1, CMT, COR-L23, COR-L23/5010, COR-L23/CPR, COR-L23/R23, COS-7, COV-434, CT26, D17, DH82, DU145, DuCaP, E14Tg2a, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, Hepa1c1c7, High Five cells, HL-60, HMEC, HT-29, HUVEC, J558L cells, Jurkat, JY cells, K562 cells, KCL22, KG1, Ku812, KYO1, LNCap, Ma-Mel 1, 2, 3 . . . 48, MC-38, MCF-10A, MCF-7, MDA-MB-231, MDA-MB-435, MDA-MB-468, MDCK II, MG63, MONO-MAC 6, MOR/0.2R, MRC5, MTD-1A, MyEnd, NALM-1, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NW-145, OPCN/OPCT Peer, PNT-1A/PNT 2, PTK2, Raji, RBL cells, RenCa, RIN-5F, RMA/RMAS, S2, Saos-2 cells, Sf21, Sf9, SiHa, SKBR3, SKOV-3, T-47D, T2, T84, THP1, U373, U87, U937, VCaP, WM39, WT-49, X63, YAC-1 and YAR cells.

Some aspects of this disclosure provide cells comprising any of the constructs disclosed herein. In some embodiments, a host cell is transiently or non-transiently transfected with one or more vectors described herein. In some embodiments, a cell is transfected as it naturally occurs in a subject. In some embodiments, a cell that is transfected is taken from a subject. In some embodiments, the cell is derived from cells taken from a subject, such as a cell line. A wide variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, HeLa-S3, Huh1, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panc1, PC-3, TF1, CTLL-2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calu1, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bel-1, BC-3, IC21, DLD2, Raw264.7, NRK, NRK-52E, MRC5, MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelial, BALB/3T3 mouse embryo fibroblast, 3T3 Swiss, 3T3-L1, 132-d5 human fetal fibroblasts; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A2780ADR, A2780cis, A 172, A20, A253, A431, A-549, ALC, B16, B35, BCP-1 cells, BEAS-2B, bEnd.3, BHK-21, BR 293. BxPC3. C3H-10T1/2, C6/36, Cal-27, CHO, CHO-7, CHO-IR, CHO-K1, CHO-K2, CHO-T, CHO Dhfr−/−, COR-L23, COR-L23/CPR, COR-L23/5010, COR-L23/R23, COS-7, COV-434, CML T1, CMT, CT26, D17, DH82, DU145, DuCaP, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, HEK-293, HeLa, Hepa1c1c7, HL-60, HMEC, HT-29, Jurkat, JY cells, K562 cells, Ku812, KCL22, KG1, KYO1, LNCap, Ma-Mel 1-48, MC-38, MCF-7, MCF-10A, MDA-MB-231, MDA-MB-468, MDA-MB-435, MDCK II, MDCK 11, MOR/0.2R, MONO-MAC 6, MTD-1A, MyEnd, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NALM-1, NW-145, OPCN/OPCT cell lines, Peer, PNT-1A/PNT 2, RenCa, RIN-5F, RMA/RMAS, Saos-2 cells, Sf-9, SkBr3, T2, T-47D, T84, THP1 cell line, U373, U87, U937, VCaP, Vero cells, WM39, WT-49, X63, YAC-1, YAR, and transgenic varieties thereof.

Cell lines are available from a variety of sources known to those with skill in the art (see, e.g., the American Type Culture Collection (ATCC) (Manassas, Va.)). In some embodiments, a cell transfected with one or more vectors described herein is used to establish a new cell line comprising one or more vector-derived sequences. In some embodiments, a cell transiently transfected with the components of a CRISPR system as described herein (such as by transient transfection of one or more vectors, or transfection with RNA), and modified through the activity of a CRISPR complex, is used to establish a new cell line comprising cells containing the modification but lacking any other exogenous sequence. In some embodiments, cells transiently or non-transiently transfected with one or more vectors described herein, or cell lines derived from such cells are used in assessing one or more test compounds.

Vectors

Some aspects of the present disclosure relate to using recombinant virus vectors (e.g., adeno-associated virus vectors, adenovirus vectors, or herpes simplex virus vectors) for the delivery of the prime editors or components thereof described herein, e.g., the split Cas9 protein or a split nucleobase prime editors, into a cell. In the case of a split-PE approach, the N-terminal portion of a PE fusion protein and the C-terminal portion of a PE fusion are delivered by separate recombinant virus vectors (e.g., adeno-associated virus vectors, adenovirus vectors, or herpes simplex virus vectors) into the same cell, since the full-length Cas9 protein or prime editors exceeds the packaging limit of various virus vectors, e.g., rAAV (˜4.9 kb).

Thus, in one embodiment, the disclosure contemplates vectors capable of delivering split prime editor fusion proteins, or split components thereof. In some embodiments, a composition for delivering the split Cas9 protein or split prime editor into a cell (e.g., a mammalian cell, a human cell) is provided. In some embodiments, the composition of the present disclosure comprises: (i) a first recombinant adeno-associated virus (rAAV) particle comprising a first nucleotide sequence encoding a N-terminal portion of a Cas9 protein or prime editor fused at its C-terminus to an intein-N; and (ii) a second recombinant adeno-associated virus (rAAV) particle comprising a second nucleotide sequence encoding an intein-C fused to the N-terminus of a C-terminal portion of the Cas9 protein or prime editor. The rAAV particles of the present disclosure comprise a rAAV vector (i.e., a recombinant genome of the rAAV) encapsidated in the viral capsid proteins.

In some embodiments, the rAAV vector comprises: (1) a heterologous nucleic acid region comprising the first or second nucleotide sequence encoding the N-terminal portion or C-terminal portion of a split Cas9 protein or a split prime editor in any form as described herein, (2) one or more nucleotide sequences comprising a sequence that facilitates expression of the heterologous nucleic acid region (e.g., a promoter), and (3) one or more nucleic acid regions comprising a sequence that facilitate integration of the heterologous nucleic acid region (optionally with the one or more nucleic acid regions comprising a sequence that facilitates expression) into the genome of a cell. In some embodiments, viral sequences that facilitate integration comprise Inverted Terminal Repeat (ITR) sequences. In some embodiments, the first or second nucleotide sequence encoding the N-terminal portion or C-terminal portion of a split Cas9 protein or a split prime editor is flanked on each side by an ITR sequence. In some embodiments, the nucleic acid vector further comprises a region encoding an AAV Rep protein as described herein, either contained within the region flanked by ITRs or outside the region. The ITR sequences can be derived from any AAV serotype (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) or can be derived from more than one serotype. In some embodiments, the ITR sequences are derived from AAV2 or AAV6.

Thus, in some embodiments, the rAAV particles disclosed herein comprise at least one rAAV2 particle, rAAV6 particle, rAAV8 particle, rPHP.B particle, rPHP.eB particle, or rAAV9 particle, or a variant thereof. In particular embodiments, the disclosed rAAV particles are rPHP.B particles, rPHP.eB particles, rAAV9 particles.

ITR sequences and plasmids containing ITR sequences are known in the art and commercially available (see, e.g., products and services available from Vector Biolabs, Philadelphia, PA; Cellbiolabs, San Diego, CA; Agilent Technologies, Santa Clara, Ca; and Addgene, Cambridge, MA; and Gene delivery to skeletal muscle results in sustained expression and systemic delivery of a therapeutic protein. Kessler P D, Podsakoff G M, Chen X, McQuiston S A, Colosi P C, Matelis L A, Kurtzman G J, Byrne B J. Proc Natl Acad Sci USA. 1996 Nov. 26; 93(24):14082-7; and Curtis A. Machida. Methods in Molecular Medicine™. Viral Vectors for Gene Therapy Methods and Protocols. 10.1385/1-59259-304-6:201 © Humana Press Inc. 2003. Chapter 10. Targeted Integration by Adeno-Associated Virus. Matthew D. Weitzman, Samuel M. Young Jr., Toni Cathomen and Richard Jude Samulski; U.S. Pat. Nos. 5,139,941 and 5,962,313, all of which are incorporated herein by reference).

In some embodiments, the rAAV vector of the present disclosure comprises one or more regulatory elements to control the expression of the heterologous nucleic acid region (e.g., promoters, transcriptional terminators, and/or other regulatory elements). In some embodiments, the first and/or second nucleotide sequence is operably linked to one or more (e.g., 1, 2, 3, 4, 5, or more) transcriptional terminators. Non-limiting examples of transcriptional terminators that may be used in accordance with the present disclosure include transcription terminators of the bovine growth hormone gene (bGH), human growth hormone gene (hGH), SV40, CW3, 4, or combinations thereof. The efficiencies of several transcriptional terminators have been tested to determine their respective effects in the expression level of the split Cas9 protein or the split prime editor. In some embodiments, the transcriptional terminator used in the present disclosure is a bGH transcriptional terminator. In some embodiments, the rAAV vector further comprises a Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE). In certain embodiments, the WPRE is a truncated WPRE sequence, such as “W3.” In some embodiments, the WPRE is inserted 5′ of the transcriptional terminator. Such sequences, when transcribed, create a tertiary structure which enhances expression, in particular, from viral vectors.

In some embodiments, the vectors used herein may encode the PE fusion proteins, or any of the components thereof (e.g., napDNAbp, linkers, or polymerases). In addition, the vectors used herein may encode the PEgRNAs, and/or the accessory gRNA for second strand nicking. The vectors may be capable of driving expression of one or more coding sequences in a cell. In some embodiments, the cell may be a prokaryotic cell, such as, e.g., a bacterial cell. In some embodiments, the cell may be a eukaryotic cell, such as, e.g., a yeast, plant, insect, or mammalian cell. In some embodiments, the eukaryotic cell may be a mammalian cell. In some embodiments, the eukaryotic cell may be a rodent cell. In some embodiments, the eukaryotic cell may be a human cell. Suitable promoters to drive expression in different types of cells are known in the art. In some embodiments, the promoter may be wild-type. In other embodiments, the promoter may be modified for more efficient or efficacious expression. In yet other embodiments, the promoter may be truncated yet retain its function. For example, the promoter may have a normal size or a reduced size that is suitable for proper packaging of the vector into a virus.

In some embodiments, the promoters that may be used in the prime editor vectors may be constitutive, inducible, or tissue-specific. In some embodiments, the promoters may be a constitutive promoters. Non-limiting exemplary constitutive promoters include cytomegalovirus immediate early promoter (CMV), simian virus (SV40) promoter, adenovirus major late (MLP) promoter, Rous sarcoma virus (RSV) promoter, mouse mammary tumor virus (MMTV) promoter, phosphoglycerate kinase (PGK) promoter, elongation factor-alpha (EFla) promoter, ubiquitin promoters, actin promoters, tubulin promoters, immunoglobulin promoters, a functional fragment thereof, or a combination of any of the foregoing. In some embodiments, the promoter may be a CMV promoter. In some embodiments, the promoter may be a truncated CMV promoter. In other embodiments, the promoter may be an EFla promoter. In some embodiments, the promoter may be an inducible promoter. Non-limiting exemplary inducible promoters include those inducible by heat shock, light, chemicals, peptides, metals, steroids, antibiotics, or alcohol. In some embodiments, the inducible promoter may be one that has a low basal (non-induced) expression level, such as, e.g., the Tet-On® promoter (Clontech). In some embodiments, the promoter may be a tissue-specific promoter. In some embodiments, the tissue-specific promoter is exclusively or predominantly expressed in liver tissue. Non-limiting exemplary tissue-specific promoters include B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, Flt-1 promoter, GFAP promoter, GPIIb promoter, ICAM-2 promoter, INF-3 promoter, Mb promoter, Nphsl promoter, OG-2 promoter, SP-B promoter, SYN1 promoter, and WASP promoter.

In some embodiments, the prime editor vectors (e.g., including any vectors encoding the prime editor fusion protein and/or the PEgRNAs, and/or the accessory second strand nicking gRNAs) may comprise inducible promoters to start expression only after it is delivered to a target cell. Non-limiting exemplary inducible promoters include those inducible by heat shock, light, chemicals, peptides, metals, steroids, antibiotics, or alcohol. In some embodiments, the inducible promoter may be one that has a low basal (non-induced) expression level, such as, e.g., the Tet-On® promoter (Clontech).

In additional embodiments, the prime editor vectors (e.g., including any vectors encoding the prime editor fusion protein and/or the PEgRNAs, and/or the accessory second strand nicking gRNAs) may comprise tissue-specific promoters to start expression only after it is delivered into a specific tissue. Non-limiting exemplary tissue-specific promoters include B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, Flt-1 promoter, GFAP promoter, GPIIb promoter, ICAM-2 promoter, INF-β promoter, Mb promoter, Nphsl promoter, OG-2 promoter, SP-B promoter, SYN1 promoter, and WASP promoter.

In some embodiments, the nucleotide sequence encoding the PEgRNA (or any guide RNAs used in connection with prime editing) may be operably linked to at least one transcriptional or translational control sequence. In some embodiments, the nucleotide sequence encoding the guide RNA may be operably linked to at least one promoter. In some embodiments, the promoter may be recognized by RNA polymerase III (Pol III). Non-limiting examples of Pol III promoters include U6, HI and tRNA promoters. In some embodiments, the nucleotide sequence encoding the guide RNA may be operably linked to a mouse or human U6 promoter. In other embodiments, the nucleotide sequence encoding the guide RNA may be operably linked to a mouse or human HI promoter. In some embodiments, the nucleotide sequence encoding the guide RNA may be operably linked to a mouse or human tRNA promoter. In embodiments with more than one guide RNA, the promoters used to drive expression may be the same or different. In some embodiments, the nucleotide encoding the crRNA of the guide RNA and the nucleotide encoding the tracr RNA of the guide RNA may be provided on the same vector. In some embodiments, the nucleotide encoding the crRNA and the nucleotide encoding the tracr RNA may be driven by the same promoter. In some embodiments, the crRNA and tracr RNA may be transcribed into a single transcript. For example, the crRNA and tracr RNA may be processed from the single transcript to form a double-molecule guide RNA. Alternatively, the crRNA and tracr RNA may be transcribed into a single-molecule guide RNA.

In some embodiments, the nucleotide sequence encoding the guide RNA may be located on the same vector comprising the nucleotide sequence encoding the PE fusion protein. In some embodiments, expression of the guide RNA and of the PE fusion protein may be driven by their corresponding promoters. In some embodiments, expression of the guide RNA may be driven by the same promoter that drives expression of the PE fusion protein. In some embodiments, the guide RNA and the PE fusion protein transcript may be contained within a single transcript. For example, the guide RNA may be within an untranslated region (UTR) of the Cas9 protein transcript. In some embodiments, the guide RNA may be within the 5′ UTR of the PE fusion protein transcript. In other embodiments, the guide RNA may be within the 3′ UTR of the PE fusion protein transcript. In some embodiments, the intracellular half-life of the PE fusion protein transcript may be reduced by containing the guide RNA within its 3′ UTR and thereby shortening the length of its 3′ UTR. In additional embodiments, the guide RNA may be within an intron of the PE fusion protein transcript. In some embodiments, suitable splice sites may be added at the intron within which the guide RNA is located such that the guide RNA is properly spliced out of the transcript. In some embodiments, expression of the Cas9 protein and the guide RNA in close proximity on the same vector may facilitate more efficient formation of the CRISPR complex.

The prime editor vector system may comprise one vector, or two vectors, or three vectors, or four vectors, or five vector, or more. In some embodiments, the vector system may comprise one single vector, which encodes both the PE fusion protein and PEgRNA. In other embodiments, the vector system may comprise two vectors, wherein one vector encodes the PE fusion protein and the other encodes the PEgRNA. In additional embodiments, the vector system may comprise three vectors, wherein the third vector encodes the second strand nicking gRNA used in the herein methods.

In some embodiments, the composition comprising the rAAV particle (in any form contemplated herein) further comprises a pharmaceutically acceptable carrier. In some embodiments, the composition is formulated in appropriate pharmaceutical vehicles for administration to human or animal subjects.

Some examples of materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids (23) serum component, such as serum albumin, HDL and LDL; (22) C2-C12 alcohols, such as ethanol; and (23) other non-toxic compatible substances employed in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservative and antioxidants can also be present in the formulation. The terms such as “excipient”, “carrier”, “pharmaceutically acceptable carrier” or the like are used interchangeably herein.

Delivery Methods

In some aspects, the invention provides methods comprising delivering one or more polynucleotides, such as or one or more vectors as described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell. In some aspects, the invention further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells. In some embodiments, a base editor as described herein in combination with (and optionally complexed with) a guide sequence is delivered to a cell.

Exemplary delivery strategies are described herein elsewhere, which include vector-based strategies, PE ribonucleoprotein complex delivery, and delivery of PE by mRNA methods.

In some embodiments, the method of delivery provided comprises nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA.

Exemplary methods of delivery of nucleic acids include lipofection, nucleofection, electoporation, stable genome integration (e.g., piggybac), microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™, Lipofectin™ and SF Cell Line 4D-Nucleofector X Kit™ (Lonza)). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Feigner, WO 91/17424; WO 91/16024. Delivery may be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration). Delivery may be achieved through the use of RNP complexes.

In other embodiments, the method of delivery and vector provided herein is an RNP complex. RNP delivery of fusion proteins markedly increases the DNA specificity of base editing. RNP delivery of fusion proteins leads to decoupling of on- and off-target DNA editing. RNP delivery ablates off-target editing at non-repetitive sites while maintaining on-target editing comparable to plasmid delivery, and greatly reduces off-target DNA editing even at the highly repetitive VEGFA site 2. See Rees, H. A. et al., Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery, Nat. Commun. 8, 15790 (2017), U.S. Pat. No. 9,526,784, issued Dec. 27, 2016, and U.S. Pat. No. 9,737,604, issued Aug. 22, 2017, each of which is incorporated by reference herein.

Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US 2003/0087817, incorporated herein by reference.

Other aspects of the present disclosure provide methods of delivering the prime editor constructs into a cell to form a complete and functional prime editor within a cell. For example, in some embodiments, a cell is contacted with a composition described herein (e.g., compositions comprising nucleotide sequences encoding the split Cas9 or the split prime editor or AAV particles containing nucleic acid vectors comprising such nucleotide sequences). In some embodiments, the contacting results in the delivery of such nucleotide sequences into a cell, wherein the N-terminal portion of the Cas9 protein or the prime editor and the C-terminal portion of the Cas9 protein or the prime editor are expressed in the cell and are joined to form a complete Cas9 protein or a complete prime editor.

It should be appreciated that any rAAV particle, nucleic acid molecule or composition provided herein may be introduced into the cell in any suitable way, either stably or transiently. In some embodiments, the disclosed proteins may be transfected into the cell. In some embodiments, the cell may be transduced or transfected with a nucleic acid molecule. For example, a cell may be transduced (e.g., with a virus encoding a split protein), or transfected (e.g., with a plasmid encoding a split protein) with a nucleic acid molecule that encodes a split protein, or an rAAV particle containing a viral genome encoding one or more nucleic acid molecules. Such transduction may be a stable or transient transduction. In some embodiments, cells expressing a split protein or containing a split protein may be transduced or transfected with one or more guide RNA sequences, for example in delivery of a split Cas9 (e.g., nCas9) protein. In some embodiments, a plasmid expressing a split protein may be introduced into cells through electroporation, transient (e.g., lipofection) and stable genome integration (e.g., piggybac) and viral transduction or other methods known to those of skill in the art.

In certain embodiments, the compositions provided herein comprise a lipid and/or polymer. In certain embodiments, the lipid and/or polymer is cationic. The preparation of such lipid particles is well known. See, e.g. U.S. Pat. Nos. 4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; 4,921,757; and 9,737,604, each of which is incorporated herein by reference.

The guide RNA sequence may be 15-100 nucleotides in length and comprise a sequence of at least 10, at least 15, or at least 20 contiguous nucleotides that is complementary to a target nucleotide sequence. The guide RNA may comprise a sequence of 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 contiguous nucleotides that is complementary to a target nucleotide sequence. The guide RNA may be 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides in length.

In some embodiments, the target nucleotide sequence is a DNA sequence in a genome, e.g. a eukaryotic genome. In certain embodiments, the target nucleotide sequence is in a mammalian (e.g. a human) genome.

The compositions of this disclosure may be administered or packaged as a unit dose, for example. The term “unit dose” when used in reference to a pharmaceutical composition of the present disclosure refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent, i.e., a carrier or vehicle.

Treatment of a disease or disorder includes delaying the development or progression of the disease, or reducing disease severity. Treating the disease does not necessarily require curative results.

As used therein, “delaying” the development of a disease means to defer, hinder, slow, retard, stabilize, and/or postpone progression of the disease. This delay can be of varying lengths of time, depending on the history of the disease and/or individuals being treated. A method that “delays” or alleviates the development of a disease, or delays the onset of the disease, is a method that reduces probability of developing one or more symptoms of the disease in a given time frame and/or reduces extent of the symptoms in a given time frame, when compared to not using the method. Such comparisons are typically based on clinical studies, using a number of subjects sufficient to give a statistically significant result.

“Development” or “progression” of a disease means initial manifestations and/or ensuing progression of the disease. Development of the disease can be detectable and assessed using standard clinical techniques as well known in the art. However, development also refers to progression that may be undetectable. For purpose of this disclosure, development or progression refers to the biological course of the symptoms. “Development” includes occurrence, recurrence, and onset.

As used herein “onset” or “occurrence” of a disease includes initial onset and/or recurrence. Conventional methods, known to those of ordinary skill in the art of medicine, can be used to administer the isolated polypeptide or pharmaceutical composition to the subject, depending upon the type of disease to be treated or the site of the disease.

Without further elaboration, it is believed that one skilled in the art can, based on the above description, utilize the present disclosure to its fullest extent. The following specific embodiments are, therefore, to be construed as merely illustrative, and not limitative of the remainder of the disclosure in any way whatsoever. All publications cited herein are incorporated by reference for the purposes or subject matter referenced herein.

Sequences

This application throughout describes a variety of amino acid and nucleotide sequences relating to various aspects of the present disclosure, including exemplary Cas9 sequences, reverse transcriptase sequences, fusion protein sequences, linkers, guide RNAs, and other sequences. In addition, Example 2 (and elsewhere herein) describes a process and algorithm for designing and/or determine the sequence of thousands of exemplary prime editor guide RNAs (PEgRNA) for repairing exemplary sequences from the ClinVar database.

This application is being filed with a Sequence Listing. The Sequence Listing includes a description of each of the PEgRNA determined in accordance with Example 2. In total, Example 2 determines the sequence of 133515 exemplary PEgRNA complete sequences. Each of these sequences of presented/included in the Sequence Listing and identified as SEQ ID NOs: 1-135514 and 813085-880462. In addition, and as described elsewhere, the PEgRNA are each comprised of a spacer (SEQ ID NOs: 135515-271028 and 880463-947840) and an extension arm (SEQ ID NOs: 271029-406542 and 947841-1015218). In addition, each PEgRNA comprises a gRNA core, for example, as defined by SEQ ID NOs: 1361579-1361580. The extension arms of SEQ ID NOs: 271029-406542 and 947841-1015218 are further each comprised of a primer binding site (SEQ ID NOs.: 406543-542056 and 1015219-1082596), an edit template (SEQ ID NOs.: 542057-677570 and 1082597-1149974), and a homology arm (SEQ ID NOs.: 677571-813084 and 1149975-1217352). The PEgRNA optionally may comprise a 5′ end modifier region and/or a 3′ end modifier region. The PEgRNA may also comprise a reverse transcription termination signal (e.g., SEQ ID NOs: 1361560-1361566) at the 3′ of the PEgRNA.

For each full length PEgRNA sequence provided in the Sequence Listing (SEQ ID NO: 1-135514), the Sequence Listing includes a set of five (5) corresponding subsequences: namely, the (1) spacer, (2) extension arm, (3) primer binding site, (4) edit template, and (5) homology arm. The set of subsequences for any given PEgRNA full length sequence may be determined by the following mathematical operation.

Determining the Set of Subsequences for Each PEgRNA

For each PEgRNA sequence (e.g., SEQ ID NO: 1) in the Sequence Listing, the following sequences in the Sequence Listing constitute a set of corresponding subsequences:

For SEQ ID NO: 1-813084: the (1) spacer, (2) extension arm, (3) primer binding site, (4) edit template, and (5) homology arm, relate as follows:

Spacer: For each given PEgRNA sequence, the corresponding spacer sequence is identified as the numeral of the PEgRNA sequence identifier (e.g., numeral “1” for SEQ ID NO: 1) added to factor 135514. For example the spacer corresponding to the PEgRNA of SEQ ID NO: 1 is SEQ ID NO: 135515.

Extension arm: For each given PEgRNA sequence, the corresponding extension arm is identified as the numeral of the PEgRNA sequence identifier (e.g., numeral “1” for SEQ ID NO: 1) added to the factor 271028 (135514×2). For example the extension arm corresponding to the PEgRNA of SEQ ID NO: 1 is SEQ ID NO: 271029.

Primer binding site: For each given PEgRNA sequence, the corresponding primer binding site is identified as the numeral of the PEgRNA sequence identifier (e.g., numeral “1” for SEQ ID NO: 1) added to the factor 406542 (135514×3). For example the primer binding site corresponding to the PEgRNA of SEQ ID NO: 1 is SEQ ID NO: 406542.

Edit template: For each given PEgRNA sequence, the corresponding edit template is identified as the numeral of the PEgRNA sequence identifier (e.g., numeral “1” for SEQ ID NO: 1) added to the factor 542056 (135514×4). For example, the edit template corresponding to the PEgRNA of SEQ ID NO: 1 is SEQ ID NO: 542057.

Homology arm: For each given PEgRNA sequence, the corresponding homology arm is identified as the numeral of the PEgRNA sequence identifier (e.g., numeral “1” for SEQ ID NO: 1) added to the factor 677570 (135514×5). For example, the edit template corresponding to the PEgRNA of SEQ ID NO: 1 is SEQ ID NO: 677571.

Examples of other PEgRNA sequence sets (i.e., comprising any given PEgRNA from sequences 1-813084, each of which was designed against SpCas9 (NG) (SEQ ID NO: 1361594) or SpCas9 (NGG) (SEQ ID NO: 1361593), and the corresponding spacer, extension arm, primer binding site, edit template, and homology arm) are presented in the following table:

complete

Extension
primer
edit
homology

Group A
PEgRNA
spacer
arm
binding site
template
arm

SEQ ID NOs.:
1
135,515
271,029
406,543
542,057
677,571

SEQ ID NOs.:
2
135,516
271,030
406,544
542,058
677,572

SEQ ID NOs.:
3
135,517
271,031
406,545
542,059
677,573

SEQ ID NOs.:
4
135,518
271,032
406,546
542,060
677,574

SEQ ID NOs.:
5
135,519
271,033
406,547
542,061
677,575

SEQ ID NOs.:
6
135,520
271,034
406,548
542,062
677,576

. . .

SEQ ID NOs.:
135,509
271,023
406,537
542,051
677,565
813,079

SEQ ID NOs.:
135,510
271,024
406,538
542,052
677,566
813,080

SEQ ID NOs.:
135,511
271,025
406,539
542,053
677,567
813,081

SEQ ID NOs.:
135,512
271,026
406,540
542,054
677,568
813,082

SEQ ID NOs.:
135,513
271,027
406,541
542,055
677,569
813,083

SEQ ID NOs.:
135,514
271,028
406,542
542,056
677,570
813,084

Referencing the column “complete PEgRNA” sequence, the following sequences were designed against SpCas9 (NG): SEQ ID NOs: 1-5647, 11805-16732, 22103-25050, 28363-29187, 30093-32319, 35189-36933, 38922-39997, 41226-42469, 43878-44208, 44586-46456, 48645-49697, 50844-52070, 53532-54670, 55949-57576, 59335-60913, 62672-64332, 66233-67299, 68520-69273, 70195-72171, 74385-74390, 74398-77256, 80717-81275, 81899-81962, 82033-82033, 82036-82044, 82057-82063, 82072-82075, 82080-82084, 82090-82092, 82096-82100, 82106-82110, 82117-82122, 82129-82405, 82715-84431, 86323-86687, 87092-87715, 88417-88800, 89256-89791, 90405-92752, 95411-98661, 102329-103777, 105393-107009, 108826-109348, 109932-110356, 110863-111265, 111744-112224, 112822-113854, 115060-115952, 116995-117667, 118418-118426, 118436-119980, 121698-121921, 122175-122445, 122774-124123, 125657-126486, 127395-127872, 128428-128931, 129509-130164, 130892-131784, 132784-134059.

Referencing the column “complete PEgRNA” sequence, the following sequences were designed against SpCas9 (NGG): SEQ ID NOs: 5648-11804, 16733-22102, 25051-28362, 29188-30092, 32320-35188, 36934-38921, 39998-41225, 42470-43877, 44209-44585, 46457-48644, 49698-50843, 52071-53531, 54671-55948, 57577-59334, 60914-62671, 64333-66232, 67300-68519, 69274-70194, 72172-74384, 74391-74397, 77257-80716, 81276-81898, 81963-82032, 82034-82035, 82045-82056, 82064-82071, 82076-82079, 82085-82089, 82093-82095, 82101-82105, 82111-82116, 82123-82128, 82406-82714, 84432-86322, 86688-87091, 87716-88416, 88801-89255, 89792-90404, 92753-95410, 98662-102328, 103778-105392, 107010-108825, 109349-109931, 110357-110862, 111266-111743, 112225-112821, 113855-115059, 115953-116994, 117668-118417, 118427-118435, 119981-121697, 121922-122174, 122446-122773, 124124-125656, 126487-127394, 127873-128427, 128932-129508, 130165-130891, 131785-132783, 134060-135514.

For SEQ ID NO: 813085-1217352: the (1) spacer, (2) extension arm, (3) primer binding site, (4) edit template, and (5) homology arm, relate as follows:

Spacer: For each given PEgRNA sequence, the corresponding spacer sequence is identified as the numeral of the PEgRNA sequence identifier (e.g., numeral “813085” for SEQ ID NO: 813085) added to factor 67378. For example the spacer corresponding to the PEgRNA of SEQ ID NO: 813085 is SEQ ID NO: 880463.

Extension arm: For each given PEgRNA sequence, the corresponding extension arm is identified as the numeral of the PEgRNA sequence identifier (e.g., numeral “813085” for SEQ ID NO: 813085) added to the factor 134756 (67378×2). For example the extension arm corresponding to the PEgRNA of SEQ ID NO: 813085 is SEQ ID NO: 947841.

Primer binding site: For each given PEgRNA sequence, the corresponding primer binding site is identified as the numeral of the PEgRNA sequence identifier (e.g., numeral “813085” for SEQ ID NO: 813085) added to the factor 202134 (67378×3). For example the primer binding site corresponding to the PEgRNA of SEQ ID NO: 813085 is SEQ ID NO: 1015219.

Edit template: For each given PEgRNA sequence, the corresponding edit template is identified as the numeral of the PEgRNA sequence identifier (e.g., numeral “813085” for SEQ ID NO: 813085) added to the factor 269512 (67378×4). For example, the edit template corresponding to the PEgRNA of SEQ ID NO: 813085 is SEQ ID NO: 1082597.

Homology arm: For each given PEgRNA sequence, the corresponding homology arm is identified as the numeral of the PEgRNA sequence identifier (e.g., numeral “813085” for SEQ ID NO: 813085) added to the factor 336890 (67378×5). For example, the edit template corresponding to the PEgRNA of SEQ ID NO: 813085 is SEQ ID NO: 1149975.

The total number of sequences provided in the Sequence Listing is 1217352. There are total of 202892 PEgRNA complete sequences (each comprising at least a spacer, gRNA core, and extension arm). There are same number of (1) spacers, (2) extension arms, (3) primer binding sites, (4) edit templates, and (5) homology arms, with sets of each defined as above.

Examples of other PEgRNA sequence sets (i.e., comprising any given PEgRNA from sequences 813085-1217352, and the corresponding spacer, extension arm, primer binding site, edit template, and homology arm) are presented in the following table:

complete

Extension
primer
edit
homology

Group B
PEgRNA
spacer
arm
binding site
template
arm

SEQ ID NOs.:
813,085
880,463
947,841
1,015,219
1,082,597
1,149,975

SEQ ID NOs.:
813,086
880,464
947,842
1,015,220
1,082,598
1,149,976

SEQ ID NOs.:
813,087
880,465
947,843
1,015,221
1,082,599
1,149,977

SEQ ID NOs.:
813,088
880,466
947,844
1,015,222
1,082,600
1,149,978

SEQ ID NOs.:
813,089
880,467
947,845
1,015,223
1,082,601
1,149,979

SEQ ID NOs.:
813,090
880,468
947,846
1,015,224
1,082,602
1,149,980

. . .

SEQ ID NOs.:
880,457
947,835
1,015,213
1,082,591
1,149,969
1,217,347

SEQ ID NOs.:
880,458
947,836
1,015,214
1,082,592
1,149,970
1,217,348

SEQ ID NOs.:
880,459
947,837
1,015,215
1,082,593
1,149,971
1,217,349

SEQ ID NOs.:
880,460
947,838
1,015,216
1,082,594
1,149,972
1,217,350

SEQ ID NOs.:
880,461
947,839
1,015,217
1,082,595
1,149,973
1,217,351

SEQ ID NOs.:
880,462
947,840
1,015,218
1,082,596
1,149,974
1,217,352

Each of the sequences of SEQ ID NOs: 813,085-1,217,352 were designed against SaCas9-KKH (SEQ ID NO: 1361596).

The Sequence Listing filed herewith is intended to and does form part of the instant specification as originally filed.

A summary of the content of the Sequence Listing (i.e., inventory) is as follows in Table XY:

SEQ ID NO:*
Molecule Description
Classification

1-135514
PEgRNA - complete sequence
PEgRNA or

component thereof

135515-271028
Spacer
″

271029-406542
Extension arm
″

406543-542056
Primer binding site
″

542057-677570
Edit template
″

677571-813084
Homology arm
″

813085-880462
PEgRNA - complete sequence
″

880463-947840
Spacer
″

947841-1015218
Extension arm
″

1015219-1082596
Primer binding site
″

1082597-1149974
Edit template
″

1149975-1217352
Homology arm
″

1217353-1289387
Disease variants (input) -
Prioritized

variant + context sequence
disease-related

variant from

ClinVar

database**,

including 200 bp

upstream and

downstream of the

disease variant***

1289388-1361420
Healthy alleles (output) -
Corresponding

healthy allele + context
healthy allele,

sequence
including 200 bp

upstream and

downstream of the

healthy allele***

1361421-1361428
Wild type canonical SpCas9
napDNAbp/Cas9

1361429-1361442
Wild type Cas9 orthologs
napDNAbp/Cas9

1361443-1361444
Dead Cas9 variant
napDNAbp/Cas9

1361445-1361456, and
Other Cas9 variants
napDNAbp/Cas9

1361593-1361596

1361457-1361462
Small-sized Cas9 variants
napDNAbp/Cas9

1361463-1361471
Cas9 equivalents
napDNAbp/Cas9

1361472-1361474
Cas9 with expanded PAM
napDNAbp/Cas9

1361475-1361484
Cas9 circular permutants
napDNAbp/Cas9

1361485-1361496
Wild type Reverse
RTs

transcriptases

1361497-1361514, and
Variant Reverse transcriptases
RTs

1361597-1361598

1361515-1361519, and
PE fusion proteins
PE Fusion

1361602

1361520-1361530,
Linkers
Linkers

1361585, and 1361603

1361531-1361541, and
Nuclear localization sequence
NLS

1361659-1361664

1361542-1361547
Flap endonucleases (FEN1/
Flap endonucleases

GEN1/ERCC5, etc. )

1361548-1361559
Target sites and target
PEgRNA

sequences

1361560-1361565
Transcription terminators
PEgRNA

1361566-1361578
Miscellaneous PEgRNA
PEgRNA

1361579-1361581
algorithm-based design
PEgRNA

of PEgRNA

1361582-1361584
PE Complexes
PE Complexes

1361586-1361592
TAG sequences
TAG

1361665-1361667

TREX

1361668-1361670

EXO1

1361671-1361678

INTEIN

1361679-1361680

hairpin

1361681

uracil

1361682-1361685

IBR

1361604-1361658
Sequences which appear
FIG SEQ

only in the figures

PAM Recognition Sites

For each of the PEgRNA, the Cas9 may have PAM recognition sites associated therewith. In some embodiments, the PAM recognition site is NGG. In some embodiments, the PAM recognition site is NG. In some embodiments, the PAM recognition site is KKH. The following table illustrates the PAM site associated targeted by the PEgRNA of the disclosure:

TABLE XX

PAM associations

NG
NGG
KKH

SEQ ID NO:
1-5467
5648-
813085-

11804
1217352

SEQ ID NO:
11805-
16733-

16732
22102

SEQ ID NO:
22103-
25051-

25050
28362

SEQ ID NO:
28363-
29188-

29187
30092

SEQ ID NO:
30093-
32320-

32319
35188

SEQ ID NO:
35189-
36934-

36933
38921

SEQ ID NO:
38922-
39998-

39997
41225

SEQ ID NO:
41226-
42470-

42469
43877

SEQ ID NO:
43878-
44209-

44208
44585

SEQ ID NO:
44586-
46457-

46456
48644

SEQ ID NO:
48645-
49698-

49697
50843

SEQ ID NO:
50844-
52071-

52070
53531

SEQ ID NO:
53532-
54671-

54670
55948

SEQ ID NO:
55949-
57577-

57576
59334

SEQ ID NO:
59335-
60914-

60913
62671

SEQ ID NO:
62672-
64333-

64332
66232

SEQ ID NO:
66233-
67300-

67299
68519

SEQ ID NO:
68520-
69274-

69273
70194

SEQ ID NO:
70195-
72172-

72171
74384

SEQ ID NO:
74385-
74391-

74390
74397

SEQ ID NO:
74398-
77257-

77256
80716

SEQ ID NO:
80717-
81276-

81275
81898

SEQ ID NO:
81899-
81963-

81962
82032

SEQ ID NO:
82033-
82034-

82033
82035

SEQ ID NO:
82036-
82045-

82044
82056

SEQ ID NO:
82057-
82064-

82063
82071

SEQ ID NO:
82072-
82076-

82075
82079

SEQ ID NO:
82080-
82085-

82084
82089

SEQ ID NO:
82090-
82093-

82092
82095

SEQ ID NO:
82096-
82101-

82100
82105

SEQ ID NO:
82106-
82111-

82110
82116

SEQ ID NO:
82117-
82123-

82122
82128

SEQ ID NO:
82129-
82406-

82405
82714

SEQ ID NO:
82715-
84432-

84431
86322

SEQ ID NO:
86323-
86688-

86687
87091

SEQ ID NO:
87092-
87716-

87715
88416

SEQ ID NO:
88417-
88801-

88800
89255

SEQ ID NO:
89256-
89792-

89791
90404

SEQ ID NO:
90405-
92753-

92752
95410

SEQ ID NO:
95411-
98662-

98661
102328

SEQ ID NO:
102329-
103778-

103777
105392

SEQ ID NO:
105393-
107010-

107009
108825

SEQ ID NO:
108826-
109349-

109348
109931

SEQ ID NO:
109932-
110357-

110356
110862

SEQ ID NO:
110863-
111266-

111265
111743

SEQ ID NO:
111744-
112225-

112224
112821

SEQ ID NO:
112822-
113855-

113854
115059

SEQ ID NO:
115060-
115953-

115952
116994

SEQ ID NO:
116995-
117668-

117667
118417

SEQ ID NO:
118418-
118427-

118426
118435

SEQ ID NO:
118436-
119981-

119980
121697

SEQ ID NO:
121698-
121922-

121921
122174

SEQ ID NO:
122175-
122446-

122445
122773

SEQ ID NO:
122774-
124124-

124123
125656

SEQ ID NO:
125657-
126487-

126486
127394

SEQ ID NO:
127395-
127873-

127872
128427

SEQ ID NO:
128428-
128932-

128931
129508

SEQ ID NO:
129509-
130165-

130164
130891

SEQ ID NO:
130892-
131785-

131784
132783

SEQ ID NO:
132784-
134060-

134059
135514

EXAMPLES
Example 1. Prime Editing (PE) for Installing Precise Nucleotide Changes in the Genome

The objective is to develop a transformative genome editing technology for precise and general installation of single nucleotide changes in mammalian genomes. This technology would allow investigators to study the effects of single nucleotide variations in virtually any mammalian gene, and potentially enable therapeutic interventions for correcting pathogenic point mutations in human patients.

Adoption of the clustered regularly interspaced short palindromic repeat (CRISPR) system for genome editing has revolutionized the life sciences^1-3. Although gene disruption using CRISPR is now routine, the precise installation of single nucleotide edits remains a major challenge, despite being necessary for studying or correcting a large number of disease-causative mutations. Homology directed repair (HDR) is capable of achieving such edits, but suffers from low efficiency (often <5%), a requirement for donor DNA repair templates, and deleterious effects of double-stranded DNA break (DSB) formation. Recently, the Liu laboratory developed base editing, which achieves efficient single nucleotide editing without DSBs. Base editors (BEs) combine the CIRSPR system with base-modifying deaminase enzymes to convert target C•G or A•T base pairs to A•T or G•C, respectively^4-6. Although already widely used by researchers worldwide (>5,000 Liu lab BE constructs distributed by Addgene), current BEs enable only four of the twelve possible base pair conversions and are unable to correct small insertions or deletions. Moreover, the targeting scope of base editing is limited by the editing of non-target C or A bases adjacent to the target base (“bystander editing”) and by the requirement that a PAM sequence exist 15±2 bp from the target base. Overcoming these limitations would therefore greatly broaden the basic research and therapeutic applications of genome editing.

Here, it is proposed to develop a new precision editing approach that offers many of the benefits of base editing—namely, avoidance of double strand breaks and donor DNA repair templates—while overcoming its major limitations. To achieve this ambitious goal, it is aimed to directly install edited DNA strands at target genomic sites using target-primed reverse transcription (TPRT). In the design discussed herein, CRISPR guide RNA (gRNA) will be engineered to carry a template encoding mutagenic DNA strand synthesis, to be executed by an associated reverse transcriptase (RT) enzyme. The CRISPR nuclease (Cas9)-nicked target site DNA will serve as the primer for reverse transcription, allowing for direct incorporation of any desired nucleotide edit.

Experiment 1

Establish guide RNA-templated reverse transcription of mutagenic DNA strands. Prior studies have shown that, following DNA cleavage but prior to complex dissociation, Cas9 releases the non-target DNA strand to expose a free 3′ terminus. It is hypothesized that this DNA strand is accessible to extension by polymerase enzymes, and that gRNAs can be engineered through extension of their 5′ or 3′ terminus to serve as templates for DNA synthesis. In preliminary in vitro studies, it was established that nicked DNA strands within Cas9:gRNA-bound complexes can indeed prime reverse transcription using the bound gRNA as a template (RT enzyme in trans). Next, different gRNA linkers, primer binding sites, and edit templates will be explored to determine optimal design rules in vitro. Then, different RT enzymes, acting in trans or as fusions to Cas9, will be evaluated in vitro. Finally, engineered gRNA designs will be identified that retain efficient binding and cutting activity in cells. Successful demonstration of this aim will provide a foundation for carrying out mutagenic strand synthesis in cells.

Experiment 2

Establish prime editing in human cells. Based on DNA processing and repair mechanisms, it is hypothesized that mutagenic DNA strands (single stranded flaps) can be used to direct specific and efficient editing of target nucleotides. In encouraging preliminary studies, feasibility for this strategy was established by demonstrating editing with model plasmid substrates containing mutagenic flaps. Concurrent with Experiment 1, repair outcomes will be further evaluated by systematically varying the mutagenic flap's length, sequence composition, target nucleotide identity, and 3′ terminus. Small 1 to 3 nucleotide insertions and deletions will also be tested. In parallel, and building from Experiment 1, Cas9-RT architectures will be evaluated, including fusion proteins and non-covalent recruitment strategies. Cas9-RT architectures and extended gRNAs will be assayed for cellular editing at multiple target sites in the human genome, and will then be optimized for high efficiency. If successful, this aim would immediately establish TPRT genome editing for basic science applications.

Experiment 3

Achieve site-specific editing of pathogenic mutations in cultured human cells. The potential generality of this technology could enable editing of transversion mutations and indels that are not currently correctable by BEs. Guided by the results of Experiment 1 and Experiment 2, pathogenic transversion mutations will be targeted in cultured human cells, including the sickle cell disease founder mutation in beta globin (requires an A•T to T•A transversion to correct) and the most prevalent Wilson's disease mutation in ATP7B (requires a G•C to T•A transversion to correct). The correction of small insertion and deletion mutations will also be examined, including the 3-nucleotide ΔF508 deletion in CFTR that causes cystic fibrosis. If successful, this would lay the foundation for developing powerful therapeutic approaches that address these important human diseases.

Approach

The objective is to develop a genome editing strategy that directly installs point mutations at targeted genomic sites. In the technology development phase, efforts will focus on protein and RNA engineering to incorporate TPRT functionality into the CRISPR/Cas system. In vitro assays will be used to carefully probe the function of each step of TPRT, building from the ground up (Experiment 1). The second focus area will evaluate editing outcomes in mammalian cells using a combination of model substrates and engineered CRISPR/Cas systems (Experiment 2). Finally, the application phase will use the technology to correct mutations that have been intractable to genome editing by other methods (Experiment 3).

The general editing design is shown in FIGS. 1A-1B. Cas9 nickases contain inactivating mutations to the HNH nuclease domain (Spy Cas9 H840A or N863A), restricting DNA cleavage to the PAM containing strand (non-target strand). Guide RNAs (gRNAs) are engineered to contain a template for reverse transcription. Shown is a 5′ extension of the gRNA, but 3′ extensions can also be implemented. The Cas9 nickase is fused to a reverse transcriptase (RT) enzyme, either through the C-terminus or N-terminus. The gRNA:Cas9-RT complex targets the DNA region of interest and forms an R loop after displacing the non-target strand. Cas9 nicks the non-target DNA strand. Release of the nicked strand exposes a free 3′-OH terminus that is competent to prime reverse transcription using the extended gRNA as a template. This DNA synthesis reaction is carried out by the fused RT enzyme. The gRNA template encodes a DNA sequence that is homologous to the original DNA duplex, with the exception of the nucleotide that is targeted for editing. The product of reverse transcription is a single stranded DNA flap that encodes the desired edit. This flap, which contains a free 3′ terminus, can equilibrate with the adjacent DNA strand, resulting in a 5′ flap species. The latter species is hypothesized to serve as an efficient substrate for FEN1 (flap endonuclease 1), an enzyme that naturally excises 5′ flaps from Okazaki fragments during lagging strand DNA synthesis, and removes 5′ flaps following strand displacement synthesis that occurs during long-patch base excision repair. Ligation of the nicked DNA produces a mismatched base pair. This intermediate could either undergo reversion to the original base pair or conversion to the desired edited base pair via mismatch repair (MMR) processes. Alternatively, semiconservative DNA replication could give rise to one copy each of the reversion and edit.

1. Establish Guide RNA-Templated Reverse Transcription of Mutagenic DNA Strands.
Background and Rationale

In the proposed genome editing strategy, the Cas9-nicked nontarget DNA strand (PAM-containing strand that forms the R-loop) acts as the primer for DNA synthesis. It is hypothesized that this is possible based on several pieces of biochemical and structural data. Nuclease protection experiments³², crystallographic studies³³, and base editing windows^4,24have demonstrated a large degree of flexibility and disorder for the nontarget strand nucleotides −20 through −10 within the so-called R-loop of the Cas9-bound complex (numbering indicates distance 5′ from first PAM nucleotide). Moreover, the PAM-distal portion of the cleaved nontarget strand can be displaced from tightly bound ternary complexes when complementary ssDNA is added in trans²⁰. These studies support that the nontarget strand is highly flexible, is accessible to enzymes, and that after nicking, the 3′ terminus of the PAM-distal fragment is released prior to Cas9 dissociation. Furthermore, it is hypothesized that gRNAs can be extended to template DNA synthesis. Prior studies have shown that gRNAs for SpCas9, SaCas9, and LbCas12a (formerly Cpf1) tolerate gRNA extensions with RNA aptamers³⁴, ligand-inducible self-cleaving ribozymes³⁵, and long non-coding RNAs³⁶. This literature establishes precedent for two major features that will be exploited. In assessing this strategy, several CRISPR-Cas systems will be evaluated in conjunction with 5′ and 3′ extended gRNA designs using a combination of in vitro and cellular assays (FIGS. 2A-2C).

Designs for engineered gRNAs for prime editing are shown in FIGS. 3A-3B. DNA synthesis proceeds 5′ to 3′, and thus copies the RNA template in the 3′ to 5′ direction. The design for the 5′ extension contains a linker region, a primer binding site where the nicked DNA strand anneals, and a template for DNA synthesis by reverse transcription. The 3′ extended gRNA contains a primer binding site and a reverse transcription template. In some cases, the 3′ RNA hairpin of the gRNA core is modified to match the DNA target sequence, as in vitro experiments showed that reverse transcription extends ˜3 nucleotides into the gRNA core for the 3′ extended gRNA constructs (modification of the hairpin sequence appears well tolerated so long as compensatory changes are made that maintain the hairpin RNA structure). DNA synthesis proceed 5′ to 3′, with nucleotides added to the 3′ OH of the growing DNA strand.

Preliminary Results

Cas9 nicked DNA primes reverse transcription of gRNA templates. To evaluate the accessibility of the nicked nontarget DNA strand, in vitro biochemical assays were performed using the Cas9 nuclease from S. pyogenes (SpCas9) and Cy5 fluorescently labeled duplex DNA substrates (51 base pairs). First, a series of gRNAs containing 5′ extensions with varying edit template lengths were prepared by in vitro transcription (overall design shown in FIG. 2B). Electrophoretic mobility shift assays (EMSA) with nuclease dead Cas9 (dCas9) established that 5′ extended gRNAs maintain target binding affinity (data not shown). Next, TPRT activity was tested on pre-nicked CyS-labeled duplex DNA substrates using dCas9, 5′-extended gRNAs, and Molony-Murine Leukemia Virus (M-MLV) reverse transcriptase (Superscript III). After 1 hour of incubation at 37° C., products were evaluated by denaturing polyacrylamide gel electrophoresis (PAGE) and imaged using Cy5 fluorescence (FIG. 4A). Each 5′-extended gRNA variant led to significant product formation, with the observed DNA product sizes being consistent with the length of the extension template (FIG. 4B). Importantly, in the absence of dCas9, pre-nicked substrates were extended to the full 51-bp length of the DNA substrate, strongly suggesting that the complementary DNA strand, and not the gRNA, was used as the template for DNA synthesis when dCas9 was not present (FIG. 4C). Of note, the system was designed such that the newly synthesized DNA strand mirrors the product that would be required for target site editing (a homologous strand with a single nucleotide change). This result establishes that Cas9:gRNA binding exposes the nicked nontarget strand's 3′ end, and that the nontarget strand is accessible to reverse transcription.

Next, non-nicked dsDNA substrates were evaluated using the Cas9(H840A) mutant, which nicks the nontarget DNA strand. First, to test Cas9(H840A) nickase activity with 5′-extended gRNAs, in vitro cleavage assays were performed as previously described³⁷. Although nicking was impaired by comparison to the standard gRNA, appreciable cleavage products were formed (FIG. 4D). Importantly, RT products were also observed when TPRT reactions were carried out with 5′-extended gRNAs and Cas9(H840A), albeit at lower yields that are likely explained by the decreased nicking activity (FIG. 4D). This result establishes that 5′-extended gRNA:Cas9(H840A) complexes can nick DNA and template reverse transcription.

Finally, 3′ gRNA extensions were evaluated for Cas9(H840A) nicking and TPRT. By comparison to 5′-extended gRNAs, DNA cleavage by 3′-extended gRNAs was not impaired to any detectable extent compared to the standard gRNA. Significantly, 3′-extended gRNA templates also supported efficient reverse transcription with both pre-nicked and intact duplex DNA substrates when M-MLV RT was supplied in trans (FIG. 4E). Surprisingly, only a single product was observed for 3′-extended templates, indicating that reverse transcription terminates at a specific location along the gRNA scaffold. Homopolymer tailing of the product with terminal transferase followed by Klenow extension and Sanger sequencing revealed that the full gRNA edit template was copied in addition to the terminal 3 nucleotides of the gRNA core. In the future, the flap terminus will be reprogrammed by modifying the terminal gRNA sequence^38,39. This result demonstrates that 3′-extended gRNAs can serve as efficient nuclease targeting guides and can template reverse transcription.

Cas9-TPRT uses nicked DNA and gRNA in cis. Dual color experiments were used to determine if the RT reaction preferentially occurs with the gRNA in cis (bound in the same complex) (see FIG. 8). Two separate experiments were conducted for 5′-extended and 3′-extended gRNAs. For a given experiment, ternary complexes of dCas9, gRNA, and DNA substrate were formed in separate tubes. In one tube, the gRNA encodes a long RT product and the DNA substrate is labeled with Cy3 (red); in the other, the gRNA encodes a short RT product and the DNA substrate is labeled with Cy5 (blue). After short incubations, the complexes were mixed and then treated with RT enzyme and dNTPs. Products were separated by urea-denaturing PAGE and visualized by fluorescence in the Cy3 and Cy5 channels. Reaction products were found to preferentially form using the gRNA template that was pre-complexed with the DNA substrate, indicating that the RT reaction likely can occur in cis. This results supports that a single Cas9:gRNA complex can target a DNA site and template reverse transcription of a mutagenic DNA strand.

(viii) Testing TPRT with Other Cas Systems

Similar experiments to those presented in the previous sections will be carried out using other Cas systems, including Cas9 from S. aureus and Cas12a from L. bacterium (see FIGS. 2A-2C). If TRPT can also be demonstrated for these Cas variants, the potential editing scope and likelihood of overall success in cells would increase.

(ix) Testing TPRT with RT-Cas9 Fusion Proteins

A series of commercially available or purifiable RT enzymes will first be evaluated in trans for TPRT activity. In addition to the already tested RT from M-MLV, the RT from Avian Myeloblastosis Virus (AMV), the Geobacillus stearothermophilus Group II Intron (GsI-IIC)⁴¹⁴², and the Eubacterium rectale group II intron (Eu.re.I2)^43,44will be evaluated. Significantly, the latter two RTs perform TPRT in their natural biological contexts. Where relevant, RNAse inactivating mutations and other potentially beneficial RT enzyme modifications will be tested. Once functional RTs are identified when supplied in trans, each will be evaluated as a fusion protein to Cas9 variants. Both N-terminus and C-terminus fusion orientations will be tested, along with various linker lengths and architectures. Kinetic time course experiments will be used to determine whether TPRT can occur using the RT enzyme in cis. If an RT-Cas9 fusion architecture can be constructed that allows for efficient TPRT chemistry, this will greatly increase the likelihood of functional editing in the context of a cell.

(x) Cas9 Targeting with Engineered gRNAs in Cells

Candidate engineered gRNAs developed in the previous sub-aims will be evaluated in human cell culture experiments (HEK293) to confirm Cas9 targeting efficiency. Using established indel formation assays employing wild type SpCas9⁴⁵, engineered gRNAs will be compared side-by-side with standard gRNAs across 5 or more sites in the human genome. Genome editing efficiency will be characterized by amplicon sequencing in multiplex using the Illumina MiSeq platform housed in the laboratory. It is anticipated that results from this and the preceding sections will generate insights that inform subsequent iterations of the design-build-test cycle, where gRNAs can be optimized for both templating reverse transcription and efficient Cas9 targeting in cells.

Results of in vitro validations are shown in FIGS. 5-7. In vitro experiments demonstrated that the nicked non-target DNA strand is flexible and available for priming DNA synthesis, and that the gRNA extension can serve as a template for reverse transcription (see FIG. 5). This set of experiments used 5′-extended gRNAs (designed as shown in FIGS. 3A-3B) with varying length edit templates (listed to the left). Fluorescently labeled (Cy5) DNA targets were used as substrates, and were pre-nicked in this set of experiments. The Cas9 used in these experiments is catalytically dead Cas9 (dCas9), so cannot cut DNA but can still bind efficiently. Superscript III, a commercial RT derived from the Moloney-Murine Leukemia Virus (M-MLV), was supplied in trans. First, dCas9:gRNA complexes were formed from purified components. Then, the fluorescently labeled DNA substrate was added along with dNTPs and the RT enzyme. After 1 hour of incubation at 37 C, the reaction products were analyzed by denaturing urea-polyacrylamide gel electrophoresis (PAGE). The gel image shows extension of the original DNA strand to lengths that are consistent with the length of the reverse transcription template. Of note, reactions carried out in the absence of dCas9 produced DNA products of length 51 nucleotides, regardless of the gRNA used. This product corresponds to use of the complementary DNA strand as the template for DNA synthesis and not the RNA (data not shown). Thus, Cas9 binding is required for directing DNA synthesis to the RNA template. This set of in vitro experiments closely parallels those shown in FIG. 5, except that the DNA substrate is not pre-nicked, and a Cas9 nickase (SpyCas9 H840A mutant) is used. As shown in the gel, the nickase efficiently cleaves the DNA strand when the standard gRNA is used (gRNA_0, lane 3). Multiple cleavage products are observed, consistent with prior biochemical studies of SpyCas9. The 5′ extension impairs nicking activity (lanes 4-8), but some RT product is still observed. FIG. 7 shows that 3′ extensions support DNA synthesis and do not significantly effect Cas9 nickase activity. Pre-nicked substrates (black arrow) are near-quantitatively converted to RT products when either dCas9 or Cas9 nickase is used (lanes 4 and 5). Greater than 50% conversion to the RT product (red arrow) is observed with full substrates (lane 3). To determine the length and sequence of the RT product, the product band was excised from the gel, extracted, and sequenced. This revealed that RT extended 3 nucleotides into the gRNA core's 3′ terminal hairpin. Subsequent experiments demonstrated that these three nucleotides could be changed to match target DNA sequences, so long as complementary changes were made that maintain the hairpin RNA structure.

(xi) Potential Difficulties and Alternatives

(1) RT does not function as a fusion: molecular crowding and/or unfavorable geometries could encumber polymerase extension by Cas9-fused RT enzymes. First, linker optimization can be tested. Circularly permutated variants of Cas9, which could re-orient the spatial relationship between the DNA primer, gRNA, and RT enzyme, will be evaluated. Non-covalent RT recruitment strategies as detailed in Experiment 2 can be tested. (2) Decreased Cas targeting efficiency by extended gRNA variants: this is most likely to be an issue for 5′-extended gRNAs. Based on structural data²⁴, Cas9 mutants can be designed and screened to identify variants with greater tolerance to gRNA extension. In addition, gRNA libraries could be screened in cells for linkers that improve targeting activity.

Significance

These preliminary results establish that Cas9 nickases and extended gRNAs can initiate target-primed reverse transcription on bound DNA targets using a reverse transcriptase supplied in trans. Importantly, Cas9 binding was found to be critically important for product formation. Though perhaps not an absolute requirement for genome editing in cells, further development of the system that incorporates RT enzyme function in cis would significantly increase the likelihood of success in cell-based applications. Achievement of the remaining aspects of this Aim would provide a molecular foundation for carrying out precision genome editing in the context of the human genome.

2. Establish Prime Editing in Human Cells.
Background and Rationale

In the proposed strategy, an engineered RT-Cas9:gRNA complex will introduce mutagenic 3′ DNA flaps at genomic target sites. It is hypothesized that mutagenic 3′ flaps containing a single mismatch will be incorporated by the DNA repair machinery through energetically accessible equilibration with adjacent 5′ flaps, which would be preferentially removed (FIG. 1B). The DNA replication and repair machineries encounter 5′ ssDNA flaps when processing Okazaki fragments⁴⁶and during long-patch base excision repair (LP-BER)⁴⁷. 5′ flaps are the preferred substrates for the widely expressed flap endonuclease FEN1, which is recruited to DNA repair sites by the homotrimeric sliding clamp complex PCNA⁴⁸. PCNA also serves as a scaffold for simultaneous recruitment of other repair factors including the DNA ligase Ligl⁴⁹. Acting as a ‘toolbelt’, PCNA accelerates serial flap cleavage and ligation, which is essential to processing the millions of Okazaki fragments generated during every cell division^50,51, Based on resemblance to these natural DNA intermediates, it is hypothesized that mutagenic strands would be incorporated through equilibration with 5′ flaps, followed by coordinated 5′ flap excision and ligation. Mismatch repair (MMR) should then occur on either strand with equal probability, leading to editing or reversion (FIG. 1B). Alternatively, DNA replication could occur first and lead directly to the incorporation of the edit in the newly synthesized daughter strand. While the highest expected yield from this process is 50%, multiple substrate editing attempts could drive the reaction toward completion due to the irreversibility of editing repair.

Preliminary Result

DNA flaps induce site-specific mutagenesis in plasmid model substrates in yeast and HEK cells. To test the proposed editing strategy, studies were initiated with model plasmid substrates containing mutagenic 3′ flaps that resemble the product of TPRT. A dual fluorescent protein reporter was created that encodes a stop codon between GFP and mCherry. Mutagenic flaps encode a correction to the stop codon (FIG. 9A), enabling mCherry synthesis. Thus, mutagenesis efficiency can be quantified by GFP:mCherry ratios. Plasmid substrates were prepared in vitro and introduced into yeast (S. cerevisiae) or human cells (HEK293). High frequency mutagenesis was observed in both systems (FIG. 9B), and isolated yeast colonies contained either the reverted base, the mutated base, or a mixture of both products (FIG. 9C). Detection of the latter suggests that plasmid replication occurred prior to MMR in these cases, and further suggests that flap excision and ligation precede MMR. This result establishes the feasibility of DNA editing using 3′ mutagenic strands.

(i) Systematic Studies with Model Flap Substrates

Based on the preliminary results described above, a broader spectrum of flap substrates will be evaluated in HEK cells to infer principles of efficient editing. 3′ ssDNA flaps will be systematically varied to determine the influence of mismatch pairings, the location of the mutagenic nucleotide along the flap, and the identity of the terminal nucleotide (FIG. 9D). Single nucleotide insertions and deletions will also be tested. Amplicon sequencing will be used to analyze editing precision. These results will help inform the design of gRNA reverse transcription templates.

In vitro TPRT on plasmid substrates leads to efficient editing outcomes. TPRT reactions developed in Experiment 1 were used to induce mutagenesis within a plasmid substrate. The reaction was carried out on circular DNA plasmid substrates (see FIG. 10). This rules out the possibility of DNA strand dissociation as the mechanism for RT extension in the previous in vitro experiments. It also allowed for the testing of DNA repair of flap substrates in cells. A dual-fluorescent reporter plasmid was constructed for yeast (S. cerevisiae) expression. This plasmid encodes GFP (green fluorescent protein) and mCherry (red fluorescent protein) with an intervening stop codon (TGA). Expression of this construct in yeast produces only GFP. The plasmid was used as a substrate for in vitro TRT [Cas9(H840A) nickase, engineered gRNA, MLV RT enzyme, dNTPS]. The gRNA extension encodes a mutation to the stop codon. The flap strand is used for repair of the stop codon and it is anticipated to produce a plasmid that expresses both GFP and mCherry as a fusion protein. Yeast dual-FP plasmid transformants are shown in FIG. 10. Transforming the parent plasmid or an in vitro Cas9(H840A) nicked plasmid results in only green GFP expressing colonies. TRT reaction with 5′-extended or 3′-extended gRNAs produces a mix of green and yellow colonies. The latter express both GFP and mCherry. More yellow colonies are observed with the 3′-extended gRNA. A positive control that contains no stop codon is shown as well.

This result establishes that long double stranded substrates can undergo TPRT, and that TPRT products induce editing in eukaryotic cells.

Another experiment similar to the foregoing prime editing experiment was carried out, but instead of installing a point mutation in the stop codon, prime editing installs a single nucleotide insertion (left) or deletion (right) that repairs a frameshift mutation and allows for synthesis of downstream mCherry (see FIG. 11). Both experiments used 3′ extended gRNAs. Individual colonies from the TRT transformations were selected and analyzed by Sanger sequencing (see FIG. 12). Green colonies contained plasmids with the original DNA sequence, while yellow colonies contained the precise mutation designed by the prime editing gRNA. No other point mutations or indels were observed.

(ii) Establish Prime Editing in HEK Cells Using RT-Cas9 Architectures

The optimized constructs from previous aims will be adapted for mammalian expression and editing at targeted sites in the human genome. Multiple RT enzymes and fusion architectures will be tested, in addition to adjacent targeting with secondary gRNAs (truncated to prevent nicking). Non-covalent RT recruitment will also be evaluated using the Sun-Tag system⁵²and MS2 aptamer system⁵³. Indel formation assays will be used to evaluate targeting efficiency with standard gRNAs and RT-Cas9 fusions (as above). Then, for each genomic site, extended gRNAs and RT-Cas9 pairs will be assayed for single nucleotide editing. Editing outcomes will be evaluated with MiSeq.

Initial experiments in HEK cells were performed using Cas9-RT fusions. Editing by components expressed within cells requires a Cas9(H840A) nickase, a reverse transcriptase (expressed as a fusion or supplied in trans), and an engineered gRNA with a 3′ extension (see FIG. 14). Preliminary studies indicated that the length of the primer binding site within the gRNA extension was important for increasing the efficiency of editing in human cells. (see FIG. 15).

(iii) Optimize Prime Editing Parameters in HEK Cells

After identifying Cas9-RT architectures that can perform prime editing in cells, the components and design will be optimized to achieve high efficiency editing. The location and nucleotide identity of the encoded point mutation, and the total length of the newly synthesized DNA strand, will be varied to evaluate editing scope and potential limitations. Short insertion and deletion mutations will also be evaluated. Protein expression constructs will be codon optimized. If successful, this would establish efficient prime editing in mammalian cells.

Preliminary Result. Additional gRNAs were designed to bring the RT enzyme to a higher local concentration at the editing locus, in the event that intramolecular reverse transcription by the fused RT enzyme were not possible. These auxiliary guides are truncated at the 5′ end (14-15 nt spacer), which has previously been shown to prevent Cas9 cutting but retain binding (see FIG. 16). The HEK3 locus was chosen to explore this strategy.

(iv) Potential Difficulties and Alternatives

1) gRNA degradation in cells: if extended gRNA termini are truncated in cells, stabilizing secondary structures could be installed, or synthetic gRNAs with stabilizing modifications could be tested. (2) No observed editing in human cells: additional strategies will be explored, including secondary targeting of RT-Cas9 fusions to adjacent genomic sites⁵⁴. In addition, potential directed evolution strategies in E. coli or S. cerevisiae could be explored.

Significance

If prime editing could be established in experimental cell lines, this would have an immediate impact for basic biomedical research by enabling the rapid generation and characterization of a large number of point mutations in human genes. The generality of the method, and its orthogonal editing window with respect to base editors, would provide an approach to installing many currently inaccessible mutations. Moreover, if prime editing could be optimized for high efficiency and product purity, its potential applicability to correcting disease mutations in other human cell types would be significant.

3. Achieve Site-Specific Editing of Pathogenic Mutations in Cultured Human Cells.
Background and Rationale.

A large number of pathogenic mutations cannot be corrected by current base editors due to PAM restrictions, or a need for transversion or indel mutation correction. With prime editing, all transitions and transversions are theoretically possible, as may be small insertions and deletions. Moreover, in relation to the PAM, the prime editing window (anticipated −3 to +4) is distinct from that of base editors (−18 to −12) (FIG. 13). Mendelian conditions not currently correctable by base editors include: (1) the sickle cell disease Glu6Val founder mutation in hemoglobin beta (requires A•T to T•A transversion); (2) the most common Wilson's disease variant His1069Gln in ATP7B (requires G•C to T•A transversion); and (3) the ΔPhe508 mutation in CFTR that causes cystic fibrosis (requires 3-nucleotide insertion). Each of these targets contains an appropriately positioned PAM for SpCas9 targeting and prime editing.

Preliminary Results.
(i) T to A Editing in HEK3 Cells is not Achievable by Current Base Editing but is Achievable by TRPT Editing (See FIGS. 17A-17C).

FIG. 17A shows a graph displaying the % T to A conversion at the target nucleotide after transfection of components in human embryonic kidney (HEK) cells. This data presents results using an N-terminal fusion of wild type MLV reverse transcriptase to Cas9(H840A) nickase (32-amino acid linker). Editing efficiency was improved dramatically when the length of the primer binding site is extended from 7 nucleotides to 11 or 12 nucleotides. Additionally, the auxiliary guide A, which is positioned just upstream of the editing locus (see FIG. 16), significantly improves editing activity, particularly for shorter length primer binding sites. Editing efficiency was quantified by amplicon sequencing using the Illumina MiSeq platform. FIG. 17B also shows % T to A conversion at the target nucleotide after transfection of components in human embryonic kidney (HEK) cells, but this data presents results using a C-terminal fusion of the RT enzyme. Here, the auxiliary guide A does not have as much of an effect, and editing efficiency is overall higher. FIG. 17C shows data presenting results using an N-terminal fusion of wild type MLV reverse transcriptase to Cas9(H840A) nickase similar to that used in FIG. 17A; however the linker between the MLV RT and Cas9 is 60 amino acids long instead of 32 amino acids.

(ii) T to A Editing at HEK3 Site by TRPT Editing Results Displays High Purity.

FIG. 18 shows the output of sequencing analysis by high-throughput amplicon sequencing. The output displays the most abundant genotypes of edited cells. Of note, no major indel products are obtained, and the desired point mutation (T to A) is cleanly installed without bystander edits. The first sequence shows the reference genotype. The top two products are the starting genotype containing an endogenous polymorphism (G or A). The bottom two products represent the correctly edited genotypes.

(iii) MLV RT Mutants Improve Editing.

Mutant reverse transcriptases, described in Baranauskas, et al (doi:10.1093/protein/gzs034), were tested as c-terminal fusions to the Cas9(H840A) nickase for target nucleotide editing in human embryonic kidney (HEK) cells. Cas9-RT editor plasmid was co-transfected with a plasmid encoding a 3′-prime editor guide RNA that templates reverse transcription. Editing efficiency at the target nucleotide (blue bars) is shown alongside indel rates (orange bars) in FIG. 19. WT refers to the wild type MLV RT enzyme. The mutant enzymes (M1 through M4) contain the mutations listed to the right. Editing rates were quantified by high throughput sequencing of genomic DNA amplicons.

(iv) Complementary Strand Nicking with a Second gRNA Improves Editing.

This experiment evaluates editing efficiency of the target nucleotide when a single strand nick is introduced in the complementary DNA strand in proximity to the target nucleotide, with the hypothesis being that this would direct mismatch repair to preferentially remove the original nucleotide and convert the base pair to the desired edit. The Cas9(H840A)-RT editing construct was co-transfected with two guide RNA encoding plasmids, one of which templates the reverse transcription reaction, while the other targets the complementary DNA strand for nicking. Nicking at various distances from the target nucleotide was tested (orange triangles) (see FIG. 20). Editing efficiency at the target base pair (blue bars) is shown alongside the indel formation rate (orange bars). The “none” example does not contain a complementary strand nicking guide RNA. Editing rates were quantified by high throughput sequencing of genomic DNA amplicons.

FIG. 21 shows processed high throughput sequencing data showing the desired T to A transversion mutation and general absence of other major genome editing byproducts.

Scope. The potential scope for the new editing technology is shown in FIG. 13 and is compared to deaminase-mediated base editor technologies. Previously developed base editors target a region ˜15±2 bp upstream of the PAM. By converting target C or A nucleotides to T or G, respectively, previously developed base editors enable all transition mutations (A:T to G:C conversions). However, previously developed base editors are unable to install transversion mutations (A to T, A to C, G to T, G to C, T to A, T to G, C to A, C to G). Moreover, if there are multiple target nucleotides in the editing window, additional undesired edits can result.

The new prime editing technology could theoretically install any nucleotide and base pair conversion, and potentially small insertion and deletion edits as well. With respect to the PAM, prime editing windows start at the site of DNA nicking (3 bases upstream of the PAM) and end at an as-of-yet undetermined position downstream of the PAM. Of note, this editing window is distinct from that of deaminase base editors. Because the TPRT systems performs editing using DNA polymerase enzymes, it potentially has all of their benefits including generality, precision, and fidelity.

(v) Correct Pathogenic Mutations in Patient-Derived Cell Lines.

Cell lines harboring the relevant mutations (sickle cell disease: CD34+ hematopoietic stem cells; Wilson's disease: cultured fibroblasts; cystic fibrosis: cultured bronchial epithelia) will be obtained from ATCC, the Coriell Biobank, or collaborating Harvard/Broad affiliate laboratories. Editing efficiency will be evaluated by high throughput sequencing, and the efficacy of the corrected genotype will be tested using phenotypic assays (hemoglobin HPLC, ATP7B immunostaining, and CFTR membrane potential assays).

(vi) Characterize Off-Target Editing Activity.

Potential off-target editing will be screened with established methods such as GUIDE-seq⁵⁵and CIRCLE-seq⁵⁶using target gRNAs paired with wild type Cas9. If potential off-targets are identified, these loci will be probed in TPRT edited cells to identify true off-target editing events.

(vii) Potential Difficulties and Alternatives.

(1) Low editing efficiency: prime editors may require optimization for each target. In this case, gRNA libraries can be tested to identify the highest functioning variants for specific applications. RT-Cas fusion expression and nuclear localization can be optimized. Liposomal RNP delivery could be used to limit off-target editing.

(viii) Upcoming Experiments.

Optimization of gRNA designs can be achieved by further exploration of the primer binding site length and extension of edit template. Testing scope and generality will include different nucleotide conversions, small insertions and deletions, as well as, different editing positions with respect to PAM, and multiple sites in the human genome. Optimization of RT component will include exploring mutations in MLV RT to enhance activity (Rnase H inactivation, increase primer-template binding affinity, adjustments to processivity), and new RT enzymes (group II intro RTs, other retroviral RTs).

Significance.

Myriad genetic disorders result from single nucleotide changes in individual genes. Developing the genome editing technology described here, and applying it in disease-relevant cell types, would establish a foundation for translation to the clinic. For some diseases, such as Sickle Cell Disease, a single point mutation represents the dominant genotype throughout the population. However, for many other genetic disorders, a large heterogeneity of different point mutations within a single gene is observed throughout the patient population, each of which gives rise to a similar disease phenotype. Therefore, as a general genome editing method that could in theory target a large number of such mutations, this technology could provide enormous potential benefit to many of these patients and their families. If proof of principle for these applications could be established in cells, it would establish the foundation to studies in animal models of disease.

Advantages

Precision: the desired edit is encoded directed in nucleic acid sequence. Generality: in theory, could be possible to make any base pair conversion, including transversion edits, as well as small insertions or deletions. There is a distinct editing window from that of base editors with respect to Cas9 protospacer adjacent motif (PAM) sequence. This method achieves many of the editing capabilities of homology-directed repair (HDR), but without the major limitations of HDR (inefficient in most cell types, and is usually accompanied by an excess of undesired byproducts such as indels). Also, it does not make double-stranded DNA breaks (DSBs, so few indels, translocations, large deletions, p53 activation, etc.

Example 2. A Process for Designing Therapeutic Prime Editor Guide RNAs (PEgRNA) and Potential Targets and PEgRNA for Correcting Pathogenic Human Gene Variants with Prime Editing
Introduction

Prime editing is a transformative tool for genome editing. Among its many possible applications, prime editing represents a novel strategy for correcting pathogenic mutations, with potential therapeutic benefit. Clinvar is a publicly accessible database of reported human mutations that may be associated with disease.

This Example demonstrates the design of a computer program/algorithm that determines PEgRNA sequences using a simple procedure and applied it to 436,042 unique Clinvar mutations. This Examples first describes the PEgRNA structure and functional elements, and then describes design procedure, including definitions provides the following contributions description of the computer program/algorithm that uses a simple procedure to design potentially curative PEgRNA, and a discussion on variations of PEgRNA design.

In addition, this Example provides the Sequence Listing filed herewith comprising exemplary PEgRNA sequences for correcting potentially pathogenic mutations from Clinvar. A description of the Sequence Listing is provided above.

PEgRNA Structure

FIG. 27 and FIG. 28 provide schematics of two possible configurations of PEgRNA that may be designed in accordance with this Example.

The PEgRNA structural elements can be described, as follows:

Spacer: The portion of the gRNA, typically 20-nt in length, that is complementary to the target DNA sequence in the genome. The spacer is complementary to the protospacer in the DNA. The spacer can be modified in various non-limiting ways. For example, the spacer may be modified to add a 5′-G to form a 21-nt spacer when the 5′-most nucleotide is not a G. This modification improves transcription from a U6 promoter.

gRNA core or “gRNA backbone”: The portion of the gRNA is generally positions on the 3′ side of the spacer and includes several hairpins that interact with the Cas9 protein. Exemplary gRNA cores can include, for example:

gRNA backbone sequence 1:

(SEQ ID NO: 1361579)

GTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGT

CCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC (86 nt).

gRNA backbone sequence 2:

(SEQ ID NO: 1361580)

GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAA

CTTGAAAAAGTGGCACCGAGTCGGTGC (76 nt).

The PEgRNA contemplated herein may employ other gRNA core sequences which maintain the same or similar capacity of the gRNA to bind to Cas9.

Extension arm: The PEgRNA contemplated herein comprise an extension arm, which comprises various functional elements including the primer binding site, the edit template, and the homology arm. The extension arm can be positioned at the 3′ end of the gRNA core. In other embodiments, the extension arm can be positioned at the 5′ end of the spacer. The extension arm is almost complementary to the genomic sequence context, with the desired edit representing the non-complementary section. The sequence of the extension arm can be varied to include the following variations: variations in the length of the primer binding site (see below); variations in the length of the encoded regions.

Primer binding site: The primer binding site is at the 3′ end of the PEgRNA extension arm and is complementary to ssDNA flap displaced by the spacer following the Cas-mediated nick. The ssDNA:RNA hybrid serves as a primer for reverse transcription of the remainder of the PEgRNA extension in the 5′ to 3′ polymerization direction. For SpCas9, the RuvC nick is positioned between protospacer positions −4 and −3, where −1 is the 3′-most nucleotide and −20 is the 5′-most nucleotide of a 20-nt spacer. A primer binding site of length X is designed as the X nucleotides starting 3′ of spacer position −4 up to and including spacer position −4. Common variations include different lengths, ranging from between about 8 nt to about 20 nt (e.g., between 8 and 17 nts).

Edit template: The edit template is adjacent the 5′ end of the primer binding site and encodes the desired edited sequence (e.g., single nucleotide change, deletion, or insertion).

Homology arm: The homology arm is position on the 5′ end of the edit template and is complementary to the native genomic context.

Reverse transcription template or the region encoding the ssDNA flap that becomes incorporated into the endogenous DNA: The PEgRNA extension excluding the primer binding site (since the primer binding site does not become a template for ssDNA polymerization by reverse transcriptase. The reverse transcriptase template contains the edit template and the homology arm. Following unbinding of the spacer from the protospacer, the reverse transcriptase primer ssDNA sequence rebinds to its native genomic context, leaving the reverse transcriptase template as a 3′ ssDNA flap. Common variations include: flap length can vary, e.g., from 7 nt to 34 nt (but a wide range of flap lengths are contemplated, as described herein); the position of the edit within the flap can vary. Successful editing has been observed with homology arm length as short as 2 nt. Edits are typically, but do not have to be, positioned as close to the nick as possible.

Transcription terminator sequence: A sequence at the 3′ end of the PEgRNA that terminates transcription during the production of the PEgRNA, e.g., when expressed from a U6 promoter or other promoter. An exemplary terminator sequence is TTTTTTGTTTT (SEQ ID NO: 1361581).

Additional variations on PEgRNA design are feasible and contemplated herein. For example, it is conceivable that PEgRNA may be designed with PEgRNA extension arms containing non-complementary sequences 5′ of the homology arm or 3′ of the primer binding site or both, for instance, to form a kissing loop interaction, or to act as a protecting hairpin for RNA stability. These sequences are represented in FIGS. 27 and 28 at sequence elements e1 and e2 and referred to as “optional 3′ or 5′ end modifier regions.” In addition, it is conceivable that PEgRNA may be designed using strategies and methods that prioritize between multiple design candidates. Examples include avoiding PEgRNA extensions where the 5′-most nucleotide is a cytosine due to interrupting native nucleotide-protein interactions in the sgRNA:Cas9 complex, or using RNA secondary structure prediction tools to select a preferred PBS length and flap length given a spacer and desired edit.

PEgRNA Design Algorithm

Given an input allele, an output allele, and a CRISPR system for prime editing (importantly, the PAM motif and the relative position of the prime editor's nick), the algorithm designs a PEgRNA or a list of PEgRNA that are capable of editing the input allele to the output allele. The sequence difference between the output and input alleles is referred to as the desired edit. An embodiment of this may be an input allele representing a pathogenic mutation and an output allele representing the corrected wild-type sequence.

The classes of edits that can be induced by a single PEgRNA include single nucleotide substitutions, insertions from 1 nt up to approximately 40 nt, deletions from 1 nt up to approximately 30 nt, and a combination or mixture of all of the above. Prime editing is known to support edits of these types primarily from spacer position −3 (immediately 3′ of the nick) to spacer position +27 (30 nt 3′ of the nick in the input allele). Each of these specified numbers represents a parameter of the algorithm that may change over time as knowledge about prime editing increases. As an aside, edits at protospacer position −4 using the SpCas9 system have been observed with prime editing which are likely caused by occasional RuvC cleavage between protospacer positions −5 and −4.

The algorithm enumerates all spacers compatible with the PAM motif of the selected CRISPR system in the input allele on both strands, then filters protospacers whose associated nick positions are incompatible with prime editing to the output allele, either because the nick is on the 3′ side of the desired edit on that strand, or because the distance between the nick and the desired edit is too large (greater than a user-defined threshold, for example 30 nt).

For each spacer, the algorithm constructs a spacer and edit template sequence using the sequence of the input allele, the position of the nick, and the sequence of the desired edit. The algorithm then selects one or more values for primer binding site length, which can vary from 8 nt to 17 nt; homology arm length, which can vary from 2 nt to 33 nt; and gRNA backbone sequence, which can be GTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATC AACTTGAAAAAGTGGCACCGAGTCGGTGC (SEQ ID NO: 1361579) or GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAA AGTGGCACCGAGTCGGTGC (SEQ ID NO: 1361580) or another gRNA backbone sequence that retains wild-type RNA secondary structure. For each parameter combination selected, the algorithm constructs a homology arm, primer binding site sequence, and gRNA backbone using the given parameters. The algorithm then forms a PEgRNA sequence by concatenating the protospacer, gRNA backbone, PEgRNA extension, and terminator sequence.

After basic filtering to germline mutations annotated as pathogenic or likely pathogenic, 72,020 unique Clinvar mutations were identified where are compatible with prime editing with Cas9-NG and 63,496 unique Clinvar mutations compatible with prime editing with SpCas9 with an NGG PAM. Note that additional mutations would be correctable if one were to use a prime editor containing a different Cas9 variant with different PAM compatibility.

These mutations were classified into four classes of clinical significance using minor allele frequency, number of submitters, whether or not submitters conflicted in their interpretations, and whether or not the mutation was reviewed by an expert panel.

Among the 63,496 SpCas9-compatible mutations:

- 4,627 mutations are identified at the most significant level (four)
- 13,943 mutations are identified at significance levels three or four
- 44,385 mutations are identified at significance levels two, three, or four.

The provided Sequence Listing enumerates a single PEgRNA per unique mutation, selected as the PEgRNA with the shortest distance between the nick and the edit. PEgRNA were designed with homology arm length of 13 and a primer binding site length of 13. Spacers with nick sites farther than 20 nt to the edit were disregarded. The gRNA backbone sequence used was GTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATC AACTTGAAAAAGTGGCACCGAGTCGGTGC (SEQ ID NO: 1361579). The terminator sequence used was TTTTTTGTTTT (SEQ ID NO: 1361581).

The Sequence Listing includes a description of each of the PEgRNA determined in accordance with Example 2. In total, Example 2 determines the sequence of 133515 exemplary PEgRNA complete sequences. Each of these sequences of presented/included in the Sequence Listing and identified as SEQ ID NOs: 1-135514. In addition, and as described elsewhere, the PEgRNA are each comprised of a spacer (SEQ ID NOs: 135515-271028) and an extension arm (SEQ ID NOs: 271029-406542). In addition, each PEgRNA comprises a gRNA core, for example, as defined by SEQ ID NOs: 1361579-1361580. The extension arms of SEQ ID NOs: 271029-406542 are further each comprised of a primer binding site (SEQ ID NOs.: 406543-542056), an edit template (SEQ ID NOs.: 542057-677570), and a homology arm (SEQ ID NOs.: 677571-813084). The PEgRNA optionally may comprise a 5′ end modifier region (SEQ ID NOs: EE-EE) and/or a 3′ end modifier region (SEQ ID NOs: FF-FF). The PEgRNA may also comprise a reverse transcription termination signal (e.g., SEQ ID NOs: 1361560-1361565) at the 3′ of the PEgRNA.

For each PEgRNA sequence (e.g., SEQ ID NO: 1) in the Sequence Listing, the following sequences in the Sequence Listing constitute a set of corresponding subsequences: the (1) spacer, (2) extension arm, (3) primer binding site, (4) edit template, and (5) homology arm, as follows:

The total number of sequences provided in the Sequence Listing is 813084. There are total of 135514 PEgRNA complete sequences (each comprising at least a spacer, gRNA core, and extension arm). There are same number of (1) spacers, (2) extension arms, (3) primer binding sites, (4) edit templates, and (5) homology arms, with sets of each defined as above.

Examples of other PEgRNA sequence sets (i.e., comprising any given PEgRNA and the corresponding spacer, extension arm, primer binding site, edit template, and homology arm) are presented in the following table:

complete

Extension
primer
edit
homology

PEgRNA
spacer
arm
binding site
template
arm

SEQ ID NOs.:
1
135,515
271,029
406,543
542,057
677,571

SEQ ID NOs.:
2
135,516
271,030
406,544
542,058
677,572

SEQ ID NOs.:
3
135,517
271,031
406,545
542,059
677,573

SEQ ID NOs.:
4
135,518
271,032
406,546
542,060
677,574

SEQ ID NOs.:
5
135,519
271,033
406,547
542,061
677,575

SEQ ID NOs.:
6
135,520
271,034
406,548
542,062
677,576

. . .

SEQ ID NOs.:
135,509
271,023
406,537
542,051
677,565
813,079

SEQ ID NOs.:
135,510
271,024
406,538
542,052
677,566
813,080

SEQ ID NOs.:
135,511
271,025
406,539
542,053
677,567
813,081

SEQ ID NOs.:
135,512
271,026
406,540
542,054
677,568
813,082

SEQ ID NOs.:
135,513
271,027
406,541
542,055
677,569
813,083

SEQ ID NOs.:
135,514
271,028
406,542
542,056
677,570
813,084

Variations on the provided PEgRNA designs include all variations previously discussed, including varying the gRNA backbone sequence, primer binding site length, flap length, etc.

REFERENCES (FOR EXAMPLE 2)

Landrum, M. J., Lee, J. M., Riley, G. R., Jang, W., Rubinstein, W. S., Church, D. M., & Maglott, D. R. (2014). ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic acids research, 42(Database issue), D980-D985. doi:10.1093/nar/gkt1113

Stenson, P. D., Mort, M., Ball, E. V., Evans, K., Hayden, M., Heywood, S., . . . Cooper, D. N. (2017). The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Human genetics, 136(6), 665-677. doi:10.1007/s00439-017-1779-6

Nishimasu, H., Ran, F. A., Hsu, P. D., Konermann, S., Shehata, S. I., Dohmae, N., . . . Nureki, O. (2014). Crystal structure of Cas9 in complex with guide RNA and target DNA. Cell, 156(5), 935-949. doi:10.1016/j.cell.2014.02.001

Lorenz, R., Bernhart, S. H., Höner Zu Siederdissen, C., Tafer, H., Flamm, C., Stadler, P. F., & Hofacker, I. L. (2011). ViennaRNA Package 2.0. Algorithms for molecular biology: AMB, 6, 26. doi:10.1186/1748-7188-6-26

Example 3. Design and Engineering of PEgRNA for Prime Editing
Summary

Described herein is a series of PEgRNA designs and strategies that can improve prime editing (PE) efficiency.

Background

Prime Editing (PE) is a genome editing technology that can replace, insert, or remove defined DNA sequences within a targeted genetic locus using information encoded within a prime editor guide RNA (PEgRNA). Prime editors (PEs) consist of a sequence-programmable DNA binding protein with nuclease activity (Cas9) fused to a reverse transcriptase (RT) enzyme. PEs form complexes with PEgRNA, which contain the information for targeting specific DNA loci within their spacer sequences, as well as information specifying the desired edit in an engineered extension built into a standard sgRNA scaffold. PE:PEgRNA complexes bind and nick the programmed target DNA locus, allowing hybridization of the nicked DNA strand to the engineered primer binding sequence (PBS) of the PEgRNA. The reverse transcriptase domain then copies the edit-encoding information within the RT template portion of the PEgRNA, using the nicked genomic DNA as a primer for DNA polymerization. Subsequent DNA repair processes incorporate the newly synthesized edited DNA strand into the genomic locus. While the versatility of prime editing holds great promise as a research tool and potential therapeutic, several limitations in efficiency and scope exist due to the multi-step process required for editing. For example, unfavorable RNA structures that form within the PEgRNA can inhibit the copying of DNA edits from the PEgRNA to the genomic locus. One potential way to improve PE technology is through redesign and engineering of the critical PEgRNA component. Improvements to the design of these PEgRNA are likely to be necessary for improved PE efficiency, as well as enable installation of longer inserted sequences into the genome.

Description

Described herein is a series of PEgRNA designs that are envisioned to improve the efficacy of PE. These designs take advantage of a number of previously published approaches for improving sgRNA efficacy and/or stability, as well as utilize a number of novel strategies. These improvements can belong to one or more of a number of different categories: i) designs to enable efficient expression of functional PEgRNA from non-polymerase III (pol III) promoters, which would enable the expression of longer PEgRNA without burdensome sequence requirements; ii) improvements to the core, Cas9-binding PEgRNA scaffold, which could improve efficacy; iii) modifications to the PEgRNA to improve RT processivity, enabling the insertion of longer sequences at targeted genomic loci; iv) addition of RNA motifs to the 5′ or 3′ termini of the PEgRNA that improve PEgRNA stability, enhance RT processivity, prevent misfolding of the PEgRNA, or recruit additional factors important for genome editing. Described herein are a number of potential such PEgRNA designs in each category. Several of these designs have been previously described for improving sgRNA activity with Cas9 and are indicated as such. Also described herein is a platform for the evolution of PEgRNA for given sequence targets that would enable the polishing of the PEgRNA scaffold and enhance PE activity (v). Notably, these designs could also be readily applied to improve PEgRNA recognized by any Cas9 or evolved variant thereof.

(i) Expression of PEgRNA from Non-Pol III Promoters

sgRNAs are typically expressed from the U6 snRNA promoter. This promoter recruits pol III to express the associated RNA and is useful for expression of short RNAs that are retained within the nucleus. However, pol III is not highly processive and is unable to express RNAs longer than a few hundred nucleotides in length at the levels required for efficient genome editing¹⁸³. Additionally, pol III can stall or terminate at stretches of U's, potentially limiting the sequence diversity that could be inserted using a PEgRNA. Other promoters that recruit polymerase II (such as pCMV) or polymerase I (such as the U1 snRNA promoter) have been examined for their ability to express longer sgRNAs¹⁸³. However, these promoters are typically partially transcribed, which would result in extra sequence 5′ of the spacer in the expressed PEgRNA, which has been shown to result in markedly reduced Cas9:sgRNA activity in a site-dependent manner. Additionally, while pol III-transcribed PEgRNA can simply terminate in a run of 6-7 U's, PEgRNA transcribed from pol II or pol I would require a different termination signal. Often such signals also result in polyadenylation, which would result in undesired transport of the PEgRNA from the nucleus. Similarly, RNAs expressed from pol II promoters such as pCMV are typically 5′-capped, also resulting in their nuclear export.

Previously, Rinn and coworkers screened a variety of expression platforms for the production of long-noncoding RNA-(lncRNA) tagged sgRNAs¹⁸³. These platforms include RNAs expressed from pCMV and that terminate in the ENE element from the MALAT1 ncRNA from humans¹⁸⁴, the PAN ENE element from KSHV¹⁸⁵, or the 3′ box from U1 snRNA¹⁸⁶. Notably, the MALAT1 ncRNA and PAN ENEs form triple helices protecting the polyA-tail^{184, 187}. It is anticipated that, in addition to enabling expression of RNAs, these constructs could also enhance RNA stability (see section iv). Using the promoter from the U1 snRNA to enable expression of these longer sgRNAs¹⁸³was also explored. It is anticipated that these expression systems will also enable the expression of longer PEgRNA. In addition, a series of methods have been designed for the cleavage of the portion of the pol II promoter that would be transcribed as part of the PEgRNA, adding either a self-cleaving ribozyme such as the hammerhead¹⁸⁸, pistol¹⁸⁹, hatchet¹⁸⁹, hairpin¹⁹⁰, VS¹⁹¹, twister¹⁹², or twister sister¹⁹²ribozymes, or other self-cleaving elements to process the transcribed guide, or a hairpin that is recognized by Csy4¹⁹³and also leads to processing of the guide. Also, is hypothesized that incorporation of multiple ENE motifs could lead to improved PEgRNA expression and stability, as previously demonstrated for the KSHV PAN RNA and element¹⁸⁵. It is also anticipated that circularizing the PEgRNA in the form of a circular intronic RNA (ciRNA) could also lead to enhanced RNA expression and stability, as well as nuclear localization¹⁹⁴.

Sequences:

PEgRNA expression platform consisting of pCMV, Csy4 hairing, the PEgRNA, and

MALAT1 ENE

TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTT

ACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAA

TAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTA

TTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATT

GACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTC

CTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTA

CATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTC

AATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCC

CATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGT

GAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCT

AGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTC

CTCTGCCATCAAAGCGTGCTCAGTCTGTTTTAGGGTCATGAAGGTTTTTCTTTTCCTGAGAAAA

CAACACGTATTGTTTTCTCAGGTTTTGCTTTTTGGCCTTTTTCTAGCTTAAAAAAAAAAAAAGC

AAAAGATGCTGGTGGTTGGCACTCCTGGTTTCCAGGACGGGGTTCAAATCCCTGCGGCGTCTTT

GCTTTGACT (SEQ ID NO: 1361567)

PEgRNA expression platform consisting of pCMV, Csy4 hairing, the PERNA, and

PAN ENE

AATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCC

ACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAA

GACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTC

CTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTA

CATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTC

TAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTA

TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTT

TTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATT

CATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGT

GAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCT

AGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTC

CTCTGCCATCAAAGCGTGCTCAGTCTGTTTTGTTTTGGCTGGGTTTTTCCTTGTTCGCACCGGA

CACCTCCAGTGACCAGACGGCAAGGTTTTTATCCCAGTGTATATTGGAAAAACATGTTATACTT

TTGACAATTTAACGTGCCTAGAGCTCAAATTAAACTAATACCATAACGTAATGCAACTTACAAC

ATAAATAAAGGTCAATGTTTAATCCATAAAAAAAAAAAAAAAAAAA (SEQ ID NO:

1361568)

PEgRNA expression platform consisting of pCMV, Csy4 hairing, the PERNA, and

3xPAN ENE

TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTT

ACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAA

TAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTA

TTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATT

GACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTC

CTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTA

CATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTC

AATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCC

CATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGT

GAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCT

AGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTC

CTCTGCCATCAAAGCGTGCTCAGTCTGTTTTGTTTTGGCTGGGTTTTTCCTTGTTCGCACCGGA

CACCTCCAGTGACCAGACGGCAAGGTTTTTATCCCAGTGTATATTGGAAAAACATGTTATACTT

TTGACAATTTAACGTGCCTAGAGCTCAAATTAAACTAATACCATAACGTAATGCAACTTACAAC

ATAAATAAAGGTCAATGTTTAATCCATAAAAAAAAAAAAAAAAAAAACACACTGTTTTGGCTGG

GTTTTTCCTTGTTCGCACCGGACACCTCCAGTGACCAGACGGCAAGGTTTTTATCCCAGTGTAT

ATTGGAAAAACATGTTATACTTTTGACAATTTAACGTGCCTAGAGCTCAAATTAAACTAATACC

ATAACGTAATGCAACTTACAACATAAATAAAGGTCAATGTTTAATCCATAAAAAAAAAAAAAAA

AAAATCTCTCTGTTTTGGCTGGGTTTTTCCTTGTTCGCACCGGACACCTCCAGTGACCAGACGG

CAAGGTTTTTATCCCAGTGTATATTGGAAAAACATGTTATACTTTTGACAATTTAACGTGCCTA

GAGCTCAAATTAAACTAATACCATAACGTAATGCAACTTACAACATAAATAAAGGTCAATGTTT

AATCCATAAAAAAAAAAAAAAAAAAA (SEQ ID NO: 1361569)

PERNA expression platform consisting of pCMV, Csy4 hairing, the PERNA, and 3′

box

TTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATT

AATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCC

CATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGT

GACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTC

CTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTA

CATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTC

ACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAA

TAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTA

TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTT

GAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCT

AGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTC

CTCTGCCATCAAAGCGTGCTCAGTCTGTTTGTTTCAAAAGTAGACTGTACGCTAAGGGTCATAT

CTTTTTTTGTTTGGTTTGTGTCTTGGTTGGCGTCTTAAA (SEQ ID NO: 1361570)

PEgRNA expression platform consisting of pU1, Csy4 hairpin, the PEgRNA, and 3′

box

CTAAGGACCAGCTTCTTTGGGAGAGAACAGACGCAGGGGGGGGAGGGAAAAAGGGAGAGGCAGA

CGTCACTTCCCCTTGGCGGCTCTGGCAGCAGATTGGTCGGTTGAGTGGCAGAAAGGCAGACGGG

GACTGGGCAAGGCACTGTCGGTGACATCACGGACAGGGCGACTTCTATGTAGATGAGGCAGCGC

AGAGGCTGCTGCTTCGCCACTTGCTGCTTCACCACGAAGGAGTTCCCGTGCCCTGGGAGCGGGT

TCAGGACCGCTGATCGGAAGTGAGAATCCCAGCTGTGTGTCAGGGCTGGAAAGGGCTCGGGAGT

GCGCGGGGCAAGTGACCGTGTGTGTAAAGAGTGAGGCGTATGAGGCTGTGTCGGGGCAGAGGCC

CAAGATCTCAGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGA

AATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTC

TGCCATCAAAGCGTGCTCAGTCTGTTTCAGCAAGTTCAGAGAAATCTGAACTTGCTGGATTTTT

GGAGCAGGGAGATGGAATAGGAGCTTGCTCCGTCCACTCCACGCATCGACCTGGTATTGCAGTA

CCTCCAGGAACGGTGCACCCACTTTCTGGAGTTTCAAAAGTAGACTGTACGCTAAGGGTCATAT

CTTTTTTTGTTTGGTTTGTGTCTTGGTTGGCGTCTTAAA (SEQ ID NO: 1361571)

(ii) Improvements to the PEgRNA Scaffold

Sequences:

PEgRNA containing a 6 nt extension to P1

(SEQ ID NO: 1361572)

GGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGCTCATGAAAATGAGCT

AGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACC

GAGTCGGTCCTCTGCCATCAAAGCGTGCTCAGTCTGTTTTTTT

PEgRNA containing a T-A to G-C mutation within P1

(SEQ ID NO: 1361573)

ATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTG

CTCAGTCTGTTTTTTGGCCCAGACTGAGCACGTGAGTTTGAGAGCTAGA

AATAGCAAGTTTAAATAAGGCTAGTCCGTTT

(iii) Improvement of RT Processivity Via Modifications to the Template Region of the PEgRNA

As the size of the insertion templated by the PEgRNA increases, it is more likely to be degraded by endonucleases, undergo spontaneous hydrolysis, or fold into secondary structures unable to be reverse-transcribed by the RT or that disrupt folding of the PEgRNA scaffold and subsequent Cas9-RT binding. Accordingly, it is likely that modification to the template of the PEgRNA might be necessary to affect large insertions, such as the insertion of whole genes. Some strategies to do so include the incorporation of modified nucleotides within a synthetic or semi-synthetic PEgRNA that render the RNA more resistant to degradation or hydrolysis or less likely to adopt inhibitory secondary structures¹⁹⁶. Such modifications could include 8-aza-7-deazaguanosine, which would reduce RNA secondary structure in G-rich sequences; locked-nucleic acids (LNA) that reduce degradation and enhance certain kinds of RNA secondary structure; 2′-O-methyl, 2′-fluoro, or 2′-O-methoxyethoxy modifications that enhance RNA stability. Such modifications could also be included elsewhere in the PEgRNA to enhance stability and activity. Alternatively or additionally, the template of the PEgRNA could be designed such that it both encodes for a desired protein product and is also more likely to adopt simple secondary structures that are able to be unfolded by the RT. Such simple structures would act as a thermodynamic sink, making it less likely that more complicated structures that would prevent reverse transcription would occur. Finally, one could also imagine splitting the template into two, separate PEgRNA. In such a design, a PE would be used to initiate transcription and also recruit a separate template RNA to the targeted site via an RNA-binding protein fused to Cas9 or an RNA recognition element on the PEgRNA itself such as the MS2 aptamer. The RT could either directly bind to this separate template RNA, or initiate reverse transcription on the original PEgRNA before swapping to the second template. Such an approach could enable long insertions by both preventing misfolding of the PEgRNA upon addition of the long template and also by not requiring dissociation of Cas9 from the genome for long insertions to occur, which could possibly be inhibiting PE-based long insertions.

(iv) Installation of Additional RNA Motifs at the 5′ or 3′ Termini

PEgRNA designs could also be improved via the installation of additional motifs at either end of the terminus of the RNA. Several such motifs—such as the PAN ENE from KSHV and the ENE from MALAT1 were discussed earlier in part (i)^184,185as possible means to terminate expression of longer PEgRNA from non-pol III promoters. These elements form RNA triple helices that engulf the polyA tail, resulting in their being retained within the nucleus^184,187. However, by forming complex structures at the 3′ terminus of the PEgRNA that occlude the terminal nucleotide, these structures would also likely help prevent exonuclease-mediated degradation of PEgRNA. Other structural elements inserted at the 3′ terminus could also enhance RNA stability, albeit without enabling termination from non-pol III promoters. Such motifs could include hairpins or RNA quadruplexes that would occlude the 3′ terminus¹⁹⁷, or self-cleaving ribozymes such as HDV that would result in the formation of a 2′-3′-cyclic phosphate at the 3′ terminus and also potentially render the PEgRNA less likely to be degraded by exonucleases¹⁹⁸. Inducing the PEgRNA to cyclize via incomplete splicing—to form a ciRNA—could also increase PEgRNA stability and result in the PEgRNA being retained within the nucleus¹⁹⁴.

Additional RNA motifs could also improve RT processivity or enhance PEgRNA activity by enhancing RT binding to the DNA-RNA duplex. Addition of the native sequence bound by the RT in its cognate retroviral genome could enhance RT activity¹⁹⁹. This could include the native primer binding site (PBS), polypurine tract (PPT), or kissing loops involved in retroviral genome dimerization and initiation of transcription¹⁹⁹. Addition of dimerization motifs—such as kissing loops or a GNRA tetraloop/tetraloop receptor pair²⁰⁰—at the 5′ and 3′ termini of the PEgRNA could also result in effective circularization of the PEgRNA, improving stability. Additionally, it is envisioned that addition of these motifs could enable the physical separation of the PEgRNA spacer and primer, prevention occlusion of the spacer which would hinder PE activity. Short 5′ extensions to the PEgRNA that form a small toehold hairpin in the spacer region could also compete favorably against the annealing region of the PEgRNA binding the spacer. Finally, kissing loops could also be used to recruit other template RNAs to the genomic site and enable swapping of RT activity from one RNA to the other (section iii).

Sequences

PEgRNA-HDV fusion

(SEQ ID NO: 1361574)

GGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAA

TAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTC

TGCCATCAAAGCGTGCTCAGTCTGGGCCGGCATGGTCCCAGCCTCCTCG

CTGGCGCCGGCTGGGCAACATGCTTCGGCATGGCGAATGGGACTTTTTT

T

PEgRNA-MMLV kissing loop

(SEQ ID NO: 1361575)

GGTGGGAGACGTCCCACCGGCCCAGACTGAGCACGTGAGTTTTAGAGCT

AGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGT

GGGACCGAGTCGGTCCTCTGCCATCAAAGCTTCGACCGTGCTCAGTCTG

GTGGGAGACGTCCCACCTTTTTTT

PEgRNA-VS ribozyme kissing loop

(SEQ ID NO: 1361576)

GAGCAGCATGGCGTCGCTGCTCACGGCCCAGACTGAGCACGTGAGTTTT

AGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGA

AAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCTTCGACCGTGCTC

AGTCTCCATCAGTTGACACCCTGAGGTTTTTTT

PEgRNA-GNRA tetraloop/tetraloop receptor

(SEQ ID NO: 1361577)

GCAGACCTAAGTGGUGACATATGGTCTGGGCCCAGACTGAGCACGTGAG

TTTTAGAGCTAUACGTAGCAAGTTAAAATAAGGCTAGTCCGTTATCAAC

TTUACGAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCTTCGACCGT

GCTCAGTCTGCATGCGATTAGAAATAATCGCATGTTTTTTT

PEgRNA template switching secondary RNA-HDV

fusion

(SEQ ID NO: 1361578)

TCTGCCATCAAAGCTGCGACCGTGCTCAGTCTGGTGGGAGACGTCCCAC

CGGCCGGCATGGTCCCAGCCTCCTCGCTGGCGCCGGCTGGGCAACATGC

TTCGGCATGGCGAATGGGACTTTTTTT

(v) Evolution of PEgRNA

It is likely that the PEgRNA scaffold can be further improved via directed evolution, in an analogous fashion to how SpCas9 and base editors have been improved²⁰¹. Directed evolution could enhance PEgRNA recognition by Cas9 or evolved Cas9 variants. Additionally, it is likely that different PEgRNA scaffold sequences would be optimal at different genomic loci, either enhancing PE activity at the site in question, reducing off-target activities, or both. Finally, evolution of PEgRNA scaffolds to which other RNA motifs have been added would almost certainly improve the activity of the fused PEgRNA relative to the unevolved, fusion RNA. For instance, evolution of allosteric ribozymes composed of c-di-GMP-I aptamers and hammerhead ribozymes led to dramatically improved activity²⁰², suggesting that evolution would improve the activity of hammerhead-PEgRNA fusions as well. In addition, while Cas9 currently does not generally tolerate 5′ extension of the sgRNA, directed evolution will likely generate enabling mutations that mitigate this intolerance, allowing additional RNA motifs to be utilized.

Competing Approaches

As described herein, a number of these approaches have already been described for use with Cas9:sgRNA complexes, but no designs for improving PEgRNA activity have been reported. Other strategies for the installation of programmable mutations into the genome include base-editing, homology-directed recombination (HDR), precise microhomology-mediated end-joining (MMEJ), or transposase-mediated editing. However, all of these approaches have significant drawbacks when compared to PEs. Current base editors, while more efficient than existing PEs, can only install certain classes of genomic mutations and can result in additional, undesired nucleotide conversions at the site of interest. HDR is only feasible in a very small minority of cell types and results in comparably high rates of random insertion and deletion mutations (indels). Precise MMEJ can lead to predictable repair of double-strand breaks, but is largely limited to installation of deletions, is very site-dependent, and can also have comparably high rates of undesired indels. Transposase-mediated editing has to date only been shown to function in bacteria. As such improvements to PE represent possibly the best path forward for the therapeutic correction of a wide-swatch of genomic mutations.

Example 4. Incorporation of 3′ Toe Loop in the Primer Binding Site (PBS) Improves PEgRNA Activity

In order to further improve PE activity, the inventors contemplated adding a toeloop sequence at the 3′ end of a PEgRNA having a 3′ extension arm. FIG. 37A provides an example of a generic SpCas9 PEgRNA having a 3′ extension arm (top molecule). The 3′ extension arm, in turn, comprises an RT template (that includes that the desired edit) and a primer binding site (PBS) at the 3′ end of the molecule. The molecule terminates with a poly(U) sequence comprising three U nucleobases (i.e., 5′-UUU-3′).

By contrast, the bottom portion of FIG. 37A shows the same PEgRNA molecule as the top portion of FIG. 37A, but wherein a 9-nucleobase sequence of 5′-GAAANNNNN-3′ has been inserted between the 3′ end of the primer binding site and the 5′ end of the terminal poly(U) sequence. This structure folds back on itself by 1800 to form a “toeloop” RNA structure, wherein the sequences of 5′-NNNNN-3′ of the 9-nucleobase insertion anneals with a complementary sequence in the primer binding site, and wherein the 5′-GAAA-3′ portion forms the 1800 turn. The features of the toeloop sequence depicted in FIG. 37A is not intended to limit or narrow the scope of possible toeloops that could be used in its place. Further, the sequence of the toeloop will depend upon the complementary sequence of the primer binding site. Essentially though, the toeloop sequence, in various embodiments, may have a first sequence portion that forms a 180°, and a second sequence portion that has a sequence that is complementary to a portion of the primer binding site.

Without being bound by theory, the toeloop sequence is thought to enable PEgRNA the use of PEgRNAs with increasingly longer primer binding sites than would otherwise be possible. Longer PBS sequences, in turn, are thought to improve PE activity. PEgRNA More in particular, the likely function of the toeloop is to occlude or at least minimize the PBS from interacting with the spacer. Stable hairpin formation between the PBS and the spacer can lead to an inactive PEgRNA. Without a toeloop, this interaction may require restricting the length of the PBS. Blocking or minimizing the interaction between the spacer and the PBS using a 3′ end toeloop may lead to an improvement in PE activity.

FIG. 37B shows the results of Example 4, which demonstrates that the efficiency of prime editing in HEK cells or EMX cells is increased using PEgRNA containing toeloop elements, whereas the percent of indel formation is largely unchanged.

Embodiments

The following embodiments are within the scope of the present disclosure. Furthermore, the disclosure encompasses all variations, combinations, and permutations of these embodiments in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed embodiments is introduced into another listed embodiment in this section. For example, any listed embodiment that is dependent on another embodiment can be modified to include one or more limitations found in any other listed embodiment in this section that is dependent on the same base embodiment. Where elements are presented as lists, e.g., in Markush group format, each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should it be understood that, in general, where the disclosure, or aspects of the disclosure, is/are referred to as comprising particular elements and/or features, certain embodiments of the invention or aspects of the invention consist, or consist essentially of, such elements and/or features. It is also noted that the terms “comprising” and “containing” are intended to be open and permits the inclusion of additional elements or steps. Where ranges are given, endpoints are included. Furthermore, unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or sub-range within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise.

1. A guide RNA comprising a spacer, a gRNA core, and an extension arm, wherein the guide RNA comprises a sequence selected from the group consisting of SEQ ID NOs: 1-135514, or a sequence having at least 90% sequence identity with any of SEQ ID NOs: 1-135514.

2. A guide RNA comprising a spacer, a gRNA core, and an extension arm, wherein the spacer comprises a nucleotide sequence selected from the group consisting of SEQ ID NOs: 135515-271028, or a spacer having a nucleotide sequence having at least 90% sequence identity with any of SEQ ID NOs: 135515-271028.

3. A guide RNA comprising a spacer, a gRNA core, and an extension arm, wherein the extension arm has a nucleotide sequence selected from the group consisting of SEQ ID NOs: 271029-406542, or an extension arm having a nucleotide sequence having at least 90% sequence identity with any of SEQ ID NOs: 271029-406542.

4. A guide RNA comprising a spacer, a gRNA core, and an extension arm, wherein the extension arm comprises (i) a primer binding site, (ii) an edit template, and (iii) a homology arm.

5. A guide RNA comprising a spacer, a gRNA core, and an extension arm, wherein the extension arm comprises an primer binding site having a nucleotide sequence selected from the group consisting of SEQ ID NOs: 406543-542056, or a primer binding site having a nucleotide sequence that is at least 90% sequence identical to any of SEQ ID NOs: 406543-542056.

6. A guide RNA comprising a spacer, a gRNA core, and an extension arm, wherein the extension arm comprises an edit template comprising a nucleotide sequence selected from the group consisting of SEQ ID NOs: 542057-677570, or an edit template having a nucleotide sequence that is at least 90% identical to any of SEQ ID NOs: 542057-677570.

7. A guide RNA comprising a spacer, a gRNA core, and an extension arm, wherein the extension arm comprises a homology arm having a nucleotide sequence selected from the group consisting of SEQ ID NOs: 677571-813084, or a homology arm having a nucleotide sequence that is at least 90% identical to any of SEQ ID NOs: 677571-813084.

8. A guide RNA comprising:

- (i) a spacer having a nucleotide sequence selected from the group consisting of SEQ ID NOs: 135515-271028, or a spacer having a nucleotide sequence having at least 90% sequence identity with any of SEQ ID NOs: 135515-271028, and
- (ii) an extension arm selected from the group consisting of SEQ ID NOs: 271029-406542, or an extension arm having a nucleotide sequence having least 90% sequence identity with SEQ ID NOs: 271029-406542.
  
  9. A guide RNA comprising:
- (i) a spacer having a nucleotide sequence selected from the group consisting of SEQ ID NOs: 135515-271028, or a spacer having a nucleotide sequence that is at least 90% identical to any of SEQ ID NOs: 135515-271028, and
- (ii) a primer binding site selected from the group consisting of SEQ ID NOs: 406543-542056, or a primer binding site having a nucleotide sequence that is at least 90% identical to any of SEQ ID NOs: 406543-542056.
  
  10. A guide RNA comprising:
- (i) a spacer having a nucleotide sequence selected from the group consisting of SEQ ID NOs: 135515-271028, or a spacer having a nucleotide sequence having at least 90% sequence identity with any of SEQ ID NOs: 135515-271028, and
- (ii) an edit template having a nucleotide sequence selected from the group consisting of SEQ ID NOs: 542057-677570, or an edit template having a nucleotide sequence that is at least 90% identical to any of SEQ ID NOs: 542057-677570.
  
  11. A guide RNA comprising:
- (i) a spacer having a nucleotide sequence selected from the group consisting of SEQ ID NOs: 135515-271028, or a spacer having a nucleotide sequence that is at least 90% identical to any of SEQ ID NOs: 135515-271028, and
- (ii) a homology arm having a nucleotide sequence selected from the group consisting of SEQ ID NOs: 677571-813084, or a spacer having a nucleotide sequence that is at least 90% identical to any of SEQ ID NOs: 677571-813084.
  
  12. The guide RNA of any of embodiments 1-11 further comprising an termination signal of SEQ ID NO: 813086, or a termination signal having at least 90% sequence identity with SEQ ID NO: 813086.
  
  13. The guide RNA of any of embodiments 1-12 further comprising a 5′ end modifier region comprising a hairpin sequence, a stem/loop sequence, or a toeloop sequence.
  
  14. The guide RNA of any of embodiments 1-13 further comprising a 3′ end modifier region comprising a hairpin sequence, a stem/loop sequence, or a toeloop sequence.
  
  15. The guide RNA of any of embodiments 1-14 further comprising a gRNA core comprising SEQ ID NO: 813085, or a gRNA core having at least 90% sequence identity with SEQ ID NO: 813085.
  
  16. The guide RNA of any of embodiments 1-15, wherein the guide RNA is capable of binding to a napDNAbp suitable for prime editing and directing the napDNAbp to a target DNA sequence.
  
  17. The guide RNA of embodiment 16, wherein the target nucleic acid sequence comprises a target strand (or PAM strand) and a complementary non-target strand (or non-PAM strand), wherein the spacer of the guide RNA hybridizes to the complementary non-target strand (non-PAM strand) to form an RNA-DNA hybrid and an R-loop.
  
  18. The guide RNA of any of embodiments 1-17, wherein the primer binding site is between approximately 8 and approximately 20 nucleotides in length.
  
  19. The guide RNA of any of embodiments 1-18, wherein the primer binding site is 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length.
  
  20. The guide RNA of any of the above embodiments, wherein the primer binding site is at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, or at least 20 nucleotides in length.
  
  21. The guide RNA of any of embodiments 1-19, wherein the homology arm is complementary to a strand of the target DNA.
  
  22. The guide RNA of any of embodiments 1-10, wherein the extension arm is between approximately 7 and approximately 500 nucleotides in length.
  
  23. The guide RNA of any of the above embodiments, wherein the extension arm is at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, or at least 100 nucleotides in length.
  
  24. The guide RNA of any of the above embodiments, wherein the edit template is at least 1 nucleotides, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides or, at least 100 nucleotides in length.
  
  25. The guide RNA of any of the above embodiments, wherein the homology arm is at least 1 nucleotides, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, or at least 30 nucleotides.
  
  26. The guide RNA of any of the above embodiments, wherein the edit template and homology arm can be used by a reverse transcriptase as a template sequence for the synthesis of a corresponding single-strand DNA flap having a 3′ end, wherein the DNA flap is complementary to a strand of the endogenous target DNA sequence adjacent to a nick site, and wherein the single-strand DNA flap comprises a nucleotide change encoded by the edit template.
  
  27. The guide RNA of embodiment 25, wherein the single-strand DNA flap displaces an endogenous single-strand DNA having a 5′ end in the target DNA sequence that has been nicked.
  
  28. The guide RNA of embodiment 26, wherein the endogenous single-strand DNA having the free 5′ end is excised by the cell.
  
  29. The guide RNA of embodiment 27, whereby cellular repair of the single-strand DNA flap results in installation of the nucleotide change, thereby forming a desired product.
  
  30. The guide RNA of embodiment 28, wherein the desired nucleotide change is an insertion.
  
  31. The guide RNA of embodiment 29, wherein in the insertion is at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, or at least 100 nucleotides in length.
  
  32. The guide RNA of embodiment 29, wherein the insertion is a sequence encoding a polypeptide.
  
  33. A prime editing complex comprising a napDNAbp, a reverse transcriptase, and any one of the guide RNAs of embodiments 1-32.
  
  34. The prime editing complex of embodiment 32, wherein the napDNAbp and the reverse transcriptase are formed as a fusion protein.
  
  35. The prime editing complex of embodiment 32, wherein the napDNAbp is a Cas9.
  
  36. The prime editing complex of embodiment 34, wherein the Cas9 is selected from the group consisting of Cas9 nickases or variants thereof.
  
  37. The prime editing complex of embodiment 34, wherein the Cas9 has an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-135514.
  
  38. The prime editing complex of embodiment 33, wherein the fusion protein has an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-135514.
  
  39. The prime editing complex of embodiment 33, wherein the fusion protein comprises a linker joining the napDNAbp and reverse transcriptase.
  
  40. The prime editing complex of embodiment 38, wherein the linker has an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-135514.
  
  41. One or more polynucleotides encoding the prime editing complex of any of embodiments 32-39.
  
  42. A vector comprising the polynucleotide of embodiment 41 and one or more promoters that drive the expression of the guide RNA and the fusion protein of the prime editing complex.
  
  43. A cell comprising the a vector of embodiment 41.
  
  44. A cell comprising a prime editing complex of any of embodiments 32-39.
  
  45. A pharmaceutical composition comprising: (i) a guide RNA of any of embodiments 1-31, a prime editing complex of embodiments 32-39, a polynucleotide of embodiment 40, or a vector of embodiment 41; and (ii) a pharmaceutically acceptable excipient.
  
  46. A method for installing a nucleotide change in a nucleic acid sequence, the method comprising: contacting the nucleic acid sequence with a complex comprising a fusion protein and a guide RNA of any of embodiments 1-31 or any of embodiments 83-85, wherein the fusion protein comprises a napDNAbp and a polymerase, and wherein the guide RNA comprises a spacer, gRNA core, and an extension arm that comprises an edit template encoding a nucleotide change; thereby
- (i) nicking the double-stranded DNA sequence on the target strand (or the PAM strand), and generating a free single-strand DNA having a 3′ end;
- (ii) hybridizing the 3′ end of the free single-strand DNA to the guide RNA at the primer binding site, thereby priming the polymerase;
- (iii) polymerizing a strand of DNA from the 3′ end, thereby generating a single-strand DNA flap comprising the nucleotide change; and
- (iv) replacing the endogenous DNA strand immediately adjacent downstream of the cut site on the target strand (or PAM strand) with the single-strand DNA flap, thereby installing the desired nucleotide change in the double-stranded DNA sequence.
  
  47. The method of embodiment 46, wherein the nucleotide change is a single nucleotide substitution, a deletion, an insertion, or a combination thereof.
  
  48. The method of embodiment 46, wherein the single nucleotide substitution is a transition or a transversion.
  
  49. The method of embodiment 46, wherein the nucleotide change is (1) a G to T substitution, (2) a G to A substitution, (3) a G to C substitution, (4) a T to G substitution, (5) a T to A substitution, (6) a T to C substitution, (7) a C to G substitution, (8) a C to T substitution, (9) a C to A substitution, (10) an A to T substitution, (11) an A to G substitution, or (12) an A to C substitution.
  
  50. The method of embodiment 46, wherein the nucleoid change converts (1) a G:C basepair to a T:A basepair, (2) a G:C basepair to an A:T basepair, (3) a G:C basepair to C:G basepair, (4) a T:A basepair to a G:C basepair, (5) a T:A basepair to an A:T basepair, (6) a T:A basepair to a C:G basepair, (7) a C:G basepair to a G:C basepair, (8) a C:G basepair to a T:A basepair, (9) a C:G basepair to an A:T basepair, (10) an A:T basepair to a T:A basepair, (11) an A:T basepair to a G:C basepair, or (12) an A:T basepair to a C:G basepair.
  
  51. The method of embodiment 46, wherein the nucleotide change is an insertion or deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides.
  
  52. The method of embodiment 46, wherein the nucleotide change is an insertion of a polypeptide-encoding sequence.
  
  53. The method of embodiment 46, wherein the nucleotide change corrects a disease-associated gene.
  
  54. The method of embodiment 46, wherein the disease-associated gene is associated with a monogentic disorder selected from the group consisting of: Adenosine Deaminase (ADA) Deficiency; Alpha-1 Antitrypsin Deficiency; Cystic Fibrosis; Duchenne Muscular Dystrophy; Galactosemia; Hemochromatosis; Huntington's Disease; Maple Syrup Urine Disease; Marfan Syndrome; Neurofibromatosis Type 1; Pachyonychia Congenita; Phenylkeotnuria; Severe Combined Immunodeficiency; Sickle Cell Disease; Smith-Lemli-Opitz Syndrome; and Tay-Sachs Disease.
  
  55. The method of embodiment 46, wherein the disease-associated gene is associated with a polygenic disorder selected from the group consisting of: heart disease; high blood pressure; Alzheimer's disease; arthritis; diabetes; cancer; and obesity.
  
  56. A computerized method for determining a prime editor guide RNA (PEgRNA) structure, the method comprising using at least one computer hardware processor to perform:
- accessing data indicative of:
  - an input allele;
  - an output allele; and
  - a fusion protein comprising a nucleic acid programmable DNA binding protein and a polymerase (e.g., a reverse transcriptase); and
- determining the PEgRNA structure based on the input allele, the output allele, and the fusion protein, wherein the PEgRNA structure is designed to be associated with the fusion protein to change the input allele to the output allele, comprising determining for the PEgRNA structure one or more of the following features:
  - a spacer complementary to a target nucleotide sequence in the input allele;
  - a gRNA backbone for interacting with the fusion protein; and
  - an extension comprising one or more of:
    - a DNA synthesis template sequence comprising a desired nucleotide change to change the input allele to the output allele;
    - an primer binding site;
    - optionally, a termination signal adjacent to the DNA synthesis template;
    - optionally, a first modifier adjacent to the termination signal; and
    - optionally, a second modifier adjacent to the primer binding site.
      
      57. The method of embodiment 56 further comprising determining the spacer and the extension, and determining the spacer is at the 5′ end of the PEgRNA structure, and the extension is at the 3′ end of the PEgRNA structure.
      
      58. The method of embodiment 56 further comprising determining the spacer and the extension, wherein the spacer is at the 5′ end of the PEgRNA structure, and the extension is 3′ to the spacer.
      
      59. The method of embodiment 56, wherein accessing data indicative of the input allele and the output allele comprises accessing a database comprising a set of input alleles and associated output alleles.
      
      60. The method of embodiment 59, wherein accessing the database comprises accessing a ClinVar database comprising a plurality of entries, wherein each entry comprises an input allele from the set of input alleles and an output allele from the set of output alleles.
      
      61. The method of embodiment 59, wherein determining the PEgRNA structure comprises determining one or more PEgRNA structures for each input allele and associated output allele in the set.
      
      62. The method of embodiment 56, wherein accessing data indicative of the fusion protein comprises determining the fusion protein from a plurality of fusion proteins.
      
      63. The method of embodiment 56, wherein the fusion protein comprises a Cas9 protein.
      
      64. The method of embodiment 63, wherein the fusion protein comprises a Cas9-NG protein or a SpCas9 protein.
      
      65. The method of embodiment 56, wherein changing the input allele to the output allele comprises a single nucleotide change, an insertion of one or more nucleotides, a deletion of one or more nucleotides, or a combination thereof.
      
      66. The method of embodiment 56, further comprising determining the spacer, wherein the spacer comprises a nucleotide sequence of approximately 20 nucleotides.
      
      67. The method of embodiment 66, further comprising determining the spacer based on the position of the change in a corresponding protospacer nucleotide sequence.
      
      68. The method of embodiment 67, wherein the change is installed in an editing window that is between about protospacer position −3 to protospacer position +27.
      
      69. The method of embodiment 67 further comprising:
- determining a set of initial candidate protospacers based on the input allele and the fusion protein, wherein each initial candidate protospacer comprises a PAM of the fusion protein in the input allele;
- determining one or more initial candidate protospacers from the set of initial candidate protospacers, wherein each comprises an incompatible nick position;
- removing the determined one or more initial candidate protospacers from the set to generate a set of remaining candidate protospacers; and
- wherein determining the PEgRNA structure comprises determining a plurality of PEgRNA structures, wherein each of the PEgRNA structure comprises a different spacer determined based on a corresponding protospacer from the set of remaining candidate protospacers.
  
  70. The method of embodiment 55, further comprising determining the extension and the DNA synthesis template (e.g., RT template sequence), wherein the DNA synthesis template (e.g., RT template sequence) comprises approximately 7 nucleotides to approximately 34 nucleotides.
  
  71. The method of embodiment 56, wherein determining the PEgRNA comprises: determining the spacer based on the input allele and/or the fusion protein; and determining the DNA synthesis template (e.g., RT template sequence) based on the spacer.
  
  72. The method of embodiment 56, wherein the DNA synthesis template (e.g., RT template sequence) encodes a single-strand DNA flap that is complementary to an endogenous DNA sequence adjacent to a nick site, wherein the single-strand DNA flap comprises the desired nucleotide change.
  
  73. The method of embodiment 72, wherein the single-strand DNA flap is capable of hybridizing to the endogenous DNA sequence adjacent to the nick site, thereby leading to the installation of the desired nucleotide change.
  
  74. The method of embodiment 72, wherein the single-stranded DNA flap is capable of displacing the endogenous DNA sequence adjacent to the nick site.
  
  75. The method of embodiment 72, whereby cellular repair of the single-strand DNA flap results in installation of the desired nucleotide change, thereby forming a desired product.
  
  76. The method of embodiment 56, wherein the fusion protein when complexed with the PEgRNA is capable of binding to a target DNA sequence.
  
  77. The method of embodiment 76, wherein the target DNA sequence comprises a target strand in which the change occurs and a complementary non-target strand.
  
  78. The method of embodiment 56, wherein the input allele comprises a pathogenic DNA mutation, and the output allele comprises a corrected DNA sequence.
  
  79. The method of embodiment 56, wherein the input allele is any one of the disease alleles of SEQ ID NOs: 1217353-1289387.
  
  80. The method of embodiment 56, wherein the output allele is any one of the healthy alleles of SEQ ID NOs: 1289388-1361420.
  
  81. A system comprising:
- at least one processor; and
- at least one computer-readable storage medium having encoded thereon instructions which, when executed, cause the at least one processor to perform the method of any of embodiments 56-81.
  
  82. At least one computer-readable storage medium having encoded thereon instructions which, when executed, cause at least one processor to perform the method of any of embodiments 56-81.
  
  83. A method of base editing using the PEgRNA structure determined according to the method of any of embodiments 56-81.
  
  84. A PEgRNA determined according to the method of any one of embodiments 56-81.
  
  85. A guide RNA for use in prime editing to correct a disease allele at a target DNA sequence to form a healthy allele, said guide comprising a spacer, a gRNA core, and an extension arm, wherein the spacer is capable of binding to a ˜20 nucleotide region within SEQ ID NOs: 1217353-1289387 or the complement strand thereof.
  
  86. A guide RNA comprising a spacer, gRNA core, and an extension arm, wherein the extension arm comprises a DNA synthesis template and a primer binding site effective to conduct prime editing.
  
  87. The guide RNA of embodiment 85, wherein the edit site in any of the nucleotide sequences of SEQ ID NOs: 1217353-1289387 begins at position 201 in the 5′ to 3′ orientation.
  
  88. A guide RNA for prime editing comprising a spacer, a gRNA core, and an extension arm, wherein the extension arm comprises a primer binding site and a DNA synthesis template.
  
  89. The guide RNA of embodiment 88, wherein the primer binding site has a nucleotide sequence selected from the group consisting of SEQ ID NOs: 406543-542056 (primer binding site), or a nucleotide sequence that has at least 90% sequence identity with any of SEQ ID NOs: 406543-542056.
  
  90. The guide RNA of embodiment 88, wherein the DNA synthesis template comprises a nucleotide sequence of SEQ ID NOs: 542057-677570 (edit template), or a nucleotide sequence that has at least 90% sequence identity with any of SEQ ID NOs: 542057-677570.
  
  91. The guide RNA of embodiment 88, wherein the DNA synthesis template comprises a nucleotide sequence of SEQ ID NOs: 677571-813084 (homology arm), or a nucleotide sequence that has at least 90% sequence identity with any of SEQ ID NOs: 677571-813084.
  
  92. The guide RNA of embodiment 88, wherein the DNA synthesis template comprises an edit template and a homology arm, wherein the edit template comprises a nucleotide sequence of SEQ ID NOs: 542057-677570, and the homology arm comprises a nucleotide sequence of SEQ ID NOs: 677571-813084.
  
  93. The guide RNA of any of embodiments 86-92 further comprising an termination signal of SEQ ID NO: 813086, or a termination signal having at least 90% sequence identity with SEQ ID NO: 813086.
  
  94. The guide RNA of any of embodiments 86-93 further comprising a 5′ end modifier region comprising a hairpin sequence, stem/loop sequence, or a toeloop sequence.
  
  95. The guide RNA of any of embodiments 86-94 further comprising a 3′ end modifier region comprising a hairpin sequence, stem/loop sequence, or a toeloop sequence.
  
  96. The guide RNA of any of embodiments 86-95, further comprising a gRNA core comprising SEQ ID NO: 813085, or a gRNA core having at least 90% sequence identity with SEQ ID NO: 813085.
  
  97. The guide RNA of any of embodiments 86-96, wherein the guide RNA is capable of binding to a napDNAbp suitable for prime editing and directing the napDNAbp to a target DNA sequence.
  
  98. The guide RNA of embodiment 97, wherein the target nucleic acid sequence comprises a target strand (or PAM or edit strand) and a complementary non-target strand (or non-PAM or non-edit strand) wherein the spacer of the guide RNA hybridizes to the non-PAM strand to form an RNA-DNA hybrid and an R-loop.
  
  99. The guide RNA of any of embodiments 86-98, wherein the primer binding site is between approximately 8 and approximately 20 nucleotides in length.
  
  100. The guide RNA of any of embodiments 86-99, wherein the primer binding site is 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length.
  
  101. The guide RNA of any of embodiments 86-100, wherein the extension arm is at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 100 nucleotides in length.
  
  102. The guide RNA of any of embodiments 86-101, wherein the primer binding site is at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, or at least 20 nucleotides in length.
  
  103. The guide RNA of any of embodiments 86-102, wherein the DNA synthesis template is at least 1 nucleotides, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 100 nucleotides in length.
  
  104. The guide RNA of any of embodiments 86-103, wherein the DNA synthesis template can be used by an RNA-dependent DNA polymerase (e.g., reverse transcriptase) as a template for the synthesis of a corresponding single-strand DNA flap having a 3′ end, wherein the DNA flap is complementary to a strand of the endogenous target DNA sequence adjacent to a nick site, and wherein the single-strand DNA flap comprises a desired nucleotide change encoded by the DNA synthesis template.
  
  105. The guide RNA of embodiment 104, wherein the single-strand DNA flap displaces an endogenous single-strand DNA having a 5′ end in the target DNA sequence that has been nicked.
  
  106. The guide RNA of embodiment 105, wherein the endogenous single-strand DNA having the free 5′ end is excised by the cell.
  
  107. The guide RNA of embodiment 105, whereby cellular repair of the single-strand DNA flap results in installation of the nucleotide change, thereby forming an edited DNA product.
  
  108. The guide RNA of embodiment 107, wherein the nucleotide change is an insertion.
  
  109. The guide RNA of embodiment 108, wherein in the insertion is at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 100 nucleotides in length.
  
  110. The guide RNA of embodiment 108, wherein the insertion is a sequence encoding a polypeptide.
  
  111. A prime editing complex comprising a napDNAbp, an RNA-dependent DNA polymerase, and any one of the guide RNA of embodiments 86-110.
  
  112. The prime editing complex of embodiment 111, wherein the napDNAbp and the RNA-dependent DNA polymerase are formed as a fusion protein.
  
  113. The prime editing complex of embodiment 111, wherein the napDNAbp is a Cas9.
  
  114. The prime editing complex of embodiment 113, wherein the Cas9 is a Cas9 nickase or variant thereof.
  
  115. The prime editing complex of embodiment 113, wherein the Cas9 has an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-135514.
  
  116. The prime editing complex of embodiment 112, wherein the fusion protein has an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-135514.
  
  117. The prime editing complex of embodiment 112, wherein the fusion protein comprises a linker joining the napDNAbp and RNA-dependent DNA polymerase.
  
  118. The prime editing complex of embodiment 117, wherein the linker has an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-135514.
  
  119. One or more polynucleotides encoding the prime editing complex of any of embodiments 111-118.
  
  120. A vector comprising the polynucleotide of embodiment 119 and one or more promoters that drive the expression of the guide RNA and the fusion protein of the prime editing complex.
  
  121. A cell comprising the vector of embodiment 120.
  
  122. A cell comprising a prime editing complex of any of embodiments 111-118.
  
  123. A pharmaceutical composition comprising: (i) a guide RNA of any of embodiments 84-110, a prime editing complex of embodiments 111-118, a polynucleotide of embodiment 119, or a vector of embodiment 120; and (ii) a pharmaceutically acceptable excipient.
  
  124. A method for installing a nucleotide change in a nucleic acid sequence, the method comprising: contacting the nucleic acid sequence with a complex comprising a fusion protein and a guide RNA of any of embodiments 109-116, wherein the fusion protein comprises a napDNAbp and an RNA-dependent DNA polymerase, wherein the guide RNA comprises a spacer, gRNA core, and an extension arm that comprises a DNA synthesis template and primer binding site, said DNA synthesis template encoding a nucleotide change, and wherein the spacer is capable of annealing to the non-PAM strand proximal to an available PAM and protospacer, thereby
- (i) nicking the double-stranded DNA sequence on the PAM strand, thereby generating a free single-strand DNA having a 3′ end;
- (ii) hybridizing the 3′ end of the free single-strand DNA to the guide RNA at the primer binding site, thereby priming the RNA-dependent DNA polymerase;
- (iii) polymerizing a strand of DNA from the 3′ end of DNA, coding from the DNA synthesis template, thereby generating a single-strand DNA flap extended from the 3′ end of the DNA, wherein the flap comprises the nucleotide change;
- (iv) replacing an endogenous DNA strand adjacent immediately downstream of the cut site on the PAM strand with the single-strand DNA flap, thereby installing the nucleotide change in the double-stranded DNA sequence.
  
  125. The method of embodiment 124, wherein when step (v) is completed within a cell, the cell repairs the non-edited strand through cellular DNA repair and/or replication.
  
  126. The method of embodiment 124, wherein the nucleotide change is a single nucleotide substitution, a deletion, an insertion, or a combination thereof.
  
  127. The method of embodiment 124, wherein the single nucleotide substitution is a transition or a transversion.
  
  128. The method of embodiment 124, wherein the single nucleotide substitution is (1) a G to T substitution, (2) a G to A substitution, (3) a G to C substitution, (4) a T to G substitution, (5) a T to A substitution, (6) a T to C substitution, (7) a C to G substitution, (8) a C to T substitution, (9) a C to A substitution, (10) an A to T substitution, (11) an A to G substitution, or (12) an A to C substitution.
  
  129. The method of embodiment 124, wherein the single nucleotide substitution converts (1) a G:C basepair to a T:A basepair, (2) a G:C basepair to an A:T basepair, (3) a G:C basepair to C:G basepair, (4) a T:A basepair to a G:C basepair, (5) a T:A basepair to an A:T basepair, (6) a T:A basepair to a C:G basepair, (7) a C:G basepair to a G:C basepair, (8) a C:G basepair to a T:A basepair, (9) a C:G basepair to an A:T basepair, (10) an A:T basepair to a T:A basepair, (11) an A:T basepair to a G:C basepair, or (12) an A:T basepair to a C:G basepair.
  
  130. The method of embodiment 124, wherein the nucleotide change is an insertion or deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides.
  
  131. The method of embodiment 124, wherein the nucleotide change is an insertion of a polypeptide-encoding sequence.
  
  132. The method of embodiment 124, wherein the nucleotide change corrects a disease-associated gene.
  
  133. The method of embodiment 132, wherein the disease-associated gene is associated with a monogenetic disorder selected from the group consisting of: Adenosine Deaminase (ADA) Deficiency; Alpha-1 Antitrypsin Deficiency; Cystic Fibrosis; Duchenne Muscular Dystrophy; Galactosemia; Hemochromatosis; Huntington's Disease; Maple Syrup Urine Disease; Marfan Syndrome; Neurofibromatosis Type 1; Pachyonychia Congenita; Phenylkeotnuria; Severe Combined Immunodeficiency; Sickle Cell Disease; Smith-Lemli-Opitz Syndrome; and Tay-Sachs Disease.
  
  134. The method of embodiment 132, wherein the disease-associated gene is associated with a polygenic disorder selected from the group consisting of: heart disease; high blood pressure; Alzheimer's disease; arthritis; diabetes; cancer; and obesity.
  
  135. A guide RNA for use in prime editing to alter the nucleotide sequence of a target DNA molecule with an insertion, deletion, inversion, substitution, or combination thereof to produce a corresponding edited DNA molecule, wherein:
- (i) the guide RNA is capable of forming a complex with a fusion protein comprising a napDNAbp and a domain comprising an RNA-dependent DNA polymerase activity;
- (ii) the guide RNA comprises (a) a spacer that is capable of annealing to the non-PAM strand proximal to an available PAM and protospacer on the PAM strand on the target DNA molecule, and (b) a gRNA core;
- (iii) the guide RNA further comprises an extension arm at the 5′ or 3′ end of the guide RNA;
- (iv) the extension arm comprises (a) a primer binding site and (b) a DNA synthesis template, wherein the DNA synthesis template codes for a single-strand DNA flap that includes an edit to be integrated in place of the endogenous strand immediately downstream of the cut site on the PAM strand;
- (v) the target DNA molecule is selected from the group consisting of SEQ ID NOs: SEQ ID NOs: 1217353-1289387; and
- (vi) the corresponding edited DNA molecule is selected from the group consisting of SEQ ID NOs: 1289388-1361420.
  
  136. The guide RNA of embodiment 135, wherein the target DNA molecule is a Clinvar variant sequence.
  
  137. The guide RNA of embodiment 135, wherein the napDNAbp is Cas9, Cas12e, Cas12d, Cas12a, Cas12b1, Cas13a, Cas12c, or Argonaute, or a variant of Cas9, Cas12e, Cas12d, Cas12a, Cas12b1, Cas13a, Cas12c, or Argonaute.
  
  138. The guide RNA of embodiment 135, wherein the napDNAbp domain comprises nickase activity.
  
  139. The guide RNA of embodiment 135, wherein the napDNAbp is a Cas9 or variant thereof.
  
  140. The guide RNA of embodiment 135, wherein the napDNAbp is a nuclease active Cas9, a nuclease inactive Cas9 (dCas9), or a Cas9 nickase (nCas9).
  
  141. The guide RNA of embodiment 135, wherein the napDNAbp is Cas9 nickase (nCas9).
  
  142. The guide RNA of embodiment 135, wherein the napDNAbp comprises the amino acid
  
  143. The guide RNA of embodiment 135, wherein the napDNAbp is SpCas9 wild type or a variant thereof of any one of amino acid sequences 1361421-1361428, or an amino acid sequence having at least 80% sequence identity with any of SEQ ID NOs: 1361421-1361428.
  
  144. The guide RNA of embodiment 135, wherein the napDNAbp is an SpCas9 ortholog of any one of amino acid sequences 1361429-1361442, or an amino acid sequence having at least 80% sequence identity with any of SEQ ID NOs: 1361429-1361442.
  
  145. The guide RNA of embodiment 135, wherein the napDNAbp is any one of amino acid sequences 1361421-1361484, or an amino acid sequence having at least 80% sequence identity with any of SEQ ID NOs: 1361421-1361484.
  
  146. The guide RNA of embodiment 135, wherein the domain comprising an RNA-dependent DNA polymerase activity is a reverse transcriptase.
  
  147. The guide RNA of embodiment 146, wherein the reverse transcriptase is a naturally occurring wild type reverse transcriptase having an amino acid sequence of any one of SEQ ID NOs: 1361485-1361496, or an amino acid sequence having at least 80% sequence identity with any of SEQ ID NOs: 1361485-1361496.
  
  148. The guide RNA of embodiment 146, wherein the reverse transcriptase is a variant reverse transcriptase having an amino acid sequence of any one of SEQ ID NOs: 1361497-1361514, or an amino acid sequence having at least 80% sequence identity with any of SEQ ID NOs: 1361497-1361514.
  
  149. The guide RNA of embodiment 135, wherein the fusion protein comprises an amino acid sequence of any one of SEQ ID NOs: 1361515-1361519, or an amino acid sequence having at least 80% sequence identity with any of SEQ ID NOs: 1361515-1361519.
  
  150. The guide RNA of embodiment 135, wherein the fusion protein comprises an amino acid sequence of SEQ ID NO: 1361515 (PE1) or 1361516 (PE2), or an amino acid sequence having at least 80% sequence identity with any of SEQ ID NOs: 1361515 or 1361516.
  
  151. The guide RNA of embodiment 135, wherein the available PAM sequence is a function of the napDNAbp used in step (i).
  
  152. The guide RNA of embodiment 135, wherein the available PAM sequence is selected from the group consisting of: (a) 5′-NGG-3′ (the canonical PAM sequence), (b) 5′-NNG-3′, (c) 5′-NNA-3′, (d) 5′-NNC-3′, (e) 5′-NNT-3′, (f) 5′-NGT-3′, (g) 5′-NGA-3′, (h) 5′-NGC-3′, (i) 5′-NAA-3′, (j) 5′-NAC-3′, (k) 5′-NAG-3′, and (1) 5′-NAT-3′, the selection of which is a function of the choice of napDNAbp.
  
  153. The guide RNA of embodiment 135, wherein the edit site in any of the nucleotide sequences of SEQ ID NOs: 1217353-1289387 of step (v) begins at position 201 in the 5′ to 3′ orientation.
  
  154. The guide RNA of embodiment 135, wherein the nucleotide change is a nucleotide substitution, a deletion, an insertion, or a combination thereof.
  
  155. The guide RNA of embodiment 135, wherein the nucleotide substitution is a transition or a transversion.
  
  156. The guide RNA of embodiment 135, wherein the single nucleotide substitution is (1) a G to T substitution, (2) a G to A substitution, (3) a G to C substitution, (4) a T to G substitution, (5) a T to A substitution, (6) a T to C substitution, (7) a C to G substitution, (8) a C to T substitution, (9) a C to A substitution, (10) an A to T substitution, (11) an A to G substitution, or (12) an A to C substitution.
  
  157. The guide RNA of embodiment 135, wherein the single nucleotide substitution converts (1) a G:C basepair to a T:A basepair, (2) a G:C basepair to an A:T basepair, (3) a G:C basepair to C:G basepair, (4) a T:A basepair to a G:C basepair, (5) a T:A basepair to an A:T basepair, (6) a T:A basepair to a C:G basepair, (7) a C:G basepair to a G:C basepair, (8) a C:G basepair to a T:A basepair, (9) a C:G basepair to an A:T basepair, (10) an A:T basepair to a T:A basepair, (11) an A:T basepair to a G:C basepair, or (12) an A:T basepair to a C:G basepair.
  
  158. The guide RNA of embodiment 135, wherein the desired nucleotide change is an insertion or deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides.
  
  159. The guide RNA of embodiment 135, wherein the nucleotide change is an insertion of a polypeptide-encoding sequence.
  
  160. The guide RNA of embodiment 135, wherein the nucleotide change corrects a disease-associated gene.
  
  161. The guide RNA of embodiment 160, wherein the disease-associated gene is associated with a monogenetic disorder selected from the group consisting of: Adenosine Deaminase (ADA) Deficiency; Alpha-1 Antitrypsin Deficiency; Cystic Fibrosis; Duchenne Muscular Dystrophy; Galactosemia; Hemochromatosis; Huntington's Disease; Maple Syrup Urine Disease; Marfan Syndrome; Neurofibromatosis Type 1; Pachyonychia Congenita; Phenylkeotnuria; Severe Combined Immunodeficiency; Sickle Cell Disease; Smith-Lemli-Opitz Syndrome; and Tay-Sachs Disease.
  
  162. The guide RNA of embodiment 160, wherein the disease-associated gene is associated with a polygenic disorder selected from the group consisting of: heart disease; high blood pressure; Alzheimer's disease; arthritis; diabetes; cancer; and obesity.
  
  163. A method for installing a nucleotide change in a nucleic acid sequence, the method comprising: contacting the nucleic acid sequence with a complex comprising a fusion protein and a guide RNA of any of embodiments 1-32 or 135-162.
  
  164. The method of embodiment 163, wherein the fusion protein comprises a napDNAbp and an RNA-dependent DNA polymerase.
  
  165. The method of embodiment 163, wherein the guide RNA comprises a spacer, gRNA core, and an extension arm that comprises a DNA synthesis template and primer binding site.
  
  166. The method of embodiment 165, wherein the DNA synthesis template encodes a nucleotide change.
  
  167. The method of any of embodiments 163-166, wherein the guide RNA is capable of binding to a napDNAbp suitable for prime editing and directing the napDNAbp to a target DNA sequence.
  
  168. The method of embodiment 167, wherein the target nucleic acid sequence comprises a target strand (or PAM or edit strand) and a complementary non-target strand (or non-PAM or non-edit strand) wherein the spacer of the guide RNA hybridizes to the non-PAM strand to form an RNA-DNA hybrid and an R-loop.
  
  169. The method of embodiment 165, wherein the primer binding site is between approximately 8 and approximately 20 nucleotides in length.
  
  170. The method of embodiment 165, wherein the primer binding site is 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length.
  
  171. The method of embodiment 165, wherein the extension arm is at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 100 nucleotides in length.
  
  172. The method of embodiment 165, wherein the primer binding site is at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, or at least 20 nucleotides in length.
  
  173. The method of embodiment 165, wherein the DNA synthesis template is at least 1 nucleotides, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 100 nucleotides in length.
  
  174. The method of embodiment 165, wherein the DNA synthesis template can be used by an RNA-dependent DNA polymerase (e.g., reverse transcriptase) as a template for the synthesis of a corresponding single-strand DNA flap having a 3′ end, wherein the DNA flap is complementary to a strand of the endogenous target DNA sequence adjacent to a nick site, and wherein the single-strand DNA flap comprises a desired nucleotide change encoded by the DNA synthesis template.
  
  175. The method of embodiment 174, wherein the single-strand DNA flap displaces an endogenous single-strand DNA having a 5′ end in the target DNA sequence that has been nicked.
  
  176. The method of embodiment 175, wherein the endogenous single-strand DNA having the free 5′ end is excised by the cell.
  
  177. The method of embodiment 175, whereby cellular repair of the single-strand DNA flap results in installation of the nucleotide change, thereby forming an edited DNA product.
  
  178. The method of embodiment 177, wherein the nucleotide change is an insertion.
  
  179. The method of embodiment 178, wherein in the insertion is at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, at least 30 nucleotides, at least 31 nucleotides, at least 32 nucleotides, at least 33 nucleotides, at least 34 nucleotides, at least 35 nucleotides, at least 36 nucleotides, at least 37 nucleotides, at least 38 nucleotides, at least 39 nucleotides, at least 40 nucleotides, at least 100 nucleotides in length.
  
  180. The method of embodiment 178, wherein the insertion is a sequence encoding a polypeptide.

REFERENCES

The following references are each incorporated herein by reference in their entireties.

1. Jinek, M. et al. A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science 337, 816-821 (2012).
2. Cong, L. et al. Multiplex Genome Engineering Using CRISPR/Cas Systems. Science 339, 819-823 (2013).
3. Komor, A. C., Badran, A. H. & Liu, D. R. CRISPR-Based Technologies for the Manipulation of Eukaryotic Genomes. Cell 168, 20-36 (2017).
4. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420-424 (2016).
5. Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, aaf8729 (2016).
6. Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464-471 (2017).
7. Stenson, P. D. et al. The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum. Genet. 136, 665-677 (2017).
8. Dunbar, C. E. et al. Gene therapy comes of age. Science 359, eaan4672 (2018).
9. Cox, D. B. T., Platt, R. J. & Zhang, F. Therapeutic genome editing: prospects and challenges. Nat. Med. 21, 121-131 (2015).
10. Adli, M. The CRISPR tool kit for genome editing and beyond. Nat. Commun. 9, 1911 (2018).
11. Kleinstiver, B. P. et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523, 481-485 (2015).
12. Kleinstiver, B. P. et al. High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490-495 (2016).
13. Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57-63 (2018).
14. Nishimasu, H. et al. Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science 361, 1259-1262 (2018).
15. Jasin, M. & Rothstein, R. Repair of strand breaks by homologous recombination. Cold Spring Harb. Perspect. Biol. 5, a012740 (2013).
16. Paquet, D. et al. Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9. Nature 533, 125-129 (2016).
17. Kosicki, M., Tomberg, K. & Bradley, A. Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions and complex rearrangements. Nat. Biotechnol. 36, 765-771 (2018).
18. Haapaniemi, E., Botla, S., Persson, J., Schmierer, B. & Taipale, J. CRISPR-Cas9 genome editing induces a p53-mediated DNA damage response. Nat. Med 24, 927-930 (2018).
19. Ihry, R. J. et al. p53 inhibits CRISPR-Cas9 engineering in human pluripotent stem cells. Nat. Med. 24, 939-946 (2018).
20. Richardson, C. D., Ray, G. J., DeWitt, M. A., Curie, G. L. & Corn, J. E. Enhancing homology-directed genome editing by catalytically active and inactive CRISPR-Cas9 using asymmetric donor DNA. Nat. Biotechnol. 34, 339-344 (2016).
21. Srivastava, M. et al. An Inhibitor of Nonhomologous End-Joining Abrogates Double-Strand Break Repair and Impedes Cancer Progression. Cell 151, 1474-1487 (2012).
22. Chu, V. T. et al. Increasing the efficiency of homology-directed repair for CRISPR-Cas9-induced precise gene editing in mammalian cells. Nat. Biotechnol. 33, 543-548 (2015).
23. Maruyama, T. et al. Increasing the efficiency of precise genome editing with CRISPR-Cas9 by inhibition of nonhomologous end joining. Nat. Biotechnol. 33, 538-542 (2015).
24. Kim, Y. B. et al. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotechnol. 35, 371-376 (2017).
25. Li, X. et al. Base editing with a Cpf1-cytidine deaminase fusion. Nat. Biotechnol. 36, 324-327 (2018).
26. Gehrke, J. M. et al. An APOBEC3A-Cas9 base editor with minimized bystander and off-target activities. Nat. Biotechnol. (2018). doi:10.1038/nbt.4199
27. Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 1 (2018). doi:10.1038/s41576-018-0059-1.
28. Ostertag, E. M. & Kazazian Jr, H. H. Biology of Mammalian Li Retrotransposons. Annu. Rev. Genet. 35, 501-538 (2001).
29. Zimmerly, S., Guo, H., Perlman, P. S. & Lambowltz, A. M. Group II intron mobility occurs by target DNA-primed reverse transcription. Cell 82, 545-554 (1995).
30. Luan, D. D., Korman, M. H., Jakubczak, J. L. & Eickbush, T. H. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72, 595-605 (1993).
31. Feng, Q., Moran, J. V., Kazazian, H. H. & Boeke, J. D. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87, 905-916 (1996).
32. Jinek, M. et al. Structures of Cas9 Endonucleases Reveal RNA-Mediated Conformational Activation. Science 343, 1247997 (2014).
33. Jiang, F. et al. Structures of a CRISPR-Cas9 R-loop complex primed for DNA cleavage. Science aad8282 (2016). doi:10.1126/science.aad8282
34. Qi, L. S. et al. Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression. Cell 152, 1173-1183 (2013).
35. Tang, W., Hu, J. H. & Liu, D. R. Aptazyme-embedded guide RNAs enable ligand-responsive genome editing and transcriptional activation. Nat. Commun. 8, 15939 (2017).
36. Shechner, D. M., Hacisuleyman, E., Younger, S. T. & Rinn, J. L. Multiplexable, locus-specific targeting of long RNAs with CRISPR-Display. Nat. Methods 12, 664-670 (2015).
37. Anders, C. & Jinek, M. Chapter One—In Vitro Enzymology of Cas9. in Methods in Enzymology (eds. Doudna, J. A. & Sontheimer, E. J.) 546, 1-20 (Academic Press, 2014).
38. Briner, A. E. et al. Guide RNA Functional Modules Direct Cas9 Activity and Orthogonality. Mol. Cell 56, 333-339 (2014).
39. Nowak, C. M., Lawson, S., Zerez, M. & Bleris, L. Guide RNA engineering for versatile Cas9 functionality. Nucleic Acids Res. 44, 9555-9564 (2016).
40. Sternberg, S. H., Redding, S., Jinek, M., Greene, E. C. & Doudna, J. A. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 507, 62-67 (2014).
41. Mohr, S. et al. Thermostable group II intron reverse transcriptase fusion proteins and their use in cDNA synthesis and next-generation RNA sequencing. RNA 19, 958-970 (2013).
42. Stamos, J. L., Lentzsch, A. M. & Lambowitz, A. M. Structure of a Thermostable Group II Intron Reverse Transcriptase with Template-Primer and Its Functional and Evolutionary Implications. Mol. Cell 68, 926-939.e4 (2017).
43. Zhao, C. & Pyle, A. M. Crystal structures of a group II intron maturase reveal a missing link in spliceosome evolution. Nat. Struct. Mol. Biol. 23, 558-565 (2016).
44. Zhao, C., Liu, F. & Pyle, A. M. An ultraprocessive, accurate reverse transcriptase encoded by a metazoan group II intron. RNA 24, 183-195 (2018).
45. Ran, F. A. et al. Genome engineering using the CRISPR-Cas9 system. Nat. Protoc. 8, 2281-2308 (2013).
46. Liu, Y., Kao, H.-I. & Bambara, R. A. Flap endonuclease 1: a central component of DNA metabolism. Annu. Rev. Biochem. 73, 589-615 (2004).
47. Krokan, H. E. & Bjorus, M. Base Excision Repair. Cold Spring Harb. Perspect. Biol. 5, (2013).
48. Kelman, Z. PCNA: structure, functions and interactions. Oncogene 14, 629-640 (1997).
49. Choe, K. N. & Moldovan, G.-L. Forging Ahead through Darkness: PCNA, Still the Principal Conductor at the Replication Fork. Mol. Cell 65, 380-392 (2017).
50. Li, X., Li, J., Harrington, J., Lieber, M. R. & Burgers, P. M. Lagging strand DNA synthesis at the eukaryotic replication fork involves binding and stimulation of FEN-1 by proliferating cell nuclear antigen. J. Biol. Chem. 270, 22109-22112 (1995).
51. Tom, S., Henricksen, L. A. & Bambara, R. A. Mechanism whereby proliferating cell nuclear antigen stimulates flap endonuclease 1. J. Biol. Chem. 275, 10498-10505 (2000).
52. Tanenbaum, M. E., Gilbert, L. A., Qi, L. S., Weissman, J. S. & Vale, R. D. A protein-tagging system for signal amplification in gene expression and fluorescence imaging. Cell 159, 635-646 (2014).
53. Bertrand, E. et al. Localization of ASH1 mRNA particles in living yeast. Mol. Cell 2, 437-445 (1998).
54. Dahlman, J. E. et al. Orthogonal gene knockout and activation with a catalytically active Cas9 nuclease. Nat. Biotechnol. 33, 1159-1161 (2015).
55. Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol. 33, 187-197 (2015).
56. Tsai, S. Q. et al. CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets. Nat. Methods 14, 607-614 (2017).

EQUIVALENTS AND SCOPE

In the claims articles such as “a,” “an,” and “the” may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Claims or descriptions that include “or” between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process.

Furthermore, the invention encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims is introduced into another claim. For example, any claim that is dependent on another claim can be modified to include one or more limitations found in any other claim that is dependent on the same base claim. Where elements are presented as lists, e.g., in Markush group format, each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should it be understood that, in general, where the invention, or aspects of the invention, is/are referred to as comprising particular elements and/or features, certain embodiments of the invention or aspects of the invention consist, or consist essentially of, such elements and/or features. For purposes of simplicity, those embodiments have not been specifically set forth in haec verba herein. It is also noted that the terms “comprising” and “containing” are intended to be open and permits the inclusion of additional elements or steps. Where ranges are given, endpoints are included. Furthermore, unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or sub-range within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise.

This application refers to various issued patents, published patent applications, journal articles, and other publications, all of which are incorporated herein by reference. If there is a conflict between any of the incorporated references and the instant specification, the specification shall control. In addition, any particular embodiment of the present invention that falls within the prior art may be explicitly excluded from any one or more of the claims. Because such embodiments are deemed to be known to one of ordinary skill in the art, they may be excluded even if the exclusion is not set forth explicitly herein. Any particular embodiment of the invention can be excluded from any claim, for any reason, whether or not related to the existence of prior art.

Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation many equivalents to the specific embodiments described herein. The scope of the present embodiments described herein is not intended to be limited to the above Description, but rather is as set forth in the appended claims. Those of ordinary skill in the art will appreciate that various changes and modifications to this description may be made without departing from the spirit or scope of the present invention, as defined in the following claims.

Number	Date	Country
63100548	Mar 2020	US
62991069	Mar 2020	US
62944231	Dec 2019	US
62974537	Dec 2019	US
62931195	Nov 2019	US
62913553	Oct 2019	US
62973558	Oct 2019	US
62889996	Aug 2019	US
62922654	Aug 2019	US
62858958	Jun 2019	US
62820813	Mar 2019	US

METHODS AND COMPOSITIONS FOR EDITING NUCLEOTIDE SEQUENCES

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

RELATED APPLICATIONS

GOVERNMENT SUPPORT

PCT Information

Provisional Applications (11)