CONSTRUCTS FOR IMPROVED HDR-DEPENDENT GENOMIC EDITING

REFERENCE TO A SEQUENCE LISTING SUBMITTED AS A TEXT FILE VIA EFS-WEB

This application contains a Sequence Listing which has been submitted in ASCII format via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Nov. 8, 2021, is named B119570062US01-SUBSEQ-GJM and is 107,005 bytes in size.

BACKGROUND OF THE INVENTION

Genome editing has revolutionized the life sciences and offers the potential to cure genetic diseases. Genome editing involves the use of a site-specific nuclease (e.g., CRISPR/Cas9 nucleases, zinc finger nucleases (ZFNs), and transcription activator-like effector-based nucleases (TALEN)) which creates site-specific double-strand breaks (DSBs) at a targeted site in a genome, the location of which is determined by the nuclease itself. ZFNs and TALENs both bind to preferred target DNA sequences through amino acid sequence regions which interact directly with specific DNA sequences. In contrast to ZFNs and TALENs, which rely on protein domains to confer DNA-binding specificity, Cas9 forms a complex with a small guide RNA that directs the enzyme to its DNA target via Watson-Crick base pairing. Consequently, the Cas9 system is simple and requires only the production of a short RNA molecule (the guide RNA) to direct DNA binding to almost any locus.

Cas9 facilitates genome editing by inducing double-strand breaks (DSBs) at its target site, which in turn stimulates endogenous DNA damage repair pathways that lead to DNA editing. In one mechanism of repair, the double-strand break is repaired by homology directed repair (HDR), which requires the presence of an exogenous template DNA that is homologous to the target site. The exogenous template DNA typically includes the desired genetic change (e.g., a single nucleobase pair change) and regions that are homologous to the target site DNA. The HDR machinery results in the integration and exchange of the target site DNA with the exogenous template DNA carrying the corrected or altered genetic element (e.g., a single nucleobase pair change). This recombination and repair by the HDR machinery at double-strand breaks typically occurs with high fidelity, but at very low efficiency since the HDR machinery is active only in dividing cells. Double-strand breaks may also be processed by the non-homologous end-joining (NHEJ) DNA repair system, which functions without a template and frequently produces insertions or deletions (indels) as a consequence of the repair mechanism.

In addition to its low efficiency, HDR-directed repair using Cas9 is associated with undesirable levels of off-target editing at sites which share sequence homology with the on-target site. Nucleases with off-target DSB activity could induce undesirable mutations with potentially deleterious effects, an unacceptable outcome in most clinical settings. The rate of off-target editing can be alleviated by converting Cas9 to a “nickase” so that only one of the strands of DNA of a target site is cut. The wild-type Cas9 enzyme makes use of two conserved nuclease domains, HNH and RuvC, to cleave DNA by nicking the guide RNA-complementary and non-complementary strands, respectively. These two domains function together to generate blunt-ended, double-strand breaks (DSBs) by cleaving opposite strands of double-stranded DNA (dsDNA). A nickase mutant (nCas9) can be produced by inactivating the RuvC or the HNH domain (e.g., via mutation of one or more key catalytic residues), resulting in an enzyme variant that cleaves only one of the strands of DNA at a target site.

Because single-stranded nicks are generally repaired via the non-mutagenic base-excision repair pathway, nCas9 mutants can be leveraged to mediate highly specific genome editing. In one application, tandem nCas9 systems, appropriately spaced and oriented at the same locus, effectively generate DSBs, creating 3′ or 5′ overhangs along the target as opposed to a blunt DSB as in the wild-type case. The on-target modification efficiency of the double-nicking strategy is comparable to wild-type, but indels at predicted off-target sites are reduced below the threshold of detection by deep sequencing (Ran et al., 2013). In another strategy, engineered genome editors produced from nCas9 variants have been recently developed. Genome editors are fusions of a catalytically disabled Cas moiety (e.g., nCas9) and a nucleobase modification enzyme (e.g., natural or evolved nucleobase deaminases, such as cytidine deaminases that include APOBEC1 (“apolipoprotein B mRNA editing enzyme, catalytic polypeptide 1”), CDA (“cytidine deaminase”), or AID (“activation-induced cytidine deaminase”) domains). In some cases, genome editors may also include proteins that alter cellular DNA repair processes to increase the efficiency and stability of the resulting single-nucleotide change, e.g., inclusion of a UGI domain.

Despite this progress, there continues to be a need for the development of “next-generation” Cas9 editing systems that begin to address various limitations of these existing systems, including Cas9 editing systems with different or expanded PAM compatibilities, high-fidelity Cas9 editing systems with reduced off-target activity, Cas9 editing systems with narrower editing windows (normally ˜5 nucleotides wide), Cas9 editing systems with loosened sequence-context preferences, and Cas9 editing systems expanded nucleobase editing capabilities (e.g., transition editors (e.g., purine to a purine) and transversion editors (e.g., a purine to a pyrimidine)).

Another limitation that needs to be addressed is the low efficiency of homology-directed repair (HDR) by double-strand breaks introduced by the Cas9 system or by the dual nickase approach. For many applications, HDR is desired (because of its high fidelity) but is crippled by the low efficiency of recombination. Thus, various approaches have been developed to boost HDR frequency. For example, small molecules inhibiting NHEJ and/or upregulating HDR pathways have been reported to enhance HDR (Song, J. et al., 2016, Nat. Commun. 7, 10548). In another example, HDR efficiency can be altered by regulating cell cycle progression or by controlled timing of Cas9 delivery (Lin et al., 2014, Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery, Elife 3, e04766; and Glutscner et al., 2016, Post-translational regulation of Cas9 during G1 enhances homology-directed repair. Cell Rep. 14, 1555-1566). While these approaches and others have been reported to increase HDR efficiencies by as much as 10-fold, they suffer both from negative impacts on cell growth and inconsistencies between cell type and targeted loci (Zhang et al., 2017, Efficient precise knockin with a double cut HDR donor after CRISPR/Cas9-mediated double-stranded DNA cleavage, Genome Biol. 18, 35).

Given the high fidelity of HDR in terms of producing a desired genetic alteration, it would be a significant advance in the art to develop a Cas9 system which is capable of altering the sequence of a target DNA (e.g., a genome) using an HDR-dependent approach which has a much higher efficiency, and/or which could operate in non-dividing cells, and/or which minimizes the formation of unwanted indels at target and/or off-target sites.

SUMMARY OF THE INVENTION

The instant specification provides a genome editing system which is capable of editing a target sequence in an HDR-dependent manner (i.e., “HDR-dependent genome editors”) with increased efficiency and/or reduced indel formation. In certain embodiments, the system does not require a dividing cell to operate (e.g., neurons). In particular, the disclosure provides a new fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) (e.g., Cas9) with nickase activity and a single-stranded DNA binding protein which edits a target DNA in an HDR-dependent manner. In certain embodiments, the editing occurs with greater efficiency (e.g., increased rate of induced HDR) and/or with a lower rate or occurrence of indel formation. In certain embodiments, the napDNAbp is a nickase Cas9 enzyme, and the single-stranded DNA binding protein is Rad51 or a protein having the same function or effect as Rad51 (e.g., a Rad51 homolog). In certain other embodiments, the Cas9 nickase is a Cas9-D10A variant, e.g., a D10A mutation in the RuvC1 nuclease domain, relative to the wild type Cas9 sequence—SEQ ID NO: 9. In still other embodiments, the nickase Cas9 enzyme is a Cas9-H840A variant, e.g., a H840A mutation in the HNH nuclease domain, relative to the wild type Cas9 sequence—SEQ ID NO: 9. Given the high fidelity of HDR in terms of producing a desired genetic alteration (e.g., transitions, transversions, insertions, deletions, etc.), it represents a significant advance in the art to develop a Cas9 system which is capable of altering the sequence of a target DNA (e.g., a genome) using an HDR-dependent approach, which has a much higher efficiency and/or which could operate in non-dividing cells, and/or which minimizes the formation of unwanted indels at target and/or off-target sites.

In addition, the instant specification provides for nucleic acid molecules encoding and/or expressing the HDR-dependent genome editors as described herein, as well as expression vectors and constructs for expressing the improved HDR-dependent genome editors described herein, host cells comprising said nucleic acid molecules and expression vectors, and compositions for delivering and/or administering nucleic acid-based embodiments described herein. In addition, the disclosure provides for isolated improved HDR-dependent genome editors, as well as compositions comprising said isolated improved HDR-dependent genome editors as described herein. Still further, the present disclosure provides for methods of making or developing the improved HDR-dependent genome editors, as well as methods of using the improved HDR-dependent genome editors or nucleic acid molecules encoding the improved HDR-dependent genome editors in applications including editing a nucleic acid molecule, e.g., a gene, vector, genome, with improved efficiency, increased HDR induction rate, reduced off-target effects, and/or reduced indel formation, as compared to prior art genome editors. The specification also provides methods for efficiently editing a target nucleic acid molecule, e.g., making a single nucleobase of a genome, with an HDR-dependent genome editor described herein (e.g., in the form of an isolated HDR-dependent genome editor as described herein or a vector or construct encoding same) and conducting base editing, in a manner characterized with a higher rate of HDR and/or a lower rate of indel formation, relative to editors known in the art. Still further, the specification provides therapeutic methods for treating a genetic disease and/or for altering or changing a genetic trait or condition by contacting a target nucleic acid molecule, e.g., a genome, with an HDR-dependent genome editor and conducting base editing to treat the genetic disease and/or change the genetic trait (e.g., eye color).

In various aspects, the specification provides an HDR-dependent genome editor comprising: (i) a nucleic acid programmable DNA binding protein (napDNAbp) (e.g., nCas9), and (ii) a single-stranded DNA binding protein (e.g., Rad51). The nucleic acid programmable DNA binding protein (napDNAbp) can be a nCas9 domain. The napDNAbp can also be a a Cpf1, a CasX, a CasY, a C2c1, a C2c2, a C2c3, a GeoCas9, a CjCas9, a Cas12a, a Cas12b, a Cas12g, a Cas12h, a Cas12i, a Cas13b, a Cas13c, a Cas13d, a Cas14, a Csn2, an xCas9, an SpCas9-NG, a circularly permuted Cas9, or an Argonaute (Ago) domai, preferably engineered to have nickase activity (i.e., wherein only a single strand is cut).

In various embodiments, the nCas9 (i.e., “nickase Cas9”) can have an amino acid sequence that is at least 80%, 85%, 90%, 95%, 98%, 99%, 99.5%, or 99.9% identical to SEQ ID NO: 1.

In various embodiments of the HDR-dependent genome editor, the single-stranded DNA binding protein is a wild-type Rad51 protein, or a variant thereof.

In certain embodiments, the single-stranded DNA binding protein (“SSDBP”) is a variant of Rad51 that comprises one or more mutations, such as, K133X, R235X, G151X, or R310X mutations, relative to the wildtype Rad51 polypeptide of SEQ ID NOs: 13-18.

In certain embodiments, the single-stranded DNA binding protein (“SSDBP”) is a variant of Rad51 that comprises one or more mutations, such as, K133R, R235E, G151D, or R310A mutations, relative to the wildtype Rad51 polypeptide of SEQ ID NOs: 13-18.

In various embodiments, the single-stranded DNA binding protein comprises an amino acid sequence that is at least 80%, 85%, 90%, 95%, 98%, or 99% identical to the amino acid sequence of any one of SEQ ID NOs: 13-18. In certain aspects, a variant of the single-stranded DNA binding protein (e.g., hRad51) is produced by evolving a starting-point single-stranded DNA binding protein (e.g., hRad51) using a directed evolution methodology. In certain embodiments, the directed evolution methodology comprises phage assisted continuous evolution (PACE).

In various embodiments, the HDR-dependent editor fusion proteins described herein can comprise any of the following structures: NH₂-[napDNAbp]-[SSDBP]—COOH; NH₂—[SSDBP]-[napDNAbp]-COOH; NH₂-[napDNAbp]-[Rad51]-COOH; NH₂-[Rad51]-[napDNAbp]-COOH; NH₂-[nCas9]-[Rad51]-COOH; or NH₂-[Rad51]-[nCas9]-COOH; wherein each instance of “]-[” comprises an optional linker.

In various embodiments, the linkers fusing the napDNAbp and the single-stranded DNA binding protein (e.g., Rad51) can be any suitable amino acid linker sequence, including, for example, any of the following amino acid sequences:

(SEQ ID NO: 2)

SGGSSGGSSGSETPGTSESATPESSGGSSGGS;

(SEQ ID NO: 3)

SGGSGGSGGS;

GGG;

(SEQ ID NO: 4)

GGGS;

(SEQ ID NO: 5)

SGGGS;

(SEQ ID NO: 6)

SGSETPGTSESATPES;

or

(SEQ ID NO: 7)

SGGS.

In various embodiments, the disclosure provides nucleic acid molecules encoding any of the HDR-dependent editor fusion proteins, or domains thereof. The nucleic acid sequences may be codon-optimized for expression in a mammalian cell.

In certain embodiments, the specification provides vectors with appropriate promoters for driving expression of the nucleic acid sequences encoding the HDR-dependent editor fusion proteins (or one or more individual components thereof).

In other aspects, the present specification provides a complex comprising the HDR-dependent editor fusion proteins described herein and an RNA bound to the napDNAbp of the fusion protein, such as a guide RNA (gRNA).

In various aspects, the complex may further comprise an exogenous donor nucleotide sequence, e.g., a double stranded molecule of DNA, which comprises a sequence that is homologous to the target site in the DNA being edited, and further comprises the desired genetic alteration (e.g., a desired nucleobase pair change). For example, the exogenous donor nucleotide sequence can be a double-stranded DNA molecule comprising (i) a first region comprising a nucleotide sequence homologous to the target nucleotide sequence to be altered/edited (e.g., a genomic locus), (ii) a second region comprising a desired genetic alteration (e.g., a single nucleobase pair change), and (iii) a third region comprising a nucleotide sequence homologous to the target nucleotide sequence to be altered/edited (e.g., a genomic locus), wherein the exogenous donor sequence has the structure [region (i)-region (ii)-region (iii)]. The exogenous donor molecule can be double-stranded or singled-stranded DNA or RNA of any suitable length, including from about 10-1600 nucleotides in length. In addition, in some embodiments, the exogenous donor molecule can be provided separately, or it can be covalently attached to the fusion construct by a linker, thereby increasing the effective concentration of the donor molecule (e.g., as described in Aird et al., “Increasing Cas9-mediated homology-directed repair efficiency through covalent tethering of DNA repair template,” Communication Biology, May 31, 2018, which is incorporated herein by reference).

In some embodiments, the target sequence (the sequence to be edited) is a DNA sequence, including a genome. The organism can be a prokaryote or a eukaryote, such as a vertebrate, a mammal, or a human.

In other embodiments, the disclosure provides cells that comprise the herein disclosed HDR-dependent editor fusion proteins, the complexes disclosed herein, the nucleic acid molecules encoding same, or a vector comprising the nucleic acid molecules.

In still other embodiments, the disclosure provides kits comprising nucleic acid nucleic acid constructs comprising: a nucleic acid sequence encoding an HDR-dependent editor fusion protein disclosed herein; and a heterologous promoter that drives expression of the fusion protein. The kits can also comprise an expression construct encoding a guide RNA backbone, wherein the construct comprises a cloning site positioned to allow the cloning of a nucleic acid sequence identical or complementary to a target sequence into the guide RNA backbone.

In other embodiments, the disclosure provides a pharmaceutical composition comprising an HDR-dependent editor fusion protein described herein (or a nucleic acid molecule or vector encoding same), and a pharmaceutically acceptable excipient, and optionally a lipid, such as a cationic lipid. The pharmaceutical compositions can also comprise a polymer.

The disclosure also provides methods of using the compositions described herein, including the HDR-dependent editor fusion proteins, or nucleic acid molecules encoding same, for editing a target nucleotide sequence (e.g., in a genome). In various embodiments, a method is provided for editing a nucleobase pair of a double-stranded DNA sequence, the method comprising:

(i) contacting a double-stranded DNA sequence with a complex comprising an HDR-dependent editor fusion protein, a guide nucleic acid, and a donor nucleotide molecule, wherein the double-stranded DNA comprises a target region (e.g., including one or more nucleobase pairs) to be edited; and

(ii) inducing the endogenous HDR, thereby replacing the target region with the donor nucleotide molecule or a portion thereof. In certain aspects, the method further comprises (iii) cutting (or nicking) one strand of the double-stranded DNA.

The target nucleotide sequence can comprise a target sequence (e.g., a sequence comprising a point mutation) associated with a disease or disorder. The target sequence can encode a protein, and wherein the point mutation is in a codon which results in a change in the amino acid encoded by the mutant codon as compared to a wild-type codon. The target sequence can also be at a splice site, and wherein the point mutation results in a change in the splicing of an mRNA transcript as compared to a wild-type transcript. In addition, the target can be at a promoter of a gene, and wherein the point mutation results in increased or decreased expression of the gene.

The methods described herein involve contacting an HDR-dependent genome editor with a target nucleotide sequence can occur in vitro or in vivo in a subject. The subject can be someone who has been diagnosed with a disease or disorder.

In certain embodiments, the target is in the genome of an organism. In certain embodiments, the organism is a prokaryote. In certain embodiments, the organism is a eukaryote. In certain embodiments, the organism is a vertebrate. In certain embodiments, the vertebrate is a mammal. In certain embodiments, the mammal is a human.

In one aspect, the specification discloses a cell comprising any one of the presently disclosed improved HDR-dependent genome editors. In certain embodiments, the cell is a dividing cell. In other embodiments, the cell is not dividing.

In one aspect, the specification discloses a cell comprising any one of the presently disclosed nucleic acids.

In one aspect, the specification discloses a cell comprising any one of the presently disclosed vectors.

In one aspect, the specification discloses a cell comprising any one of the presently disclosed complexes.

In one aspect, the specification discloses a pharmaceutical composition comprising any one of the presently disclosed fusion proteins.

In one aspect, the specification discloses a pharmaceutical composition comprising any one of the presently disclosed complexes.

In one aspect, the specification discloses a pharmaceutical composition comprising any one of the presently disclosed nucleic acids.

In one aspect, the specification discloses a method comprising contacting a nucleic acid molecule with any of the presently disclosed complexes. In certain embodiments, the nucleic acid is DNA. In certain embodiments, the nucleic acid is double-stranded DNA. In certain embodiments, the nucleic acid comprises a target sequence associated with a disease or disorder. In certain embodiments, the target sequence comprises a point mutation associated with a disease or disorder.

In certain embodiments, the target sequence comprises a C-to-G point mutation associated with a disease or disorder, and wherein the exchange of the C-to-G nucleobase pair with a T-to-A nucleobase pair through HDR-dependent editing using a fusion protein or complex disclosed herein results in a sequence that is not associated with a disease or disorder. This is classified as a transition edit (pyrimidine <-> pyrimidine).

In certain embodiments, the target sequence comprises a T-to-A point mutation associated with a disease or disorder, and wherein the exchange of the T-to-A nucleobase pair with a C-to-G nucleobase pair through HDR-dependent editing using a fusion protein or complex disclosed herein results in a sequence that is not associated with a disease or disorder. This is classified as a transversion edit (pyrimidine <-> purine).

In certain embodiments, the target sequence comprises an A-to-T point mutation associated with a disease or disorder, and wherein the exchange of the A-to-T nucleobase pair with a G-to-C nucleobase pair through HDR-dependent editing using a fusion protein or complex disclosed herein results in a sequence that is not associated with a disease or disorder. This is classified as a transversion edit (purine <-> pyrimidine).

In certain embodiments, the target sequence comprises a G-to-C point mutation associated with a disease or disorder, and wherein the exchange of the G-to-C nucleobase pair with an A-to-T nucleobase pair through HDR-dependent editing using a fusion protein or complex disclosed herein results in a sequence that is not associated with a disease or disorder. This is classified as a transition edit (pyrimidine <-> pyrimidine).

In certain embodiments, the target sequence comprises a C-to-G point mutation associated with a disease or disorder, and wherein the exchange of the C-to-G nucleobase pair with a G-to-C nucleobase pair through HDR-dependent editing using a fusion protein or complex disclosed herein results in a sequence that is not associated with a disease or disorder. This is classified as a transversion edit (pyrimidine <-> purine).

In certain embodiments, the target sequence comprises a G-to-C point mutation associated with a disease or disorder, and wherein the exchange of the G-to-C nucleobase pair with a C-to-G nucleobase pair through HDR-dependent editing using a fusion protein or complex disclosed herein results in a sequence that is not associated with a disease or disorder. This is classified as a transversion edit (purine <-> pyrimidine).

In certain embodiments, the target sequence comprises an A-to-T point mutation associated with a disease or disorder, and wherein the exchange of the A-to-T nucleobase pair with a T-to-A nucleobase pair through HDR-dependent editing using a fusion protein or complex disclosed herein results in a sequence that is not associated with a disease or disorder. This is classified as a transversion edit (purine <-> pyrimidine).

In certain embodiments, the target sequence comprises a T-to-A point mutation associated with a disease or disorder, and wherein the exchange of the T-to-A nucleobase pair with an A-to-T nucleobase pair through HDR-dependent editing using a fusion protein or complex disclosed herein results in a sequence that is not associated with a disease or disorder. This is classified as a transversion edit (pyrimidine <-> purine).

In certain embodiments, the target sequence comprises a G-to-C point mutation associated with a disease or disorder, and wherein the exchange of the G-to-C nucleobase pair with a T-to-A nucleobase pair through HDR-dependent editing using a fusion protein or complex disclosed herein results in a sequence that is not associated with a disease or disorder. This is classified as a transversion edit (purine <-> pyrimindine).

In certain embodiments, the target sequence in which the desired editing is to occur, is sequence agnostic. That is, the genome editors described herein may carry out efficient and accurate editing without requiring a specific sequence context at the target site.

In certain embodiments, the target sequence encodes a protein, and wherein the point mutation is in a codon and results in a change in the amino acid encoded by the mutant codon as compared to a wild-type codon. In certain embodiments, the target sequence is at a splice site, and wherein the point mutation results in a change in the splicing of an mRNA transcript as compared to a wild-type transcript. In certain embodiments, the target sequence is at a promoter of a gene, and wherein the point mutation results in an increased expression of the gene. In certain embodiments, the target sequence is at a promoter of a gene, and wherein the point mutation results in a decreased expression of the gene.

In one aspect, the specification discloses a kit comprising a nucleic acid construct, comprising (a) a nucleic acid sequence encoding any one of the presently disclosed fusion proteins; and (b) a heterologous promoter that drives expression of the sequence of (a). In certain embodiments, the kit further comprises an expression construct encoding a guide RNA backbone, wherein the construct comprises a cloning site positioned to allow the cloning of a nucleic acid sequence identical or complementary to a target sequence into the guide RNA backbone.

In one aspect, the specification discloses a pharmaceutical composition comprising any one of the presently disclosed vectors. In certain embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable excipient. In certain embodiments, the pharmaceutical composition further comprises a lipid or protein (e.g., cationic lipids and cationic proteins). In certain embodiments, the lipid is a cationic lipid. In certain embodiments, the pharmaceutical composition further comprises a polymer.

It should be appreciated that the foregoing concepts, and additional concepts discussed below, may be arranged in any suitable combination, as the present disclosure is not limited in this respect. Further, other advantages and novel features of the present disclosure will become apparent from the following detailed description of various non-limiting embodiments when considered in conjunction with the accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present disclosure, which can be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.

FIG. 1 provides a schematic of traditional Cas9-mediated nucleobase editing by way of the homology-directed repair pathway which is triggered by double-strand breaks. Step 1 shows the cleavage of a desired strand by Cas9 RNA guided nuclease. Step 2 shows the addition of a desired insert DNA sequence flanked by regions homologous to each side of cut-site. Step 3 shows the action of the endogenous homology-directed repair (HDR) mechanism, which uses homologous regions to rejoin cleaved DNA to result in the creation of the intended modified DNA.

FIG. 2 provides a schematic of an embodiment of the improved nuclease editor construct for homology-directed repair of a target nucleobase. The schematic depicts a generalized process (100) of editing a double-stranded target DNA (101) having an X′X target nucleobase pair (e.g., a GC nucleobase pair). The target DNA (101) also is depicted with a PAM sequence on one strand that is approximately 12-17 base pairs from the target base pair X′:X. In this embodiment, the fusion protein comprises a nucleic acid programmable DNA binding protein (napDNAbp) with nickase activity (e.g., a Cas9 nickase domain) (102) that is translationally fused to a single-stranded DNA binding protein (e.g., Rad51) (108). The fusion protein is complexed with a sgRNA (105) that comprises a region that is complementary to and binds a region of the target DNA (101) comprising the target base pair X′:X within the ssDNA bubble formed by the napDNAbp nickase (102). The napDNAbp nickase 102 cleaves a single strand of the target DNA sequence on one of the strands at (104). The nicked DNA induces the homology-directed repair (HDR) (107), and in the presence of a donor double stranded DNA (106) having a donor second nucleobase pair (Y′:Y) (e.g., an A:T nucleobase pair), the X′:X target nucleobase pair target base pair (e.g., a G:C nucleobase pair) is replaced by the donor nucleobase pair (Y′:Y) (e.g., a A:T nucleobase pair). Thus, in this example, a G:C nucleobase pair is replaced with an A:T nucleobase pair. As demonstrated in the Examples, the single-stranded DNA binding protein (e.g., Rad51) improves the rate of homology-directed repair as compared to the rate of homology-directed repair.

FIGS. 3A-3C demonstrate that Cas9 nickases generate unpredictable levels of indels. FIG. 3A is a graph comparing the percent (%) of sequencing reads containing indels using eight different sgRNAs (118, 119, A18, A19, A20, 167, 171, and 184) and four different editing constructs (Cas9, D10A-nCas9, H840A-nCas9, and dead Cas9 (D10A, H840A). The nickases are the D10A-nCas9 and H840A-nCas9 constructs. They each nick different strands of a target cut site. FIG. 3B is a graph comparing the percent (%) of sequencing reads with indels at one DNA locus using a Cas9 nuclease (high levels of indels), a Cas9 D10A nickase, a Cas9 H840A nickase, a dCas9 (dead Cas9), and K133R-D10A construct. FIG. 3C shows the target sequence and binding sites for sgRNA210 and sgRNA211 and A18 (HEK2).

FIGS. 4A-4B demonstrate that Cas9 nickases generate a favorable HDR:indel ratio when a donor ssODN (single-stranded oligodeoxynucleotide) is supplied. FIG. 4A shows the rate of homology-directed repair (Y-axis) triggered by various constructs at a range of different target site loci (X-axis). DSB-induced editing generates an excess of indels (i.e., the Cas9 construct). The nickases (D10A and H840A) also trigger HDR but at a much lower rate. The control, dCas9 does not trigger HDR. FIG. 4B shows the rate of indel formation in HDR-oligo-treated cells. The graph shows that the rate of indel formation remains high with the Cas9 construct, but relatively low to non-existent in the nickase constructs (D10A and H840A), similar to the dCas9 control. Thus, the absolute rate of HDR remains low with nickases, but the relative rate of HDR as compared to indel formation is higher with nickases than when a double-stranded DNA break (Cas9) is used to stimulate HDR.

FIG. 5 demonstrates that fusion of hRad51 (human Rad51) to D10A nickase improves/increases the rate of HDR To be especially useful, the absolute rate of HDR must be increased. N-terminal fusion of hRad51 to a nickase, or mutants thereof, increases the rate of nickase-induced HDR. In general, the absolute rate of HDR with hRad51-D10A fusions generally exceeds the rate with a Cas9 DSB.

FIG. 6 demonstrates that indel rates increase slightly but remain low with hRad51-D10A fusions as compared to nickase alone. The indel rate formation is approximately the same as among several hRad51 mutants.

FIGS. 7A-7B demonstrate that hRad51 when fused to Cas9 does not have a significant effect on (FIG. 7A) the rate of HDR or (FIG. 7B) the rate of indel formation relative to Cas9 alone.

FIGS. 8A-8B demonstrate that hRad51 fused to the H840A nickase has a negligible effect on (FIG. 8A) the rate of HDR and (FIG. 8B) the rate of indel formation.

FIG. 9 demonstrates that alternate single-strand DNA binding proteins (SSB) or proteins involved in HDR (e.g., ExoI or BCCIP) did not improve the rate of HDR.

FIG. 10 demonstrates nick-induced indels. Unexpectedly, changing the strand nicked by Cas9 (with D10A or H840A mutations) lead to site-dependent differences in indel generation. Previous reports, overwhelmingly in GFP reporter assays, have focused on D10A nickase and reported that D10A is more efficient for indel generation than H840A.

FIG. 11 demonstrates nick-induced HDR. Rates of HDR when Cas9/nickases were transfected along with an ssODN designed according to ‘CORRECT’ principles were investigated. Again, a site-dependence was observed describing which nickase is more efficient. Nickase-induced HDR leads to a reduced mean efficiency than DSB-mediated HDR. However, for the ‘more efficient nickase’ the difference is small (often not statistically significant).

FIG. 12 demonstrates nick-induced genome editing. Combining the previous two insights, if a Cas9 nickase can mediate indel formation at a particular locus, combining the nickase with an HDR template leads to relatively efficient HDR, without the excess of indels observed with Cas9 nuclease editing. This data inspired the employment of nick-directed HDR to address three issues with traditional double-strand break (DSB) induced HDR: (1) HDR is accompanied by an excess of undesired indel by-products, (2) HDR is low in efficiency, even in dividing cells, and (3) DSB generation is toxic and can lead to activation of cellular DSB response pathways. However, using this simple nick-induced editing strategy, the absolute rate of nick-induced HDR remains low.

FIG. 13 demonstrates new editing constructs for improving nick-induced HDR. At these 5 test loci, using both hRad51-nickase fusions and hRad51(K133R)-nickase fusions enhanced rates of HDR. HEK site 2 is intriguing; it is the only locus where H840A nickase is significantly more efficient at generating indels or HDR. However, even for HEK2, the most effective fusion construct is the hRad51[K133R]-Cas9[D10A nickase].

FIG. 14 demonstrates curiosities of HEK site 2. The high efficiency of hRad51 [K133R]-Cas9[D10A nickase] at this locus (where D10A is not effective) is particularly encouraging for pursuing this construct further.

FIG. 15 demonstrates challenging loci. One advantage of HDR is that there is reduced restriction on the target site choice, so pathogenically relevant mutations can be generated from the outset. Three genes were investigated with 2 sgRNAs for each; the nickase strategy failed for PAH and SERPA1 but the hRad fusions enabled HDR at LDLR.

FIG. 16 demonstrates the titration experiment. This oligo:plasmid titration experiment was conducted under optimized conditions to ascertain the sensitivity of the system to fluctuations in oligo and plasmid amount. It appears that there is not a great deal of difference between 100-200 ng of donor ssODN and 200-800 ng plasmid (total plasmid) with 1.4u1 of L2000, but the system is not exquisitely sensitive to fluctuations in plasmid/donor ssODN amount.

FIGS. 17A-17F show indel formation and HDR in HEK293T cells mediated by Cas9 or Cas9 nickases. FIG. 17A shows a DSB-mediated HDR using Cas9 and a 100-mer ssODN. FIG. 17B shows the DNA nicks resulting from a Cas9(D10A) or Cas9(H840A) nickase. FIG. 17C is a graph of the % HTS reads with indels resulting from Cas9 nuclease, Cas9 nickase, or dead Cas9 at eight loci in HEK293T cells. FIG. 17D is a graph showing a comparison of indel frequencies associated with three sgRNAs in close proximity. The sgRNA sequences used are shown, with arrows marked with a ‘*’ indicating nicks induced by Cas9(D10A) nickase, and unmarked arrows showing nicks by Cas9(H840A) nickase. FIG. 17E is a graph showing HDR frequencies measured by high-throughput DNA sequencing of unsorted HEK293T cells at eight endogenous genomic loci. FIG. 17F is a graph showing the HDR:indel ratio associated with editing at eight loci. All data are shown as individual data points and mean±s.d. for n=3 independent biological replicates, performed on different days.

FIGS. 18A-18E show the manipulation of HDR frequencies by global manipulation of cellular repair proteins. FIG. 18A is an overview of the experimental procedure. FIGS. 18B and 18E show HDR frequencies, measured by high-throughput DNA sequencing of unsorted HEK293T cells at eight endogenous genomic loci. FIGS. 18C and 18E show the HDR:indel ratio at eight loci. FIGS. 18B and 18C also show data associated with treatment of Cas9(D10A) nickase, and FIGS. 18D and 18E show data associated with treatment of Cas9 nuclease. All data are shown as individual data points and mean±s.d. for n=3 independent biological replicates, performed on different days. A two-tailed t-test was used to determine statistical significance between the indicated sample and Cas9(D10A) alone (as shown in FIG. 18B) or Cas9 alone (as shown in FIG. 18D). (*): 0.01<p<0.05; (**): 0.001<p<0.01.

FIGS. 19A-19G show the HDR frequencies associated with fusion constructs between hRad51 and its mutants, and Cas9 or Cas9 nickases. FIG. 19A shows catalytic activity and protein-protein binding interactions associated with hRad51, mutants of hRad51 and the homologous protein recA. ‘+’ indicates activity has been validated; ‘−’ indicates the absence of activity has been validated; ‘?’ indicates activity is unknown; ‘(+)’ indicates activity has not been explicitly validated but is expected from structural data; and ‘++’ indicates improved activity relative to wild type. FIGS. 19B-19D are dot plots depicting the average HDR frequencies and the average HDR:indel ratio associated with the indicated construct measured by high-throughput sequencing in unsorted HEK293T cells at eight loci. FIG. 19B is a comparison of fusion constructs between Cas9(D10A) and hRad51(K133R) with different fusion architectures. FIG. 19C is a comparison between catalytic mutants of hRad51 bound to the N-terminus of Cas9(D10A). FIG. 19D is a comparison between binding mutants of hRad51 bound to the N-terminus of Cas9(D10A). FIG. 19E shows the HDR frequencies associated with hRad51 and the mutants depicted in FIG. 19D, plotted by genomic locus. FIG. 19F shows the HDR:indel ratio associated with editing at eight loci. For FIGS. 19E and 19D, data are shown as individual data points and mean±s.d. for n=3 independent biological replicates, performed on different days. FIG. 19G is a model of possible editing outcomes from hRad51-Cas9(D10A) nickase fusions.

FIGS. 20A-20B are graphs offering a characterization of positional dependence and off-target editing of nick-mediated HDR. FIG. 20A shows the HDR frequencies measured by high-throughput sequencing in unsorted HEK293T cells using ssODNs with point mutations distributed along the sgRNA protospacer sequence of the HEK 3 sgRNA site. In previous figures, an oligonucleotide with a different PAM-blocking mutation at HEK Site 3 was used to measure an SNP incorporated at position 12 in the protospacer. FIG. 20B shows the indel frequencies at off-target genomic loci in cells treated with Cas9 nuclease, Cas9(D10A) nickase, or Cas9(D10A) fusions with hRad51 or the indicated mutants thereof. Dead Cas9 (dCas9) treated cells were included as a negative control. All data are shown as individual data points and mean±s.d. for n=3 independent biological replicates, performed on different days.

FIGS. 21A-21J show hRad51-Cas9(D10A) nickase activity in K562, U20S, HeLa and hiPS cells. FIGS. 21A, 21C, 21E, and 21G show HDR frequencies measured by high-throughput sequencing in unsorted, nucleofected cells at three loci. FIGS. 21B, 21D, 21F, 21H show the HDR:indel ratios associated with editing at the same three loci. FIGS. 21I and 21J show the HDR frequency and HDR:indel ratios in iPS cells nucleofected with P2A-GFP tagged constructs and sorted for GFP-positive cells. All data are shown as individual data points and mean±s.d. for n=3 biological replicates, performed independently. An HDR:indel ratio was not reported if the HDR frequency was <0.1%.

FIGS. 22A-22B show the frequency of nick-induced indels in HeLa and U2OS cells. Cells were lipofected with Cas9, D10A, H840A nickase or dCas9 plasmid and a plasmid expressing the indicated sgRNA. DNA was harvested and sequenced from unsorted cells and subjected to HTS. FIG. 22A shows indel frequencies in HeLa cells. FIG. 22B shows indel frequencies in U2OS cells. All data are shown as individual data points and mean±s.d. for n=3 independent biological replicates, performed on different days.

FIGS. 23A-23B show correlations between HDR and indel frequencies and between indel frequencies and micro homology with Cas9 nuclease and Cas9 nickases. FIG. 23A shows indel frequencies in the absence of an ssODN plotted against HDR frequency. These data are also represented in FIGS. 17C and 17E. FIG. 23B shows indel frequencies correlated to the micro homology score predicted by inDelphi⁷⁰for each of the eight loci shown in FIGS. 17C and 17D. For both FIGS. 23A and 23B, p-values were calculated in Prism. For FIG. 23A, p-values represent a linear regression analysis to determine whether the slope is significantly non-zero. For FIG. 23B, p-values represent a two-tailed test to determine whether the MH score is significantly correlated to the indel frequency. Data shown are the mean±s.d. for n=3 independent biological replicates in HEK293T cells, performed on different days.

FIGS. 24A-24D show titrations of plasmids and ssODN quantities for lipofection-mediated transfections. HDR and indel frequencies associated with the indicated quantities of plasmid or ssODN, targeted to the indicated genomic locus. FIGS. 24A and 24C show HDR and indel rates associated with D10A nickase; FIGS. 24B and 24D show HDR and indel frequencies associated with the hRad51(K133R)-D10A fusion. 1.4 μL Lipofectamine 2000 was used for all conditions. All data are shown as individual data points and mean±s.d. for n=3 independent biological replicates, performed on different days.

FIG. 25 shows an assessment of the effect of ssODN sense on HDR frequencies in HEK293T cells. Related to FIG. 20A. The single-stranded oligonucleotide donor (ssODN) sense (forward or reverse) was varied in the context of introducing single point mutations at different locations at the HEK 3 locus. Forward ssODN indicates that the ssODN donor is in the same sense as the sgRNA, and reverse ssODN indicates that the ssODN donor is in the reverse sense relative to the sgRNA (see Table 2). Data are shown as mean±s.d. for n=3 independent biological replicates, performed on different days.

FIGS. 26A-26J show site-by-site plots of HDR frequencies and HDR:indel ratios in HEK293T cells, as described in FIGS. 19A-19G. FIGS. 26A, 26C, 26E, 26G, and 26I show site-by-site plots of HDR frequencies. FIGS. 26B, 26D, 26F, 26H, and 26J show site-by-site plots of HDR:indel ratios. These data are processed and plotted in FIGS. 19B and 19C. Data are shown as individual data points and mean±s.d. for n=3 independent biological replicates, performed on different days. Note that RDN specifically refers to the construct containing a hRad51 monomer N-terminally fused to the Cas9(D10A) nickase (i.e. hRad51-Cas9(D10A)).

FIGS. 27A-27B show indel formation and base editing in HEK293T cells at the same genomic loci as shown in FIGS. 17A-17F. FIG. 27A shows indel frequencies associated with genome editors and D10A nickase. FIG. 27B shows base editing rates associated with genome editors. All data are shown as individual data points and mean±s.d. for n=3 independent biological replicates, performed on different days.

FIGS. 28A-28B show comparisons of apparent HDR frequencies with and without magnetic bead-based purification of genomic DNA. FIG. 28A shows HEK293T cells were lipofected with a plasmid encoding dCas9, a plasmid encoding the indicated sgRNA, and a 50 ng of a homologous 100-mer ssODN. Cells were lysed 4 days after treatment and crude cell lysate was saved before genomic DNA purification was performed with DNAdvance beads. The purified and unpurified genomic DNA samples were amplified by PCR and subjected to HTS. FIG. 28B shows artifactual HDR frequencies recorded from addition of 100-mer ssODN to genomic DNA isolated from untreated HEK293T cells. The indicated ssODN was added to 600 ng genomic DNA and the resulting mixture subjected to PCR and HTS as described in the methods (“unpurified samples”). A sample of each ssODN and genomic DNA mixture was purified using Agincourt DNAdvance magnetic beads as described in the Methods (“purified samples”) to assess the extent to which bead-based purification can separate genomic DNA from ssODN donor.

FIG. 29 shows gating examples for flow sorting human iPSC cells (hiPSC). Related to FIGS. 21I and 21J. Examples of flow sorting gates for single cells and for GFP+ cells are shown.

DEFINITIONS

Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings ascribed to them unless specified otherwise.

a/an/the

As used herein and in the claims, the singular forms “a,” “an,” and “the” include the singular and the plural reference unless the context clearly indicates otherwise. Thus, for example, a reference to “an agent” includes a single agent and a plurality of such agents.

Cas9

The term “Cas9” or “Cas9 nuclease” refers to an RNA-guided nuclease comprising a Cas9 domain, or a fragment thereof (e.g., a protein comprising an active or inactive DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9). A “Cas9 domain” as used herein, is a protein fragment comprising an active or inactive cleavage domain of Cas9 and/or the gRNA binding domain of Cas9. A “Cas9 protein” is a full length Cas9 protein. A Cas9 nuclease is also referred to sometimes as a casnl nuclease or a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat)-associated nuclease. CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements, and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 domain. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of which are hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J. J., McShan W. M., Ajdic D. J., Savic D. J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A. N., Kenton S., Lai H. S., Lin S. P., Qian Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S. W., Roe B. A., McLaughlin R. E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C. M., Gonzales K., Chao Y., Pirzada Z. A., Eckert M. R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease comprises one or more mutations that partially impair or inactivate the DNA cleavage domain.

A nuclease-inactivated Cas9 domain may interchangeably be referred to as a “dCas9” protein (for nuclease-“dead” Cas9). Methods for generating a Cas9 domain (or a fragment thereof) having an inactive DNA cleavage domain are known (see, e.g., Jinek et al., Science. 337:816-821(2012); Qi et al., “Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression” (2013) Cell. 28; 152(5):1173-83, the entire contents of each of which are incorporated herein by reference). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, the HNH nuclease subdomain and the RuvC1 subdomain. The HNH subdomain cleaves the strand complementary to the gRNA, whereas the RuvC1 subdomain cleaves the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas9. For example, the mutations D10A and H840A completely inactivate the nuclease activity of S. pyogenes Cas9 (Jinek et al., Science. 337:816-821(2012); Qi et al., Cell. 28; 152(5):1173-83 (2013)). In some embodiments, proteins comprising fragments of Cas9 are provided. For example, in some embodiments, a protein comprises one of two Cas9 domains: (1) the gRNA binding domain of Cas9; or (2) the DNA cleavage domain of Cas9. In some embodiments, proteins comprising Cas9 or fragments thereof are referred to as “Cas9 variants.” A Cas9 variant shares homology to Cas9, or a fragment thereof. For example, a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, at least about 99.8% identical, or at least about 99.9% identical to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 9). In some embodiments, the Cas9 variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes compared to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 9). In some embodiments, the Cas9 variant comprises a fragment of SEQ ID NO:9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 9). In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 9).

CRISPR

CRISPR is a family of DNA sequences (i.e., CRISPR clusters) in bacteria and archaea that represent snippets of prior infections by a virus that have invaded the prokaryote. The snippets of DNA are used by the prokaryotic cell to detect and destroy DNA from subsequent attacks by similar viruses and effectively compose, along with an array of CRISPR-associated proteins (including Cas9 and homologs thereof) and CRISPR-associated RNA, a prokaryotic immune defense system. In nature, CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In certain types of CRISPR systems (e.g., type II CRISPR systems), correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the RNA. Specifically, the target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species—the guide RNA. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of which is hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. CRISPR biology, as well as Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J. J., McShan W. M., Ajdic D. J., Savic D. J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A. N., Kenton S., Lai H. S., Lin S. P., Qian Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S. W., Roe B. A., McLaughlin R. E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C. M., Gonzales K., Chao Y., Pirzada Z. A., Eckert M. R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference.

In certain types of CRISPR systems (e.g., type II CRISPR systems), correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc), and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular nucleic acid target complementary to the RNA. Specifically, the target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gRNA”) can be engineered so as to incorporate embodiments of both the crRNA and tracrRNA into a single RNA species— the guide RNA.

In general, a “CRISPR system” refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), or other sequences and transcripts from a CRISPR locus. The tracrRNA of the system is complementary (fully or partially) to the tracr mate sequence present on the guide RNA.

Effective Amount

The term “effective amount,” as used herein, refers to an amount of a biologically active agent that is sufficient to elicit a desired biological response. For example, in some embodiments, an effective amount of an HDR-dependent genome editor may refer to the amount of the editor that is sufficient to edit a target site nucleotide sequence, e.g., a genome. In some embodiments, an effective amount of a HDR-dependent genome editor provided herein, e.g., of a fusion protein comprising a nickase Cas9 domain and a Rad51 domain, may refer to the amount of the fusion protein that is sufficient to induce editing of a target site specifically bound and edited by the fusion protein. As will be appreciated by the skilled artisan, the effective amount of an agent, e.g., a fusion protein, a nuclease, a hybrid protein, a protein dimer, a complex of a protein (or protein dimer) and a polynucleotide, or a polynucleotide, may vary depending on various factors as, for example, on the desired biological response, e.g., on the specific allele, genome, or target site to be edited, on the cell or tissue being targeted, and on the agent being used.

Effector Domain

The term, as used herein, “effector domain,” or the equivalent terms “nucleobase modification moiety” or “nucleic acid effector domain,” embrace any protein, enzyme, or polypeptide (or functional fragment thereof) which is capable of modifying a DNA or RNA molecule. Nucleobase modification moieties can be naturally occurring, or can be recombinant. For example, a nucleobase modification moiety can include one or more DNA repair enzymes, for example, and an enzyme or protein involved in base excision repair (BER), nucleotide excision repair (NER), homology-dependnent recombinational repair (HR), non-homologous end-joining repair (NHEJ), microhomology end-joining repair (MMEJ), mismatch repair (MMR), direct reversal repair, or other known DNA repair pathway. A nucleobase modification moiety can have one or more types of enzymatic activities, including, but not limited to endonuclease activity, polymerase activity, ligase activity, replication activity, proofreading activity. The “nucleic acid effector domain” (e.g., a DNA effector domain or an RNA effector domain) as used herein may also refer to a protein or enzyme capable of making one or more modifications to a nucleic acid (e.g., DNA or RNA). Exemplary nucleic acid editing domains include, but are not limited to a nuclease, a nickase, a recombinase, a methyltransferase, a methylase, an acetylase, an acetyltransferase, a transcriptional activator, or a transcriptional repressor domain.

Gene of Interest

The term “gene of interest,” as used herein, refers to a nucleic acid construct comprising a nucleotide sequence encoding a gene product of interest, for example, a gene product (e.g., a genome editor or component/domain thereof) to be evolved in a continuous evolution process as provided herein. The term includes any variations of a gene of interest that are the result of a continuous evolution process according to methods provided herein. For example, in some embodiments, a gene of interest is a nucleic acid construct comprising a nucleotide sequence encoding a protein to be evolved, cloned into a viral vector, for example, a phage genome, so that the expression of the encoding sequence is under the control of one or more promoters in the viral genome. In other embodiments, a gene of interest is a nucleic acid construct comprising a nucleotide sequence encoding a protein to be evolved and a promoter operably linked to the encoding sequence. When cloned into a viral vector, for example, a phage genome, the expression of the encoding sequence of such genes of interest is under the control of the heterologous promoter and, in some embodiments, may also be influenced by one or more promoters comprised in the viral genome.

Gene Function

The term “function of a gene of interest,” as interchangeably used with the term “gene function” or “activity of a gene of interest,” refers to a function or activity of a gene product, for example, a nucleic acid, or a protein, encoded by the gene of interest. For example, a function of a gene of interest may be an enzymatic activity (e.g., an enzymatic activity resulting in the generation of a reaction product, phosphorylation activity, phosphatase activity, etc.), an ability to activate transcription (e.g., transcriptional activation activity targeted to a specific promoter sequence), a bond-forming activity, (e.g., an enzymatic activity resulting in the formation of a covalent bond), or a binding activity (e.g., a protein, DNA, or RNA binding activity).

Functional Equivalent

The term “functional equivalent” refers to a second biomolecule that is equivalent in function, but not necessarily equivalent in structure to a first biomolecule. For example, a “Cas9 equivalent” refers to a protein that has the same or substantially the same functions as Cas9, but not necessarily the same amino acid sequence. In the context of the disclosure, the specification refers throughout to “a protein X, or a functional equivalent thereof.” In this context, a “functional equivalent” of protein X embraces any homolog, paralog, fragment, naturally occurring, engineered, mutated, or synthetic version of protein X which bears an equivalent function.

Fusion Protein

The term “fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an “amino-terminal fusion protein” or a “carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic-acid editing protein. Another example includes a Cas9 or equivalent thereof to a reverse transcriptase. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.

HDR-Dependent Genome Editing

As used herein, the term “HDR-dependent genome editing” (wherein the acronym HDR refers to homology-directed repair) refers to a mode of editing performed by the fusion constructs described herein. In particular, the fusion constructs (e.g., nCas9 (nickase) fused to a single-stranded DNA binding protein, such as Rad51) are capable of nicking a target DNA sequence (i.e, on one strand only, rather than creating a double-strand break), which in the presence of Rad51 and a donor nucleotide sequence (e.g., a donor double-stranded DNA molecule), is capable of inducing homology-directed repair to replace or otherwise exchange the target DNA sequence (i.e., the sequence being edited) with the donor sequence (i.e., carrying the desired edited genetic alternation, e.g., a new nucleobase pair). Since many genetic diseases arise from point mutations, this technology has important implications in the study of human health and disease. This process is distinct from pure “base editing,” as defined above, as the source of the altered change in the target sequence (e.g., a new nucleobase pair) is a donor sequence that comprises the desired genetic alteration, and which is then incorporated into the target sequence through homology-directed repair.

Guide RNA (“gRNA”)

As used herein, the term “guide RNA” is a particular type of guide nucleic acid which is mostly commonly associated with a Cas protein of a CRISPR-Cas9 and which associates with Cas9, directing the Cas9 protein to a specific sequence in a DNA molecule that includes complementarity to protospace sequence of the guide RNA. However, this term also embraces the equivalent guide nucleic acid molecules that associate with Cas9 equivalents, homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and which otherwise program the Cas9 equivalent to localize to a specific target nucleotide sequence. The Cas9 equivalents may include other napDNAbp from any type of CRISPR system (e.g., type II, V, VI), including Cpf1 (a type-V CRISPR-Cas systems), C2c1 (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system) and C2c3 (a type V CRISPR-Cas system). Further Cas-equivalents are described in Makarova et al., “C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299), the contents of which are incorporated herein by reference. Exemplary sequences are and structures of guide RNAs are provided herein. In addition, methods for designing appropriate guide RNA sequences are provided herein.

Guide RNAs may comprise various structural elements that include, but are not limited to:

Spacer sequence—the sequence in the guide RNA (having about 20 nts in length) which binds to the protospacer in the target DNA.

gRNA core (or gRNA scaffold or backbone sequence)—refers to the sequence within the gRNA that is responsible for Cas9 binding, it does not include the 20 bp spacer/targeting sequence that is used to guide Cas9 to target DNA.

Inteins

As used herein, the term “intein” refers to auto-processing polypeptide domains found in organisms from all domains of life. An intein (intervening protein) carries out a unique auto-processing event known as protein splicing in which it excises itself out from a larger precursor polypeptide through the cleavage of two peptide bonds and, in the process, ligates the flanking extein (external protein) sequences through the formation of a new peptide bond. This rearrangement occurs post-translationally (or possibly co-translationally), as intein genes are found embedded in frame within other protein-coding genes. Furthermore, intein-mediated protein splicing is spontaneous; it requires no external factor or energy source, only the folding of the intein domain. This process is also known as cis-protein splicing, as opposed to the natural process of trans-protein splicing with “split inteins.” Inteins are the protein equivalent of the self-splicing RNA introns (see Perler et al., Nucleic Acids Res. 22:1125-1127 (1994)), which catalyze their own excision from a precursor protein with the concomitant fusion of the flanking protein sequences, known as exteins (reviewed in Perler et al., Curr. Opin. Chem. Biol. 1:292-299 (1997); Perler, F. B. Cell 92(1):1-4 (1998); Xu et al., EMBO J. 15(19):5146-5153 (1996)).

The elucidation of the mechanism of protein splicing has led to a number of intein-based applications (Comb, et al., U.S. Pat. No. 5,496,714; Comb, et al., U.S. Pat. No. 5,834,247; Camarero and Muir, J. Amer. Chem. Soc., 121:5597-5598 (1999); Chong, et al., Gene, 192:271-281 (1997), Chong, et al., Nucleic Acids Res., 26:5109-5115 (1998); Chong, et al., J. Biol. Chem., 273:10567-10577 (1998); Cotton, et al. J. Am. Chem. Soc., 121:1100-1101 (1999); Evans, et al., J. Biol. Chem., 274:18359-18363 (1999); Evans, et al., J. Biol. Chem., 274:3923-3926 (1999); Evans, et al., Protein Sci., 7:2256-2264 (1998); Evans, et al., J. Biol. Chem., 275:9091-9094 (2000); Iwai and Pluckthun, FEBS Lett. 459:166-172 (1999); Mathys, et al., Gene, 231:1-13 (1999); Mills, et al., Proc. Natl. Acad. Sci. USA 95:3543-3548 (1998); Muir, et al., Proc. Natl. Acad. Sci. USA 95:6705-6710 (1998); Otomo, et al., Biochemistry 38:16040-16044 (1999); Otomo, et al., J. Biolmol. NMR 14:105-114 (1999); Scott, et al., Proc. Natl. Acad. Sci. USA 96:13638-13643 (1999); Severinov and Muir, J. Biol. Chem., 273:16205-16209 (1998); Shingledecker, et al., Gene, 207:187-195 (1998); Southworth, et al., EMBO J. 17:918-926 (1998); Southworth, et al., Biotechniques, 27:110-120 (1999); Wood, et al., Nat. Biotechnol., 17:889-892 (1999); Wu, et al., Proc. Natl. Acad. Sci. USA 95:9226-9231 (1998a); Wu, et al., Biochim Biophys Acta 1387:422-432 (1998b); Xu, et al., Proc. Natl. Acad. Sci. USA 96:388-393 (1999); Yamazaki, et al., J. Am. Chem. Soc., 120:5591-5592 (1998)). Each reference is incorporated herein by reference.

Although inteins are most frequently found as a contiguous domain, some exist in a naturally split form. In this case, the two fragments are expressed as separate polypeptides and must associate before splicing takes place, so-called protein trans-splicing.

An exemplary split intein is the Ssp DnaE intein, which comprises two subunits, namely, DnaE-N and DnaE-C. The two different subunits are encoded by separate genes, namely dnaE-n and dnaE-c, which encode the DnaE-N and DnaE-C subunits, respectively. DnaE is a naturally occurring split intein in Synechocytis sp. PCC6803 and is capable of directing trans-splicing of two separate proteins, each comprising a fusion with either DnaE-N or DnaE-C.

Additional naturally occurring or engineered split-intein sequences are known in the or can be made from whole-intein sequences described herein or those available in the art. Examples of split-intein sequences can be found in Stevens et al., “A promiscuous split intein with expanded protein engineering applications,” PNAS, 2017, Vol. 114: 8538-8543; Iwai et al., “Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostc punctiforme, FEBS Lett, 580: 1853-1858, each of which are incorporated herein by reference. Additional split intein sequences can be found, for example, in WO 2013/045632, WO 2014/055782, WO 2016/069774, and EP2877490, the contents each of which are incorporated herein by reference.

In addition, protein splicing in trans has been described in vivo and in vitro (Shingledecker, et al., Gene 207:187 (1998), Southworth, et al., EMBO J. 17:918 (1998); Mills, et al., Proc. Natl. Acad. Sci. USA, 95:3543-3548 (1998); Lew, et al., J. Biol. Chem., 273:15887-15890 (1998); Wu, et al., Biochim. Biophys. Acta 35732:1 (1998b), Yamazaki, et al., J. Am. Chem. Soc. 120:5591 (1998), Evans, et al., J. Biol. Chem. 275:9091 (2000); Otomo, et al., Biochemistry 38:16040-16044 (1999); Otomo, et al., J. Biolmol. NMR 14:105-114 (1999); Scott, et al., Proc. Natl. Acad. Sci. USA 96:13638-13643 (1999)) and provides the opportunity to express a protein as to two inactive fragments that subsequently undergo ligation to form a functional product, e.g., as shown in FIGS. 66 and 67 with regard to the formation of a complete PE fusion protein from two separately-expressed halves.

Ligand-Dependent Intein

The term “ligand-dependent intein,” as used herein refers to an intein that comprises a ligand-binding domain. Typically, the ligand-binding domain is inserted into the amino acid sequence of the intein, resulting in a structure intein (N)—ligand-binding domain—intein (C). Typically, ligand-dependent inteins exhibit no or only minimal protein splicing activity in the absence of an appropriate ligand, and a marked increase of protein splicing activity in the presence of the ligand. In some embodiments, the ligand-dependent intein does not exhibit observable splicing activity in the absence of ligand but does exhibit splicing activity in the presence of the ligand. In some embodiments, the ligand-dependent intein exhibits an observable protein splicing activity in the absence of the ligand, and a protein splicing activity in the presence of an appropriate ligand that is at least 5 times, at least 10 times, at least 50 times, at least 100 times, at least 150 times, at least 200 times, at least 250 times, at least 500 times, at least 1000 times, at least 1500 times, at least 2000 times, at least 2500 times, at least 5000 times, at least 10000 times, at least 20000 times, at least 25000 times, at least 50000 times, at least 100000 times, at least 500000 times, or at least 1000000 times greater than the activity observed in the absence of the ligand. In some embodiments, the increase in activity is dose dependent over at least 1 order of magnitude, at least 2 orders of magnitude, at least 3 orders of magnitude, at least 4 orders of magnitude, or at least 5 orders of magnitude, allowing for fine-tuning of intein activity by adjusting the concentration of the ligand. Suitable ligand-dependent inteins are known in the art, and in include those provided below and those described in published U.S. Patent Application U.S. 2014/0065711 A1; Mootz et al., “Protein splicing triggered by a small molecule.” J. Am. Chem. Soc. 2002; 124, 9044-9045; Mootz et al., “Conditional protein splicing: a new tool to control protein structure and function in vitro and in vivo.” J. Am. Chem. Soc. 2003; 125, 10561-10569; Buskirk et al., Proc. Natl. Acad. Sci. USA. 2004; 101, 10505-10510); Skretas & Wood, “Regulation of protein activity with small-molecule-controlled inteins.” Protein Sci. 2005; 14, 523-532; Schwartz, et al., “Post-translational enzyme activation in an animal via optimized conditional protein splicing.” Nat. Chem. Biol. 2007; 3, 50-54; Peck et al., Chem. Biol. 2011; 18 (5), 619-630; the entire contents of each are hereby incorporated by reference.

Linker

The term “linker,” as used herein, refers to a molecule linking two other molecules or moieties. The linker can be an amino acid sequence in the case of a linker joining two fusion proteins. For example, a Cas9 can be fused to a Rad51 by an amino acid linker sequence. The linker can also be a nucleotide sequence in the case of joining two nucleotide sequences together. In other embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.

Mutation

The term “mutation,” as used herein, refers to a substitution of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. Mutations are typically described herein by identifying the original residue followed by the position of the residue within the sequence and by the identity of the newly substituted residue. Various methods for making the amino acid substitutions (mutations) provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4^thed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)). Mutations can include a variety of categories, such as single base polymorphisms, microduplication regions, indel, and inversions, and is not meant to be limiting in any way. Mutations can include “loss-of-function” mutations which is the normal result of a mutation that reduces or abolishes a protein activity. Most loss-of-function mutations are recessive, because in a heterozygote the second chromosome copy carries an unmutated version of the gene coding for a fully functional protein whose presence compensates for the effect of the mutation. There are some exceptions where a loss-of-function mutation is dominant, one example being haploinsufficiency, where the organism is unable to tolerate the approximately 50% reduction in protein activity suffered by the heterozygote. This is the explanation for a few genetic diseases in humans. Mutations also embrace “gain-of-function” mutations, which is one which confers an abnormal activity on a protein or cell that is otherwise not present in a normal condition. Many gain-of-function mutations are in regulatory sequences rather than in coding regions, and can therefore have a number of consequences. For example, a mutation might lead to one or more genes being expressed in the wrong tissues, these tissues gaining functions that they normally lack. Alternatively the mutation could lead to overexpression of one or more genes involved in control of the cell cycle, thus leading to uncontrolled cell division and hence to cancer. Because of their nature, gain-of-function mutations are usually dominant.

napDNAbp

As used herein, the term “nucleic acid programmable DNA binding protein” or “napDNAbp,” of which Cas9 is an example, refer to a proteins which use RNA:DNA hybridization to target and bind to specific sequences in a DNA molecule. Each napDNAbp is associated with at least one guide nucleic acid (e.g., guide RNA), which localizes the napDNAbp to a DNA sequence that comprises a DNA strand (i.e., a target strand) that is complementary to the guide nucleic acid, or a portion thereof (e.g., the protospacer of a guide RNA). In other words, the guide nucleic-acid “programs” the napDNAbp (e.g., Cas9 or equivalent) to localize and bind to a complementary sequence.

Without being bound by theory, the binding mechanism of a napDNAbp— guide RNA complex, in general, includes the step of forming an R-loop whereby the napDNAbp induces the unwinding of a double-strand DNA target, thereby separating the strands in the region bound by the napDNAbp. The guideRNA protospacer then hybridizes to the “target strand.” This displaces a “non-target strand” that is complementary to the target strand, which forms the single strand region of the R-loop. In some embodiments, the napDNAbp includes one or more nuclease activities, which then cut the DNA leaving various types of lesions. For example, the napDNAbp may comprises a nuclease activity that cuts the non-target strand at a first location, and/or cuts the target strand at a second location. Depending on the nuclease activity, the target DNA can be cut to form a “double-stranded break” whereby both strands are cut. In other embodiments, the target DNA can be cut at only a single site, i.e., the DNA is “nicked” on one strand. Exemplary napDNAbp with different nuclease activities include “Cas9 nickase” (“nCas9”) and a deactivated Cas9 having no nuclease activities (“dead Cas9” or “dCas9”). Exemplary sequences for these and other napDNAbp are provided herein.

The related term “RNA-programmable nuclease,” and “RNA-guided nuclease” are used interchangeably herein and refer to a nuclease that forms a complex with (e.g., binds or associates with) one or more RNA that is not a target for cleavage (e.g., a Cas9 or homolog or variant thereof). In some embodiments, an RNA-programmable nuclease, when in a complex with an RNA, may be referred to as a nuclease:RNA complex. Typically, the bound RNA(s) is referred to as a guide RNA (gRNA). gRNAs can exist as a complex of two or more RNAs, or as a single RNA molecule. gRNAs that exist as a single RNA molecule may be referred to as single-guide RNAs (sgRNAs), though “gRNA” is used interchangeabley to refer to guide RNAs that exist as either single molecules or as a complex of two or more molecules. Typically, gRNAs that exist as single RNA species comprise two domains: (1) a domain that shares homology to a target nucleic acid (e.g., and directs binding of a Cas9 (or equivalent) complex to the target); and (2) a domain that binds a Cas9 protein. In some embodiments, domain (2) corresponds to a sequence known as a tracrRNA, and comprises a stem-loop structure. For example, in some embodiments, domain (2) is homologous to a tracrRNA as depicted in FIG. 1E of Jinek et al., Science 337:816-821(2012), the entire contents of which is incorporated herein by reference. Other examples of gRNAs (e.g., those including domain 2) can be found in U.S. Provisional Patent Application, U.S. Ser. No. 61/874,682, filed Sep. 6, 2013, entitled “Switchable Cas9 Nucleases And Uses Thereof,” and U.S. Provisional Patent Application, U.S. Ser. No. 61/874,746, filed Sep. 6, 2013, entitled “Delivery System For Functional Nucleases,” the entire contents of each are hereby incorporated by reference in their entirety. In some embodiments, a gRNA comprises two or more of domains (1) and (2), and may be referred to as an “extended gRNA.” For example, an extended gRNA will, e.g., bind two or more Cas9 proteins and bind a target nucleic acid at two or more distinct regions, as described herein. The gRNA comprises a nucleotide sequence that complements a target site, which mediates binding of the nuclease/RNA complex to said target site, providing the sequence specificity of the nuclease:RNA complex. In some embodiments, the RNA-programmable nuclease is the (CRISPR-associated system) Cas9 endonuclease, for example Cas9 (Csn1) from Streptococcus pyogenes (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti J. J., McShan W. M., Ajdic D. J., Savic D. J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A. N., Kenton S., Lai H. S., Lin S. P., Qian Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S. W., Roe B. A., McLaughlin R. E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C. M., Gonzales K., Chao Y., Pirzada Z. A., Eckert M. R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference.

Because RNA-programmable nucleases (e.g., Cas9) use RNA:DNA hybridization to target DNA cleavage sites, these proteins are able to be targeted, in principle, to any sequence specified by the guide RNA. Methods of using RNA-programmable nucleases, such as Cas9, for site-specific cleavage (e.g., to modify a genome) are known in the art (see e.g., Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013); Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823-826 (2013); Hwang, W. Y. et al. Efficient genome editing in zebrafish using a CRISPR-Cas system. Nature biotechnology 31, 227-229 (2013); Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013); Dicarlo, J. E. et al. Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic acids research (2013); Jiang, W. et al. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nature biotechnology 31, 233-239 (2013); the entire contents of each of which are incorporated herein by reference).

Non-Naturally Occurring

The terms “non-naturally occurring” or “engineered” are used interchangeably and indicate the involvement of the hand of man. The terms, when referring to nucleic acid molecules or polypeptides (e.g., Cas9 or single-stranded DNA binding protein) mean that the nucleic acid molecule or the polypeptide is at least substantially free from at least one other component with which they are naturally associated in nature and/or as found in nature (e.g., an amino acid sequence not found in nature).

Nickase

The term “nickase” refers to a napDNAbp (e.g., Cas9) with one of the two nuclease domains inactivated. This enzyme is capable of cleaving only one strand of a target DNA.

Nuclear Localization Sequence (NLS)

The term “nuclear localization sequence” or “NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., International PCT application, PCT/EP2000/011690, filed Nov. 23, 2000, published as WO/2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference for its disclosure of exemplary nuclear localization sequences. In some embodiments, a NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 19), MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 20), KRTADGSEFESPKKKRKV (SEQ ID NO: 21), or KRTADGSEFEPKKKRKV (SEQ ID NO: 22).

Nucleic Acid Molecule

The term “nucleic acid,” as used herein, refers to a polymer of nucleotides. The polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C5 bromouridine, C5 fluorouridine, C5 iodouridine, C5 propynyl uridine, C5 propynyl cytidine, C5 methylcytidine, 7 deazaadenosine, 7 deazaguanosine, 8 oxoadenosine, 8 oxoguanosine, O(6) methylguanine, 4-acetylcytidine, 5-(carboxyhydroxymethyl)uridine, dihydrouridine, methylpseudouridine, 1-methyl adenosine, 1-methyl guanosine, N6-methyl adenosine, and 2-thiocytidine), chemically modified bases, biologically modified bases (e.g., methylated bases), intercalated bases, modified sugars (e.g., 2′-fluororibose, ribose, 2′-deoxyribose, 2′-O-methylcytidine, arabinose, and hexose), or modified phosphate groups (e.g., phosphorothioates and 5′ N phosphoramidite linkages).

Oligonucleotide

As used herein, the terms “oligonucleotide” and “polynucleotide” can be used interchangeably to refer to a polymer of nucleotides (e.g., a string of at least three nucleotides). In some embodiments, “nucleic acid” encompasses RNA as well as single and/or double-stranded DNA. Nucleic acids may be naturally occurring, for example, in the context of a genome, a transcript, an mRNA, tRNA, rRNA, siRNA, snRNA, a plasmid, cosmid, chromosome, chromatid, or other naturally occurring nucleic acid molecule. On the other hand, a nucleic acid molecule may be a non-naturally occurring molecule, e.g., a recombinant DNA or RNA, an artificial chromosome, an engineered genome, or fragment thereof, or a synthetic DNA, RNA, DNA/RNA hybrid, or including non-naturally occurring nucleotides or nucleosides. Furthermore, the terms “nucleic acid,” “DNA,” “RNA,” and/or similar terms include nucleic acid analogs, e.g., analogs having other than a phosphodiester backbone. Nucleic acids can be purified from natural sources, produced using recombinant expression systems and optionally purified, chemically synthesized, etc. Where appropriate, e.g., in the case of chemically synthesized molecules, nucleic acids can comprise nucleoside analogs such as analogs having chemically modified bases or sugars, and backbone modifications. A nucleic acid sequence is presented in the 5′ to 3′ direction unless otherwise indicated. In some embodiments, a nucleic acid is or comprises natural nucleosides (e.g. adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine); nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, 2-aminoadenosine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 2-aminoadenosine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, 0(6)-methylguanine, and 2-thiocytidine); chemically modified bases; biologically modified bases (e.g., methylated bases); intercalated bases; modified sugars (e.g., 2′-fluororibose, ribose, 2′-deoxyribose, arabinose, and hexose); and/or modified phosphate groups (e.g., phosphorothioates and 5′-N-phosphoramidite linkages).

PACE and PANCE

The term “phage-assisted continuous evolution (PACE),” as used herein, refers to continuous evolution that employs phage as viral vectors which may be used to evolve a component of the HDR-dependent genome editors, e.g., a napDNAbp or a single-stranded DNA binding protein (e.g., Rad51). The general concept of PACE technology has been described, for example, in International PCT Application, PCT/US2009/056194, filed Sep. 8, 2009, published as WO 2010/028347 on Mar. 11, 2010; International PCT Application, PCT/US2011/066747, filed Dec. 22, 2011, published as WO 2012/088381 on Jun. 28, 2012; U.S. application, U.S. Pat. No. 9,023,594, issued May 5, 2015, International PCT Application, PCT/US2015/012022, filed Jan. 20, 2015, published as WO 2015/134121 on Sep. 11, 2015, and International PCT Application, PCT/US2016/027795, filed Apr. 15, 2016, published as WO 2016/168631 on Oct. 20, 2016, the entire contents of each of which are incorporated herein by reference.

The term “phage-assisted non-continuous evolution (PANCE),” as used herein, refers to non-continuous evolution that employs phage as viral vectors. PANCE is a simplified technique for rapid in vivo directed evolution using serial flask transfers of evolving ‘selection phage’ (SP), which contain a gene of interest to be evolved, across fresh E. coli host cells, thereby allowing genes inside the host E. coli to be held constant while genes contained in the SP continuously evolve. Serial flask transfers have long served as a widely-accessible approach for laboratory evolution of microbes, and, more recently, analogous approaches have been developed for bacteriophage evolution. The PANCE system features lower stringency than the PACE system.

The term “reference genome editor,” as used herein, refers to the version of an HDR-dependent genome editor or component thereof (e.g., the Cas9 domain or the single-stranded DNA binding protein) that is used as the starting point for a directed evolution process, e.g., PACE, to achieve or obtain an evolved HDR-dependent genome editor. The reference HDR-dependent genome editor (or component thereof) may include naturally-occurring polypeptide sequences (e.g., hRad51). The reference genome editor may also include non-naturally-occurring polypeptide sequences, e.g., genome editors that have one or more changes in the amino acid sequence (e.g., one or more mutated residues, an insertion of one or more amino acids, or a deletion of one or more amino acids relative to a wildtype or canonical polypeptides). In other words, a reference genome editor can comprise genome editor components (e.g., single-stranded DNA binding protein and Cas9) that are naturally occurring (e.g., wildtype human, mouse, rat, horse, or rabbit polypeptide sequences or naturally occurring variants thereof) or they may also include genome editors which have already been modified relative to the naturally-occurring sequences, and which are desired to be further evolved and/or changed and/or improved using a continuous evolution process, e.g., PACE, described herein. Analogous definitions will be observed when referring to the individual components of a genome editor. For example, a “reference Cas9 domain” or a “reference single-stranded DNA binding protein” or other such individual components of a genome editor refers to the version of a that component or domain that is used as the starting point for a continuous evolution process, e.g., PACE, to achieve or obtain an evolved version or variant of that component or domain.

Promoter

The term “promoter” is art-recognized and refers to a nucleic acid molecule with a sequence recognized by the cellular transcription machinery and able to initiate transcription of a downstream gene. A promoter can be constitutively active, meaning that the promoter is always active in a given cellular context, or conditionally active, meaning that the promoter is only active in the presence of a specific condition. For example, a conditional promoter may only be active in the presence of a specific protein that connects a protein associated with a regulatory element in the promoter to the basic transcriptional machinery, or only in the absence of an inhibitory molecule. A subclass of conditionally active promoters are inducible promoters that require the presence of a small molecule “inducer” for activity. Examples of inducible promoters include, but are not limited to, arabinose-inducible promoters, Tet-on promoters, and tamoxifen-inducible promoters. A variety of constitutive, conditional, and inducible promoters are well known to the skilled artisan, and the skilled artisan will be able to ascertain a variety of such promoters useful in carrying out the instant invention, which is not limited in this respect.

Protein, Peptide, and Polypeptide

The terms “protein,” “peptide,” and “polypeptide” are used interchangeably herein, and refer to a polymer of amino acid residues linked together by peptide (amide) bonds. The terms refer to a protein, peptide, or polypeptide of any size, structure, or function. Typically, a protein, peptide, or polypeptide will be at least three amino acids long. A protein, peptide, or polypeptide may refer to an individual protein or a collection of proteins. One or more of the amino acids in a protein, peptide, or polypeptide may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a farnesyl group, an isofarnesyl group, a fatty acid group, a linker for conjugation, functionalization, or other modification, etc. A protein, peptide, or polypeptide may also be a single molecule or may be a multi-molecular complex. A protein, peptide, or polypeptide may be just a fragment of a naturally occurring protein or peptide. A protein, peptide, or polypeptide may be naturally occurring, recombinant, or synthetic, or any combination thereof. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which are incorporated herein by reference.

Protein Splicing

As used herein, the term “protein splicing” refers to a process in which an interior region of a precursor protein (an intein) is excised and the flanking regions of the protein (exteins) are ligated to form the mature protein. This natural process has been observed in numerous proteins from both prokaryotes and eukaryotes (Perler, F. B., Xu, M. Q., Paulus, H. Current Opinion in Chemical Biology 1997, 1, 292-299; Perler, F. B. Nucleic Acids Research 1999, 27, 346-347). The intein unit contains the necessary components needed to catalyze protein splicing and often contains an endonuclease domain that participates in intein mobility (Perler, F. B., Davis, E. O., Dean, G. E., Gimble, F. S., Jack, W. E., Neff, N., Noren, C. J., Thomer, J., Belfort, M. Nucleic Acids Research 1994, 22, 1127-1127). The resulting proteins are linked, however, not expressed as separate proteins. Protein splicing may also be conducted in trans with split inteins expressed on separate polypeptides spontaneously combine to form a single intein which then undergoes the protein splicing process to join to separate proteins.

Protospacer

As used herein, the term “protospacer” refers to the sequence (˜20 bp) in DNA adjacent to the PAM (protospacer adjacent motif) sequence which is complementary to the spacer sequence of the guide RNA. The guide RNA anneals to the protospacer sequence on the target DNA (specifically, one strand thereof, i.e, the “target strand” versus the “non-target strand” of the target sequence). In order for Cas9 to function it also requires a specific protospacer adjacent motif (PAM) that varies depending on the bacterial species of the Cas9 gene. The most commonly used Cas9 nuclease, derived from S. pyogenes, recognizes a PAM sequence of NGG that is found directly downstream of the target sequence in the genomic DNA, on the non-target strand. The skilled person will appreciate that the literature in the state of the art sometimes refers to the “protospacer” as the ˜20-nt target-specific guide sequence on the guide RNA itself, rather than referring to it as a “spacer.” Thus, in some cases, the term “protospacer” as used herein may be used interchangeably with the term “spacer.” The context of the discription surrounding the appearance of either “protospacer” or “spacer” will help inform the reader as to whether the term is refence to the gRNA or the DNA target. Both usages of these terms are acceptable since the state of the art uses both terms in each of these ways.

Protospacer Adjacent Motif (PAM)

As used herein, the term “protospacer adjacent sequence” or “PAM” refers to an approximately 2-6 base pair DNA sequence (or a 2-, 3-, 4-, 5-, 6-, 7-, 8-, 9-, 10-, 11-, 12-long nucleotide sequence) that is an important targeting component of a Cas9 nuclease. Typically, the PAM sequence is on either strand, and is downstream in the 5′ to 3′ direction of Cas9 cut site. The canonical PAM sequence (i.e., the PAM sequence that is associated with the Cas9 nuclease of Streptococcus pyogenes or SpCas9) is 5′-NGG-3′ wherein “N” is any nucleobase followed by two guanine (“G”) nucleobases. Different PAM sequences can be associated with different Cas9 nucleases or equivalent proteins from different organisms. In addition, any given Cas9 nuclease, e.g., SpCas9, may be modified to alter the PAM specificity of the nuclease such that the nuclease recognizes alternative PAM sequence.

For example, with reference to the canonical SpCas9 amino acid sequence is SEQ ID NO: 9, the PAM sequence can be modified by introducing one or more mutations, including (a) D1135V, R1335Q, and T1337R “the VQR variant”, which alters the PAM specificity to NGAN or NGNG, (b) D1135E, R1335Q, and T1337R “the EQR variant”, which alters the PAM specificity to NGAG, and (c) D1135V, G1218R, R1335E, and T1337R “the VRER variant”, which alters the PAM specificity to NGCG. In addition, the D1135E variant of canonical SpCas9 still recognizes NGG, but it is more selective compared to the wild type SpCas9 protein.

It will also be appreciated that Cas9 enzymes from different bacterial species (i.e., Cas9 orthologs) can have varying PAM specificities. For example, Cas9 from Staphylococcus aureus (SaCas9) recognizes NGRRT or NGRRN. In addition, Cas9 from Neisseria meningitis (NmCas) recognizes NNNNGATT. In another example, Cas9 from Streptococcus thermophilis (StCas9) recognizes NNAGAAW. In still another example, Cas9 from Treponema denticola (TdCas) recognizes NAAAAC. These are example are not meant to be limiting. It will be further appreciated that non-SpCas9s bind a variety of PAM sequences, which makes them useful when no suitable SpCas9 PAM sequence is present at the desired target cut site. Furthermore, non-SpCas9s may have other characteristics that make them more useful than SpCas9. For example, Cas9 from Staphylococcus aureus (SaCas9) is about 1 kilobase smaller than SpCas9, so it can be packaged into adeno-associated virus (AAV). Further reference may be made to Shah et al., “Protospacer recognition motifs: mixed identities and functional diversity,” RNA Biology, 10(5): 891-899 (which is incorporated herein by reference).

As used herein, the term “sequence-context agnostic” refers to a desired property or characteristic of the genome editors described herein in which the sequence proximate (upstream and/or downstream) to the desired target editing site has little or no impact or effect on the efficiency of the evolved genome editor to edit the desired target editing site. A small fraction (less than 5%) of the identified intein genes encode split inteins. Unlike the more common contiguous inteins, these are transcribed and translated as two separate polypeptides, the N-intein and C-intein, each fused to one extein. Upon translation, the intein fragments spontaneously and non-covalently assemble into the canonical intein structure to carry out protein splicing in trans.

Spacer Sequence

As used herein, the term “spacer sequence” in connection with a guide RNA refers to the portion of the guide RNA of about 20 nucleotides (or from about 5, 6, 7, 8, 9, 11, 12 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30, or more) which contains a nucleotide sequence that is complementary to the protospacer sequence in the target DNA sequence. The spacer sequence anneals to the protospacer sequence to form a ssRNA/ssDNA hybrid structure at the target site and a corresponding R loop ssDNA structure of the endogenous DNA strand that is complementary to the protospacer sequence.

Subject

The term “subject,” as used herein, refers to an individual organism, for example, an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, a goat, a cattle, a cat, or a dog. In some embodiments, the subject is a vertebrate, an amphibian, a reptile, a fish, an insect, a fly, or a nematode. In some embodiments, the subject is a research animal. In some embodiments, the subject is genetically engineered, e.g., a genetically engineered non-human subject. The subject may be of either sex and at any stage of development.

Target Site

The term “target site” refers to a sequence within a nucleic acid molecule that is edited by a prime editor (PE) disclosed herein. The target site further refers to the sequence within a nucleic acid molecule to which a complex of the prime editor (PE) and gRNA binds.

Transitions

As used herein, “transitions” refer to the interchange of purine nucleobases (A↔G) or the interchange of pyrimidine nucleobases (C↔T). This class of interchanges involves nucleobases of similar shape. The compositions and methods disclosed herein are capable of inducing one or more transitions in a target DNA molecule. The compositions and methods disclosed herein are also capable of inducing both transitions and transversion in the same target DNA molecule. These changes involve A↔G, G↔A, C↔T, or T↔C. In the context of a double-strand DNA with Watson-Crick paired nucleobases, transversions refer to the following base pair exchanges: A:T↔G:C, G:G↔A:T, C:G↔T:A, or T:A↔C:G. The compositions and methods disclosed herein are capable of inducing one or more transitions in a target DNA molecule. The compositions and methods disclosed herein are also capable of inducing both transitions and transversion in the same target DNA molecule, as well as other nucleotide changes, including deletions and insertions.

Transversions

As used herein, “transversions” refer to the interchange of purine nucleobases for pyrimidine nucleobases, or in the reverse and thus, involve the interchange of nucleobases with dissimilar shape. These changes involve T↔A, T↔G, C↔G, C↔A, A↔T, A↔C, G↔C, and G↔T. In the context of a double-strand DNA with Watson-Crick paired nucleobases, transversions refer to the following base pair exchanges: T:A↔A:T, T:A↔G:C, C:G↔G:C, C:G↔A:T, A:T↔T:A, A:T↔C:G, G:C↔C:G, and G:C↔T:A. The compositions and methods disclosed herein are capable of inducing one or more transversions in a target DNA molecule. The compositions and methods disclosed herein are also capable of inducing both transitions and transversion in the same target DNA molecule, as well as other nucleotide changes, including deletions and insertions.

Treatment

The terms “treatment,” “treat,” and “treating,” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. As used herein, the terms “treatment,” “treat,” and “treating” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. In some embodiments, treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed. In other embodiments, treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease. For example, treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their recurrence.

Variant

As used herein the term “variant” should be taken to mean the exhibition of qualities that have a pattern that deviates from what occurs in nature, e.g., a variant Cas9 is a Cas9 comprising one or more changes in amino acid residues as compared to a wild type Cas9 amino acid sequence. The term “variant” encompasses homologous proteins having at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 99% percent identity with a reference sequence and having the same or substantially the same functional activity or activities as the reference sequence. The term also encompasses mutants, trunctations, or domains of a reference sequence, and which display the same or substantially the same functional activity or activities as the reference sequence.

Vector

The term “vector,” as used herein, refers to a nucleic acid that can be modified to encode a gene of interest and that is able to enter into a host cell, mutate and replicate within the host cell, and then transfer a replicated form of the vector into another host cell. Exemplary suitable vectors include viral vectors, such as retroviral vectors or bacteriophages and filamentous phage, and conjugative plasmids. Additional suitable vectors will be apparent to those of skill in the art based on the instant disclosure.

Viral Life Cycle

The term “viral life cycle,” as used herein, refers to the viral reproduction cycle comprising insertion of the viral genome into a host cell, replication of the viral genome in the host cell, and packaging of a replication product of the viral genome into a viral particle by the host cell.

Viral Particle

The term “viral particle,” as used herein, refers to a viral genome, for example, a DNA or RNA genome, that is associated with a coat of a viral protein or proteins, and, in some cases, with an envelope of lipids. For example, a phage particle comprises a phage genome packaged into a protein encoded by the wild type phage genome.

Viral Vector

The term “viral vector,” as used herein, refers to a nucleic acid comprising a viral genome that, when introduced into a suitable host cell, can be replicated and packaged into viral particles able to transfer the viral genome into another host cell. The term viral vector extends to vectors comprising truncated or partial viral genomes. For example, in some embodiments, a viral vector is provided that lacks a gene encoding a protein essential for the generation of infectious viral particles. In suitable host cells, for example, host cells comprising the lacking gene under the control of a conditional promoter, however, such truncated viral vectors can replicate and generate viral particles able to transfer the truncated viral genome into another host cell. In some embodiments, the viral vector is a phage, for example, a filamentous phage (e.g., an M13 phage). In some embodiments, a viral vector, for example, a phage vector, is provided that comprises a gene of interest to be evolved.

Wild Type

As used herein the term “wild type” is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms.

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

The inventors have surprisingly discovered an improved genome editing construct which is capable of editing a target sequence in an HDR-dependent manner (i.e., “HDR-dependent genome editors”) with increased efficiency and reduced indel formation and which does not require a dividing cell. In particular, the inventors surprisingly discovered a new fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) (e.g., Cas9) with nickase activity fused to a single-stranded DNA binding protein (e.g., Rad51) which edits a target DNA, in the presence of a donor sequence, in an HDR-dependent manner with greater efficiency (e.g., increased rate of induced HDR) and/or with a lower rate or occurrence of indel formation. In certain embodiments, the napDNAbp is a nickase Cas9 enzyme, and the singled-stranded DNA binding protein is Rad51. In certain other embodiments, the nickase Cas9 enzyme is a Cas9-D10A variant, e.g., a D10A mutation in the RuvC1 nuclease domain, relative to the wildtype Cas9 sequence—SEQ ID NO:9. In still other embodiments, the nickase Cas9 enzyme is a Cas9-H840A variant, e.g., a H840A mutation in the HNH nuclease domain, relative to the wildtype Cas9 sequence—SEQ ID NO:9.

In addition, the instant specification provides for nucleic acid molecules encoding and/or expressing the improved HDR-dependent genome editors as described herein, as well as expression vectors or constructs for expressing the improved HDR-dependent genome editors described herein, host cells comprising said nucleic acid molecules and expression vectors, and compositions for delivering and/or administering nucleic acid-based embodiments described herein. In addition, the disclosure provides for isolated improved HDR-dependent genome editors, as well as compositions comprising said isolated improved HDR-dependent genome editors as described herein. Still further, the present disclosure provides for methods of making the improved HDR-dependent genome editors, as well as methods of using the improved HDR-dependent genome editors or nucleic acid molecules encoding the improved HDR-dependent genome editors in applications including editing a nucleic acid molecule, e.g., a genome, with improved efficiency, increased HDR induction rate, and reduced indel formation, as compared to prior art genome editors. In embodiments, the method of constructing provided herein is a directed evolution methodology, e.g., a phage-assisted continuous evolution (PACE) system or phage-assisted non-continuous evolutions (PANCE), which may be utilized to evolve one or more components of an HDR-dependent genome editor described herein (e.g., a single-stranded DNA binding protein) in a rapid manner. The specification also provides methods for efficiently editing a target nucleic acid molecule, e.g., a single nucleobase of a genome, with an HDR-dependent genome editor described herein (e.g., in the form of an isolated HDR-dependent genome editor as described herein or a vector or construct encoding same) and conducting base editing, in a manner characterized by a higher rate of HDR and/or a lower rate of indel formation, relative to a control sequence. Still further, the specification provides therapeutic methods for treating a genetic disease and/or for altering or changing a genetic trait or condition by contacting a target nucleic acid molecule, e.g., a genome, with an HDR-dependent genome editor (e.g., in the form of an isolated evolved genome editor protein or a vector encoding same) and conducting base editing to treat the genetic disease and/or change the genetic trait (e.g., eye color) in the presence of a donor sequence.

HDR-Dependent Genome Editors

In various aspects, the specification provides an HDR-dependent genome editor comprising: (i) a nucleic acid programmable DNA binding protein (napDNAbp) (e.g., nCas9), and (ii) a single-stranded DNA binding protein (e.g., Rad51). The nucleic acid programmable DNA binding protein (napDNAbp) can be a nCas9 domain. The napDNAbp can also be a Cpf1, a CasX, a CasY, a C2c1, a C2c2, a C2c3, a GeoCas9, a CjCas9, a Cas12a, a Cas12b, a Cas12g, a Cas12h, a Cas12i, a Cas13b, a Cas13c, a Cas13d, a Cas14, a Csn2, an xCas9, an SpCas9-NG, a circularly permuted Cas9, or an Argonaute (Ago) domain, preferably engineered to have a nickase activity.

In various embodiments of the genome editor, the single-stranded DNA binding protein is a wild-type Rad51 protein, or a variant thereof.

In certain aspects, the methods described herein for evolving the HDR-dependent genome editors (or a component thereof, e.g., a Cas9 or a Rad51 moiety) begins with a genome editor or components known in the art. The state of the art has described numerous genome editors and/or components thereof as of this filing. The methods and approaches herein described for improving genome editors may be applied to any previously known genome editor (or components thereof, e.g., Rad51), or to genome editors that may be developed in the further but which lack the beneficial characteristics imparted by the instant methods and modification approaches. Examplary genome editors that may be modified by the methods described herein to achieve the genome editors of the invention can include, for example, those described in the following references and/or patent publications, each of which are incorporated by reference in their entireties: (a) PCT/US2014/070038 (published as WO2015/089406, on Jun. 18, 2015) and its equivalents in the US or around the world; (b) PCT/US2016/058344 (published as WO2017/070632, on Apr. 27, 2017) and its equivalents in the US or around the world; (c) PCT/US2016/058345 (published as WO2017/070633, on Apr. 27, 2017) and its equivalent in the US or around the world; (d) PCT/US2017/045381 (published as WO2018/027078, on Feb. 8, 2018) and its equivalents in the US or around the world; (e) PCT/US2017/056671 (published as WO2018/071868, on Apr. 19, 2018) and its equivalents in the US or around the world; PCT/2017/048390 (WO2017/048390, on Mar. 23, 2017) and its equivalents in the US or around the world; (f) PCT/US2017/068114 and its equivalents in the US or around the world; (g) PCT/US2017/068105 and its equivalents in the US or around the world; (h) PCT/US2017/046144 (WO2018/031683, on Feb. 15, 2018) and its equivalents in the US or around the world; (i) PCT/US2018/024208 and its equivalents in the US or around the world; (j) PCT/2018/021878 (WO2018/021878, on Feb. 1, 2018) and its equivalents in the US and around the world; (k) Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420-(2016); (1) Gaudelli, N. M. et al. Programmable base editing of A.T to G.C in genomic DNA without DNA cleavage. Nature 551, 464-(2017); (m) any of the references listed in this specification entitled “References,” and which reports or describes a genome editor known in the art.

In various aspects, the HDR-dependent genome editors described herein have the following generalized structure: A-B-C, wherein “A” is a nickase Cas moiety or napDNAbp nickase, “B” is a single-stranded DNA binding protein (e.g., Rad51), and “C” represents an optional additional genome editor functional domain (e.g., an NLS domain). In addition, the “-” represents a linker that covalently joins moieties A, B, and C. The linkers can be any suitable type (e.g., amino acid sequences or other biopolymers, or synthetic chemical linkages in the case where the moieties are bioconjugated to one another) or length. In addition, a functional improved genome editor of the invention could also include one or more “R” or guide sequences (e.g., guide RNA in the case of a Cas9 or Cas9 equivalent) in order to carry out the R/DNA-programmable functionality of genome editors for targeting specific sites to be corrected. In addition, the HDR-dependent genome editors comprise (either covalently attached or separately provided) one or more donor nucleotide sequences (e.g., a double-stranded DNA sequence) comprising the desired genetic change (e.g., a single replacement nucleobase) and regions homologous to the target sequence.

The order of linkage of the moieties is not meant to be particularly limiting so long as the particular arrangement of the elements of moieties produces a functional HDR-dependent genome editor. That is, the HDR-dependent genome editors of the invention may also include editors represented by the following structures: B-A-C; B-C-A; C-B-A; C-A-B; and A-C-B. In various embodiments, the HDR-dependent genome editors may comprise at least one domain of the genome editors (e.g., a nCas9 domain or a Rad51 domain) that has been evolved by a continuous evolution process (e.g., PACE). Thus, in one embodiment, the specification provides an evolved genome editor that comprises an evolved nCas9 domain relative to a reference nCas9 domain, but where the other domains of the genome editor have not been evolved. In another embodiment, the specification provides an evolved genome editor that comprise an evolved single-stranded DNA binding protein (e.g., Rad51), but where the other domains of the genome editor have not been evolved. In still other embodiment, the genome editors may comprise combinations of domains which are evolved by the continuous evolution process described herein.

(A) napDNAbp

In one aspect, the methods and compositions described herein involve a nucleic acid programmable DNA binding protein (napDNAbp). Each napDNAbp is associated with at least one guide nucleic acid (e.g., guide RNA), which localizes the napDNAbp to a DNA sequence that comprises a DNA strand (i.e., a target strand) that is complementary to the guide nucleic acid, or a portion thereof (e.g., the spacer of a guide RNA). In other words, the guide nucleic-acid “programs” the napDNAbp (e.g., Cas9 or equivalent) to localize and bind to a complementary sequence.

Without being bound by theory, the binding mechanism of a napDNAbp-guide RNA complex, in general, includes the step of forming an R-loop whereby the napDNAbp induces the unwinding of a double-strand DNA target, thereby separating the strands in the region bound by the napDNAbp. The guideRNA protospacer then hybridizes to the “target strand.” This displaces a “non-target strand” that is complementary to the target strand, which forms the single strand region of the R-loop. In some embodiments, the napDNAbp includes one or more nuclease activities, which then cut the DNA leaving various types of lesions. For example, the napDNAbp may comprises a nuclease activity that cuts the non-target strand at a first location, and/or cuts the target strand at a second location. Depending on the nuclease activity, the target DNA can be cut to form a “double-stranded break” whereby both strands are cut. In other embodiments, the target DNA can be cut at only a single site, i.e., the DNA is “nicked” on one strand. Exemplary napDNAbp with different nuclease activities include “Cas9 nickase” (“nCas9”) and a deactivated Cas9 having no nuclease activities (“dead Cas9” or “dCas9”).

The below description of various napDNAbps which can be used in connection with the presently disclose HDR-dependent genome editors is not meant to be limiting in any way. The HDR-dependent genome editor may comprise the canonical SpCas9, or any ortholog Cas9 protein, or any variant Cas9 protein—including any naturally occurring variant, mutant, or otherwise engineered version of Cas9—that is known or which can be made or evolved through a directed evolutionary or otherwise mutagenic process. In various embodiments, the Cas9 or Cas9 variants have a nickase activity, i.e., only cleave of strand of the target DNA sequence. In other embodiments, the Cas9 or Cas9 variants have inactive nucleases, i.e., are “dead” Cas9 proteins. Other variant Cas9 proteins that may be used are those having a smaller molecular weight than the canonical SpCas9 (e.g., for easier delivery) or having modified or rearranged primary amino acid structure (e.g., the circular permutant formats). The prime editors described herein may also comprise Cas9 equivalents, including Cas12a/Cpf1 and Cas12b proteins which are the result of convergent evolution. The napDNAbps used herein (e.g., SpCas9, Cas9 variant, or Cas9 equivalents) may also may also contain various modifications that alter/enhance their PAM specifities. Lastly, the application contemplates any Cas9, Cas9 variant, or Cas9 equivalent which has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity to a reference Cas9 sequence, such as a references SpCas9 canonical sequence or a reference Cas9 equivalent (e.g., Cas12a/Cpf1).

The napDNAbp can be a CRISPR (clustered regularly interspaced short palindromic repeat)-associated nuclease. As outlined above, CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M. et al., Science 337:816-821(2012), the entire contents of which is hereby incorporated by reference.

In some embodiments, the napDNAbp directs cleavage of one or both strands at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the napDNAbp directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence. In some embodiments, a vector encodes a napDNAbp that is mutated to with respect to a corresponding wild-type enzyme such that the mutated napDNAbp lacks the ability to cleave one or both strands of a target polynucleotide containing a target sequence. For example, an aspartate-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 from S. pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves a single strand). Other examples of mutations that render Cas9 a nickase include, without limitation, H840A, N854A, and N863A in reference to the canonical SpCas9 sequence, or to equivalent amino acid positions in other Cas9 variants or Cas9 equivalents.

The HDR-dependent genome editors provided by the instant specification include any suitable Cas9 moiety or equivalent protein, such as a CRISPR associated protein 9, or functional fragment thereof, and embraces any naturally occurring Cas9 from any organism, any naturally-occurring Cas9 equivalent or functional fragment thereof, any Cas9 homolog, ortholog, or paralog from any organism, and any mutant or variant of a Cas9, naturally-occurring or engineered. Preferably, the Cas9 moiety is a nickase, i.e., catalyzes the cleavage of only a single strand of double-stranded target DNA.

These Cas9 moieties or equivalent protein may be evolved using a continuous evolution method (e.g., PACE) described herein. Preferably, the Cas9 moieties are configured as a nickase for introducing a nick in a target double-stranded sequence. The genome editors include those in which only the Cas9 moiety is evolved using PACE, or those in which the Cas9 moiety is evolved along with one or more other genome editor domains (e.g., a Rad51 or homolog thereof). The genome editors described herein may also include those fusion proteins in which the Cas9 moiety or domain has not been evolved using PACE, but wherein one or more other HDR-dependent genome editor domains (e.g., a Rad51 or homolog thereof) have been evolved sing PACE.

More broadly, a Cas9 is a type of “RNA-programmable nuclease” or “RNA-guided nuclease” or “nucleic acid programmable DNA-binding protein.” The terms napDNAbp or Cas9 are not meant to be particularly limiting. The present disclosure is unlimited with regard to the particular napDNAbp, Cas9 or Cas9 equivalent that is employed in the genome editors of the invention.

As will be understand in the context of the present disclosure, any Cas9 domain is generally to be regarded as a possible reference polypeptide (i.e., starting point) for processing using the continuous evolution methods (e.g., PACE) described herein. Otherwise, those Cas9 domains which have been evolved using the continuous evolution methods described herein are indicated as such.

In some embodiments, the napDNAbp is a Cas moiety having a nickase activity (nCas9).

In various embodiment, the Cas moiety is a S. pyogenes Cas9, which has been mostly widely used as a tool for genome engineering. This Cas9 protein is a large, multi-domain protein containing two distinct nuclease domains. Point mutations can be introduced into Cas9 to abolish one or both nuclease activities, resulting in a nickase Cas9 (nCas9) or dead Cas9 (dCas9), respectively, that still retains its ability to bind DNA in a sgRNA-programmed manner. In principle, when fused to another protein or domain, Cas9 or variant thereof (e.g., nCas9) can target that protein to virtually any DNA sequence simply by co-expression with an appropriate sgRNA.

In one embodiment, the Cas9 is Cas9 nickase D10A, comprising the following amino acid sequence, or a polypeptide that is at least 75%, or 80%, or 85%, or 90%, or 95%, or 99% identical to:

(SEQ ID NO: 1)

DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALL

FDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEE

SFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLI

YLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASG

VDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNF

DLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILR

VNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNG

YAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGS

IPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNS

RFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKH

SLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVK

QLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENE

DILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSR

KLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSG

QGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARE

NQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQ

NGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSD

NVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKR

QLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDF

QFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKM

IAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEI

VWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARK

KDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSF

EKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNE

LALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEF

SKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYF

DTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD.

In one embodiment, the Cas9 is Cas9 nickase D10A, comprising the following amino acid sequence, or a polypeptide that is at least 75%, or 80%, or 85%, or 90%, or 95%, or 99% identical to:

Accession No. AMQ45845 (Cas9(D10A)nickase)

(SEQ ID NO: 8)

1
MYPYDVPDYA SPKKKRKVEA SDKKYSIGLA IGTNSVGWAV

ITDEYKVPSK KFKVLGNTDR

61
HSIKKNLIGA LLFDSGETAE ATRLKRTARR RYTRRKNRIC

YLQEIFSNEM AKVDDSFFHR

121
LEESFLVEED KKHERHPIFG NIVDEVAYHE KYPTIYHLRK

KLVDSTDKAD LRLIYLALAH

181
MIKFRGHFLI EGDLNPDNSD VDKLFIQLVQ TYNQLFEENP

INASGVDAKA ILSARLSKSR

241
RLENLIAQLP GEKKNGLFGN LIALSLGLTP NFKSNFDLAE

DAKLQLSKDT YDDDLDNLLA

301
QIGDQYADLF LAAKNLSDAI LLSDILRVNT EITKAPLSAS

MIKRYDEHHQ DLTLLKALVR

361
QQLPEKYKEI FFDQSKNGYA GYIDGGASQE EFYKFIKPIL

EKMDGTEELL VKLNREDLLR

421
KQRTFDNGSI PHQIHLGELH AILRRQEDFY PFLKDNREKI

EKILTFRIPY YVGPLARGNS

481
RFAWMTRKSE ETITPWNFEE VVDKGASAQS FIERMTNFDK

NLPNEKVLPK HSLLYEYFTV

541
YNELTKVKYV TEGMRKPAFL SGEQKKAIVD LLFKTNRKVT

VKQLKEDYFK KIECFDSVEI

601
SGVEDRFNAS LGTYHDLLKI IKDKDFLDNE ENEDILEDIV

LTLTLFEDRE MIEERLKTYA

661
HLFDDKVMKQ LKRRRYTGWG RLSRKLINGI RDKQSGKTIL

DFLKSDGFAN RNFMQLIHDD

721
SLTFKEDIQK AQVSGQGDSL HEHIANLAGS PAIKKGILQT

VKVVDELVKV MGRHKPENIV

781
IEMARENQTT QKGQKNSRER MKRIEEGIKE LGSQILKEHP

VENTQLQNEK LYLYYLQNGR

841
DMYVDQELDI NRLSDYDVDH IVPQSFLKDD SIDNKVLTRS

DKNRGKSDNV PSEEVVKKMK

901
NYWRQLLNAK LITQRKFDNL TKAERGGLSE LDKAGFIKRQ

LVETRQITKH VAQILDSRMN

961
TKYDENDKLI REVKVITLKS KLVSDFRKDF QFYKVREINN

YHHAHDAYLN AVVGTALIKK

1021
YPKLESEFVY GDYKVYDVRK MIAKSEQEIG KATAKYFFYS

NIMNFFKTEI TLANGEIRKR

1081
PLIETNGETG EIVWDKGRDF ATVRKVLSMP QVNIVKKTEV

QTGGFSKESI LPKRNSDKLI

1141
ARKKDWDPKK YGGFDSPTVA YSVLVVAKVE KGKSKKLKSV

KELLGITIME RSSFEKNPID

1201
FLEAKGYKEV KKDLIIKLPK YSLFELENGR KRMLASAGEL

QKGNELALPS KYVNFLYLAS

1261
HYEKLKGSPE DNEQKQLFVE QHKHYLDEII EQISEFSKRV

ILADANLDKV LSAYNKHRDK

1321
PIREQAENII HLFTLTNLGA PAAFKYFDTT IDRKRYTSTK

EVLDATLIHQ SITGLYETRI

1381
DLSQLGGDSP KKKRKVEAS

In other embodiments, the Cas moiety is a Cas9 from: Corynebacterium ulcerans (NCBI Refs: NC_015683.1, NC_017317.1); Corynebacterium diphtheria (NCBI Refs: NC_016782.1, NC_016786.1); Spiroplasma syrphidicola (NCBI Ref: NC_021284.1); Prevotella intermedia (NCBI Ref: NC_017861.1); Spiroplasma taiwanense (NCBI Ref: NC_021846.1); Streptococcus iniae (NCBI Ref: NC_021314.1); Belliella baltica (NCBI Ref: NC_018010.1); Psychroflexus torquisl (NCBI Ref: NC_018721.1); Streptococcus thermophilus (NCBI Ref: YP_820832.1); Listeria innocua (NCBI Ref: NP_472073.1); Campylobacter jejuni (NCBI Ref: YP_002344900.1); or Neisseria meningitidis (NCBI Ref: YP_002342100.1), and preferably comprising a nickase mutation (e.g., a mutation corresponding to the D10A mutation of the wild type Cas9 polypeptide of SEQ ID NO: 9).

In still other embodiments, the Cas moiety may include any CRISPR associated protein, including but not limited to, Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas10, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2. Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, homologs thereof, or modified versions thereof, and preferably comprising a nickase mutation (e.g., a mutation corresponding to the D10A mutation of the wild type Cas9 polypeptide of SEQ ID NO: 9). These enzymes are known; for example, the amino acid sequence of S. pyogenes Cas9 protein may be found in the SwissProt database under accession number Q99ZW2 (having the following amino acid sequence).

(SEQ ID NO: 9)

01
MDKKYSIGLD IGTNSVGWAV ITDEYKVPSK KFKVLGNTDR

HSIKKNLIGA LLFDSGETAE

61
ATRLKRTARR RYTRRKNRIC YLQEIFSNEM AKVDDSFFHR

LEESFLVEED KKHERHPIFG

121
NIVDEVAYHE KYPTIYHLRK KLVDSTDKAD LRLIYLALAH

MIKFRGHFLI EGDLNPDNSD

181
VDKLFIQLVQ TYNQLFEENP INASGVDAKA ILSARLSKSR

RLENLIAQLP GEKKNGLFGN

241
LIALSLGLTP NFKSNFDLAE DAKLQLSKDT YDDDLDNLLA

QIGDQYADLF LAAKNLSDAI

301
LLSDILRVNT EITKAPLSAS MIKRYDEHHQ DLTLLKALVR

QQLPEKYKEI FFDQSKNGYA

361
GYIDGGASQE EFYKFIKPIL EKMDGTEELL VKLNREDLLR

KQRTFDNGSI PHQIHLGELH

421
AILRRQEDFY PFLKDNREKI EKILTFRIPY YVGPLARGNS

RFAWMTRKSE ETITPWNFEE

481
VVDKGASAQS FIERMTNFDK NLPNEKVLPK HSLLYEYFTV

YNELTKVKYV TEGMRKPAFL

541
SGEQKKAIVD LLFKTNRKVT VKQLKEDYFK KIECFDSVEI

SGVEDRFNAS LGTYHDLLKI

601
IKDKDFLDNE ENEDILEDIV LTLTLFEDRE MIEERLKTYA

HLFDDKVMKQ LKRRRYTGWG

661
RLSRKLINGI RDKQSGKTIL DFLKSDGFAN RNFMQLIHDD

SLTFKEDIQK AQVSGQGDSL

721
HEHIANLAGS PAIKKGILQT VKVVDELVKV MGRHKPENIV

IEMARENQTT QKGQKNSRER

781
MKRIEEGIKE LGSQILKEHP VENTQLQNEK LYLYYLQNGR

DMYVDQELDI NRLSDYDVDH

841
IVPQSFLKDD SIDNKVLTRS DKNRGKSDNV PSEEVVKKMK

NYWRQLLNAK LITQRKFDNL

901
TKAERGGLSE LDKAGFIKRQ LVETRQITKH VAQILDSRMN

TKYDENDKLI REVKVITLKS

961
KLVSDFRKDF QFYKVREINN YHHAHDAYLN AVVGTALIKK

YPKLESEFVY GDYKVYDVRK

1021
MIAKSEQEIG KATAKYFFYS NIMNFFKTEI TLANGEIRKR

PLIETNGETG EIVWDKGRDF

1081
ATVRKVLSMP QVNIVKKTEV QTGGFSKESI LPKRNSDKLI

ARKKDWDPKK YGGFDSPTVA

1141
YSVLVVAKVE KGKSKKLKSV KELLGITIME RSSFEKNPID

FLEAKGYKEV KKDLIIKLPK

1201
YSLFELENGR KRMLASAGEL QKGNELALPS KYVNFLYLAS

HYEKLKGSPE DNEQKQLFVE

1261
QHKHYLDEII EQISEFSKRV ILADANLDKV LSAYNKHRDK

PIREQAENII HLFTLTNLGA

1321
PAAFKYFDTT IDRKRYTSTK EVLDATLIHQ SITGLYETRI

DLSQLGGD

In some embodiments, the unmodified CRISPR enzyme has DNA cleavage activity, such as Cas9. In some embodiments the CRISPR enzyme is Cas9, and may be Cas9 from S. pyogenes or S. pneumoniae. In some embodiments, the CRISPR enzyme directs cleavage of one or both strands at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the CRISPR enzyme directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence. In some embodiments, a vector encodes a CRISPR enzyme that is mutated to with respect to a corresponding wild-type enzyme such that the mutated CRISPR enzyme lacks the ability to cleave one or both strands of a target polynucleotide containing a target sequence. For example, an aspartate-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 from S. pyogenes converts Cas9 from a nuclease that cleaves both strands to a nickase (cleaves a single strand). Other examples of mutations that render Cas9 a nickase include, without limitation, H840A, N854A, and N863A.

A Cas moiety may also be referred to as a Casn1 nuclease or a CRISPR (clustered regularly interspaced short palindromic repeat)-associated nuclease. As outlined above, CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves linear or circular dsDNA target complementary to the spacer. The target strand not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gNRA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of which is hereby incorporated by reference.

Cas9 and equivalents recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. As noted herein, Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J. J., McShan W. M., Ajdic D. J., Savic D. J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A. N., Kenton S., Lai H. S., Lin S. P., Qian Y., Jia H. G., Najar F. Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S. W., Roe B. A., McLaughlin R. E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C. M., Gonzales K., Chao Y., Pirzada Z. A., Eckert M. R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J. A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference).

The Cas moiety may include any suitable homologs and/or orthologs. Cas9 homologs and/or orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Preferably, the Cas moiety is configured (e.g, mutagenized, recombinantly engineered, or otherwise obtained from nature) as a nickase, i.e., capable of cleaving only a single strand of the target doubpdditional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease has an inactive (e.g., an inactivated) DNA cleavage domain, that is, the Cas9 is a nickase.

In various embodiments, the genome editors may comprise a nuclease-inactivated Cas protein may interchangeably be referred to as a “dCas” or “dCas9” protein (for nuclease-“dead” Cas9). Methods for generating a Cas9 protein (or a fragment thereof) having an inactive DNA cleavage domain are known (See, e.g., Jinek et al., Science. 337:816-821(2012); Qi et al., “Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression” (2013) Cell. 28; 152(5):1173-83, the entire contents of each of which are incorporated herein by reference). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, the HNH nuclease subdomain and the RuvC1 subdomain. The HNH subdomain cleaves the strand complementary to the gRNA, whereas the RuvC1 subdomain cleaves the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas9. For example, the mutations D10A and H840A completely inactivate the nuclease activity of S. pyogenes Cas9 (Jinek et al., Science. 337:816-821(2012); Qi et al., Cell. 28; 152(5):1173-83 (2013)). In some embodiments, proteins comprising fragments of Cas9 are provided. For example, in some embodiments, a protein comprises one of two Cas9 domains: (1) the gRNA binding domain of Cas9; or (2) the DNA cleavage domain of Cas9.

In some embodiments, proteins comprising Cas9 or fragments thereof are referred to as “Cas9 variants.” A Cas9 variant shares homology to Cas9, or a fragment thereof. For example a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to wild type Cas9. In some embodiments, the Cas9 variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more amino acid changes compared to a wild type Cas9. In some embodiments, the Cas9 variant comprises a fragment of Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9. In some embodiments, the fragment is is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9.

In some embodiments, the Cas9 fragment is at least 100 amino acids in length. In some embodiments, the fragment is at least 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, or at least 1300 amino acids in length. In some embodiments, wild type Cas9 corresponds to Cas9 from Streptococcus pyogenes (NCBI Reference Sequence: NC_017053.1). In other embodiments, wild type Cas9 corresponds to Cas9 from Streptococcus pyo genes (NCBI Reference Sequence: NC_002737.2). In still other embodiments, dCas9 corresponds to, or comprises in part or in whole, a Cas9 amino acid sequence having one or more mutations that inactivate the Cas9 nuclease activity.

In some embodiments, the Cas9 domain comprises a nickase D10A mutation, while the residue at position 840 relative to a wild type sequence such as Cas9 from Streptococcus pyogenes (NCBI Reference Sequence: NC_017053.1).

In other embodiments, dCas9 variants having mutations other than D10A and H840A are provided, which, e.g., result in nuclease inactivated Cas9 (dCas9). Such mutations, by way of example, include other amino acid substitutions at D10 and H820, or other substitutions within the nuclease domains of Cas9 (e.g., substitutions in the HNH nuclease subdomain and/or the RuvC1 subdomain) with reference to a wild type sequence such as Cas9 from Streptococcus pyogenes (NCBI Reference Sequence: NC_017053.1). In some embodiments, variants or homologues of dCas9 (e.g., variants of Cas9 from Streptococcus pyogenes (NCBI Reference Sequence: NC_017053.1)) are provided which are at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to NCBI Reference Sequence: NC_017053.1. In some embodiments, variants of dCas9 (e.g., variants of NCBI Reference Sequence: NC_017053.1) are provided having amino acid sequences which are shorter, or longer than NC_017053.1 by about 5 amino acids, by about 10 amino acids, by about 15 amino acids, by about 20 amino acids, by about 25 amino acids, by about 30 amino acids, by about 40 amino acids, by about 50 amino acids, by about 75 amino acids, by about 100 amino acids or more.

In some embodiments, the genome editors as provided herein comprise the full-length amino acid sequence of a Cas9 protein, e.g., one of the Cas9 sequences provided herein. In other embodiments, however, fusion proteins as provided herein do not comprise a full-length Cas9 sequence, but only a fragment thereof. For example, in some embodiments, a Cas9 fusion protein provided herein comprises a Cas9 fragment, wherein the fragment binds crRNA and tracrRNA or sgRNA, but does not comprise a functional nuclease domain, e.g., in that it comprises only a truncated version of a nuclease domain or no nuclease domain at all. Exemplary amino acid sequences of suitable Cas9 domains and Cas9 fragments are provided herein, and additional suitable sequences of Cas9 domains and fragments will be apparent to those of skill in the art.

It should be appreciated that additional Cas9 proteins (e.g., a nuclease dead Cas9 (dCas9), a Cas9 nickase (nCas9), or a nuclease active Cas9), including variants and homologs thereof, are within the scope of this disclosure. Exemplary Cas9 proteins include, without limitation, those provided below. In some embodiments, the Cas9 protein is a nuclease dead Cas9 (dCas9). In some embodiments, the dCas9 comprises the amino acid sequence (SEQ ID NO: 10). In some embodiments, the Cas9 protein is a Cas9 nickase (nCas9).

In certain embodiments, the genome editors of the invention can include a catalytically inactive Cas9 (dCas9) having the following reference sequence:

(SEQ ID NO: 10)

DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALL

FDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEE

SFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLI

YLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASG

VDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNF

DLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILR

VNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNG

YAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGS

IPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNS

RFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKH

SLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVK

QLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENE

DILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSR

KLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSG

QGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARE

NQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQ

NGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSD

NVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKR

QLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDF

QFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKM

IAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEI

VWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARK

KDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSF

EKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNE

LALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEF

SKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYF

DTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD,

or an evolved variant thereof that has been evolved using the continuous evolution process (e.g., PACE) described herein, or modified to include a nickase mutation D10A or H840A.

In other embodiments, the genome editors can comprise a Cas9 nickase (nCas9) that comprises an amino acid sequence that is at least 80%, 85%, 90%, 95%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO: 1, and may be an evolved version thereof.

In still other embodiments, the genome editors can comprise a catalytically active Cas9 that comprises an amino acid sequence that is at least 80%, 85%, 90%, 95%, 98%, or 99% identical to the amino acid sequence of SEQ ID NO: 9.

In some embodiments, a Cas moiety refers to a Cas9 or Cas9 homolog from archaea (e.g. nanoarchaea), which constitute a domain and kingdom of single-celled prokaryotic microbes. In some embodiments, Cas9 refers to CasX or CasY, which have been described in, for example, Burstein et al., “New CRISPR-Cas systems from uncultivated microbes.” Cell Res. 2017 Feb. 21. doi: 10.1038/cr.2017.21, the entire contents of which is hereby incorporated by reference. Using genome-resolved metagenomics, a number of CRISPR-Cas systems were identified, including the first reported Cas9 in the archaeal domain of life. This divergent Cas9 protein was found in little-studied nanoarchaea as part of an active CRISPR-Cas system. In bacteria, two previously unknown systems were discovered, CRISPR-CasX and CRISPR-CasY, which are among the most compact systems yet discovered. In some embodiments, Cas9 refers to CasX, or a variant of CasX. In some embodiments, Cas9 refers to a CasY, or a variant of CasY. It should be appreciated that other RNA-guided DNA binding proteins may be used as a nucleic acid programmable DNA binding protein (napDNAbp), and are within the scope of this disclosure.

In some embodiments, the Cas9 moiety is a nucleic acid programmable DNA binding protein (napDNAbp) of any of the fusion proteins provided herein may be a CasX or CasY protein. In some embodiments, the napDNAbp is a CasX protein. In some embodiments, the napDNAbp is a CasY protein. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring CasX or CasY protein. In some embodiments, the napDNAbp is a naturally-occurring CasX or CasY protein. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a wild-type Cas moiety or any Cas moiety provided herein.

In various embodiments, the nucleic acid programmable DNA binding proteins include, without limitation, a Cas9 (e.g., dCas9 and nCas9), a Cpf1, a CasX, a CasY, a C2c1, a C2c2, a C2c3, a GeoCas9, a CjCas9, a Cas12a, a Cas12b, a Cas12g, a Cas12h, a Cas12i, a Cas13b, a Cas13c, a Cas13d, a Cas14, a Csn2, an xCas9, an SpCas9-NG, a circularly permuted Cas9, or an Argonaute (Ago) domai. One example of a nucleic acid programmable DNA-binding protein that has different PAM specificity than Cas9 is Clustered Regularly Interspaced Short Palindromic Repeats from Prevotella and Francisella 1 (Cpf1). Similar to Cas9, Cpf1 is also a class 2 CRISPR effector. It has been shown that Cpf1 mediates robust DNA interference with features distinct from Cas9. Cpf1 is a single RNA-guided endonuclease lacking tracrRNA, and it utilizes a T-rich protospacer-adjacent motif (TTN, TTTN, or YTN). Moreover, Cpf1 cleaves DNA via a staggered DNA double-stranded break. Out of 16 Cpf1-family proteins, two enzymes from Acidaminococcus and Lachnospiraceae are shown to have efficient genome-editing activity in human cells. Cpf1 proteins are known in the art and have been described previously, for example Yamano et al., “Crystal structure of Cpf1 in complex with guide RNA and target DNA.” Cell (165) 2016, p. 949-962; the entire contents of which is hereby incorporated by reference.

Also useful in the present compositions and methods are nuclease-inactive Cpf1 (dCpf1) variants that may be used as a guide nucleotide sequence-programmable DNA-binding protein domain. The Cpf1 protein has a RuvC-like endonuclease domain that is similar to the RuvC domain of Cas9 but does not have a HNH endonuclease domain, and the N-terminal of Cpf1 does not have the alfa-helical recognition lobe of Cas9. It was shown in Zetsche et al., Cell, 163, 759-771, 2015 (which is incorporated herein by reference) that, the RuvC-like domain of Cpf1 is responsible for cleaving both DNA strands and inactivation of the RuvC-like domain inactivates Cpf1 nuclease activity.

In some embodiments, the nucleic acid programmable DNA binding protein (napDNAbp) is a nucleic acid programmable DNA binding protein that does not require a canonical (NGG) PAM sequence. In some embodiments, the napDNAbp is an argonaute protein. One example of such a nucleic acid programmable DNA binding protein is an Argonaute protein from Natronobacterium gregoryi (NgAgo). NgAgo is a ssDNA-guided endonuclease. NgAgo binds 5′ phosphorylated ssDNA of ˜24 nucleotides (gDNA) to guide it to its target site and will make DNA double-strand breaks at the gDNA site. In contrast to Cas9, the NgAgo—gDNA system does not require a protospacer-adjacent motif (PAM). Using a nuclease inactive NgAgo (dNgAgo) can greatly expand the bases that may be targeted. The characterization and use of NgAgo have been described in Gao et al., Nat Biotechnol., 2016 July; 34(7):768-73. PubMed PMID: 27136078; Swarts et al., Nature. 507(7491) (2014):258-61; and Swarts et al., Nucleic Acids Res. 43(10) (2015):5120-9, each of which is incorporated herein by reference.

In some embodiments, the napDNAbp is a prokaryotic homolog of an Argonaute protein. Prokaryotic homologs of Argonaute proteins are known and have been described, for example, in Makarova K., et al., “Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements”, Biol Direct. 2009 Aug. 25; 4:29. doi: 10.1186/1745-6150-4-29, the entire contents of which is hereby incorporated by reference. In some embodiments, the napDNAbp is a Marinitoga piezophila Argunaute (MpAgo) protein. The CRISPR-associated Marinitoga piezophila Argunaute (MpAgo) protein cleaves single-stranded target sequences using 5′-phosphorylated guides. The 5′ guides are used by all known Argonautes. The crystal structure of an MpAgo-RNA complex shows a guide strand binding site comprising residues that block 5′ phosphate interactions. This data suggests the evolution of an Argonaute subclass with noncanonical specificity for a 5′-hydroxylated guide. See, e.g., Kaya et al., “A bacterial Argonaute with noncanonical guide RNA specificity”, Proc Natl Acad Sci USA. 2016 Apr. 12; 113(15):4057-62, the entire contents of which are hereby incorporated by reference). It should be appreciated that other argonaute proteins may be used, and are within the scope of this disclosure.

In some embodiments, the nucleic acid programmable DNA binding protein (napDNAbp) is a single effector of a microbial CRISPR-Cas system. Single effectors of microbial CRISPR-Cas systems include, without limitation, Cas9, Cpf1, C2c1, C2c2, and C2c3. Typically, microbial CRISPR-Cas systems are divided into Class 1 and Class 2 systems. Class 1 systems have multisubunit effector complexes, while Class 2 systems have a single protein effector. For example, Cas9 and Cpf1 are Class 2 effectors. In addition to Cas9 and Cpf1, three distinct Class 2 CRISPR-Cas systems (C2c1, C2c2, and C2c3) have been described by Shmakov et al., “Discovery and Functional Characterization of Diverse Class 2 CRISPR Cas Systems”, Mol. Cell, 2015 Nov. 5; 60(3): 385-397, the entire contents of which is hereby incorporated by reference. Effectors of two of the systems, C2c1 and C2c3, contain RuvC-like endonuclease domains related to Cpf1. A third system, C2c2 contains an effector with two predicated HEPN RNase domains. Production of mature CRISPR RNA is tracrRNA-independent, unlike production of CRISPR RNA by C2c1. C2c1 depends on both CRISPR RNA and tracrRNA for DNA cleavage. Bacterial C2c2 has been shown to possess a unique RNase activity for CRISPR RNA maturation distinct from its RNA-activated single-stranded RNA degradation activity. These RNase functions are different from each other and from the CRISPR RNA-processing behavior of Cpf1. See, e.g., East-Seletsky, et al., “Two distinct RNase activities of CRISPR-C2c2 enable guide-RNA processing and RNA detection”, Nature, 2016 Oct. 13; 538(7624):270-273, the entire contents of which are hereby incorporated by reference. In vitro biochemical analysis of C2c2 in Leptotrichia shahii has shown that C2c2 is guided by a single CRISPR RNA and can be programed to cleave ssRNA targets carrying complementary protospacers. Catalytic residues in the two conserved HEPN domains mediate cleavage. Mutations in the catalytic residues generate catalytically inactive RNA-binding proteins. See e.g., Abudayyeh et al., “C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector”, Science, 2016 Aug. 5; 353(6299), the entire contents of which are hereby incorporated by reference.

The crystal structure of Alicyclobaccillus acidoterrastris C2c1 (AacC2c1) has been reported in complex with a chimeric single-molecule guide RNA (sgRNA). See e.g., Liu et al., “C2c1-sgRNA Complex Structure Reveals RNA-Guided DNA Cleavage Mechanism”, Mol. Cell, 2017 Jan. 19; 65(2):310-322, the entire contents of which are hereby incorporated by reference. The crystal structure has also been reported in Alicyclobacillus acidoterrestris C2c1 bound to target DNAs as ternary complexes. See e.g., Yang et al., “PAM-dependent Target DNA Recognition and Cleavage by C2C1 CRISPR-Cas endonuclease”, Cell, 2016 Dec. 15; 167(7):1814-1828, the entire contents of which are hereby incorporated by reference. Catalytically competent conformations of AacC2c1, both with target and non-target DNA strands, have been captured independently positioned within a single RuvC catalytic pocket, with C2c1-mediated cleavage resulting in a staggered seven-nucleotide break of target DNA. Structural comparisons between C2c1 ternary complexes and previously identified Cas9 and Cpf1 counterparts demonstrate the diversity of mechanisms used by CRISPR-Cas9 systems.

In some embodiments, the nucleic acid programmable DNA binding protein (napDNAbp) of any of the fusion proteins provided herein may be a C2c1, a C2c2, or a C2c3 protein. In some embodiments, the napDNAbp is a C2c1 protein. In some embodiments, the napDNAbp is a C2c2 protein. In some embodiments, the napDNAbp is a C2c3 protein. In some embodiments, the napDNAbp comprises an amino acid sequence that is at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to a naturally-occurring C2c1, C2c2, or C2c3 protein. In some embodiments, the napDNAbp is a naturally-occurring C2c1, C2c2, or C2c3 protein.

Some aspects of the disclosure provide Cas9 domains that have different PAM specificities. Typically, Cas9 proteins, such as Cas9 from S. pyogenes (spCas9), require a canonical NGG PAM sequence to bind a particular nucleic acid region. This may limit the ability to edit desired bases within a genome. In some embodiments, the base editing fusion proteins provided herein may need to be placed at a precise location, for example where a target base is placed within a 4 base region (e.g., a “editing window”), which is approximately 15 bases upstream of the PAM. See Komor, A. C., et al., “Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage” Nature 533, 420-424 (2016), the entire contents of which are hereby incorporated by reference. Accordingly, in some embodiments, any of the fusion proteins provided herein may contain a Cas9 domain that is capable of binding a nucleotide sequence that does not contain a canonical (e.g., NGG) PAM sequence. Cas9 domains that bind to non-canonical PAM sequences have been described in the art and would be apparent to the skilled artisan. For example, Cas9 domains that bind non-canonical PAM sequences have been described in Kleinstiver, B. P., et al., “Engineered CRISPR-Cas9 nucleases with altered PAM specificities” Nature 523, 481-485 (2015); and Kleinstiver, B. P., et al., “Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition” Nature Biotechnology 33, 1293-1298 (2015); the entire contents of each are hereby incorporated by reference.

In some embodiments, the Cas9 domain is a Cas9 domain from Staphylococcus aureus (SaCas9). In some embodiments, the SaCas9 domain is a nuclease active SaCas9, a nuclease inactive SaCas9 (SaCas9d), or a SaCas9 nickase (SaCas9n).

(B) Single-Stranded DNA Binding Protein (e. g., Rad51)

In various embodiments, the improved HDR-dependent editing constructs comprise a Cas9 domain (e.g., a nickase Cas9) fused to a single-stranded DNA binding protein (e.g., Rad51).

In various embodiments, the single-stranded DNA binding protein is Rad51, or homolog or functional variant thereof. As will be understood, Rad51 is involved in the repair of DNA double-strand breaks and also play important roles in recombination repair and various SOS responses to DNA damage by γ-irradiation and alkylating reagents. Rad51 also plays a role in several cellular processes, including genomic integrity, cell cycle regulation, apoptosis and tumor formation. Rad51 is widely conserved amongst organisms, including, human, Rhesus monkey, dog, cow, mouse, rat, chicken, zebrafish, fruit fly, mosquito, C. elegans, S. cerevisiae, K. lactis, E. gossypii, S. pombe, M. oryzae, N. crassa, A. thaliana, rice, and frog. Rad51 equivalently may refer to an accepted alias recognized in the art, including BRCC5, FANCR, HRAD51, HsRad51, HsT16930, MRMV2A, and RecA.

The following Rad51 amino acid sequences are contemplated by the instantly disclosed improved HDR-dependent editing constructs:

NP_001157741.1 (human Rad51, isoform 1)

(SEQ ID NO: 13)

MAMQMQLEANADTSVEEESFGPQPISRLEQCGINANDVKKLEEAGFHTVEA

VAYAPKKELINIKGISEAKADKILTESRSVARLECNSVILVYCTLRLSGSS

DSPASASRVVGTTGGIETGSITEMFGEFRTGKTQICHTLAVTCQLPIDRGG

GEGKAMYIDTEGTFRPERLLAVAERYGLSGSDVLDNVAYARAFNTDHQTQL

LYQASAMMVESRYALLIVDSATALYRTDYSGRGELSARQMHLARFLRMLLR

LADEFGVAVVITNQVVAQVDGAAMFAADPKKPIGGNIIAHASTTRLYLRKG

RGETRICKIYDSPCLPEAEAMFAINADGVGDAKD

NP_001157742 (human Rad51, isoform 3)

(SEQ ID NO: 14)

MAMQMQLEANADTSVEEESFGPQPISRLEQCGINANDVKKLEEAGFHTVEA

VAYAPKKELINIKGISEAKADKILAEAAKLVPMGFTTATEFHQRRSEIIQI

TTGSKELDKLLQGGIETGSITEMFGEFRTGKTQICHTLAVTCQLPIDRGGG

EGKAMYIDTEGTFRPERLLAVAERYGLSGSDVLDNVAYARAFNTDHQTQLL

YQASAMMVESRYALLIVDSATALYRTDYSGRGELSARQMHLARFLRMLLRL

ADEIVSEERKRGNQNLQNLRLSLSS

XP_003314669 (chimpanzee Rad51)

(SEQ ID NO: 15)

MAMQMQLEANADTSVEEESFGPQPISRLEQCGINANDVKKLEEAGFHTVEA

VAYAPKKELINIKGISEAKADKILAEAAKLVPMGFTTATEFHQRRSEIIQI

TTGSKELDKLLQGGIETGSITEMFGEFRTGKTQICHTLAVTCQLPIDRGGG

EGKAMYIDTEGTFRPERLLAVAERYGLSGSDVLDNVAYARAFNTDHQTQLL

YQASAMMVESRYALLIVDSATALYRTDYSGRGELSARQMHLARFLRMLLRL

ADEIVSEERKRGNQNLQNLRLSLSS

NP_033040 (mouse Rad51)

(SEQ ID NO: 16)

MSSKKLRRVGLSPELCDRLSRYQIVNCQHFLSLSPLELMKVTGLSYRGVHE

LLHTVSKACAPQMQTAYELKTRRSAHLSPAFLSTTLCALDEALHGGVPCGS

LTEITGPPGCGKTQFCIMMSVLATLPTSLGGLEGAVVYIDTESAFTAERLV

EIAESRFPQYFNTEEKLLLTSSRVHLCRELTCEGLLQRLESLEEEIISKGV

KLVIVDSIASVVRKEFDPKLQGNIKERNKFLGKGASLLKYLAGEFSIPVIL

TNQITTHLSGALPSQADLVSPADDLSLSEGTSGSSCLVAALGNTWGHCVNT

RLILQYLDSERRQILIAKSPLAAFTSFVYTIKGEGLVLQGHERP

NP_001075493 (rabbit Rad51)

(SEQ ID NO: 17)

MAMQMQLEANADTSVEEESFGPQPVSRLEQCGINANDVKKLEEAGFHTEEA

VAYAPKKELINIKGISEAKADKILTEAAKLVPMGFTTATEFHQRRSEIIQI

TTGSKELDKLLQGGIETGSITEMFGEFRTGKTQICHTLAVTCQLPIDRGGG

EGKAMYIDTEGTFRPERLLAVAERYGLSGSDVLDNVAYARGFNTDHQTQLL

YQASAMMVESRYALLIVDSATALYRTDYSGRGELSARQMHLARFLRMLLRL

ADEFGVTVVITNQVVAQVDGAAMFAADPKKPIGGNIIAHASTTRLYLRKGR

GETRICKIYDSPCLPEAEAMFAINADGVGDAKD

XP_026671085 (bee Rad51)

(SEQ ID NO: 18)

MSALPPTVQLEENEEVEEQNHVRLIKTLEGNGITAGDIKKLEEAGYYTIES

VAYATKKCLLTIKGISEAKADKILQEASKLVVMGFKSANEIHQIRLNIVFI

STGSTEMDRLLGGGIETGSITEIFGEFASGKTQICHTLAVNCQLPIDMGGA

EGKCLYIDTEGTFRPERLIAVAERYKIKGDSVLDNVACARAYNTDHQTKLV

VQASAMMAESRYALLIVDSATGLYRSEYNGRGELAARQTHLNKFLRMLLRV

ADEHGVAVVITNQVVAQVDGAASMFGGDTKKPIGGHILAHSSTTRLYLRKG

RRETRICKIYDSPCLPESEAMFAINGDGIGDVKE.

In various embodiments, the single-stranded DNA binding protein comprises an amino acid sequence that is at least 80%, 85%, 90%, 95%, 98%, or 99% identical to the amino acid sequence of any one of SEQ ID NOs: 13-18 provided above. In certain aspects, a variant of the single-stranded DNA binding protein (e.g., hRad51) is produced by evolving a starting-point single-stranded DNA binding protein (e.g., hRad51) using a directed evolution methodology. In certain embodiments, the directed evolution methodology comprises phage assisted continuous evolution (PACE).

In various embodiments, the additional genome editor functional domain (or nucleobase modification moieties) may be any protein, enzyme, or polypeptide (or functional fragment thereof) which is capable of modifying a DNA or RNA molecule. Nucleobase modification moieties can be naturally occurring, or can be recombinant. For example, a nucleobase modification moiety can include one or more DNA repair enzymes, for example, and an enzyme or protein involved in base excision repair (BER), nucleotide excision repair (NER), homology-dependnent recombinational repair (HR), non-homologous end-joining repair (NHEJ), microhomology end-joining repair (MMEJ), mismatch repair (MMR), direct reversal repair, or other known DNA repair pathway. A nucleobase modification moiety can have one or more types of enzymatic activities, including, but not limited to endonuclease activity, polymerase activity, ligase activity, replication activity, proofreading activity. Nucleobase modification moieties can also include DNA or RNA-modifying enzymes and/or mutagenic enzymes which covalently modify nucleobases leading in some cases to mutagenic corrections by way of normal cellular DNA repair and replication processes. The “nucleic acid effector domain” (e.g., a DNA effector domain or an RNA effector domain) as used herein may also refer to a protein or enzyme capable of making one or more modifications (e.g., alkylation of a guanine residue) to a nucleic acid (e.g., DNA or RNA). Exemplary nucleic acid editing domains include, but are not limited to a nuclease, a nickase, a recombinase, a methylase, an acetylase, an acetyltransferase, a transcriptional activator, or a transcriptional repressor domain.

Some exemplary suitable nucleic-acid editing domains that can be fused to Cas9 domains or to the double-stranded DNA binding proteins according to aspects of this disclosure are provided below. It should be understood that, in some embodiments, the active domain of the respective sequence can be used, e.g., the domain without a localizing signal (nuclear localization sequence, without nuclear export signal, cytoplasmic localizing signal).

In various embodiments, the improved HDR-dependent genome editors disclosed herein further comprise one or more, preferably at least two nuclear localization signals. In a preferred embodiment, the genome editors comprise at least two NLSs. In embodiments with at least two NLSs, the NLSs can be the same NLSs or they can be different NLSs. In addition, the NLSs may be expressed as part of a fusion protein with the remaining portions of the genome editors. The location of the NLS fusion can be at the N-terminus, the C-terminus, or within a sequence of a genome editor (e.g., inserted between the encoded napDNAbp component (e.g., Cas9) and a DNA effector moiety (e.g., a single-stranded DNA binding protein).

The NLSs may be any known NLS sequence in the art. The NLSs may also be any future-discovered NLSs for nuclear localization. The NLSs also may be any naturally-occurring NLS, or any non-naturally occurring NLS (e.g., an NLS with one or more desired mutations).

A nuclear localization signal or sequence (NLS) is an amino acid sequence that tags, designates, or otherwise marks a protein for import into the cell nucleus by nuclear transport. Typically, this signal consists of one or more short sequences of positively charged lysines or arginines exposed on the protein surface. Different nuclear localized proteins may share the same NLS. An NLS has the opposite function of a nuclear export signal (NES), which targets proteins out of the nucleus. A nuclear localization signal can also target the exterior surface of a cell. Thus, a single nuclear localization signal can direct the entity with which it is associated to the exterior of a cell and to the nucleus of a cell. Such sequences can be of any size and composition, for example more than 25, 25, 15, 12, 10, 8, 7, 6, 5 or 4 amino acids, but will preferably comprise at least a four to eight amino acid sequence known to function as a nuclear localization signal (NLS).

The term “nuclear localization sequence” or “NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., international PCT application, PCT/EP2000/011690, filed Nov. 23, 2000, published as WO 2001/038547, on May 31, 2001, the contents of which are incorporated herein by reference for their disclosure of exemplary nuclear localization sequences. In some embodiments, a NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 19), MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 20), KRTADGSEFESPKKKRKV (SEQ ID NO: 21), or KRTADGSEFEPKKKRKV (SEQ ID NO: 22).

In one aspect of the invention, an improved HDR-dependent genome editor described herein may be modified with one or more nuclear localization signals (NLS), preferably at least two NLSs. In preferred embodiments, the genome editors are modified with two or more NLSs. The invention contemplates the use of any nuclear localization signal known in the art at the time of the invention, or any nuclear localization signal that is identified or otherwise made available in the state of the art after the time of the instant filing. A representative nuclear localization signal is a peptide sequence that directs the protein to the nucleus of the cell in which the sequence is expressed. A nuclear localization signal is predominantly basic, can be positioned almost anywhere in a protein's amino acid sequence, generally comprises a short sequence of four amino acids (Autieri & Agrawal, (1998) J. Biol. Chem. 273: 14731-37, incorporated herein by reference) to eight amino acids, and is typically rich in lysine and arginine residues (Magin et al., (2000) Virology 274: 11-16, incorporated herein by reference). Nuclear localization signals often comprise proline residues. A variety of nuclear localization signals have been identified and have been used to effect transport of biological molecules from the cytoplasm to the nucleus of a cell. See, e.g., Tinland et al., (1992) Proc. Natl. Acad. Sci. U.S.A. 89:7442-46; Moede et al., (1999) FEBS Leff 461:229-34, which is incorporated by reference. Translocation is currently thought to involve nuclear pore proteins.

Most NLSs can be classified in three general groups: (i) a monopartite NLS exemplified by the SV40 large T antigen NLS (PKKKRKV SEQ ID NO: 19); (ii) a bipartite motif consisting of two basic domains separated by a variable number of spacer amino acids and exemplified by the Xenopus nucleoplasmin NLS (KRXXXXXXXXXXKKKL SEQ ID NO: 11); and (iii) noncanonical sequences such as M9 of the hnRNP Al protein, the influenza virus nucleoprotein NLS, and the yeast Gal4 protein NLS (Dingwall and Laskey 1991).

Nuclear localization signals appear at various points in the amino acid sequences of proteins. NLS's have been identified at the N-terminus, the C-terminus and in the central region of proteins. Thus, the specification provides genome editors that may be modified with one or more NLSs at the C-terminus, the N-terminus, as well as at in internal regaion of the genome editor. The residues of a longer sequence that do not function as component NLS residues should be selected so as not to interfere, for example tonically or sterically, with the nuclear localization signal itself. Therefore, although there are no strict limits on the composition of an NLS-comprising sequence, in practice, such a sequence can be functionally limited in length and composition.

The present disclosure contemplates any suitable means by which to modify a genome editor to include one or more NLSs. In one aspect, the genome editors can be engineered to express a genome editor protein that is translationally fused at its N-terminus or its C-terminus (or both) to one or more NLSs, i.e., to form a genome editor-NLS fusion construct. In other embodiments, the genome editor-encoding nucleotide sequence can be genetically modified to incorporate a reading frame that encodes one or more NLSs in an internal region of the encoded genome editor. In addition, the NLSs may include various amino acid linkers or spacer regions encoded between the genome editor and the N-terminally, C-terminally, or internally-attached NLS amino acid sequence, e.g, and in the central region of proteins. Thus, the present disclosure also provides for nucleotide constructs, vectors, and host cells for expressing fusion proteins that comprise a genome editor and one or more NLSs.

The improved HDR-dependent genome editors described herein may also comprise nuclear localization signals which are linked to a genome editor through one or more linkers, e.g., and polymeric, amino acid, nucleic acid, polysaccharide, chemical, or nucleic acid linker element. The linkers within the contemplated scope of the disclosure are not intented to have any limitations and can be any suitable type of molecule (e.g., polymer, amino acid, polysaccharide, nucleic acid, lipid, or any synthetic chemical linker moiety) and be joined to the genome editor by any suitable strategy that effectuates forming a bond (e.g., covalent linkage, hydrogen bonding) between the genome editor and the one or more NLSs.

The improved HDR-dependent genome editors described herein also may include one or more additional functionalities. In certain embodiments, the additional functionalities may include an effector of base repair.

In certain embodiments, the improved HDR-dependent genome editors described herein may comprise an inhibitor of base repair. The term “inhibitor of base repair” or “IBR” refers to a protein that is capable in inhibiting the activity of a nucleic acid repair enzyme, for example a base excision repair enzyme. In some embodiments, the IBR is an inhibitor of OGG base excision repair. In some embodiments, the IBR is an inhibitor of alkylation lesion repair enzyme (“ALRE”) base excision repair. Exemplary inhibitors of base repair include inhibitors of APE1, Endo III, Endo IV, Endo V, Endo VIII, Fpg, hOGG1, hNEIL1, T7 Endol, T4PDG, UDG, hSMUG1, and hAAG. In some embodiments, the IBR is an inhibitor of Endo V or hAAG. In some embodiments, the IBR is a catalytically inactive EndoV or a catalytically inactive hAAG.

In some embodiments, the improved HDR-dependent genome editors described herein may comprise one or more heterologous protein domains (e.g., about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains in addition to the genome editor components). A genome editor may comprise any additional protein sequence, and optionally a linker sequence between any two domains. Examples of protein domains that may be fused to a genome editor or component thereof (e.g., the napDNAbp moiety, the nucleic acid effector moiety, or the NLS moeity) include, without limitation, epitope tags, reporter gene sequences, and protein domains having one or more of the following activities: methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, RNA cleavage activity and nucleic acid binding activity. Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags. Examples of reporter genes include, but are not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT) beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP). A genome editor may be fused to a gene sequence encoding a protein or a fragment of a protein that bind DNA molecules or bind other cellular molecules, including but not limited to maltose binding protein (MBP), S-tag, Lex A DNA binding domain (DBD) fusions, GAL4 DNA binding domain fusions, and herpes simplex virus (HSV) BP16 protein fusions. Additional domains that may form part of a fusion protein comprising a genome editor are described in US20110059502, incorporated herein by reference. In some embodiments, a tagged genome editor is used to identify the location of a target sequence.

In an aspect of the invention, a reporter gene which includes but is not limited to glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT) beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP), may be introduced into a cell to encode a gene product which serves as a marker by which to measure the alteration or modification of expression of the gene product. In a further embodiment of the invention, the DNA molecule encoding the gene product may be introduced into the cell via a vector. In a preferred embodiment of the invention the gene product is luciferase. In a further embodiment of the invention the expression of the gene product is decreased.

Other exemplary features that may be present are localization sequences, such as cytoplasmic localization sequences, export sequences, such as nuclear export sequences, or other localization sequences, as well as sequence tags that are useful for solubilization, purification, or detection of the fusion proteins. Suitable protein tags provided herein include, but are not limited to, biotin carboxylase carrier protein (BCCP) tags, myc-tags, calmodulin-tags, FLAG-tags, hemagglutinin (HA)-tags, polyhistidine tags, also referred to as histidine tags or His-tags, maltose binding protein (MBP)-tags, nus-tags, glutathione-S-transferase (GST)-tags, green fluorescent protein (GFP)-tags, thioredoxin-tags, S-tags, Softags (e.g., Softag 1, Softag 3), strep-tags, biotin ligase tags, FlAsH tags, V5 tags, and SBP-tags. Additional suitable sequences will be apparent to those of skill in the art. In some embodiments, the fusion protein comprises one or more His tags.

(D) Intein Domains

In various embodiments described herein, the improved HDR-dependent genome editors, or any particular component thereof, can be engineered with one or more inteins or split-inteins, which can be utilized for a variety of purposes, such as inactivation/activation states of the fusion protein, or as part of the directed evolution process for evolving the genome editors or one or more components thereof (e.g., a Cas9 or a Rad51). For example, the continuous evolution methods (e.g., PACE) may be used to evolve a first portion of a genome editor. A first portion could include a single component or domain, e.g., a Cas9 domain, a Rad51 domain, or an additional nucleic acid effector domain. The separately evolved component or domain can be then fused to the remaining portions of the genome editor within a cell by separately express both the evolved portion and the remaining non-evolved portions with split-intein polypeptide domains. The first portion could more broadly include any first amino acid portion of a genome editor that is desired to be evolved using a continuous evolution method described herein. The second portion would in this embodiment refer to the remaining amino acid portion of the genome editor that is not evolved using the herein methods. The evolved first portion and the second portion of the genome editor could each be expressed with split-intein polypeptide domains in a cell. The natural protein splicing mechanisms of the cell would reassemble the evolved first portion and the non-evolved second portion to form a single fusion protein evolved genome editor. The evolved first portion may comprise either the N- or C-terminal part of the single fusion protein. In an analogous manner, use of a second orthogonal trans-splicing intein pair could allow the evolved first portion to comprise an internal part of the single fusion protein.

Thus, any of the evolved and non-evolved components of the genome editors herein described may be expressed with split-intein tags in order to facilitate the formation of a complete genome editor comprising the evolved and non-evolved component within a cell.

The mechanism of the protein splicing process has been studied in great detail (Chong, et al., J. Biol. Chem. 1996, 271, 22159-22168; Xu, M-Q & Perler, F. B. EMBO Journal, 1996, 15, 5146-5153) and conserved amino acids have been found at the intein and extein splicing points (Xu, et al., EMBO Journal, 1994, 13 5517-522). The constructs described herein contain an intein sequence fused to the 5′-terminus of the first gene (e.g., the evolved portion of the genome editor). Suitable intein sequences can be selected from any of the proteins known to contain protein splicing elements. A database containing all known inteins can be found on the World Wide Web (Perler, F. B. Nucleic Acids Research, 1999, 27, 346-347). The intein sequence is fused at the 3′ end to the 5′ end of a second gene. For targeting of this gene to a certain organelle, a peptide signal can be fused to the coding sequence of the gene. After the second gene, the intein-gene sequence can be repeated as often as desired for expression of multiple proteins in the same cell. For multi-intein containing constructs, it may be useful to use intein elements from different sources. After the sequence of the last gene to be expressed, a transcription termination sequence must be inserted. In one embodiment, a modified intein splicing unit is designed so that it can both catalyze excision of the exteins from the inteins as well as prevent ligation of the exteins. Mutagenesis of the C-terminal extein junction in the Pyrococcus species GB-D DNA polymerase was found to produce an altered splicing element that induces cleavage of exteins and inteins but prevents subsequent ligation of the exteins (Xu, M-Q & Perler, F. B. EMBO Journal, 1996, 15, 5146-5153). Mutation of serine 538 to either an alanine or glycine induced cleavage but prevented ligation. Mutation of equivalent residues in other intein splicing units should also prevent extein ligation due to the conservation of amino acids at the C-terminal extein junction to the intein. A preferred intein not containing an endonuclease domain is the Mycobacterium xenopi GyrA protein (Telenti, et al. J. Bacteriol. 1997, 179, 6378-6382). Others have been found in nature or have been created artificially by removing the endonuclease domains from endonuclease containing inteins (Chong, et al. J. Biol. Chem. 1997, 272, 15587-15590). In a preferred embodiment, the intein is selected so that it consists of the minimal number of amino acids needed to perform the splicing function, such as the intein from the Mycobacterium xenopi GyrA protein (Telenti, A., et al., J. Bacteriol. 1997, 179, 6378-6382). In an alternative embodiment, an intein without endonuclease activity is selected, such as the intein from the Mycobacterium xenopi GyrA protein or the Saccharaomyces cerevisiae VMA intein that has been modified to remove endonuclease domains (Chong, 1997). Further modification of the intein splicing unit may allow the reaction rate of the cleavage reaction to be altered allowing protein dosage to be controlled by simply modifying the gene sequence of the splicing unit.

Inteins can also exist as two fragments encoded by two separately transcribed and translated genes. These so-called split inteins self-associate and catalyze protein-splicing activity in trans. Split inteins have been identified in diverse cyanobacteria and archaea (Caspi et al, Mol Microbiol. 50: 1569-1577 (2003); Choi J. et al, J Mol Biol. 556: 1093-1106 (2006.); Dassa B. et al, Biochemistry. 46:322-330 (2007.); Liu X. and Yang J., J Biol Chem. 275:26315-26318 (2003); Wu H. et al. Proc Natl Acad Sci USA. £5:9226-9231 (1998.); and Zettler J. et al, FEBS Letters. 553:909-914 (2009)), but have not been found in eukaryotes thus far. Recently, a bioinformatic analysis of environmental metagenomic data revealed 26 different loci with a novel genomic arrangement. At each locus, a conserved enzyme coding region is interrupted by a split intein, with a freestanding endonuclease gene inserted between the sections coding for intein subdomains. Among them, five loci were completely assembled: DNA helicases (gp41-1, gp41-8); Inosine-5′-monophosphate dehydrogenase (IMPDH-1); and Ribonucleotide reductase catalytic subunits (NrdA-2 and NrdJ-1). This fractured gene organization appears to be present mainly in phages (Dassa et al, Nucleic Acids Research. 57:2560-2573 (2009)).

The split intein Npu DnaE was characterized as having the highest rate reported for the protein trans-splicing reaction. In addition, the Npu DnaE protein splicing reaction is considered robust and high-yielding with respect to different extein sequences, temperatures from 6 to 37° C., and the presence of up to 6M Urea (Zettler J. et al, FEBS Letters. 553:909-914 (2009); Iwai I. et al, FEBS Letters 550: 1853-1858 (2006)). As expected, when the Cysl Ala mutation at the N-domain of these inteins was introduced, the initial N to S-acyl shift and therefore protein splicing was blocked. Unfortunately, the C-terminal cleavage reaction was also almost completely inhibited. The dependence of the asparagine cyclization at the C-terminal splice junction on the acyl shift at the N-terminal scissile peptide bond seems to be a unique property common to the naturally split DnaE intein alleles (Zettler J. et al. FEBS Letters. 555:909-914 (2009)).

The mechanism of protein splicing typically has four steps: 1) an N—S or N—O acyl shift at the intein N-terminus, which breaks the upstream peptide bond and forms an ester bond between the N-extein and the side chain of the intein's first amino acid (Cys or Ser); 2) a transesterification relocating the N-extein to the intein C-terminus, forming a new ester bond linking the N-extein to the side chain of the C-extein's first amino acid (Cys, Ser, or Thr); 3) Asn cyclization breaking the peptide bond between the intein and the C-extein; and 4) a S—N or O—N acyl shift that replaces the ester bond with a peptide bond between the N-extein and C-extein.

Protein trans-splicing, catalyzed by split inteins, provides an entirely enzymatic method for protein ligation. A split-intein is essentially a contiguous intein (e.g. a mini-intein) split into two pieces named N-intein and C-intein, respectively. The N-intein and C-intein of a split intein can associate non-covalently to form an active intein and catalyze the splicing reaction essentially in same way as a contiguous intein does. Split inteins have been found in nature and also engineered in laboratories. As used herein, the term “split intein” refers to any intein in which one or more peptide bond breaks exists between the N-terminal and C-terminal amino acid sequences such that the N-terminal and C-terminal sequences become separate molecules that can non-covalently reassociate, or reconstitute, into an intein that is functional for trans-splicing reactions. Any catalytically active intein, or fragment thereof, may be used to derive a split intein for use in the methods of the invention. For example, in one aspect the split intein may be derived from a eukaryotic intein. In another aspect, the split intein may be derived from a bacterial intein. In another aspect, the split intein may be derived from an archaeal intein. Preferably, the split intein so-derived will possess only the amino acid sequences essential for catalyzing trans-splicing reactions.

As used herein, the “N-terminal split intein (In)” refers to any intein sequence that comprises an N-terminal amino acid sequence that is functional for trans-splicing reactions. An In thus also comprises a sequence that is spliced out when trans-splicing occurs. An In can comprise a sequence that is a modification of the N-terminal portion of a naturally occurring intein sequence. For example, an In can comprise additional amino acid residues and/or mutated residues so long as the inclusion of such additional and/or mutated residues does not render the In non-functional in trans-splicing. Preferably, the inclusion of the additional and/or mutated residues improves or enhances the trans-splicing activity of the In.

As used herein, the “C-terminal split intein (Ic)” refers to any intein sequence that comprises a C-terminal amino acid sequence that is functional for trans-splicing reactions. In one aspect, the Ic comprises 4 to 7 contiguous amino acid residues, at least 4 amino acids of which are from the last β-strand of the intein from which it was derived. An Ic thus also comprises a sequence that is spliced out when trans-splicing occurs. An Ic can comprise a sequence that is a modification of the C-terminal portion of a naturally occurring intein sequence. For example, an Ic can comprise additional amino acid residues and/or mutated residues so long as the inclusion of such additional and/or mutated residues does not render the In non-functional in trans-splicing. Preferably, the inclusion of the additional and/or mutated residues improves or enhances the trans-splicing activity of the Ic.

In some embodiments of the invention, a peptide linked to an Ic or an In can comprise an additional chemical moiety including, among others, fluorescence groups, biotin, polyethylene glycol (PEG), amino acid analogs, unnatural amino acids, phosphate groups, glycosyl groups, radioisotope labels, and pharmaceutical molecules. In other embodiments, a peptide linked to an Ic can comprise one or more chemically reactive groups including, among others, ketone, aldehyde, Cys residues and Lys residues. The N-intein and C-intein of a split intein can associate non-covalently to form an active intein and catalyze the splicing reaction when an “intein-splicing polypeptide (ISP)” is present. As used herein, “intein-splicing polypeptide (ISP)” refers to the portion of the amino acid sequence of a split intein that remains when the Ic, In, or both, are removed from the split intein. In certain embodiments, the In comprises the ISP. In another embodiment, the Ic comprises the ISP. In yet another embodiment, the ISP is a separate peptide that is not covalently linked to In or to Ic.

Split inteins may be created from contiguous inteins by engineering one or more split sites in the unstructured loop or intervening amino acid sequence between the −12 conserved beta-strands found in the structure of mini-inteins. Some flexibility in the position of the split site within regions between the beta-strands may exist, provided that creation of the split will not disrupt the structure of the intein, the structured beta-strands in particular, to a sufficient degree that protein splicing activity is lost.

In protein trans-splicing, one precursor protein consists of an N-extein part followed by the N-intein, another precursor protein consists of the C-intein followed by a C-extein part, and a trans-splicing reaction (catalyzed by the N- and C-inteins together) excises the two intein sequences and links the two extein sequences with a peptide bond. Protein trans-splicing, being an enzymatic reaction, can work with very low (e.g., micromolar) concentrations of proteins and can be carried out under physiological conditions.

(E) Guide RNA

In various embodiments, the improved HDR-dependent genome editors can be complexed, bound, or otherwise associated with (e.g., via any type of covalent or non-covalent bond) one or more guide sequences, i.e., the sequence which becomes associated or bound to the genome editor and directs its localization to a specific target sequence having complementarity to the guide sequence or a portion thereof. The particular design aspects of a guide sequence will depend upon the nucleotide sequence of a genomic target site of interest (i.e., the desired site to be edited) and the type of napDNAbp (e.g., type of Cas protein) present in the genome editor, among other factors, such as PAM sequence locations, percent G/C content in the target sequence, the degree of microhomology regions, secondary structures, etc.

In general, a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a napDNAbp (e.g., a Cas9, Cas9 homolog, or Cas9 variant) to the target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). In some embodiments, a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length.

In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length. The ability of a guide sequence to direct sequence-specific binding of a genome editor to a target sequence may be assessed by any suitable assay. For example, the components of a genome editor, including the guide sequence to be tested, may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of a genome editor disclosed herein, followed by an assessment of preferential cleavage within the target sequence, such as by Surveyor assay as described herein. Similarly, cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a genome editor, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible, and will occur to those skilled in the art.

A guide sequence may be selected to target any target sequence. In some embodiments, the target sequence is a sequence within a genome of a cell. Exemplary target sequences include those that are unique in the target genome. For example, for the S. pyogenes Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGG (SEQ ID NO: 12) where NNNNNNNNNNNNXGG (SEQ ID NO: 46) (N is A, G, T, or C; and X can be anything) has a single occurrence in the genome. A unique target sequence in a genome may include an S. pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXGG (SEQ ID NO: 23) where NNNNNNNNNNNXGG (SEQ ID NO: 24) (N is A, G, T, or C; and X can be anything) has a single occurrence in the genome. For the S. thermophilus CRISPR1Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXXAGAAW (SEQ ID NO: 25) where NNNNNNNNNNNNXXAGAAW (SEQ ID NO: 26) (N is A, G, T, or C; X can be anything; and W is A or T) has a single occurrence in the genome. A unique target sequence in a genome may include an S. thermophilus CRISPR 1 Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXXAGAAW (SEQ ID NO: 27) where NNNNNNNNNNNXXAGAAW (SEQ ID NO: 28) (N is A, G, T, or C; X can be anything; and W is A or T) has a single occurrence in the genome. For the S. pyogenes Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGGXG (SEQ ID NO: 29) where NNNNNNNNNNNNXGGXG (SEQ ID NO: 30) (N is A, G, T, or C; and X can be anything) has a single occurrence in the genome. A unique target sequence in a genome may include an S. pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXGGXG (SEQ ID NO: 31) where NNNNNNNNNNNXGGXG (SEQ ID NO: 32) (N is A, G, T, or C; and X can be anything) has a single occurrence in the genome. In each of these sequences “M” may be A, G, T, or C, and need not be considered in identifying a sequence as unique.

In some embodiments, a guide sequence is selected to reduce the degree of secondary structure within the guide sequence. Secondary structure may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker and Stiegler (Nucleic Acids Res. 9 (1981), 133-148). Another example folding algorithm is the online webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see e.g. A. R. Gruber et al., 2008, Cell 106(1): 23-24; and PA Carr and GM Church, 2009, Nature Biotechnology 27(12): 1151-62). Further algorithms may be found in U.S. application Ser. No. 61/836,080; Broad Reference BI-2013/004A); incorporated herein by reference.

In general, a tracr mate sequence includes any sequence that has sufficient complementarity with a tracr sequence to promote one or more of: (1) excision of a guide sequence flanked by tracr mate sequences in a cell containing the corresponding tracr sequence; and (2) formation of a complex at a target sequence, wherein the complex comprises the tracr mate sequence hybridized to the tracr sequence. In general, degree of complementarity is with reference to the optimal alignment of the tracr mate sequence and tracr sequence, along the length of the shorter of the two sequences. Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the tracr sequence or tracr mate sequence. In some embodiments, the degree of complementarity between the tracr sequence and tracr mate sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, the tracr sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length. In some embodiments, the tracr sequence and tracr mate sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a hairpin. Preferred loop forming sequences for use in hairpin structures are four nucleotides in length, and most preferably have the sequence GAAA. However, longer or shorter loop sequences may be used, as may alternative sequences. The sequences preferably include a nucleotide triplet (for example, AAA), and an additional nucleotide (for example C or G). Examples of loop forming sequences include CAAA and AAAG. In an embodiment of the invention, the transcript or transcribed polynucleotide sequence has at least two or more hairpins. In preferred embodiments, the transcript has two, three, four or five hairpins. In a further embodiment of the invention, the transcript has at most five hairpins. In some embodiments, the single transcript further includes a transcription termination sequence; preferably this is a polyT sequence, for example six T nucleotides. Further non-limiting examples of single polynucleotides comprising a guide sequence, a tracr mate sequence, and a tracr sequence are as follows (listed 5′ to 3′), where “N” represents a base of a guide sequence, the first block of lower case letters represent the tracr mate sequence, and the second block of lower case letters represent the tracr sequence, and the final poly-T sequence represents the transcription terminator:

(1)

(SEQ ID NO: 33)

NNNNNNNNgtttttgtactctcaagatttaGAAAtaaatcttgcagaagct

acaaagataaggcttcatgccgaaatcaacaccctgtcattttatggcagg

gtgttttcgttatttaaTTTTTT;

(2)

(SEQ ID NO: 34)

NNNNNNNNNNNNNNNNNNgtttttgtactctcaGAAAtgcagaagctacaa

agataaggcttcatgccgaaatcaacaccctgtcattttatggcagggtgt

tttcgttatttaaTTTTTT;

(3)

(SEQ ID NO: 35)

NNNNNNNNNNNNNNNNNNNNgtttttgtactctcaGAAAtgcagaagctac

aaagataaggcttcatgccgaaatca acaccctgtcattttatggcaggg

tgtTTTTT;

(4)

(SEQ ID NO: 36)

NNNNNNNNNNNNNNNNNNNNgttttagagctaGAAAtagcaagttaaaata

aggctagtccgttatcaacttgaaaa agtggcaccgagtcggtgcTTTTT

T;

(5)

(SEQ ID NO: 37)

NNNNNNNNNNNNNNNNNNNNgttttagagctaGAAATAGcaagttaaaata

aggctagtccgttatcaacttgaaaaagtgTTTTTTT;

and

(6)

(SEQ ID NO: 38)

NNNNNNNNNNNNNNNNNNNNgttttagagctagAAATAGcaagttaaaata

aggctagtccgttatcaTTTTTTTT

In some embodiments, sequences (1) to (3) are used in combination with Cas9 from S. thermophilus CRISPR1. In some embodiments, sequences (4) to (6) are used in combination with Cas9 from S. pyogenes. In some embodiments, the tracr sequence is a separate transcript from a transcript comprising the tracr mate sequence.

It will be apparent to those of skill in the art that in order to target any of the fusion proteins comprising a Cas9 domain and a single-stranded DNA binding protein, as disclosed herein, to a target site, e.g., a site comprising a point mutation to be edited, it is typically necessary to co-express the fusion protein together with a guide RNA, e.g., an sgRNA. As explained in more detail elsewhere herein, a guide RNA typically comprises a tracrRNA framework allowing for Cas9 binding, and a guide sequence, which confers sequence specificity to the Cas9:nucleic acid editing enzyme/domain fusion protein.

In some embodiments, the guide RNA comprises a structure 5′-[guide sequence]-guuuuagagcuagaaauagcaaguuaaaauaaggcuaguccguuaucaacuugaaaaaguggcaccgagucggugcuuu uu-3′ (SEQ ID NO: 39), wherein the guide sequence comprises a sequence that is complementary to the target sequence. The guide sequence is typically 20 nucleotides long. The sequences of suitable guide RNAs for targeting Cas9:nucleic acid editing enzyme/domain fusion proteins to specific genomic target sites will be apparent to those of skill in the art based on the instant disclosure. Such suitable guide RNA sequences typically comprise guide sequences that are complementary to a nucleic sequence within 50 nucleotides upstream or downstream of the target nucleotide to be edited. Some exemplary guide RNA sequences suitable for targeting any of the provided fusion proteins to specific target sequences are provided herein. Additional guide sequences are are well known in the art and can be used with the genome editors described herein.

(F) Linkers

In certain embodiments, linkers may be used to link any of the peptides or peptide domains or moieties of the invention (e.g., moiety A covalently linked to moiety B which is covalently linked to moiety C).

As defined above, the term “linker,” as used herein, refers to a chemical group or a molecule linking two molecules or moieties, e.g., a binding domain and a cleavage domain of a nuclease. In some embodiments, a linker joins a gRNA binding domain of an RNA-programmable nuclease and the catalytic domain of a recombinase. In some embodiments, a linker joins a dCas9 and single-stranded DNA binding protein (e.g., Rad51). Typically, the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.

The linker may be as simple as a covalent bond, or it may be a polymeric linker many atoms in length. In certain embodiments, the linker is a polypeptide or based on amino acids. In other embodiments, the linker is not peptide-like. In certain embodiments, the linker is a covalent bond (e.g., a carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.). In certain embodiments, the linker is a carbon-nitrogen bond of an amide linkage. In certain embodiments, the linker is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic or heteroaliphatic linker. In certain embodiments, the linker is polymeric (e.g., polyethylene, polyethylene glycol, polyamide, polyester, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminoalkanoic acid. In certain embodiments, the linker comprises an aminoalkanoic acid (e.g., glycine, ethanoic acid, alanine, beta-alanine, 3-aminopropanoic acid, 4-aminobutanoic acid, 5-pentanoic acid, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminohexanoic acid (Ahx). In certain embodiments, the linker is based on a carbocyclic moiety (e.g., cyclopentane, cyclohexane). In other embodiments, the linker comprises a polyethylene glycol moiety (PEG). In other embodiments, the linker comprises amino acids. In certain embodiments, the linker comprises a peptide. In certain embodiments, the linker comprises an aryl or heteroaryl moiety. In certain embodiments, the linker is based on a phenyl ring. The linker may included funtionalized moieties to facilitate attachment of a nucleophile (e.g., thiol, amino) from the peptide to the linker. Any electrophile may be used as part of the linker. Exemplary electrophiles include, but are not limited to, activated esters, activated amides, Michael acceptors, alkyl halides, aryl halides, acyl halides, and isothiocyanates.

In some other embodiments, the linker comprises the amino acid sequence (GGGGS)n (SEQ ID NO: 40), (G)n (SEQ ID NO: 41), (EAAAK)n (SEQ ID NO: 42), (GGS)n (SEQ ID NO: 43), (SGGS)n (SEQ ID NO: 44), (XP)n (SEQ ID NO: 45), or any combination thereof, wherein n is independently an integer between 1 and 30, and wherein X is any amino acid. In some embodiments, the linker comprises the amino acid sequence (GGS)n (SEQ ID NO: 43), wherein n is 1, 3, or 7. In some embodiments, the linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 6). In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 2). In some embodiments, the linker comprises the amino acid sequence SGGSGGSGGS (SEQ ID NO: 3). In some embodiments, the linker comprises the amino acid sequence SGGS (SEQ ID NO: 7).

In various embodiments, the HDR-dependent editor fusion proteins described herein can comprise any of the following structures: NH₂-[napDNAbp]-[SSDBP]—COOH; NH₂—[SSDBP]-[napDNAbp]-COOH; NH₂-[napDNAbp]-[Rad51]-COOH; NH₂-[Rad51]-[napDNAbp]-COOH; NH₂-[nCas9]-[Rad51]-COOH; NH₂-[Rad51]-[nCas9]-COOH; and wherein each instance of “]-[” comprises an optional linker.

Some aspects of this disclosure provide methods of making the improved HDR-dependent genome editors disclosed herein, or improved HDR-dependent genome editor complexes comprising one or more napDNAbp-programming nucleic acid molecules (e.g., Cas9 guide RNAs) and a single-stranded DNA binding protein provided herein. In addition, some aspects of the disclosure provide methods of using the improved HDR-dependent genome editors for editing a target nucleotide sequence (e.g., a genome).

II. Directed Evolution Methods (e.g., PACE or PANCE)

Various aspects of the disclosure relate to providing directed evolution methods and systems (e.g., appropriate vectors, cells, phage, flow vessels, etc.) for making the improved HDR-dependent genome editors described herein.

The directed evolution methods provided herein allow for a gene of interest (e.g., an improved HDR-dependent genome editor gene) in a viral vector to be evolved over multiple generations of viral life cycles in a flow of host cells to acquire a desired function or activity.

Some aspects of this invention provide a method of continuous evolution of a gene of interest, comprising (a) contacting a population of host cells with a population of viral vectors comprising the gene of interest, wherein (1) the host cell is amenable to infection by the viral vector; (2) the host cell expresses viral genes required for the generation of viral particles; (3) the expression of at least one viral gene required for the production of an infectious viral particle is dependent on a function of the gene of interest; and (4) the viral vector allows for expression of the protein in the host cell, and can be replicated and packaged into a viral particle by the host cell. In some embodiments, the method comprises (b) contacting the host cells with a mutagen. In some embodiments, the method further comprises (c) incubating the population of host cells under conditions allowing for viral replication and the production of viral particles, wherein host cells are removed from the host cell population, and fresh, uninfected host cells are introduced into the population of host cells, thus replenishing the population of host cells and creating a flow of host cells. The cells are incubated in all embodiments under conditions allowing for the gene of interest to acquire a mutation. In some embodiments, the method further comprises (d) isolating a mutated version of the viral vector, encoding an evolved gene product (e.g., protein), from the population of host cells.

In some embodiments, a method of phage-assisted continuous evolution is provided comprising (a) contacting a population of bacterial host cells with a population of phages that comprise a gene of interest to be evolved and that are deficient in a gene required for the generation of infectious phage, wherein (1) the phage allows for expression of the gene of interest in the host cells; (2) the host cells are suitable host cells for phage infection, replication, and packaging; and (3) the host cells comprise an expression construct encoding the gene required for the generation of infectious phage, wherein expression of the gene is dependent on a function of a gene product of the gene of interest. In some embodiments the method further comprises (b) incubating the population of host cells under conditions allowing for the mutation of the gene of interest, the production of infectious phage, and the infection of host cells with phage, wherein infected cells are removed from the population of host cells, and wherein the population of host cells is replenished with fresh host cells that have not been infected by the phage. In some embodiments, the method further comprises (c) isolating a mutated phage replication product encoding an evolved protein from the population of host cells.

In some embodiments, the viral vector or the phage is a filamentous phage, for example, an M13 phage, such as an M13 selection phage as described in more detail elsewhere herein. In some such embodiments, the gene required for the production of infectious viral particles is the M13 gene III (gIII).

In some embodiments, the viral vector infects mammalian cells. In some embodiments, the viral vector is a retroviral vector. In some embodiments, the viral vector is a vesicular stomatitis virus (VSV) vector. As a dsRNA virus, VSV has a high mutation rate, and can carry cargo, including a gene of interest, of up to 4.5 kb in length. The generation of infectious VSV particles requires the envelope protein VSV-G, a viral glycoprotein that mediates phosphatidylserine attachment and cell entry. VSV can infect a broad spectrum of host cells, including mammalian and insect cells. VSV is therefore a highly suitable vector for continuous evolution in human, mouse, or insect host cells. Similarly, other retroviral vectors that can be pseudotyped with VSV-G envelope protein are equally suitable for continuous evolution processes as described herein.

It is known to those of skill in the art that many retroviral vectors, for example, Murine Leukemia Virus vectors, or Lentiviral vectors can efficiently be packaged with VSV-G envelope protein as a substitute for the virus's native envelope protein. In some embodiments, such VSV-G packagable vectors are adapted for use in a continuous evolution system in that the native envelope (env) protein (e.g., VSV-G in VSVS vectors, or env in MLV vectors) is deleted from the viral genome, and a gene of interest is inserted into the viral genome under the control of a promoter that is active in the desired host cells. The host cells, in turn, express the VSV-G protein, another env protein suitable for vector pseudotyping, or the viral vector's native env protein, under the control of a promoter the activity of which is dependent on an activity of a product encoded by the gene of interest, so that a viral vector with a mutation leadinG to T increased activity of the gene of interest will be packaged with higher efficiency than a vector with baseline or a loss-of-function mutation.

In some embodiments, mammalian host cells are subjected to infection by a continuously evolving population of viral vectors, for example, VSV vectors comprising a gene of interest and lacking the VSV-G encoding gene, wherein the host cells comprise a gene encoding the VSV-G protein under the control of a conditional promoter. Such retrovirus-bases system could be a two-vector system (the viral vector and an expression construct comprising a gene encoding the envelope protein), or, alternatively, a helper virus can be employed, for example, a VSV helper virus. A helper virus typically comprises a truncated viral genome deficient of structural elements required to package the genome into viral particles, but including viral genes encoding proteins required for viral genome processing in the host cell, and for the generation of viral particles. In such embodiments, the viral vector-based system could be a three-vector system (the viral vector, the expression construct comprising the envelope protein driven by a conditional promoter, and the helper virus comprising viral functions required for viral genome propagation but not the envelope protein). In some embodiments, expression of the five genes of the VSV genome from a helper virus or expression construct in the host cells, allows for production of infectious viral particles carrying a gene of interest, indicating that unbalanced gene expression permits viral replication at a reduced rate, suggesting that reduced expression of VSV-G would indeed serve as a limiting step in efficient viral production.

One advantage of using a helper virus is that the viral vector can be deficient in genes encoding proteins or other functions provided by the helper virus, and can, accordingly, carry a longer gene of interest. In some embodiments, the helper virus does not express an envelope protein, because expression of a viral envelope protein is known to reduce the infectability of host cells by some viral vectors via receptor interference. Viral vectors, for example retroviral vectors, suitable for continuous evolution processes, their respective envelope proteins, and helper viruses for such vectors, are well known to those of skill in the art. For an overview of some exemplary viral genomes, helper viruses, host cells, and envelope proteins suitable for continuous evolution procedures as described herein, see Coffin et al., Retroviruses, CSHL Press 1997, ISBN0-87969-571-4, incorporated herein in its entirety.

In some embodiments, the incubating of the host cells is for a time sufficient for at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 200, at least 300, at least 400, at least, 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1250, at least 1500, at least 1750, at least 2000, at least 2500, at least 3000, at least 4000, at least 5000, at least 7500, at least 10000, or more consecutive viral life cycles. In certain embodiments, the viral vector is an M13 phage, and the length of a single viral life cycle is about 10-20 minutes.

In some embodiments, the cells are contacted and/or incubated in suspension culture. For example, in some embodiments, bacterial cells are incubated in suspension culture in liquid culture media. Suitable culture media for bacterial suspension culture will be apparent to those of skill in the art, and the invention is not limited in this regard. See, for example, Molecular Cloning: A Laboratory Manual, 2nd Ed., ed. by Sambrook, Fritsch, and Maniatis (Cold Spring Harbor Laboratory Press: 1989); Elizabeth Kutter and Alexander Sulakvelidze: Bacteriophages: Biology and Applications. CRC Press; 1st edition (December 2004), ISBN: 0849313368; Martha R. J. Clokie and Andrew M. Kropinski: Bacteriophages: Methods and Protocols, Volume 1: Isolation, Characterization, and Interactions (Methods in Molecular Biology) Humana Press; 1st edition (December, 2008), ISBN: 1588296822; Martha R. J. Clokie and Andrew M. Kropinski: Bacteriophages: Methods and Protocols, Volume 2: Molecular and Applied Aspects (Methods in Molecular Biology) Humana Press; 1st edition (December 2008), ISBN: 1603275649; all of which are incorporated herein in their entirety by reference for disclosure of suitable culture media for bacterial host cell culture). Suspension culture typically requires the culture media to be agitated, either continuously or intermittently. This is achieved, in some embodiments, by agitating or stirring the vessel comprising the host cell population. In some embodiments, the outflow of host cells and the inflow of fresh host cells is sufficient to maintain the host cells in suspension. This in particular, if the flow rate of cells into and/or out of the lagoon is high.

In some embodiments, a viral vector/host cell combination is chosen in which the life cycle of the viral vector is significantly shorter than the average time between cell divisions of the host cell. Average cell division times and viral vector life cycle times are well known in the art for many cell types and vectors, allowing those of skill in the art to ascertain such host cell/vector combinations. In certain embodiments, host cells are being removed from the population of host cells contacted with the viral vector at a rate that results in the average time of a host cell remaining in the host cell population before being removed to be shorter than the average time between cell divisions of the host cells, but to be longer than the average life cycle of the viral vector employed. The result of this is that the host cells, on average, do not have sufficient time to proliferate during their time in the host cell population while the viral vectors do have sufficient time to infect a host cell, replicate in the host cell, and generate new viral particles during the time a host cell remains in the cell population. This assures that the only replicating nucleic acid in the host cell population is the viral vector, and that the host cell genome, the accessory plasmid, or any other nucleic acid constructs cannot acquire mutations allowing for escape from the selective pressure imposed.

For example, in some embodiments, the average time a host cell remains in the host cell population is about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 70, about 80, about 90, about 100, about 120, about 150, or about 180 minutes.

In some embodiments, the average time a host cell remains in the host cell population depends on how fast the host cells divide and how long infection (or conjugation) requires. In general, the flow rate should be faster than the average time required for cell division, but slow enough to allow viral (or conjugative) propagation. The former will vary, for example, with the media type, and can be delayed by adding cell division inhibitor antibiotics (FtsZ inhibitors in E. coli, etc.). Since the limiting step in continuous evolution is production of the protein required for gene transfer from cell to cell, the flow rate at which the vector washes out will depend on the current activity of the gene(s) of interest. In some embodiments, titratable production of the protein required for the generation of infectious particles, as described herein, can mitigate this problem. In some embodiments, an indicator of phage infection allows computer-controlled optimization of the flow rate for the current activity level in real-time.

III. Apparatus for Continued Evolution (e.g., PACE)

The invention also provides apparatuses for continuous evolution of a nucleic acid. The core element of such an apparatus is a lagoon allowing for the generation of a flow of host cells in which a population of viral vectors can replicate and propagate. In some embodiments, the lagoon comprises a cell culture vessel comprising an actively replicating population of viral vectors, for example, phage vectors comprising a gene of interest, and a population of host cells, for example, bacterial host cells. In some embodiments, the lagoon comprises an inflow for the introduction of fresh host cells into the lagoon and an outflow for the removal of host cells from the lagoon. In some embodiments, the inflow is connected to a turbidostat comprising a culture of fresh host cells. In some embodiments, the outflow is connected to a waste vessel, or a sink. In some embodiments, the lagoon further comprises an inflow for the introduction of a mutagen into the lagoon. In some embodiments that inflow is connected to a vessel holding a solution of the mutagen. In some embodiments, the lagoon comprises an inflow for the introduction of an inducer of gene expression into the lagoon, for example, of an inducer activating an inducible promoter within the host cells that drives expression of a gene promoting mutagenesis (e.g., as part of a mutagenesis plasmid), as described in more detail elsewhere herein. In some embodiments, that inflow is connected to a vessel comprising a solution of the inducer, for example, a solution of arabinose.

In some embodiments, the lagoon comprises a population of viral vectors. In some embodiments, the lagoon comprises a population of viral vectors. In some embodiments, the viral vectors are phage, for example, M13 phages deficient in a gene required for the generation of infectious viral particles as described herein. In some such embodiments, the host cells are prokaryotic cells amenable to phage infection, replication, and propagation of phage, for example, host cells comprising an accessory plasmid comprising a gene required for the generation of infectious viral particles under the control of a conditional promoter as described herein.

In some embodiments, the lagoon comprises a controller for regulation of the inflow and outflow rates of the host cells, the inflow of the mutagen, and/or the inflow of the inducer. In some embodiments, a visual indicator of phage presence, for example, a fluorescent marker, is tracked and used to govern the flow rate, keeping the total infected population constant. In some embodiments, the visual marker is a fluorescent protein encoded by the phage genome, or an enzyme encoded by the phage genome that, once expressed in the host cells, results in a visually detectable change in the host cells. In some embodiments, the visual tracking of infected cells is used to adjust a flow rate to keep the system flowing as fast as possible without risk of vector washout.

In some embodiments, the expression of the gene required for the generation of infectious particles is titratable. In some embodiments, this is accomplished with an accessory plasmid producing pIII proportional to the amount of anhydrotetracycline added to the lagoon. Other In some embodiments, such a titrable expression construct can be combined with another accessory plasmid as described herein, allowing simultaneous selection for activity and titratable control of pIII. This permits the evolution of activities too weak to otherwise survive in the lagoon, as well as allowing neutral drift to escape local fitness peak traps. In some embodiments, negative selection is applied during a continuous evolution method as described herein, by penalizing undesired activities. In some embodiments, this is achieved by causing the undesired activity to interfere with pIII production. For example, expression of an antisense RNA complementary to the gIII RBS and/or start codon is one way of applying negative selection, while expressing a protease (e.g., TEV) and engineering the protease recognition sites into pIII is another.

In some embodiments, the apparatus comprises a turbidostat. In some embodiments, the turbidostat comprises a cell culture vessel in which the population of fresh host cells is situated, for example, in liquid suspension culture. In some embodiments, the turbidostat comprises an outflow that is connected to an inflow of the lagoon, allowing the introduction of fresh cells from the turbidostat into the lagoon. In some embodiments, the turbidostat comprises an inflow for the introduction of fresh culture media into the turbidostat. In some embodiments, the inflow is connected to a vessel comprising sterile culture media. In some embodiments, the turbidostat further comprises an outflow for the removal of host cells from the turbidostat. In some embodiments, that outflow is connected to a waste vessel or drain.

In some embodiments, the turbidostat comprises a turbidity meter for measuring the turbidity of the culture of fresh host cells in the turbidostat. In some embodiments, the turbidostat comprises a controller that regulated the inflow of sterile liquid media and the outflow into the waste vessel based on the turbidity of the culture liquid in the turbidostat.

In some embodiments, the lagoon and/or the turbidostat comprises a shaker or agitator for constant or intermittent agitation, for example, a shaker, mixer, stirrer, or bubbler, allowing for the population of host cells to be continuously or intermittently agitated and oxygenated.

In some embodiments, the controller regulates the rate of inflow of fresh host cells into the lagoon to be substantially the same (volume/volume) as the rate of outflow from the lagoon. In some embodiments, the rate of inflow of fresh host cells into and/or the rate of outflow of host cells from the lagoon is regulated to be substantially constant over the time of a continuous evolution experiment. In some embodiments, the rate of inflow and/or the rate of outflow is from about 0.1 lagoon volumes per hour to about 25 lagoon volumes per hour. In some embodiments, the rate of inflow and/or the rate of outflow is approximately 0.1 lagoon volumes per hour (lv/h), approximately 0.2 lv/h, approximately 0.25 lv/h, approximately 0.3 lv/h, approximately 0.4 lv/h, approximately 0.5 lv/h, approximately 0.6 lv/h, approximately 0.7 lv/h, approximately 0.75 lv/h, approximately 0.8 lv/h, approximately 0.9 lv/h, approximately 1 lv/h, approximately 2 lv/h, approximately 2.5 lv/h, approximately 3 lv/h, approximately 4 lv/h, approximately 5 lv/h, approximately 7.5 lv/h, approximately 10 lv/h, or more than 10 lv/h.

In some embodiments, the inflow and outflow rates are controlled based on a quantitative assessment of the population of host cells in the lagoon, for example, by measuring the cell number, cell density, wet biomass weight per volume, turbidity, or cell growth rate. In some embodiments, the lagoon inflow and/or outflow rate is controlled to maintain a host cell density of from about 10²cells/ml to about 10¹²cells/ml in the lagoon. In some embodiments, the inflow and/or outflow rate is controlled to maintain a host cell density of about 10²cells/ml, about 10³cells/ml, about 10⁴cells/ml, about 10⁵cells/ml, about 5×10⁵cells/ml, about 10⁶cells/ml, about 5×10⁶cells/ml, about 10⁷cells/ml, about 5×10⁷cells/ml, about 10⁸cells/ml, about 5×10⁸cells/ml, about 10⁹cells/ml, about 5×10⁹cells/ml, about 10¹⁰cells/ml, about 5×10¹⁰cells/ml, or more than 5×10¹⁰cells/ml, in the lagoon. In some embodiments, the density of fresh host cells in the turbidostat and the density of host cells in the lagoon are substantially identical.

In some embodiments, the lagoon inflow and outflow rates are controlled to maintain a substantially constant number of host cells in the lagoon. In some embodiments, the inflow and outflow rates are controlled to maintain a substantially constant frequency of fresh host cells in the lagoon. In some embodiments, the population of host cells is continuously replenished with fresh host cells that are not infected by the phage. In some embodiments, the replenishment is semi-continuous or by batch-feeding fresh cells into the cell population.

In some embodiments, the lagoon volume is from approximately 1 ml to approximately 100 l, for example, the lagoon volume is approximately 1 ml, approximately 10 ml, approximately 50 ml, approximately 100 ml, approximately 200 ml, approximately 250 ml, approximately 500 ml, approximately 750 ml, approximately 1 l, approximately 2 ml, approximately 2.5 l, approximately 3 l, approximately 4 l, approximately 5 l, approximately 10 l, approximately 1 ml-10 ml, approximately 10 ml-50 ml, approximately 50 ml-100, approximately 100 ml-250 ml, approximately 250 ml-500 ml, approximately 500 ml-1 l, approximately 1 l-2 l, approximately 2l-5 l, approximately 5l-10 l, approximately 10-50 l, approximately 50-100 l, or more than 100 l.

In some embodiments, the lagoon and/or the turbidostat further comprises a heater and a thermostat controlling the temperature. In some embodiments, the temperature in the lagoon and/or the turbidostat is controlled to be from about 4° C. to about 55° C., preferably from about 25° C. to about 39° C., for example, about 37° C.

In some embodiments, the inflow rate and/or the outflow rate is controlled to allow for the incubation and replenishment of the population of host cells for a time sufficient for at least 10, at least 20, at least 30, at least 40, at least 50, at least 100, at least 200, at least 300, at least 400, at least, 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1250, at least 1500, at least 1750, at least 2000, at least 2500, at least 3000, at least 4000, at least 5000, at least 7500, at least 10000, or more consecutive viral vector or phage life cycles. In some embodiments, the time sufficient for one phage life cycle is about 10 minutes.

Therefore, in some embodiments, the time of the entire evolution procedure is about 12 hours, about 18 hours, about 24 hours, about 36 hours, about 48 hours, about 50 hours, about 3 days, about 4 days, about 5 days, about 6 days, about 7 days, about 10 days, about two weeks, about 3 weeks, about 4 weeks, or about 5 weeks.

In other embodiments, a PANCE apparatus is provided. The PANCE method begins by first growing the host strain of E. coli until A600=0.3-0.5 in a large volume and then storing the cells at 4° C. An aliquot of 50 mL is then transferred to a smaller flask, supplemented with BocK and the inducing agent arabinose (Ara) for mutagenesis plasmid, and transfected with the selection phage (SP). This culture is incubated at 37° C. for 8-12 h to facilitate phage growth, which is confirmed by determination of the phage titer. Following phage growth, an aliquot of infected cells is used to transfect a subsequent flask containing host E. coli. This process is continued until the desired phenotype is evolved for as many transfers as required.

IV. Increasing Expression

The improved HDR-dependent genome editors contemplated herein can include modifications that result in increased expression through codon optimization and ancestral reconstruction analysis.

In some embodiments, the genome editors (or a component thereof) is codon optimized for expression in particular cells, such as eukaryotic cells. The eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including but not limited to human, mouse, rat, rabbit, dog, or non-human primate. In general, codon optimization refers to a process of modifying a nucleic acid sequence for enhanced expression in the host cells of interest by replacing at least one codon (e.g. about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons) of the native sequence with codons that are more frequently or most frequently used in the genes of that host cell while maintaining the native amino acid sequence. Various species exhibit particular bias for certain codons of a particular amino acid. Codon bias (differences in codon usage between organisms) often correlates with the efficiency of translation of messenger RNA (mRNA), which is in turn believed to be dependent on, among other things, the properties of the codons being translated and the availability of particular transfer RNA (tRNA) molecules. The predominance of selected tRNAs in a cell is generally a reflection of the codons used most frequently in peptide synthesis. Accordingly, genes can be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are readily available, for example, at the “Codon Usage Database”, and these tables can be adapted in a number of ways. See Nakamura, Y., et al. “Codon usage tabulated from the international DNA sequence databases: status for the year 2000” Nucl. Acids Res. 28:292 (2000). Computer algorithms for codon optimizing a particular sequence for expression in a particular host cell are also available, such as Gene Forge (Aptagen; Jacobus, Pa.), are also available. In some embodiments, one or more codons (e.g. 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more, or all codons) in a sequence encoding a CRISPR enzyme correspond to the most frequently used codon for a particular amino acid.

In other embodiments, the genome editors of the invention have improved expression (as compared to non-modified or state of the art counterpart editors) as a result of ancestral sequence reconstruction analysis. Ancestral sequence reconstruction (ASR) is the process of analyzing modern sequences within an evolutionary/phylogenetic context to infer the ancestral sequences at particular nodes of a tree. These ancient sequences are most often then synthesized, recombinantly expressed in laboratory microorganisms or cell lines, and then characterized to reveal the ancient properties of the extinct biomolecules 2,3, 4, 5, 6. This process has produced tremendous insights into the mechanisms of molecular adaptation and functional divergence7. Despite such insights, a major criticism of ASR is the general inability to benchmark accuracy of the implemented algorithms. It is difficult to benchmark ASR for many reasons. Notably, genetic material is not preserved in fossils on a long enough time scale to satisfy most ASR studies (many millions to billions of years ago), and it is not yet physically possible to travel back in time to collect samples. Reference can be made to Cal et al., “Reconstruction of ancestral protein sequences and its applications,” BMC Evolutionary Biology 2004, 4:33 and Zakas et al., “Enhancing the pharmaceutical properties of protein drugs by ancestral sequence reconstruction,” Nature Biotechnology, 35, pp. 35-37 (2017), each of which are incorporated herein by reference.

There are many software packages available which can perform ancestral state reconstruction. Generally, these software packages have been developed and maintained through the efforts of scientists in related fields and released under free software licenses. The following list is not meant to be a comprehensive itemization of all available packages, but provides a representative sample of the extensive variety of packages that implement methods of ancestral reconstruction with different strengths and features: PAML (Phylogenetic Analysis by Maximum Likelihood, available at //abacus.gene.ucl.ac.uk/software/paml.html), BEAST (Bayesian evolutionary analysis by sampling trees, available at //www.beast2.org/wiki/index.php/Main_Page), and Diversitree (FitzJohn RG, 2012. Diversitree: comparative phylogenetic analyses of diversification in R. Methods in Ecology and Evolution), and HyPHy (Hypothesis testing using phylogenies, available at //hyphy.org/w/index.php/Main_Page).

The above description is meant to be non-limiting with regard to making genome editors having increased expression, and thereby increase editing efficiencies.

V. Increasing Nuclear Localization

In one aspect, the specification provides a strategy for improving the HDR-dependent genome editors by incorporating one or more nuclear localization signals (NLS) therein, e.g., as a N-terminal or C-terminal fusion protein. Preferably, at least two NLSs are incorporated into a genome editor. The inventors explored whether sub-optimal nuclear localization could be a basis or poor editing efficiency. Six combinations of the genome editor “BE4” were tested as N- and/or C-terminal fusions to either the SV40 NLS or the bipartite NLS (bpNLS). All the variants using one or two bpNLSs showed improvements in editing efficiency. The presence of a bpNLS at both the N- and C-terminus (referred to hereafter as “bis-bpNLS”) performed best, resulting in a 1.3-fold average improvement in BE4-mediated CG-to-TA editing efficiency at five exemplary tested genomic loci (48±8.0% average editing compared to 37±5.6% for the C-terminal SV40 NLS used in BE4). These results together suggest that modifying genome editors with one or more nuclear localization signals, e.g., a bis-bpNLS, can significantly improve the editing efficiency of previously described for known genome editors, such as, BE3 and BE4 (6, 7).

The invention is not intended to be limiting with regard to which NLS is employed, and the manner by which the NLS is attached to or otherwise coupled to a genome editor. NLS sequences are known in the art and examples are disclosed herein.

VI. Increasing Genome Editor Efficiencies/Reducing Indel Formation

An “indel”, as used herein, refers to the insertion or deletion of a nucleobase within a nucleic acid. Such insertions or deletions can lead to frame shift mutations within a coding region of a gene. In some embodiments, it is desirable to generate genome editors that efficiently modify (e.g. oxidize or methylate) a specific nucleotide within a nucleic acid, without generating a large number of insertions or deletions (i.e., indels) in the nucleic acid. In certain embodiments, any of the genome editors provided herein are capable of generating a greater proportion of intended modifications (e.g., point mutations) versus indels. In some embodiments, the genome editors provided herein are capable of generating a ratio of intended point mutations to indels that is greater than 1:1. In some embodiments, the genome editors provided herein are capable of generating a ratio of intended point mutations to indels that is at least 1.5:1, at least 2:1, at least 2.5:1, at least 3:1, at least 3.5:1, at least 4:1, at least 4.5:1, at least 5:1, at least 5.5:1, at least 6:1, at least 6.5:1, at least 7:1, at least 7.5:1, at least 8:1, at least 10:1, at least 12:1, at least 15:1, at least 20:1, at least 25:1, at least 30:1, at least 40:1, at least 50:1, at least 100:1, at least 200:1, at least 300:1, at least 400:1, at least 500:1, at least 600:1, at least 700:1, at least 800:1, at least 900:1, or at least 1000:1, or more. The number of intended mutations and indels may be determined using any suitable method, for example the methods used in the below Examples. In some embodiments, to calculate indel frequencies, sequencing reads are scanned for exact matches to two 10-bp sequences that flank both sides of a window in which indels might occur. If no exact matches are located, the read is excluded from analysis. If the length of this indel window exactly matches the reference sequence the read is classified as not containing an indel. If the indel window is two or more bases longer or shorter than the reference sequence, then the sequencing read is classified as an insertion or deletion, respectively.

In some embodiments, the genome editors provided herein are capable of limiting formation of indels in a region of a nucleic acid. In some embodiments, the region is at a nucleotide targeted by a genome editor or a region within 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleotides of a nucleotide targeted by a genome editor. In some embodiments, any of the genome editors provided herein are capable of limiting the formation of indels at a region of a nucleic acid to less than 1%, less than 1.5%, less than 2%, less than 2.5%, less than 3%, less than 3.5%, less than 4%, less than 4.5%, less than 5%, less than 6%, less than 7%, less than 8%, less than 9%, less than 10%, less than 12%, less than 15%, or less than 20%. The number of indels formed at a nucleic acid region may depend on the amount of time a nucleic acid (e.g., a nucleic acid within the genome of a cell) is exposed to a genome editor. In some embodiments, an number or proportion of indels is determined after at least 1 hour, at least 2 hours, at least 6 hours, at least 12 hours, at least 24 hours, at least 36 hours, at least 48 hours, at least 3 days, at least 4 days, at least 5 days, at least 7 days, at least 10 days, or at least 14 days of exposing a nucleic acid (e.g., a nucleic acid within the genome of a cell) to a genome editor.

Some aspects of the disclosure are based on the recognition that any of the genome editors provided herein are capable of efficiently generating an intended mutation, such as a point mutation, in a nucleic acid (e.g. a nucleic acid within a genome of a subject) without generating a significant number of unintended mutations, such as unintended point mutations. In some embodiments, a intended mutation is a mutation that is generated by a specific genome editor bound to a gRNA, specifically designed to generate the intended mutation. In some embodiments, the intended mutation is a mutation associated with a disease or disorder. In some embodiments, the intended mutation is a guanine (G) to thymine (T) point mutation associated with a disease or disorder. In some embodiments, the intended mutation is a thymine (T) to cytosine (C) point mutation associated with a disease or disorder. In some embodiments, the intended mutation is a guanine (G) to thymine (T) point mutation within the coding region of a gene. In some embodiments, the intended mutation is a thymine (T) to cytosine (C) point mutation within the coding region of a gene. In some embodiments, the intended mutation is a point mutation that generates a stop codon, for example, a premature stop codon within the coding region of a gene. In some embodiments, the intended mutation is a mutation that eliminates a stop codon. In some embodiments, the intended mutation is a mutation that alters the splicing of a gene. In some embodiments, the intended mutation is a mutation that alters the regulatory sequence of a gene (e.g., a gene promotor or gene repressor). In some embodiments, any of the genome editors provided herein are capable of generating a ratio of intended mutations to unintended mutations (e.g., intended point mutations:unintended point mutations) that is greater than 1:1. In some embodiments, any of the genome editors provided herein are capable of generating a ratio of intended mutations to unintended mutations (e.g., intended point mutations:unintended point mutations) that is at least 1.5:1, at least 2:1, at least 2.5:1, at least 3:1, at least 3.5:1, at least 4:1, at least 4.5:1, at least 5:1, at least 5.5:1, at least 6:1, at least 6.5:1, at least 7:1, at least 7.5:1, at least 8:1, at least 10:1, at least 12:1, at least 15:1, at least 20:1, at least 25:1, at least 30:1, at least 40:1, at least 50:1, at least 100:1, at least 150:1, at least 200:1, at least 250:1, at least 500:1, or at least 1000:1, or more. It should be appreciated that the characteristics of the genome editors described in the “Genome editor Efficiency” section, herein, may be applied to any of the fusion proteins, or methods of using the fusion proteins provided herein.

VII. Vectors

Several aspects of the making and using the improved HDR-dependent genome editors of the invention relate to vector systems comprising one or more vectors, or vectors as such. Vectors can be designed to clone and/or express the genome editors of the disclosure. Vectors can also be designed to transfect the genome editors of the disclosure into one or more cells, e.g., a target diseased eukaryotic cell for treatment with the genome editor systems and methods disclosed herein.

Vectors can be designed for expression of genome editor transcripts (e.g. nucleic acid transcripts, proteins, or enzymes) in prokaryotic or eukaryotic cells. For example, genome editor transcripts can be expressed in bacterial cells such as Escherichia coli, insect cells (using baculovirus expression vectors), yeast cells, or mammalian cells. Suitable host cells are discussed further in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press. San Diego, Calif. (1990). Alternatively, expression vectors encoding one or more genome editors described herein can be transcribed and translated in vitro, for example, using T7 promoter regulatory sequences and T7 polymerase.

Vectors may be introduced and propagated in a prokaryotic cells. In some embodiments, a prokaryote is used to amplify copies of a vector to be introduced into a eukaryotic cell or as an intermediate vector in the production of a vector to be introduced into a eukaryotic cell (e.g. amplifying a plasmid as part of a viral vector packaging system). In some embodiments, a prokaryote is used to amplify copies of a vector and express one or more nucleic acids, such as to provide a source of one or more proteins for delivery to a host cell or host organism. Expression of proteins in prokaryotes is most often carried out in Escherichia coli with vectors containing constitutive or inducible promoters directing the expression of either fusion or non-fusion proteins.

Fusion expression vectors also may be used to express the genome editors of the disclosure. Such vectors generally add a number of amino acids to a protein encoded therein, such as to the amino terminus of the recombinant protein. Such fusion vectors may serve one or more purposes, such as: (i) to increase expression of recombinant protein; (ii) to increase the solubility of the recombinant protein; and (iii) to aid in the purification of the recombinant protein by acting as a ligand in affinity purification. Often, in fusion expression vectors, a proteolytic cleavage site is introduced at the junction of the fusion moiety and the recombinant protein to enable separation of the recombinant protein from the fusion moiety subsequent to purification of the fusion protein. Such enzymes, and their cognate recognition sequences, include Factor Xa, thrombin and enterokinase. Example fusion expression vectors include pGEX (Pharmacia Biotech Inc; Smith and Johnson, 1988. Gene 67: 31-40), pMAL (New England Biolabs, Beverly, Mass.) and pRIT5 (Pharmacia, Piscataway, N.J.) that fuse glutathione S-transferase (GST), maltose E binding protein, or protein A, respectively, to the target recombinant protein.

Examples of suitable inducible non-fusion E. coli expression vectors include pTrc (Amrann et al., (1988) Gene 69:301-315) and pET 11d (Studier et al., GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990) 60-89).

In some embodiments, a vector is a yeast expression vector for expressing the genome editors described herein. Examples of vectors for expression in yeast Saccharomyces cerivisae include pYepSec1 (Baldari, et al., 1987. EMBO J. 6: 229-234), pMFa (Kuijan and Herskowitz, 1982. Cell 30: 933-943), pJRY88 (Schultz et al., 1987. Gene 54: 113-123), pYES2 (Invitrogen Corporation, San Diego, Calif.), and picZ (InVitrogen Corp, San Diego, Calif.).

In some embodiments, a vector drives protein expression in insect cells using baculovirus expression vectors. Baculovirus vectors available for expression of proteins in cultured insect cells (e.g., SF9 cells) include the pAc series (Smith, et al., 1983. Mol. Cell. Biol. 3: 2156-2165) and the pVL series (Lucklow and Summers, 1989. Virology 170: 31-39).

In some embodiments, a vector is capable of driving expression of one or more sequences in mammalian cells using a mammalian expression vector. Examples of mammalian expression vectors include pCDM8 (Seed, 1987. Nature 329: 840) and pMT2PC (Kaufman, et al., 1987. EMBO J. 6: 187-195). When used in mammalian cells, the expression vector's control functions are typically provided by one or more regulatory elements. For example, commonly used promoters are derived from polyoma, adenovirus 2, cytomegalovirus, simian virus 40, and others disclosed herein and known in the art. For other suitable expression systems for both prokaryotic and eukaryotic cells see, e.g., Chapters 16 and 17 of Sambrook, et al., MOLECULAR CLONING: A LABORATORY MANUAL. 2nd ed., Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989.

In some embodiments, the recombinant mammalian expression vector is capable of directing expression of the nucleic acid preferentially in a particular cell type (e.g., tissue-specific regulatory elements are used to express the nucleic acid). Tissue-specific regulatory elements are known in the art. Non-limiting examples of suitable tissue-specific promoters include the albumin promoter (liver-specific; Pinkert, et al., 1987. Genes Dev. 1: 268-277), lymphoid-specific promoters (Calame and Eaton, 1988. Adv. Immunol. 43: 235-275), in particular promoters of T cell receptors (Winoto and Baltimore, 1989. EMBO J. 8: 729-733) and immunoglobulins (Baneiji, et al., 1983. Cell 33: 729-740; Queen and Baltimore, 1983. Cell 33: 741-748), neuron-specific promoters (e.g., the neurofilament promoter; Byrne and Ruddle, 1989. Proc. Natl. Acad. Sci. USA 86: 5473-5477), pancreas-specific promoters (Edlund, et al., 1985. Science 230: 912-916), and mammary gland-specific promoters (e.g., milk whey promoter, U.S. Pat. No. 4,873,316 and European Application Publication No. 264,166). Developmentally-regulated promoters are also encompassed, e.g., the murine hox promoters (Kessel and Gruss, 1990. Science 249: 374-379) and the α-fetoprotein promoter (Campes and Tilghman, 1989. Genes Dev. 3: 537-546).

The invention provides viral vectors for the continuous evolution processes. In some embodiments, phage vectors for phage-assisted continuous evolution are provided. In some embodiments, a selection phage is provided that comprises a phage genome deficient in at least one gene required for the generation of infectious phage particles and a gene of interest to be evolved.

For example, in some embodiments, the selection phage comprises an M13 phage genome deficient in a gene required for the generation of infectious M13 phage particles, for example, a full-length gIII. In some embodiments, the selection phage comprises a phage genome providing all other phage functions required for the phage life cycle except the gene required for generation of infectious phage particles. In some such embodiments, an M13 selection phage is provided that comprises a gI, gII, gIV, gV, gVI, gVII, gVIII, gIX, and a gX gene, but not a full-length gIII. In some embodiments, the selection phage comprises a 3′-fragment of gIII, but no full-length gIII. The 3′-end of gIII comprises a promoter (see FIG. 16) and retaining this promoter activity is beneficial, in some embodiments, for an increased expression of gVI, which is immediately downstream of the gIII 3′-promoter, or a more balanced (wild-type phage-like) ratio of expression levels of the phage genes in the host cell, which, in turn, can lead to more efficient phage production. In some embodiments, the 3′-fragment of gIII gene comprises the 3′-gIII promoter sequence. In some embodiments, the 3′-fragment of gIII comprises the last 180 bp, the last 150 bp, the last 125 bp, the last 100 bp, the last 50 bp, or the last 25 bp of gIII. In some embodiments, the 3′-fragment of gIII comprises the last 180 bp of gIII.

M13 selection phage is provided that comprises a gene of interest in the phage genome, for example, inserted downstream of the gVIII 3′-terminator and upstream of the gIII-3′-promoter. In some embodiments, an M13 selection phage is provided that comprises a multiple cloning site for cloning a gene of interest into the phage genome, for example, a multiple cloning site (MCS) inserted downstream of the gVIII 3′-terminator and upstream of the gIII-3′-promoter.

Some aspects of this invention provide a vector system for continuous evolution procedures, comprising of a viral vector, for example, a selection phage, and a matching accessory plasmid. In some embodiments, a vector system for phage-based continuous directed evolution is provided that comprises (a) a selection phage comprising a gene of interest to be evolved, wherein the phage genome is deficient in a gene required to generate infectious phage; and (b) an accessory plasmid comprising the gene required to generate infectious phage particle under the control of a conditional promoter, wherein the conditional promoter is activated by a function of a gene product encoded by the gene of interest.

In some embodiments, the selection phage is an M13 phage as described herein. For example, in some embodiments, the selection phage comprises an M13 genome including all genes required for the generation of phage particles, for example, gI, gII, gIV, gV, gVI, gVII, gVIII, gIX, and gX gene, but not a full-length gIII gene. In some embodiments, the selection phage genome comprises an F1 or an M13 origin of replication. In some embodiments, the selection phage genome comprises a 3′-fragment of gIII gene. In some embodiments, the selection phage comprises a multiple cloning site upstream of the gIII 3′-promoter and downstream of the gVIII 3′-terminator.

In some embodiments, the selection phage does not comprise a full length gVI. GVI is similarly required for infection as gIII and, thus, can be used in a similar fashion for selection as described for gIII herein. However, it was found that continuous expression of pIII renders some host cells resistant to infection by M13. Accordingly, it is desirable that pIII is produced only after infection. This can be achieved by providing a gene encoding pIII under the control of an inducible promoter, for example, an arabinose-inducible promoter as described herein, and providing the inducer in the lagoon, where infection takes place, but not in the turbidostat, or otherwise before infection takes place. In some embodiments, multiple genes required for the generation of infectious phage are removed from the selection phage genome, for example, gIII and gVI, and provided by the host cell, for example, in an accessory plasmid as described herein.

The vector system may further comprise a helper phage, wherein the selection phage does not comprise all genes required for the generation of phage particles, and wherein the helper phage complements the genome of the selection phage, so that the helper phage genome and the selection phage genome together comprise at least one functional copy of all genes required for the generation of phage particles, but are deficient in at least one gene required for the generation of infectious phage particles.

In some embodiments, the accessory plasmid of the vector system comprises an expression cassette comprising the gene required for the generation of infectious phage under the control of a conditional promoter. In some embodiments, the accessory plasmid of the vector system comprises a gene encoding pIII under the control of a conditional promoter the activity of which is dependent on a function of a product of the gene of interest.

In some embodiments, the vector system further comprises a mutagenesis plasmid, for example, an arabinose-inducible mutagenesis plasmid as described herein.

In some embodiments, the vector system further comprises a helper plasmid providing expression constructs of any phage gene not comprised in the phage genome of the selection phage or in the accessory plasmid.

VIII. DNA Editing

Some aspects of the disclosure provide methods for editing a nucleic acid using the improved HDR-dependent genome editors described herein to effectuate a nucleobase change, e.g., a G:T base pair to a T:A base pair. In some embodiments, the method is a method for editing a nucleobase of a nucleic acid (e.g., a base pair of a double-stranded DNA sequence) mediated by homology-directed repair. In some embodiments, the method comprises the steps of: a) contacting a target region of a nucleic acid (e.g., a double-stranded DNA sequence) with a complex comprising an improved HDR-dependent genome editor fusion protein (e.g., nCas9-Rad51 fusion), a guide nucleic acid (e.g., gRNA), and an exogenous donor DNA molecule (comprising the corrected genetic element), wherein the target region comprises a targeted nucleobase pair, b) inducing strand separation of said target region and nicking one strand of the DNA, c) inducing homology-directed repair to homologously recombine the donor sequence with the target sequence, thereby incorporating the desired genetic change from the donor sequence into the target sequence.

In some embodiments, the ratio of intended products to unintended products in the target nucleotide is at least 2:1, 5:1, 10:1, 20:1, 30:1, 40:1, 50:1, 60:1, 70:1, 80:1, 90:1, 100:1, or 200:1, or more. In some embodiments, the ratio of intended point mutation to indel formation is greater than 1:1, 10:1, 50:1, 100:1, 500:1, or 1000:1, or more. In some embodiments, the cut single strand (nicked strand) is hybridized to the guide nucleic acid. In some embodiments, the cut single strand is opposite to the strand comprising the first nucleobase. n some embodiments, the genome editor comprises nickase activity. In some embodiments, the intended edited base pair is upstream of a PAM site. In some embodiments, the intended edited base pair is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides upstream of the PAM site. In some embodiments, the intended edited basepair is downstream of a PAM site. In some embodiments, the intended edited base pair is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides downstream stream of the PAM site. In some embodiments, the method does not require a canonical (e.g., NGG) PAM site. In some embodiments, the nucleogenome editor comprises a linker. In some embodiments, the linker is 1-25 amino acids in length. In some embodiments, the linker is 5-20 amino acids in length. In some embodiments, linker is 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acids in length. In some embodiments, the target region comprises a target window, wherein the target window comprises the target nucleobase pair. In some embodiments, the target window comprises 1-10 nucleotides. In some embodiments, the target window is 1-9, 1-8, 1-7, 1-6, 1-5, 1-4, 1-3, 1-2, or 1 nucleotides in length. In some embodiments, the target window is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length. In some embodiments, the intended edited base pair is within the target window. In some embodiments, the target window comprises the intended edited base pair. In some embodiments, the method is performed using any of the genome editors provided herein. In some embodiments, a target window is a editing window.

In some embodiments, the target DNA sequence comprises a sequence associated with a disease or disorder. In some embodiments, the target DNA sequence comprises a point mutation associated with a disease or disorder. In some embodiments, the activity of the fusion protein (e.g., a Cas9 domain fused to a Rad51), or the complex, results in a correction of the point mutation as mediated by homology-directed repair in the presence of a donor DNA sequence comprising the desired genetic change.

Some embodiments provide methods for using the genome editors provided herein. In some embodiments, the genetic defect is associated with a disease or disorder, e.g., a lysosomal storage disorder or a metabolic disease, such as, for example, type I diabetes. In some embodiments, the methods provided herein are used to introduce a deactivating point mutation into a gene or allele that encodes a gene product that is associated with a disease or disorder. For example, in some embodiments, methods are provided herein that employ a DNA editing fusion protein to introduce a deactivating point mutation into an oncogene (e.g., in the treatment of a proliferative disease). A deactivating mutation may, in some embodiments, generate a premature stop codon in a coding sequence, which results in the expression of a truncated gene product, e.g., a truncated protein lacking the function of the full-length protein.

In some embodiments, the purpose of the methods provided herein is to restore the function of a dysfunctional gene via genome editing. The nucleobase editing proteins provided herein can be validated for gene editing-based human therapeutics in vitro, e.g., by correcting a disease-associated mutation in human cell culture.

The successful correction of point mutations in disease-associated genes and alleles opens up new strategies for gene correction with applications in therapeutics and basic research. Site-specific single-base modification systems like the disclosed fusions of a nucleic acid programmable DNA binding protein and a single-stranded DNA binding protein (e.g., Rad51) also have applications in “reverse” gene therapy, where certain gene functions are purposely suppressed or abolished. In these cases, site-specifically mutating residues that lead to inactivating mutations in a protein, or mutations that inhibit function of the protein can be used to abolish or inhibit protein function.

IX. Methods of Treatment

The instant disclosure provides methods for the treatment of a subject diagnosed with a disease associated with or caused by a point mutation that can be corrected by the improved HDR-dependent genome editors provided herein. For example, in some embodiments, a method is provided that comprises administering to a subject having such a disease, e.g., a cancer associated with a point mutation as described above, an effective amount of an improved HDR-dependent genome editor that corrects the point mutation or introduces a deactivating mutation into a disease-associated gene as mediated by homology-directed repair in the presence of a donor DNA molecule comprising desired genetic change. In some embodiments, a method is provided that comprises administering to a subject having such a disease, e.g., a cancer associated with a point mutation as described above, an effective amount of an HDR-dependent fusion protein that corrects the point mutation or introduces a deactivating mutation into a disease-associated gene. In some embodiments, the disease is a proliferative disease. In some embodiments, the disease is a genetic disease. In some embodiments, the disease is a neoplastic disease. In some embodiments, the disease is a metabolic disease. In some embodiments, the disease is a lysosomal storage disease. Other diseases that can be treated by correcting a point mutation or introducing a deactivating mutation into a disease-associated gene will be known to those of skill in the art, and the disclosure is not limited in this respect.

The instant disclosure provides methods for the treatment of additional diseases or disorders, e.g., diseases or disorders that are associated or caused by a point mutation that can be corrected by HDR-mediated gene editing. Some such diseases are described herein, and additional suitable diseases that can be treated with the strategies and fusion proteins provided herein will be apparent to those of skill in the art based on the instant disclosure. Exemplary suitable diseases and disorders are listed below. It will be understood that the numbering of the specific positions or residues in the respective sequences depends on the particular protein and numbering scheme used. Numbering might be different, e.g., in precursors of a mature protein and the mature protein itself, and differences in sequences from species to species may affect numbering. One of skill in the art will be able to identify the respective residue in any homologous protein and in the respective encoding nucleic acid by methods well known in the art, e.g., by sequence alignment and determination of homologous residues. Exemplary suitable diseases and disorders include, without limitation: 2-methyl-3-hydroxybutyric aciduria; 3 beta-Hydroxysteroid dehydrogenase deficiency; 3-Methylglutaconic aciduria; 3-Oxo-5 alpha-steroid delta 4-dehydrogenase deficiency; 46,XY sex reversal, type 1, 3, and 5; 5-Oxoprolinase deficiency; 6-pyruvoyl-tetrahydropterin synthase deficiency; Aarskog syndrome; Aase syndrome; Achondrogenesis type 2; Achromatopsia 2 and 7; Acquired long QT syndrome; Acrocallosal syndrome, Schinzel type; Acrocapitofemoral dysplasia; Acrodysostosis 2, with or without hormone resistance; Acroerythrokeratoderma; Acromicric dysplasia; Acth-independent macronodular adrenal hyperplasia 2; Activated PI3K-delta syndrome; Acute intermittent porphyria; deficiency of Acyl-CoA dehydrogenase family, member 9; Adams-Oliver syndrome 5 and 6; Adenine phosphoribosyltransferase deficiency; Adenylate kinase deficiency; hemolytic anemia due to Adenylosuccinate lyase deficiency; Adolescent nephronophthisis; Renal-hepatic-pancreatic dysplasia; Meckel syndrome type 7; Adrenoleukodystrophy; Adult junctional epidermolysis bullosa; Epidermolysis bullosa, junctional, localisata variant; Adult neuronal ceroid lipofuscinosis; Adult neuronal ceroid lipofuscinosis; Adult onset ataxia with oculomotor apraxia; ADULT syndrome; Afibrinogenemia and congenital Afibrinogenemia; autosomal recessive Agammaglobulinemia 2; Age-related macular degeneration 3, 6, 11, and 12; Aicardi Goutieres syndromes 1, 4, and 5; Chilbain lupus 1; Alagille syndromes 1 and 2; Alexander disease; Alkaptonuria; Allan-Herndon-Dudley syndrome; Alopecia universalis congenital; Alpers encephalopathy; Alpha-1-antitrypsin deficiency; autosomal dominant, autosomal recessive, and X-linked recessive Alport syndromes; Alzheimer disease, familial, 3, with spastic paraparesis and apraxia; Alzheimer disease, types, 1, 3, and 4; hypocalcification type and hypomaturation type, IIA1 Amelogenesis imperfecta; Aminoacylase 1 deficiency; Amish infantile epilepsy syndrome; Amyloidogenic transthyretin amyloidosis; Amyloid Cardiomyopathy, Transthyretin-related; Cardiomyopathy; Amyotrophic lateral sclerosis types 1, 6, 15 (with or without frontotemporal dementia), 22 (with or without frontotemporal dementia), and 10; Frontotemporal dementia with TDP43 inclusions, TARDBP-related; Andermann syndrome; Andersen Tawil syndrome; Congenital long QT syndrome; Anemia, nonspherocytic hemolytic, due to G6PD deficiency; Angelman syndrome; Severe neonatal-onset encephalopathy with microcephaly; susceptibility to Autism, X-linked 3; Angiopathy, hereditary, with nephropathy, aneurysms, and muscle cramps; Angiotensin i-converting enzyme, benign serum increase; Aniridia, cerebellar ataxia, and mental retardation; Anonychia; Antithrombin III deficiency; Antley-Bixler syndrome with genital anomalies and disordered steroidogenesis; Aortic aneurysm, familial thoracic 4, 6, and 9; Thoracic aortic aneurysms and aortic dissections; Multisystemic smooth muscle dysfunction syndrome; Moyamoya disease 5; Aplastic anemia; Apparent mineralocorticoid excess; Arginase deficiency; Argininosuccinate lyase deficiency; Aromatase deficiency; Arrhythmogenic right ventricular cardiomyopathy types 5, 8, and 10; Primary familial hypertrophic cardiomyopathy; Arthrogryposis multiplex congenita, distal, X-linked; Arthrogryposis renal dysfunction cholestasis syndrome; Arthrogryposis, renal dysfunction, and cholestasis 2; Asparagine synthetase deficiency; Abnormality of neuronal migration; Ataxia with vitamin E deficiency; Ataxia, sensory, autosomal dominant; Ataxia-telangiectasia syndrome; Hereditary cancer-predisposing syndrome; Atransferrinemia; Atrial fibrillation, familial, 11, 12, 13, and 16; Atrial septal defects 2, 4, and 7 (with or without atrioventricular conduction defects); Atrial standstill 2; Atrioventricular septal defect 4; Atrophia bulborum hereditaria; ATR-X syndrome; Auriculocondylar syndrome 2; Autoimmune disease, multisystem, infantile-onset; Autoimmune lymphoproliferative syndrome, type 1a; Autosomal dominant hypohidrotic ectodermal dysplasia; Autosomal dominant progressive external ophthalmoplegia with mitochondrial DNA deletions 1 and 3; Autosomal dominant torsion dystonia 4; Autosomal recessive centronuclear myopathy; Autosomal recessive congenital ichthyosis 1, 2, 3, 4A, and 4B; Autosomal recessive cutis laxa type IA and 1B; Autosomal recessive hypohidrotic ectodermal dysplasia syndrome; Ectodermal dysplasia 11b; hypohidrotic/hair/tooth type, autosomal recessive; Autosomal recessive hypophosphatemic bone disease; Axenfeld-Rieger syndrome type 3; Bainbridge-Ropers syndrome; Bannayan-Riley-Ruvalcaba syndrome; PTEN hamartoma tumor syndrome; Baraitser-Winter syndromes 1 and 2; Barakat syndrome; Bardet-Biedl syndromes 1, 11, 16, and 19; Bare lymphocyte syndrome type 2, complementation group E; Bartter syndrome antenatal type 2; Bartter syndrome types 3, 3 with hypocalciuria, and 4; Basal ganglia calcification, idiopathic, 4; Beaded hair; Benign familial hematuria; Benign familial neonatal seizures 1 and 2; Seizures, benign familial neonatal, 1, and/or myokymia; Seizures, Early infantile epileptic encephalopathy 7; Benign familial neonatal-infantile seizures; Benign hereditary chorea; Benign scapuloperoneal muscular dystrophy with cardiomyopathy; Bernard-Soulier syndrome, types A1 and A2 (autosomal dominant); Bestrophinopathy, autosomal recessive; beta Thalassemia; Bethlem myopathy and Bethlem myopathy 2; Bietti crystalline corneoretinal dystrophy; Bile acid synthesis defect, congenital, 2; Biotinidase deficiency; Birk Barel mental retardation dysmorphism syndrome; Blepharophimosis, ptosis, and epicanthus inversus; Bloom syndrome; Borjeson-Forssman-Lehmann syndrome; Boucher Neuhauser syndrome; Brachydactyly types A1 and A2; Brachydactyly with hypertension; Brain small vessel disease with hemorrhage; Branched-chain ketoacid dehydrogenase kinase deficiency; Branchiootic syndromes 2 and 3; Breast cancer, early-onset; Breast-ovarian cancer, familial 1, 2, and 4; Brittle cornea syndrome 2; Brody myopathy; Bronchiectasis with or without elevated sweat chloride 3; Brown-Vialetto-Van laere syndrome and Brown-Vialetto-Van Laere syndrome 2; Brugada syndrome; Brugada syndrome 1; Ventricular fibrillation; Paroxysmal familial ventricular fibrillation; Brugada syndrome and Brugada syndrome 4; Long QT syndrome; Sudden cardiac death; Bull eye macular dystrophy; Stargardt disease 4; Cone-rod dystrophy 12; Bullous ichthyosiform erythroderma; Burn-Mckeown syndrome; Candidiasis, familial, 2, 5, 6, and 8; Carbohydrate-deficient glycoprotein syndrome type I and II; Carbonic anhydrase VA deficiency, hyperammonemia due to; Carcinoma of colon; Cardiac arrhythmia; Long QT syndrome, LQT1 subtype; Cardioencephalomyopathy, fatal infantile, due to cytochrome c oxidase deficiency; Cardiofaciocutaneous syndrome; Cardiomyopathy; Danon disease; Hypertrophic cardiomyopathy; Left ventricular noncompaction cardiomyopathy; Carnevale syndrome; Carney complex, type 1; Carnitine acylcarnitine translocase deficiency; Carnitine palmitoyltransferase I, II, II (late onset), and II (infantile) deficiency; Cataract 1, 4, autosomal dominant, autosomal dominant, multiple types, with microcornea, coppock-like, juvenile, with microcornea and glucosuria, and nuclear diffuse nonprogressive; Catecholaminergic polymorphic ventricular tachycardia; Caudal regression syndrome; Cd8 deficiency, familial; Central core disease; Centromeric instability of chromosomes 1, 9 and 16 and immunodeficiency; Cerebellar ataxia infantile with progressive external ophthalmoplegi and Cerebellar ataxia, mental retardation, and dysequilibrium syndrome 2; Cerebral amyloid angiopathy, APP-related; Cerebral autosomal dominant and recessive arteriopathy with subcortical infarcts and leukoencephalopathy; Cerebral cavernous malformations 2; Cerebrooculofacioskeletal syndrome 2; Cerebro-oculo-facio-skeletal syndrome; Cerebroretinal microangiopathy with calcifications and cysts; Ceroid lipofuscinosis neuronal 2, 6, 7, and 10; Ch\xc3\xa9diak-Higashi syndrome, Chediak-Higashi syndrome, adult type; Charcot-Marie-Tooth disease types 1B, 2B2, 2C, 2F, 2I, 2U (axonal), 1C (demyelinating), dominant intermediate C, recessive intermediate A, 2A2, 4C, 4D, 4H, IF, IVF, and X; Scapuloperoneal spinal muscular atrophy; Distal spinal muscular atrophy, congenital nonprogressive; Spinal muscular atrophy, distal, autosomal recessive, 5; CHARGE association; Childhood hypophosphatasia; Adult hypophosphatasia; Cholecystitis; Progressive familial intrahepatic cholestasis 3; Cholestasis, intrahepatic, of pregnancy 3; Cholestanol storage disease; Cholesterol monooxygenase (side-chain cleaving) deficiency; Chondrodysplasia Blomstrand type; Chondrodysplasia punctata 1, X-linked recessive and 2 X-linked dominant; CHOPS syndrome; Chronic granulomatous disease, autosomal recessive cytochrome b-positive, types 1 and 2; Chudley-McCullough syndrome; Ciliary dyskinesia, primary, 7, 11, 15, 20 and 22; Citrullinemia type I; Citrullinemia type I and II; Cleidocranial dysostosis; C-like syndrome; Cockayne syndrome type A; Coenzyme Q10 deficiency, primary 1, 4, and 7; Coffin Siris/Intellectual Disability; Coffin-Lowry syndrome; Cohen syndrome; Cold-induced sweating syndrome 1; COLE-CARPENTER SYNDROME 2; Combined cellular and humoral immune defects with granulomas; Combined d-2- and l-2-hydroxyglutaric aciduria; Combined malonic and methylmalonic aciduria; Combined oxidative phosphorylation deficiencies 1, 3, 4, 12, 15, and 25; Combined partial and complete 17-alpha-hydroxylase/17,20-lyase deficiency; Common variable immunodeficiency 9; Complement component 4, partial deficiency of, due to dysfunctional c1 inhibitor; Complement factor B deficiency; Cone monochromatism; Cone-rod dystrophy 2 and 6; Cone-rod dystrophy amelogenesis imperfecta; Congenital adrenal hyperplasia and Congenital adrenal hypoplasia, X-linked; Congenital amegakaryocytic thrombocytopenia; Congenital aniridia; Congenital central hypoventilation; Hirschsprung disease 3; Congenital contractural arachnodactyly; Congenital contractures of the limbs and face, hypotonia, and developmental delay; Congenital disorder of glycosylation types 1B, 1D, 1G, 1H, 1J, 1K, 1N, 1P, 2C, 2J, 2K, IIm; Congenital dyserythropoietic anemia, type I and II; Congenital ectodermal dysplasia of face; Congenital erythropoietic porphyria; Congenital generalized lipodystrophy type 2; Congenital heart disease, multiple types, 2; Congenital heart disease; Interrupted aortic arch; Congenital lipomatous overgrowth, vascular malformations, and epidermal nevi; Non-small cell lung cancer; Neoplasm of ovary; Cardiac conduction defect, nonspecific; Congenital microvillous atrophy; Congenital muscular dystrophy; Congenital muscular dystrophy due to partial LAMA2 deficiency; Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, types A2, A7, A8, A11, and A14; Congenital muscular dystrophy-dystroglycanopathy with mental retardation, types B2, B3, B5, and B15; Congenital muscular dystrophy-dystroglycanopathy without mental retardation, type B5; Congenital muscular hypertrophy-cerebral syndrome; Congenital myasthenic syndrome, acetazolamide-responsive; Congenital myopathy with fiber type disproportion; Congenital ocular coloboma; Congenital stationary night blindness, type 1A, 1B, 1C, 1E, 1F, and 2A; Coproporphyria; Cornea plana 2; Corneal dystrophy, Fuchs endothelial, 4; Corneal endothelial dystrophy type 2; Corneal fragility keratoglobus, blue sclerae and joint hypermobility; Cornelia de Lange syndromes 1 and 5; Coronary artery disease, autosomal dominant 2; Coronary heart disease; Hyperalphalipoproteinemia 2; Cortical dysplasia, complex, with other brain malformations 5 and 6; Cortical malformations, occipital; Corticosteroid-binding globulin deficiency; Corticosterone methyloxidase type 2 deficiency; Costello syndrome; Cowden syndrome 1; Coxa plana; Craniodiaphyseal dysplasia, autosomal dominant; Craniosynostosis 1 and 4; Craniosynostosis and dental anomalies; Creatine deficiency, X-linked; Crouzon syndrome; Cryptophthalmos syndrome; Cryptorchidism, unilateral or bilateral; Cushing symphalangism; Cutaneous malignant melanoma 1; Cutis laxa with osteodystrophy and with severe pulmonary, gastrointestinal, and urinary abnormalities; Cyanosis, transient neonatal and atypical nephropathic; Cystic fibrosis; Cystinuria; Cytochrome c oxidase i deficiency; Cytochrome-c oxidase deficiency; D-2-hydroxyglutaric aciduria 2; Darier disease, segmental; Deafness with labyrinthine aplasia microtia and microdontia (LAMM); Deafness, autosomal dominant 3a, 4, 12, 13, 15, autosomal dominant nonsyndromic sensorineural 17, 20, and 65; Deafness, autosomal recessive 1A, 2, 3, 6, 8, 9, 12, 15, 16, 18b, 22, 28, 31, 44, 49, 63, 77, 86, and 89; Deafness, cochlear, with myopia and intellectual impairment, without vestibular involvement, autosomal dominant, X-linked 2; Deficiency of 2-methylbutyryl-CoA dehydrogenase; Deficiency of 3-hydroxyacyl-CoA dehydrogenase; Deficiency of alpha-mannosidase; Deficiency of aromatic-L-amino-acid decarboxylase; Deficiency of bisphosphoglycerate mutase; Deficiency of butyryl-CoA dehydrogenase; Deficiency of ferroxidase; Deficiency of galactokinase; Deficiency of guanidinoacetate methyltransferase; Deficiency of hyaluronoglucosaminidase; Deficiency of ribose-5-phosphate isomerase; Deficiency of steroid 11-beta-monooxygenase; Deficiency of UDPglucose-hexose-1-phosphate uridylyltransferase; Deficiency of xanthine oxidase; Dejerine-Sottas disease; Charcot-Marie-Tooth disease, types ID and IVF; Dejerine-Sottas syndrome, autosomal dominant; Dendritic cell, monocyte, B lymphocyte, and natural killer lymphocyte deficiency; Desbuquois dysplasia 2; Desbuquois syndrome; DFNA 2 Nonsyndromic Hearing Loss; Diabetes mellitus and insipidus with optic atrophy and deafness; Diabetes mellitus, type 2, and insulin-dependent, 20; Diamond-Blackfan anemia 1, 5, 8, and 10; Diarrhea 3 (secretory sodium, congenital, syndromic) and 5 (with tufting enteropathy, congenital); Dicarboxylic aminoaciduria; Diffuse palmoplantar keratoderma, Bothnian type; Digitorenocerebral syndrome; Dihydropteridine reductase deficiency; Dilated cardiomyopathy 1A, 1AA, 1C, 1G, 1BB, 1DD, 1FF, 1HH, 1I, 1KK, 1N, 1S, 1Y, and 3B; Left ventricular noncompaction 3; Disordered steroidogenesis due to cytochrome p450 oxidoreductase deficiency; Distal arthrogryposis type 2B; Distal hereditary motor neuronopathy type 2B; Distal myopathy Markesbery-Griggs type; Distal spinal muscular atrophy, X-linked 3; Distichiasis-lymphedema syndrome; Dominant dystrophic epidermolysis bullosa with absence of skin; Dominant hereditary optic atrophy; Donnai Barrow syndrome; Dopamine beta hydroxylase deficiency; Dopamine receptor d2, reduced brain density of; Dowling-degos disease 4; Doyne honeycomb retinal dystrophy; Malattia leventinese; Duane syndrome type 2; Dubin-Johnson syndrome; Duchenne muscular dystrophy; Becker muscular dystrophy; Dysfibrinogenemia; Dyskeratosis congenita autosomal dominant and autosomal dominant, 3; Dyskeratosis congenita, autosomal recessive, 1, 3, 4, and 5; Dyskeratosis congenita X-linked; Dyskinesia, familial, with facial myokymia; Dysplasminogenemia; Dystonia 2 (torsion, autosomal recessive), 3 (torsion, X-linked), 5 (Dopa-responsive type), 10, 12, 16, 25, 26 (Myoclonic); Seizures, benign familial infantile, 2; Early infantile epileptic encephalopathy 2, 4, 7, 9, 10, 11, 13, and 14; Atypical Rett syndrome; Early T cell progenitor acute lymphoblastic leukemia; Ectodermal dysplasia skin fragility syndrome; Ectodermal dysplasia-syndactyly syndrome 1; Ectopia lentis, isolated autosomal recessive and dominant; Ectrodactyly, ectodermal dysplasia, and cleft lip/palate syndrome 3; Ehlers-Danlos syndrome type 7 (autosomal recessive), classic type, type 2 (progeroid), hydroxylysine-deficient, type 4, type 4 variant, and due to tenascin-X deficiency; Eichsfeld type congenital muscular dystrophy; Endocrine-cerebroosteodysplasia; Enhanced s-cone syndrome; Enlarged vestibular aqueduct syndrome; Enterokinase deficiency; Epidermodysplasia verruciformis; Epidermolysa bullosa simplex and limb girdle muscular dystrophy, simplex with mottled pigmentation, simplex with pyloric atresia, simplex, autosomal recessive, and with pyloric atresia; Epidermolytic palmoplantar keratoderma; Familial febrile seizures 8; Epilepsy, childhood absence 2, 12 (idiopathic generalized, susceptibility to) 5 (nocturnal frontal lobe), nocturnal frontal lobe type 1, partial, with variable foci, progressive myoclonic 3, and X-linked, with variable learning disabilities and behavior disorders; Epileptic encephalopathy, childhood-onset, early infantile, 1, 19, 23, 25, 30, and 32; Epiphyseal dysplasia, multiple, with myopia and conductive deafness; Episodic ataxia type 2; Episodic pain syndrome, familial, 3; Epstein syndrome; Fechtner syndrome; Erythropoietic protoporphyria; Estrogen resistance; Exudative vitreoretinopathy 6; Fabry disease and Fabry disease, cardiac variant; Factor H, VII, X, v and factor viii, combined deficiency of 2, xiii, a subunit, deficiency; Familial adenomatous polyposis 1 and 3; Familial amyloid nephropathy with urticaria and deafness; Familial cold urticarial; Familial aplasia of the vermis; Familial benign pemphigus; Familial cancer of breast; Breast cancer, susceptibility to; Osteosarcoma; Pancreatic cancer 3; Familial cardiomyopathy; Familial cold autoinflammatory syndrome 2; Familial colorectal cancer; Familial exudative vitreoretinopathy, X-linked; Familial hemiplegic migraine types 1 and 2; Familial hypercholesterolemia; Familial hypertrophic cardiomyopathy 1, 2, 3, 4, 7, 10, 23 and 24; Familial hypokalemia-hypomagnesemia; Familial hypoplastic, glomerulocystic kidney; Familial infantile myasthenia; Familial juvenile gout; Familial Mediterranean fever and Familial mediterranean fever, autosomal dominant; Familial porencephaly; Familial Porphyria cutanea tarda; Familial pulmonary capillary hemangiomatosis; Familial renal glucosuria; Familial renal hypouricemia; Familial restrictive cardiomyopathy 1; Familial type 1 and 3 hyperlipoproteinemia; Fanconi anemia, complementation group E, I, N, and O; Fanconi-Bickel syndrome; Favism, susceptibility to; Febrile seizures, familial, 11; Feingold syndrome 1; Fetal hemoglobin quantitative trait locus 1; FG syndrome and FG syndrome 4; Fibrosis of extraocular muscles, congenital, 1, 2, 3a (with or without extraocular involvement), 3b; Fish-eye disease; Fleck corneal dystrophy; Floating-Harbor syndrome; Focal epilepsy with speech disorder with or without mental retardation; Focal segmental glomerulosclerosis 5; Forebrain defects; Frank Ter Haar syndrome; Borrone Di Rocco Crovato syndrome; Frasier syndrome; Wilms tumor 1; Freeman-Sheldon syndrome; Frontometaphyseal dysplasia land 3; Frontotemporal dementia; Frontotemporal dementia and/or amyotrophic lateral sclerosis 3 and 4; Frontotemporal Dementia Chromosome 3-Linked and Frontotemporal dementia ubiquitin-positive; Fructose-biphosphatase deficiency; Fuhrmann syndrome; Gamma-aminobutyric acid transaminase deficiency; Gamstorp-Wohlfart syndrome; Gaucher disease type 1 and Subacute neuronopathic; Gaze palsy, familial horizontal, with progressive scoliosis; Generalized dominant dystrophic epidermolysis bullosa; Generalized epilepsy with febrile seizures plus 3, type 1, type 2; Epileptic encephalopathy Lennox-Gastaut type; Giant axonal neuropathy; Glanzmann thrombasthenia; Glaucoma 1, open angle, e, F, and G; Glaucoma 3, primary congenital, d; Glaucoma, congenital and Glaucoma, congenital, Coloboma; Glaucoma, primary open angle, juvenile-onset; Glioma susceptibility 1; Glucose transporter type 1 deficiency syndrome; Glucose-6-phosphate transport defect; GLUT1 deficiency syndrome 2; Epilepsy, idiopathic generalized, susceptibility to, 12; Glutamate formiminotransferase deficiency; Glutaric acidemia IIA and IIB; Glutaric aciduria, type 1; Gluthathione synthetase deficiency; Glycogen storage disease 0 (muscle), II (adult form), IXa2, IXc, type 1A; type II, type IV, IV (combined hepatic and myopathic), type V, and type VI; Goldmann-Favre syndrome; Gordon syndrome; Gorlin syndrome; Holoprosencephaly sequence; Holoprosencephaly 7; Granulomatous disease, chronic, X-linked, variant; Granulosa cell tumor of the ovary; Gray platelet syndrome; Griscelli syndrome type 3; Groenouw corneal dystrophy type I; Growth and mental retardation, mandibulofacial dysostosis, microcephaly, and cleft palate; Growth hormone deficiency with pituitary anomalies; Growth hormone insensitivity with immunodeficiency; GTP cyclohydrolase I deficiency; Hajdu-Cheney syndrome; Hand foot uterus syndrome; Hearing impairment; Hemangioma, capillary infantile; Hematologic neoplasm; Hemochromatosis type 1, 2B, and 3; Microvascular complications of diabetes 7; Transferrin serum level quantitative trait locus 2; Hemoglobin H disease, nondeletional; Hemolytic anemia, nonspherocytic, due to glucose phosphate isomerase deficiency; Hemophagocytic lymphohistiocytosis, familial, 2; Hemophagocytic lymphohistiocytosis, familial, 3; Heparin cofactor II deficiency; Hereditary acrodermatitis enteropathica; Hereditary breast and ovarian cancer syndrome; Ataxia-telangiectasia-like disorder; Hereditary diffuse gastric cancer; Hereditary diffuse leukoencephalopathy with spheroids; Hereditary factors II, IX, VIII deficiency disease; Hereditary hemorrhagic telangiectasia type 2; Hereditary insensitivity to pain with anhidrosis; Hereditary lymphedema type I; Hereditary motor and sensory neuropathy with optic atrophy; Hereditary myopathy with early respiratory failure; Hereditary neuralgic amyotrophy; Hereditary Nonpolyposis Colorectal Neoplasms; Lynch syndrome I and II; Hereditary pancreatitis; Pancreatitis, chronic, susceptibility to; Hereditary sensory and autonomic neuropathy type IIB amd IIA; Hereditary sideroblastic anemia; Hermansky-Pudlak syndrome 1, 3, 4, and 6; Heterotaxy, visceral, 2, 4, and 6, autosomal; Heterotaxy, visceral, X-linked; Heterotopia; Histiocytic medullary reticulosis; Histiocytosis-lymphadenopathy plus syndrome; Holocarboxylase synthetase deficiency; Holoprosencephaly 2, 3, 7, and 9; Holt-Oram syndrome; Homocysteinemia due to MTHFR deficiency, CBS deficiency, and Homocystinuria, pyridoxine-responsive; Homocystinuria-Megaloblastic anemia due to defect in cobalamin metabolism, cblE complementation type; Howel-Evans syndrome; Hurler syndrome; Hutchinson-Gilford syndrome; Hydrocephalus; Hyperammonemia, type III; Hypercholesterolaemia and Hypercholesterolemia, autosomal recessive; Hyperekplexia 2 and Hyperekplexia hereditary;

Hyperferritinemia cataract syndrome; Hyperglycinuria; Hyperimmunoglobulin D with periodic fever; Mevalonic aciduria; Hyperimmunoglobulin E syndrome; Hyperinsulinemic hypoglycemia familial 3, 4, and 5; Hyperinsulinism-hyperammonemia syndrome; Hyperlysinemia; Hypermanganesemia with dystonia, polycythemia and cirrhosis; Hyperornithinemia-hyperammonemia-homocitrullinuria syndrome; Hyperparathyroidism 1 and 2; Hyperparathyroidism, neonatal severe; Hyperphenylalaninemia, bh4-deficient, a, due to partial pts deficiency, BH4-deficient, D, and non-pku; Hyperphosphatasia with mental retardation syndrome 2, 3, and 4; Hypertrichotic osteochondrodysplasia; Hypobetalipoproteinemia, familial, associated with apob32; Hypocalcemia, autosomal dominant 1; Hypocalciuric hypercalcemia, familial, types 1 and 3; Hypochondrogenesis; Hypochromic microcytic anemia with iron overload; Hypoglycemia with deficiency of glycogen synthetase in the liver; Hypogonadotropic hypogonadism 11 with or without anosmia; Hypohidrotic ectodermal dysplasia with immune deficiency; Hypohidrotic X-linked ectodermal dysplasia; Hypokalemic periodic paralysis 1 and 2; Hypomagnesemia 1, intestinal; Hypomagnesemia, seizures, and mental retardation; Hypomyelinating leukodystrophy 7; Hypoplastic left heart syndrome; Atrioventricular septal defect and common atrioventricular junction; Hypospadias 1 and 2, X-linked; Hypothyroidism, congenital, nongoitrous, 1; Hypotrichosis 8 and 12; Hypotrichosis-lymphedema-telangiectasia syndrome; I blood group system; Ichthyosis bullosa of Siemens; Ichthyosis exfoliativa; Ichthyosis prematurity syndrome; Idiopathic basal ganglia calcification 5; Idiopathic fibrosing alveolitis, chronic form; Dyskeratosis congenita, autosomal dominant, 2 and 5; Idiopathic hypercalcemia of infancy; Immune dysfunction with T-cell inactivation due to calcium entry defect 2; Immunodeficiency 15, 16, 19, 30, 31C, 38, 40, 8, due to defect in cd3-zeta, with hyper IgM type 1 and 2, and X-Linked, with magnesium defect, Epstein-Barr virus infection, and neoplasia; Immunodeficiency-centromeric instability-facial anomalies syndrome 2; Inclusion body myopathy 2 and 3; Nonaka myopathy; Infantile convulsions and paroxysmal choreoathetosis, familial; Infantile cortical hyperostosis; Infantile GM1 gangliosidosis; Infantile hypophosphatasia; Infantile nephronophthisis; Infantile nystagmus, X-linked; Infantile Parkinsonism-dystonia; Infertility associated with multi-tailed spermatozoa and excessive DNA; Insulin resistance; Insulin-resistant diabetes mellitus and acanthosis nigricans; Insulin-dependent diabetes mellitus secretory diarrhea syndrome; Interstitial nephritis, karyomegalic; Intrauterine growth retardation, metaphyseal dysplasia, adrenal hypoplasia congenita, and genital anomalies; Iodotyrosyl coupling defect; IRAK4 deficiency; Iridogoniodysgenesis dominant type and type 1; Iron accumulation in brain; Ischiopatellar dysplasia; Islet cell hyperplasia; Isolated 17,20-lyase deficiency; Isolated lutropin deficiency; Isovaleryl-CoA dehydrogenase deficiency; Jankovic Rivera syndrome; Jervell and Lange-Nielsen syndrome 2; Joubert syndrome 1, 6, 7, 9/15 (digenic), 14, 16, and 17, and Orofaciodigital syndrome xiv; Junctional epidermolysis bullosa gravis of Herlitz; Juvenile GM>1<gangliosidosis; Juvenile polyposis syndrome; Juvenile polyposis/hereditary hemorrhagic telangiectasia syndrome; Juvenile retinoschisis; Kabuki make-up syndrome; Kallmann syndrome 1, 2, and 6; Delayed puberty; Kanzaki disease; Karak syndrome; Kartagener syndrome; Kenny-Caffey syndrome type 2; Keppen-Lubinsky syndrome; Keratoconus 1; Keratosis follicularis; Keratosis palmoplantaris striata 1; Kindler syndrome; L-2-hydroxyglutaric aciduria; Larsen syndrome, dominant type; Lattice corneal dystrophy Type III; Leber amaurosis; Zellweger syndrome; Peroxisome biogenesis disorders; Zellweger syndrome spectrum; Leber congenital amaurosis 11, 12, 13, 16, 4, 7, and 9; Leber optic atrophy; Aminoglycoside-induced deafness; Deafness, nonsyndromic sensorineural, mitochondrial; Left ventricular noncompaction 5; Left-right axis malformations; Leigh disease; Mitochondrial short-chain Enoyl-CoA Hydratase 1 deficiency; Leigh syndrome due to mitochondrial complex I deficiency; Leiner disease; Leri Weill dyschondrosteosis; Lethal congenital contracture syndrome 6; Leukocyte adhesion deficiency type I and III; Leukodystrophy, Hypomyelinating, 11 and 6; Leukoencephalopathy with ataxia, with Brainstem and Spinal Cord Involvement and Lactate Elevation, with vanishing white matter, and progressive, with ovarian failure; Leukonychia totalis; Lewy body dementia; Lichtenstein-Knorr Syndrome; Li-Fraumeni syndrome 1; Lig4 syndrome; Limb-girdle muscular dystrophy, type 1B, 2A, 2B, 2D, C1, C5, C9, C14; Congenital muscular dystrophy-dystroglycanopathy with brain and eye anomalies, type A14 and B14; Lipase deficiency combined; Lipid proteinosis; Lipodystrophy, familial partial, type 2 and 3; Lissencephaly 1, 2 (X-linked), 3, 6 (with microcephaly), X-linked; Subcortical laminar heterotopia, X-linked; Liver failure acute infantile; Loeys-Dietz syndrome 1, 2, 3; Long QT syndrome 1, 2, 2/9, 2/5, (digenic), 3, 5 and 5, acquired, susceptibility to; Lung cancer; Lymphedema, hereditary, id; Lymphedema, primary, with myelodysplasia; Lymphoproliferative syndrome 1, 1 (X-linked), and 2; Lysosomal acid lipase deficiency; Macrocephaly, macrosomia, facial dysmorphism syndrome; Macular dystrophy, vitelliform, adult-onset; Malignant hyperthermia susceptibility type 1; Malignant lymphoma, non-Hodgkin; Malignant melanoma; Malignant tumor of prostate; Mandibuloacral dysostosis; Mandibuloacral dysplasia with type A or B lipodystrophy, atypical; Mandibulofacial dysostosis, Treacher Collins type, autosomal recessive; Mannose-binding protein deficiency; Maple syrup urine disease type lA and type 3; Marden Walker like syndrome; Marfan syndrome; Marinesco-Sj\xc3\xb6gren syndrome; Martsolf syndrome; Maturity-onset diabetes of the young, type 1, type 2, type 11, type 3, and type 9; May-Hegglin anomaly; MYH9 related disorders; Sebastian syndrome; McCune-Albright syndrome; Somatotroph adenoma; Sex cord-stromal tumor; Cushing syndrome; McKusick Kaufman syndrome; McLeod neuroacanthocytosis syndrome; Meckel-Gruber syndrome; Medium-chain acyl-coenzyme A dehydrogenase deficiency; Medulloblastoma; Megalencephalic leukoencephalopathy with subcortical cysts land 2a; Megalencephaly cutis marmorata telangiectatica congenital; PIK3CA Related Overgrowth Spectrum; Megalencephaly-polymicrogyria-polydactyly-hydrocephalus syndrome 2; Megaloblastic anemia, thiamine-responsive, with diabetes mellitus and sensorineural deafness; Meier-Gorlin syndromes land 4; Melnick-Needles syndrome; Meningioma; Mental retardation, X-linked, 3, 21, 30, and 72; Mental retardation and microcephaly with pontine and cerebellar hypoplasia; Mental retardation X-linked syndromic 5; Mental retardation, anterior maxillary protrusion, and strabismus; Mental retardation, autosomal dominant 12, 13, 15, 24, 3, 30, 4, 5, 6, and 9; Mental retardation, autosomal recessive 15, 44, 46, and 5; Mental retardation, stereotypic movements, epilepsy, and/or cerebral malformations; Mental retardation, syndromic, Claes-Jensen type, X-linked; Mental retardation, X-linked, nonspecific, syndromic, Hedera type, and syndromic, wu type; Merosin deficient congenital muscular dystrophy; Metachromatic leukodystrophy juvenile, late infantile, and adult types; Metachromatic leukodystrophy; Metatrophic dysplasia; Methemoglobinemia types I and 2; Methionine adenosyltransferase deficiency, autosomal dominant; Methylmalonic acidemia with homocystinuria; Methylmalonic aciduria cb1B type; Methylmalonic aciduria due to methylmalonyl-CoA mutase deficiency; METHYLMALONIC ACIDURIA, mut(0) TYPE; Microcephalic osteodysplastic primordial dwarfism type 2; Microcephaly with or without chorioretinopathy, lymphedema, or mental retardation; Microcephaly, hiatal hernia and nephrotic syndrome; Microcephaly; Hypoplasia of the corpus callosum; Spastic paraplegia 50, autosomal recessive; Global developmental delay; CNS hypomyelination; Brain atrophy; Microcephaly, normal intelligence and immunodeficiency; Microcephaly-capillary malformation syndrome; Microcytic anemia; Microphthalmia syndromic 5, 7, and 9; Microphthalmia, isolated 3, 5, 6, 8, and with coloboma 6; Microspherophakia; Migraine, familial basilar; Miller syndrome; Minicore myopathy with external ophthalmoplegia; Myopathy, congenital with cores; Mitchell-Riley syndrome; mitochondrial 3-hydroxy-3-methylglutaryl-CoA synthase deficiency; Mitochondrial complex I, II, III, III (nuclear type 2, 4, or 8) deficiency; Mitochondrial DNA depletion syndrome 11, 12 (cardiomyopathic type), 2, 4B (MNGIE type), 8B (MNGIE type); Mitochondrial DNA-depletion syndrome 3 and 7, hepatocerebral types, and 13 (encephalomyopathic type); Mitochondrial phosphate carrier and pyruvate carrier deficiency; Mitochondrial trifunctional protein deficiency; Long-chain 3-hydroxyacyl-CoA dehydrogenase deficiency; Miyoshi muscular dystrophy 1; Myopathy, distal, with anterior tibial onset; Mohr-Tranebjaerg syndrome; Molybdenum cofactor deficiency, complementation group A; Mowat-Wilson syndrome; Mucolipidosis III Gamma; Mucopolysaccharidosis type VI, type VI (severe), and type VII; Mucopolysaccharidosis, MPS-I-H/S, MPS-II, MPS-III-A, MPS-III-B, MPS-III-C, MPS-IV-A, MPS-IV-B; Retinitis Pigmentosa 73; Gangliosidosis GM1 type1 (with cardiac involvement) 3; Multicentric osteolysis nephropathy; Multicentric osteolysis, nodulosis and arthropathy; Multiple congenital anomalies; Atrial septal defect 2; Multiple congenital anomalies-hypotonia-seizures syndrome 3; Multiple Cutaneous and Mucosal Venous Malformations; Multiple endocrine neoplasia, types land 4; Multiple epiphyseal dysplasia 5 or Dominant; Multiple gastrointestinal atresias; Multiple pterygium syndrome Escobar type; Multiple sulfatase deficiency; Multiple synostoses syndrome 3; Muscle AMP guanine oxidase deficiency; Muscle eye brain disease; Muscular dystrophy, congenital, megaconial type; Myasthenia, familial infantile, 1; Myasthenic Syndrome, Congenital, 11, associated with acetylcholine receptor deficiency; Myasthenic Syndrome, Congenital, 17, 2A (slow-channel), 4B (fast-channel), and without tubular aggregates; Myeloperoxidase deficiency; MYH-associated polyposis; Endometrial carcinoma; Myocardial infarction 1; Myoclonic dystonia; Myoclonic-Atonic Epilepsy; Myoclonus with epilepsy with ragged red fibers; Myofibrillar myopathy 1 and ZASP-related; Myoglobinuria, acute recurrent, autosomal recessive; Myoneural gastrointestinal encephalopathy syndrome; Cerebellar ataxia infantile with progressive external ophthalmoplegia; Mitochondrial DNA depletion syndrome 4B, MNGIE type; Myopathy, centronuclear, 1, congenital, with excess of muscle spindles, distal, 1, lactic acidosis, and sideroblastic anemia 1, mitochondrial progressive with congenital cataract, hearing loss, and developmental delay, and tubular aggregate, 2; Myopia 6; Myosclerosis, autosomal recessive; Myotonia congenital; Congenital myotonia, autosomal dominant and recessive forms; Nail-patella syndrome; Nance-Horan syndrome; Nanophthalmos 2; Navajo neurohepatopathy; Nemaline myopathy 3 and 9; Neonatal hypotonia; Intellectual disability; Seizures; Delayed speech and language development; Mental retardation, autosomal dominant 31; Neonatal intrahepatic cholestasis caused by citrin deficiency; Nephrogenic diabetes insipidus, Nephrogenic diabetes insipidus, X-linked; Nephrolithiasis/osteoporosis, hypophosphatemic, 2; Nephronophthisis 13, 15 and 4; Infertility; Cerebello-oculo-renal syndrome (nephronophthisis, oculomotor apraxia and cerebellar abnormalities); Nephrotic syndrome, type 3, type 5, with or without ocular abnormalities, type 7, and type 9; Nestor-Guillermo progeria syndrome; Neu-Laxova syndrome 1; Neurodegeneration with brain iron accumulation 4 and 6; Neuroferritinopathy; Neurofibromatosis, type land type 2; Neurofibrosarcoma; Neurohypophyseal diabetes insipidus; Neuropathy, Hereditary Sensory, Type IC; Neutral 1 amino acid transport defect; Neutral lipid storage disease with myopathy; Neutrophil immunodeficiency syndrome; Nicolaides-Baraitser syndrome; Niemann-Pick disease type C1, C2, type A, and type C1, adult form; Non-ketotic hyperglycinemia; Noonan syndrome 1 and 4, LEOPARD syndrome 1; Noonan syndrome-like disorder with or without juvenile myelomonocytic leukemia; Normokalemic periodic paralysis, potassium-sensitive; Norum disease; Epilepsy, Hearing Loss, And Mental Retardation Syndrome; Mental Retardation, X-Linked 102 and syndromic 13; Obesity; Ocular albinism, type I; Oculocutaneous albinism type 1B, type 3, and type 4; Oculodentodigital dysplasia; Odontohypophosphatasia; Odontotrichomelic syndrome; Oguchi disease; Oligodontia-colorectal cancer syndrome; Opitz G/BBB syndrome; Optic atrophy 9; Oral-facial-digital syndrome; Ornithine aminotransferase deficiency; Orofacial cleft 11 and 7, Cleft lip/palate-ectodermal dysplasia syndrome; Orstavik Lindemann Solberg syndrome; Osteoarthritis with mild chondrodysplasia; Osteochondritis dissecans; Osteogenesis imperfecta type 12, type 5, type 7, type 8, type I, type III, with normal sclerae, dominant form, recessive perinatal lethal; Osteopathia striata with cranial sclerosis; Osteopetrosis autosomal dominant type 1 and 2, recessive 4, recessive 1, recessive 6; Osteoporosis with pseudoglioma; Oto-palato-digital syndrome, types I and II; Ovarian dysgenesis 1; Ovarioleukodystrophy; Pachyonychia congenita 4 and type 2; Paget disease of bone, familial; Pallister-Hall syndrome; Palmoplantar keratoderma, nonepidermolytic, focal or diffuse; Pancreatic agenesis and congenital heart disease; Papillon-Lef\xc3\xa8vre syndrome; Paragangliomas 3; Paramyotonia congenita of von Eulenburg; Parathyroid carcinoma; Parkinson disease 14, 15, 19 (juvenile-onset), 2, 20 (early-onset), 6, (autosomal recessive early-onset, and 9; Partial albinism; Partial hypoxanthine-guanine phosphoribosyltransferase deficiency; Patterned dystrophy of retinal pigment epithelium; PC-K6a; Pelizaeus-Merzbacher disease; Pendred syndrome; Peripheral demyelinating neuropathy, central dysmyelination; Hirschsprung disease; Permanent neonatal diabetes mellitus; Diabetes mellitus, permanent neonatal, with neurologic features; Neonatal insulin-dependent diabetes mellitus; Maturity-onset diabetes of the young, type 2; Peroxisome biogenesis disorder 14B, 2A, 4A, 5B, 6A, 7A, and 7B; Perrault syndrome 4; Perry syndrome; Persistent hyperinsulinemic hypoglycemia of infancy; familial hyperinsulinism; Phenotypes; Phenylketonuria; Pheochromocytoma; Hereditary Paraganglioma-Pheochromocytoma Syndromes; Paragangliomas 1; Carcinoid tumor of intestine; Cowden syndrome 3; Phosphoglycerate dehydrogenase deficiency; Phosphoglycerate kinase 1 deficiency; Photosensitive trichothiodystrophy; Phytanic acid storage disease; Pick disease; Pierson syndrome; Pigmentary retinal dystrophy; Pigmented nodular adrenocortical disease, primary, 1; Pilomatrixoma; Pitt-Hopkins syndrome; Pituitary dependent hypercortisolism; Pituitary hormone deficiency, combined 1, 2, 3, and 4; Plasminogen activator inhibitor type 1 deficiency; Plasminogen deficiency, type I; Platelet-type bleeding disorder 15 and 8; Poikiloderma, hereditary fibrosing, with tendon contractures, myopathy, and pulmonary fibrosis; Polycystic kidney disease 2, adult type, and infantile type; Polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy; Polyglucosan body myopathy 1 with or without immunodeficiency; Polymicrogyria, asymmetric, bilateral frontoparietal; Polyneuropathy, hearing loss, ataxia, retinitis pigmentosa, and cataract; Pontocerebellar hypoplasia type 4; Popliteal pterygium syndrome; Porencephaly 2; Porokeratosis 8, disseminated superficial actinic type; Porphobilinogen synthase deficiency; Porphyria cutanea tarda; Posterior column ataxia with retinitis pigmentosa; Posterior polar cataract type 2; Prader-Willi-like syndrome; Premature ovarian failure 4, 5, 7, and 9; Primary autosomal recessive microcephaly 10, 2, 3, and 5; Primary ciliary dyskinesia 24; Primary dilated cardiomyopathy; Left ventricular noncompaction 6; 4, Left ventricular noncompaction 10; Paroxysmal atrial fibrillation; Primary hyperoxaluria, type I, type, and type III; Primary hypertrophic osteoarthropathy, autosomal recessive 2; Primary hypomagnesemia; Primary open angle glaucoma juvenile onset 1; Primary pulmonary hypertension; Primrose syndrome; Progressive familial heart block type 1B; Progressive familial intrahepatic cholestasis 2 and 3; Progressive intrahepatic cholestasis; Progressive myoclonus epilepsy with ataxia; Progressive pseudorheumatoid dysplasia; Progressive sclerosing poliodystrophy; Prolidase deficiency; Proline dehydrogenase deficiency; Schizophrenia 4; Properdin deficiency, X-linked; Propionic academia; Proprotein convertase 1/3 deficiency; Prostate cancer, hereditary, 2; Protan defect; Proteinuria; Finnish congenital nephrotic syndrome; Proteus syndrome; Breast adenocarcinoma; Pseudoachondroplastic spondyloepiphyseal dysplasia syndrome; Pseudohypoaldosteronism type 1 autosomal dominant and recessive and type 2; Pseudohypoparathyroidism type 1A, Pseudopseudohypoparathyroidism; Pseudoneonatal adrenoleukodystrophy; Pseudoprimary hyperaldosteronism; Pseudoxanthoma elasticum; Generalized arterial calcification of infancy 2; Pseudoxanthoma elasticum-like disorder with multiple coagulation factor deficiency; Psoriasis susceptibility 2; PTEN hamartoma tumor syndrome; Pulmonary arterial hypertension related to hereditary hemorrhagic telangiectasia; Pulmonary Fibrosis And/Or Bone Marrow Failure, Telomere-Related, 1 and 3; Pulmonary hypertension, primary, 1, with hereditary hemorrhagic telangiectasia; Purine-nucleoside phosphorylase deficiency; Pyruvate carboxylase deficiency; Pyruvate dehydrogenase E1-alpha deficiency; Pyruvate kinase deficiency of red cells; Raine syndrome; Rasopathy; Recessive dystrophic epidermolysis bullosa; Nail disorder, nonsyndromic congenital, 8; Reifenstein syndrome; Renal adysplasia; Renal carnitine transport defect; Renal coloboma syndrome; Renal dysplasia; Renal dysplasia, retinal pigmentary dystrophy, cerebellar ataxia and skeletal dysplasia; Renal tubular acidosis, distal, autosomal recessive, with late-onset sensorineural hearing loss, or with hemolytic anemia; Renal tubular acidosis, proximal, with ocular abnormalities and mental retardation; Retinal cone dystrophy 3B; Retinitis pigmentosa; Retinitis pigmentosa 10, 11, 12, 14, 15, 17, and 19; Retinitis pigmentosa 2, 20, 25, 35, 36, 38, 39, 4, 40, 43, 45, 48, 66, 7, 70, 72; Retinoblastoma; Rett disorder; Rhabdoid tumor predisposition syndrome 2; Rhegmatogenous retinal detachment, autosomal dominant; Rhizomelic chondrodysplasia punctata type 2 and type 3; Roberts-SC phocomelia syndrome; Robinow Sorauf syndrome; Robinow syndrome, autosomal recessive, autosomal recessive, with brachy-syn-polydactyly; Rothmund-Thomson syndrome; Rapadilino syndrome; RRM2B-related mitochondrial disease; Rubinstein-Taybi syndrome; Salla disease; Sandhoff disease, adult and infantil types; Sarcoidosis, early-onset; Blau syndrome; Schindler disease, type 1; Schizencephaly; Schizophrenia 15; Schneckenbecken dysplasia; Schwannomatosis 2; Schwartz Jampel syndrome type 1; Sclerocornea, autosomal recessive; Sclerosteosis; Secondary hypothyroidism; Segawa syndrome, autosomal recessive; Senior-Loken syndrome 4 and 5; Sensory ataxic neuropathy, dysarthria, and ophthalmoparesis; Sepiapterin reductase deficiency; SeSAME syndrome; Severe combined immunodeficiency due to ADA deficiency, with microcephaly, growth retardation, and sensitivity to ionizing radiation, atypical, autosomal recessive, T cell-negative, B cell-positive, NK cell-negative of NK-positive; Severe congenital neutropenia; Severe congenital neutropenia 3, autosomal recessive or dominant; Severe congenital neutropenia and 6, autosomal recessive; Severe myoclonic epilepsy in infancy; Generalized epilepsy with febrile seizures plus, types 1 and 2; Severe X-linked myotubular myopathy; Short QT syndrome 3; Short stature with nonspecific skeletal abnormalities; Short stature, auditory canal atresia, mandibular hypoplasia, skeletal abnormalities; Short stature, onychodysplasia, facial dysmorphism, and hypotrichosis; Primordial dwarfism; Short-rib thoracic dysplasia 11 or 3 with or without polydactyly; Sialidosis type I and II; Silver spastic paraplegia syndrome; Slowed nerve conduction velocity, autosomal dominant; Smith-Lemli-Opitz syndrome; Snyder Robinson syndrome; Somatotroph adenoma; Prolactinoma; familial, Pituitary adenoma predisposition; Sotos syndrome 1 or 2; Spastic ataxia 5, autosomal recessive, Charlevoix-Saguenay type, 1, 10, or 11, autosomal recessive; Amyotrophic lateral sclerosis type 5; Spastic paraplegia 15, 2, 3, 35, 39, 4, autosomal dominant, 55, autosomal recessive, and 5A; Bile acid synthesis defect, congenital, 3; Spermatogenic failure 11, 3, and 8; Spherocytosis types 4 and 5; Spheroid body myopathy; Spinal muscular atrophy, lower extremity predominant 2, autosomal dominant; Spinal muscular atrophy, type II; Spinocerebellar ataxia 14, 21, 35, 40, and 6; Spinocerebellar ataxia autosomal recessive 1 and 16; Splenic hypoplasia; Spondylocarpotarsal synostosis syndrome; Spondylocheirodysplasia, Ehlers-Danlos syndrome-like, with immune dysregulation, Aggrecan type, with congenital joint dislocations, short limb-hand type, Sedaghatian type, with cone-rod dystrophy, and Kozlowski type; Parastremmatic dwarfism; Stargardt disease 1; Cone-rod dystrophy 3; Stickler syndrome type 1; Kniest dysplasia; Stickler syndrome, types 1(nonsyndromic ocular) and 4; Sting-associated vasculopathy, infantile-onset; Stormorken syndrome; Sturge-Weber syndrome, Capillary malformations, congenital, 1; Succinyl-CoA acetoacetate transferase deficiency; Sucrase-isomaltase deficiency; Sudden infant death syndrome; Sulfite oxidase deficiency, isolated; Supravalvar aortic stenosis; Surfactant metabolism dysfunction, pulmonary, 2 and 3; Symphalangism, proximal, 1b; Syndactyly Cenani Lenz type; Syndactyly type 3; Syndromic X-linked mental retardation 16; Talipes equinovarus; Tangier disease; TARP syndrome; Tay-Sachs disease, B1 variant, Gm2-gangliosidosis (adult), Gm2-gangliosidosis (adult-onset); Temtamy syndrome; Tenorio Syndrome; Terminal osseous dysplasia; Testosterone 17-beta-dehydrogenase deficiency; Tetraamelia, autosomal recessive; Tetralogy of Fallot; Hypoplastic left heart syndrome 2; Truncus arteriosus; Malformation of the heart and great vessels; Ventricular septal defect 1; Thiel-Behnke corneal dystrophy; Thoracic aortic aneurysms and aortic dissections; Marfanoid habitus; Three M syndrome 2; Thrombocytopenia, platelet dysfunction, hemolysis, and imbalanced globin synthesis; Thrombocytopenia, X-linked; Thrombophilia, hereditary, due to protein C deficiency, autosomal dominant and recessive; Thyroid agenesis; Thyroid cancer, follicular; Thyroid hormone metabolism, abnormal; Thyroid hormone resistance, generalized, autosomal dominant; Thyrotoxic periodic paralysis and Thyrotoxic periodic paralysis 2; Thyrotropin-releasing hormone resistance, generalized; Timothy syndrome; TNF receptor-associated periodic fever syndrome (TRAPS); Tooth agenesis, selective, 3 and 4; Torsades de pointes; Townes-Brocks-branchiootorenal-like syndrome; Transient bullous dermolysis of the newborn; Treacher collins syndrome 1; Trichomegaly with mental retardation, dwarfism and pigmentary degeneration of retina; Trichorhinophalangeal dysplasia type I; Trichorhinophalangeal syndrome type 3; Trimethylaminuria; Tuberous sclerosis syndrome; Lymphangiomyomatosis; Tuberous sclerosis 1 and 2; Tyrosinase-negative oculocutaneous albinism; Tyrosinase-positive oculocutaneous albinism; Tyrosinemia type I; UDPglucose-4-epimerase deficiency; Ullrich congenital muscular dystrophy; Ulna and fibula absence of with severe limb deficiency; Upshaw-Schulman syndrome; Urocanate hydratase deficiency; Usher syndrome, types 1, 1B, 1D, 1G, 2A, 2C, and 2D; Retinitis pigmentosa 39; UV-sensitive syndrome; Van der Woude syndrome; Van Maldergem syndrome 2; Hennekam lymphangiectasia-lymphedema syndrome 2; Variegate porphyria; Ventriculomegaly with cystic kidney disease; Verheij syndrome; Very long chain acyl-CoA dehydrogenase deficiency; Vesicoureteral reflux 8; Visceral heterotaxy 5, autosomal; Visceral myopathy; Vitamin D-dependent rickets, types land 2; Vitelliform dystrophy; von Willebrand disease type 2M and type 3; Waardenburg syndrome type 1, 4C, and 2E (with neurologic involvement); Klein-Waardenberg syndrome; Walker-Warburg congenital muscular dystrophy; Warburg micro syndrome 2 and 4; Warts, hypogammaglobulinemia, infections, and myelokathexis; Weaver syndrome; Weill-Marchesani syndrome 1 and 3; Weill-Marchesani-like syndrome; Weissenbacher-Zweymuller syndrome; Werdnig-Hoffmann disease; Charcot-Marie-Tooth disease; Werner syndrome; WFS1-Related Disorders; Wiedemann-Steiner syndrome; Wilson disease; Wolfram-like syndrome, autosomal dominant; Worth disease; Van Buchem disease type 2; Xeroderma pigmentosum, complementation group b, group D, group E, and group G; X-linked agammaglobulinemia; X-linked hereditary motor and sensory neuropathy; X-linked ichthyosis with steryl-sulfatase deficiency; X-linked periventricular heterotopia; Oto-palato-digital syndrome, type I; X-linked severe combined immunodeficiency; Zimmermann-Laband syndrome and Zimmermann-Laband syndrome 2; and Zonular pulverulent cataract 3.

The instant disclosure provides lists of genes comprising pathogenic G to T or C to A mutations. Such pathogenic G to T or C to A mutations may be corrected using the methods and compositions provided herein, for example by mutating the A to a G, and/or the T to a C, thereby restoring gene function. Table 2 includes exemplary mutations that can be corrected using genome editors described herein. Table 2 includes the gene symbol, the associated phenotype, the mutation to be corrected and exemplary gRNA sequences which may be used to correct the mutations. The gRNA sequences provided in Table 2 are sequences that encode RNA that can direct Cas9, or any of the genome editors provided herin, to a target site. For example, the gRNA sequences provided in Table 2 may be cloned into a gRNA expression vector, such as pFYF to encode a gRNA that targets Cas9, or any of the genome editors provided herein, to a target site in order to correct a disease-related mutation. It should be appreciated, however, that additional mutations may be corrected to treat additional diseases associated with a G to T or C to A mutation. Furthermore, additional gRNAs may be designed based on the disclosure and the knowledge in the art, which would be appreciated by the skilled artisan.

X. Pharmaceutical Compositions

Other aspects of the present disclosure relate to pharmaceutical compositions comprising any of the fusion proteins or complexes described herein. The term “pharmaceutical composition”, as used herein, refers to a composition formulated for pharmaceutical use. In some embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable carrier. In some embodiments, the pharmaceutical composition comprises additional agents (e.g. for specific delivery, increasing half-life, or other therapeutic compounds).

As used here, the term “pharmaceutically-acceptable carrier” means a pharmaceutically-acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., organ, tissue or portion of the body). A pharmaceutically acceptable carrier is “acceptable” in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiologic pH, etc.). Some examples of materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids (23) serum component, such as serum albumin, HDL and LDL; (22) C2-C12 alcohols, such as ethanol; and (23) other non-toxic compatible substances employed in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservative and antioxidants can also be present in the formulation. The terms such as “excipient”, “carrier”, “pharmaceutically acceptable carrier” or the like are used interchangeably herein.

In some embodiments, the pharmaceutical composition is formulated for delivery to a subject, e.g., for gene editing. Suitable routes of administrating the pharmaceutical composition described herein include, without limitation: topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, gingival, intradental, intracochlear, transtympanic, intraorgan, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseus, periocular, intratumoral, intracerebral, and intracerebroventricular administration.

In some embodiments, the pharmaceutical composition described herein is administered locally to a diseased site (e.g., tumor site). In some embodiments, the pharmaceutical composition described herein is administered to a subject by injection, by means of a catheter, by means of a suppository, or by means of an implant, the implant being of a porous, non-porous, or gelatinous material, including a membrane, such as a sialastic membrane, or a fiber.

In other embodiments, the pharmaceutical composition described herein is delivered in a controlled release system. In one embodiment, a pump may be used (see, e.g., Langer, 1990, Science 249:1527-1533; Sefton, 1989, CRC Crit. Ref. Biomed. Eng. 14:201; Buchwald et al., 1980, Surgery 88:507; Saudek et al., 1989, N. Engl. J. Med. 321:574). In another embodiment, polymeric materials can be used. (See, e.g., Medical Applications of Controlled Release (Langer and Wise eds., CRC Press, Boca Raton, Fla., 1974); Controlled Drug Bioavailability, Drug Product Design and Performance (Smolen and Ball eds., Wiley, New York, 1984); Ranger and Peppas, 1983, Macromol. Sci. Rev. Macromol. Chem. 23:61. See also Levy et al., 1985, Science 228:190; During et al., 1989, Ann. Neurol. 25:351; Howard et al., 1989, J. Neurosurg. 71:105.) Other controlled release systems are discussed, for example, in Langer, supra.

In some embodiments, the pharmaceutical composition is formulated in accordance with routine procedures as a composition adapted for intravenous or subcutaneous administration to a subject, e.g., a human. In some embodiments, pharmaceutical composition for administration by injection are solutions in sterile isotonic aqueous buffer. Where necessary, the pharmaceutical can also include a solubilizing agent and a local anesthetic such as lignocaine to ease pain at the site of the injection. Generally, the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent. Where the pharmaceutical is to be administered by infusion, it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline. Where the pharmaceutical composition is administered by injection, an ampoule of sterile water for injection or saline can be provided so that the ingredients can be mixed prior to administration.

A pharmaceutical composition for systemic administration may be a liquid, e.g., sterile saline, lactated Ringer's or Hank's solution. In addition, the pharmaceutical composition can be in solid forms and re-dissolved or suspended immediately prior to use. Lyophilized forms are also contemplated.

The pharmaceutical composition can be contained within a lipid particle or vesicle, such as a liposome or microcrystal, which is also suitable for parenteral administration. The particles can be of any suitable structure, such as unilamellar or plurilamellar, so long as compositions are contained therein. Compounds can be entrapped in “stabilized plasmid-lipid particles” (SPLP) containing the fusogenic lipid dioleoylphosphatidylethanolamine (DOPE), low levels (5-10 mol %) of cationic lipid, and stabilized by a polyethyleneglycol (PEG) coating (Zhang Y. P. et al., Gene Ther. 1999, 6:1438-47). Positively charged lipids such as N-[1-(2,3-dioleoyloxi)propyl]-N,N,N-trimethyl-amoniummethylsulfate, or “DOTAP,” are particularly preferred for such particles and vesicles. The preparation of such lipid particles is well known. See, e.g., U.S. Pat. Nos. 4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; and 4,921,757; each of which is incorporated herein by reference.

The pharmaceutical composition described herein may be administered or packaged as a unit dose, for example. The term “unit dose” when used in reference to a pharmaceutical composition of the present disclosure refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent; i.e., carrier, or vehicle.

Further, the pharmaceutical composition can be provided as a pharmaceutical kit comprising (a) a container containing a compound of the invention in lyophilized form and (b) a second container containing a pharmaceutically acceptable diluent (e.g., sterile water) for injection. The pharmaceutically acceptable diluent can be used for reconstitution or dilution of the lyophilized compound of the invention. Optionally associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration.

In another aspect, an article of manufacture containing materials useful for the treatment of the diseases described above is included. In some embodiments, the article of manufacture comprises a container and a label. Suitable containers include, for example, bottles, vials, syringes, and test tubes. The containers may be formed from a variety of materials such as glass or plastic. In some embodiments, the container holds a composition that is effective for treating a disease described herein and may have a sterile access port. For example, the container may be an intravenous solution bag or a vial having a stopper pierceable by a hypodermic injection needle. The active agent in the composition is a compound of the invention. In some embodiments, the label on or associated with the container indicates that the composition is used for treating the disease of choice. The article of manufacture may further comprise a second container comprising a pharmaceutically-acceptable buffer, such as phosphate-buffered saline, Ringer's solution, or dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.

XI. Delivery Methods

In some aspects, the invention provides methods comprising delivering one or more polynucleotides, such as or one or more vectors as described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell. In some aspects, the invention further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells. In some embodiments, a genome editor as described herein in combination with (and optionally complexed with) a guide sequence is delivered to a cell. Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in mammalian cells or target tissues. Such methods can be used to administer nucleic acids encoding components of a genome editor to cells in culture, or in a host organism. Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell. For a review of gene therapy procedures, see Anderson, Science 256:808-813 (1992); Nabel & Felgner, TIBTECH 11:211-217 (1993); Mitani & Caskey, TIBTECH 11:162-166 (1993); Dillon, TIBTECH 11:167-175 (1993); Miller, Nature 357:455-460 (1992); Van Brunt, Biotechnology 6(10):1149-1154 (1988); Vigne, Restorative Neurology and Neuroscience 8:35-36 (1995); Kremer & Perricaudet, British Medical Bulletin 51(1):31-44 (1995); Haddada et al., in Current Topics in Microbiology and Immunology Doerfler and Bihm (eds) (1995); and Yu et al., Gene Therapy 1:13-26 (1994).

Methods of non-viral delivery of nucleic acids include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™ and Lipofectin™) Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Feigner, WO 91/17424; WO 91/16024. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration).

The preparation of lipid:nucleic acid complexes, including targeted liposomes such as immunolipid complexes, is well known to one of skill in the art (see, e.g., Crystal, Science 270:404-410 (1995); Blaese et al., Cancer Gene Ther. 2:291-297 (1995); Behr et al., Bioconjugate Chem. 5:382-389 (1994); Remy et al., Bioconjugate Chem. 5:647-654 (1994); Gao et al., Gene Therapy 2:710-722 (1995); Ahmad et al., Cancer Res. 52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787).

The use of RNA or DNA viral based systems for the delivery of nucleic acids take advantage of highly evolved processes for targeting a virus to specific cells in the body and trafficking the viral payload to the nucleus. Viral vectors can be administered directly to patients (in vivo) or they can be used to treat cells in vitro, and the modified cells may optionally be administered to patients (ex vivo). Conventional viral based systems could include retroviral, lentivirus, adenoviral, adeno-associated and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.

The tropism of a viruses can be altered by incorporating foreign envelope proteins, expanding the potential target population of target cells. Lentiviral vectors are retroviral vectors that are able to transduce or infect non-dividing cells and typically produce high viral titers. Selection of a retroviral gene transfer system would therefore depend on the target tissue. Retroviral vectors are comprised of cis-acting long terminal repeats with packaging capacity for up to 6-10 kb of foreign sequence. The minimum cis-acting LTRs are sufficient for replication and packaging of the vectors, which are then used to integrate the therapeutic gene into the target cell to provide permanent transgene expression. Widely used retroviral vectors include those based upon murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), Simian Immuno deficiency virus (SIV), human immuno deficiency virus (HIV), and combinations thereof (see, e.g., Buchscher et al., J. Virol. 66:2731-2739 (1992); Johann et al., J. Virol. 66:1635-1640 (1992); Sommnerfelt et al., Virol. 176:58-59 (1990); Wilson et al., J. Virol. 63:2374-2378 (1989); Miller et al., J. Virol. 65:2220-2224 (1991); PCT/US94/05700). In applications where transient expression is preferred, adenoviral based systems may be used. Adenoviral based vectors are capable of very high transduction efficiency in many cell types and do not require cell division. With such vectors, high titer and levels of expression have been obtained. This vector can be produced in large quantities in a relatively simple system. Adeno-associated virus (“AAV”) vectors may also be used to transduce cells with target nucleic acids, e.g., in the in vitro production of nucleic acids and peptides, and for in vivo and ex vivo gene therapy procedures (see, e.g., West et al., Virology 160:38-47 (1987); U.S. Pat. No. 4,797,368; WO 93/24641; Kotin, Human Gene Therapy 5:793-801 (1994); Muzyczka, J. Clin. Invest. 94:1351 (1994). Construction of recombinant AAV vectors are described in a number of publications, including U.S. Pat. No. 5,173,414; Tratschin et al., Mol. Cell. Biol. 5:3251-3260 (1985); Tratschin, et al., Mol. Cell. Biol. 4:2072-2081 (1984); Hermonat & Muzyczka, PNAS 81:6466-6470 (1984); and Samulski et al., J. Virol. 63:03822-3828 (1989).

Packaging cells are typically used to form virus particles that are capable of infecting a host cell. Such cells include 293 cells, which package adenovirus, and w2 cells or PA317 cells, which package retrovirus. Viral vectors used in gene therapy are usually generated by producing a cell line that packages a nucleic acid vector into a viral particle. The vectors typically contain the minimal viral sequences required for packaging and subsequent integration into a host, other viral sequences being replaced by an expression cassette for the polynucleotide(s) to be expressed. The missing viral functions are typically supplied in trans by the packaging cell line. For example, AAV vectors used in gene therapy typically only possess ITR sequences from the AAV genome which are required for packaging and integration into the host genome. Viral DNA is packaged in a cell line, which contains a helper plasmid encoding the other AAV genes, namely rep and cap, but lacking ITR sequences. The cell line may also be infected with adenovirus as a helper. The helper virus promotes replication of the AAV vector and expression of AAV genes from the helper plasmid. The helper plasmid is not packaged in significant amounts due to a lack of ITR sequences. Contamination with adenovirus can be reduced by, e.g., heat treatment to which adenovirus is more sensitive than AAV. Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US20030087817, incorporated herein by reference.

XII. Kits, Vectors, Cells

Some aspects of this disclosure provide kits comprising a nucleic acid construct comprising a nucleotide sequence encoding an improved HDR-dependent genome editor described herein. In some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of the improved HDR-dependent genome editors.

Some aspects of this disclosure provide kits comprising a nucleic acid construct comprising a nucleotide sequence encoding an improved HDR-dependent genome editor capable of modifying a target DNA sequence in the presence of a donor DNA sequence via homology-directed repair. In some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of the improved HDR-dependent genome editors.

Some aspects of this disclosure provides kits comprising a nucleic acid construct, comprising (a) a nucleotide sequence encoding a napDNAbp (e.g., a Cas9 domain) fused to an single-stranded DNA binding protein (e.g., Rad51), and (b) a heterologous promoter that drives expression of the sequence of (a). In some embodiments, the kit further comprises an expression construct encoding a guide nucleic acid backbone, (e.g., a guide RNA backbone), wherein the construct comprises a cloning site positioned to allow the cloning of a nucleic acid sequence identical or complementary to a target sequence into the guide nucleic acid (e.g., guide RNA backbone).

Some aspects of this disclosure provide cells comprising any of the constructs disclosed herein. In some embodiments, a host cell is transiently or non-transiently transfected with one or more vectors described herein. In some embodiments, a cell is transfected as it naturally occurs in a subject. In some embodiments, a cell that is transfected is taken from a subject. In some embodiments, the cell is derived from cells taken from a subject, such as a cell line. A wide variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, HeLa-S3, Huhl, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panc1, PC-3, TF1, CTLL-2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calu1, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7, NRK, NRK-52E, MRCS, MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelial, BALB/3T3 mouse embryo fibroblast, 3T3 Swiss, 3T3-L1, 132-d5 human fetal fibroblasts; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A2780ADR, A2780cis, A 172, A20, A253, A431, A-549, ALC, B16, B35, BCP-1 cells, BEAS-2B, bEnd.3, BHK-21, BR 293. BxPC3. C3H-10T1/2, C6/36, Cal-27, CHO, CHO-7, CHO—IR, CHO-K1, CHO-K2, CHO-T, CHO Dhfr−/−, COR-L23, COR-L23/CPR, COR-L23/5010, COR-L23/R23, COS-7, COV-434, CML T1, CMT, CT26, D17, DH82, DU145, DuCaP, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, HEK-293, HeLa, Hepalc1c7, HL-60, HMEC, HT-29, Jurkat, JY cells, K562 cells, Ku812, KCL22, KG1, KYO1, LNCap, Ma-Mel 1-48, MC-38, MCF-7, MCF-10A, MDA-MB-231, MDA-MB-468, MDA-MB-435, MDCK II, MDCK 11, MOR/0.2R, MONO-MAC 6, MTD-1A, MyEnd, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NALM-1, NW-145, OPCN/OPCT cell lines, Peer, PNT-1A/PNT 2, RenCa, RIN-5F, RMA/RMAS, Saos-2 cells, Sf-9, SkBr3, T2, T-47D, T84, THP1 cell line, U373, U87, U937, VCaP, Vero cells, WM39, WT-49, X63, YAC-1, YAR, and transgenic varieties thereof. Cell lines are available from a variety of sources known to those with skill in the art (see, e.g., the American Type Culture Collection (ATCC) (Manassas, Va.)). In some embodiments, a cell transfected with one or more vectors described herein is used to establish a new cell line comprising one or more vector-derived sequences. In some embodiments, a cell transiently transfected with the components of a CRISPR system as described herein (such as by transient transfection of one or more vectors, or transfection with RNA), and modified through the activity of a CRISPR complex, is used to establish a new cell line comprising cells containing the modification but lacking any other exogenous sequence. In some embodiments, cells transiently or non-transiently transfected with one or more vectors described herein, or cell lines derived from such cells are used in assessing one or more test compounds.

EXAMPLES
Example 1. Construction of an Improved HDR-Dependent Genome Editor Comprising a Nickase Cas9 Domain Fused to a Single-Stranded DNA Binding Protein (e.g., Rad51) Results in Increased Rate of HDR and Decrease Rate of Indel Formation

The most common method to make precise changes to the genomic DNA of mammalian cells is HDR. A nuclease (e.g. Cas9) makes a double stranded DNA break (DSB) at the target site. A donor DNA encoding the desired DNA change and with homology arms which overlap with the genomic target site is incorporated at the target site to make the target change to genomic DNA. However, major problems exist with HDR, including (a) generating a DSB leads to a great excess of random indels relative to the desired change, (b) generating a DSB leads to translocations, and large deletions, and can lead to off-target DNA modifications, and (c) the absolute rate of precise HDR is low in unperturbed cells.

The inventors have surprisingly discovered through experimentation an improved genome editing construct which is capable of editing a target sequence in an HDR-dependent manner (i.e., “HDR-dependent genome editors”) with increased efficiency and reduced indel formation and which does not require a dividing cell. In particular, the inventors surprisingly discovered a new fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) (e.g., Cas9) with a nickase activity fused to a single-stranded DNA binding protein (e.g., Rad51) which edits a target DNA, in the presence of a donor sequence, in an HDR-dependent manner with greater efficiency (e.g., increased rate of induced HDR) and with a lower rate or occurrence of indel formation. The following data supports these findings.

FIG. 1 provides a schematic of traditional Cas9-mediated nucleobase editing by way of the homology-directed repair pathway which is triggered by double-strand breaks. Step 1 shows the cleavage of desired strand by Cas9 RNA guided nuclease. Step 2 shows the addition of a desired insert DNA sequence flanked by regions homologous to each side of cut-site. Step 3 shows the action of the endogenous homology-directed repair (HDR) mechanism, which uses homologous regions to rejoin cleaved DNA to result in the creation of the intended modified DNA.

FIG. 2 provides a schematic of an embodiment of the improved nuclease editor construct for homology-directed repair of a target nucleobase. The schematic depicts a generalized process 100 of editing a double-stranded target DNA 101 having an X′·X target nucleobase pair (e.g., a G·C nucleobase pair). The target DNA 101 also is depicted with a PAM sequence on one strand that is approximately 12-17 base pairs from the target base pair X′:X. In this embodiment, the fusion protein comprises a nucleic acid programmable DNA binding protein (napDNAbp) with a nickase activity (e.g., a Cas9 nickase domain) 102 that is translationally fused to a single-stranded DNA binding protein (e.g., Rad51) 108. The fusion protein is complexed with an sgRNA 105 that comprises a region that is complementary to and binds a region of the target DNA 101 comprising the target base pair X′:X within the ssDNA bubble formed by the napDNAbp nickase 102. The napDNAbp nickase 102 cleaves a single strand of the target DNA sequence on one of the strands at 104. The nicked DNA induces the homology-directed repair (HDR) (107), and in the presence of a donor double stranded DNA 106 having a donor second nucleobase pair (Y′:Y) (e.g., an A:T nucleobase pair), the X′:X target nucleobase pair target base pair (e.g., a G:C nucleobase pair) is replaced by the donor nucleobase pair (Y′:Y) (e.g., a A:T nucleobase pair). Thus, in this example, a G:C nucleobase pair is replaced with an A:T nucleobase pair. As demonstrated in the Examples, the single-stranded DNA binding protein (e.g., Rad51) improves the rate of homology-directed repair as compared to the rate of homology-directed repair

FIGS. 4A-4B demonstrate that Cas9 nickases generate a favorable HDR:indel ratio when a donor ssODN (single-stranded oligodeoxynucleotide) is supplied. FIG. 4A shows the rate of homology-directed repair (Y-axis) triggered by various constructs at range of different target site locuses (X-axis). DSB-induced editing generates an excess of indels (i.e., the Cas9 construct). The nickases (D10A and H840A) also trigger HDR but at a much lower rate. The control, dCas9 does not trigger HDR. FIG. 4B shows the rate of indel formation in HDR-oligo-treated cells. The graph shows that the rate of indel formation remains high with the Cas9 construct, but relatively low to non-existent in the nickase constructs (D10A and H840A), similar to the dCas9 control. Thus, the absolute rate of HDR remains low with nickases, but the relative rate of HDR as compared to indel formation is higher with the nickases than when a double-stranded DNA break (Cas9) is used to stimulate HDR.

FIG. 5 demonstrates that fusion of hRad51 (human Rad51) to D10A nickase improves/increases the rate of HDR. To be especially useful, the absolute rate of HDR must be increased. N-terminal fusion of hRad51 to a nickase, or mutants thereof, increases the rate of nickase induced HDR. In general, the absolute rate of HDR with hRad51-D10A fusions generally exceeds the rate with a Cas9 DSB.

FIGS. 7A-7B demonstrate that hRad51, when fused to Cas9, does not have a significant effect on (FIG. 7A) the rate of HDR or (FIG. 7B) the rate of indel formation relative to Cas9 alone.

FIGS. 8A-8B demonstrate that hRad51, fused to the H840A nickase, has a neglible effect on (FIG. 8A) the rate of HDR and (FIG. 8B) the rate of indel formation.

FIG. 9 demonstrates that alternate single-strand DNA binding proteins (SSB) or proteins involved in HDR (e.g., ExoI or BCCIP) did not improve the rate of HDR.

CONCLUSIONS

These data indicate that the constructs that have been developed offer an improved procedure to perform genome editing through homology-directed repair (HDR). Thus far, one promising construct is a fusion between a mutant hRad51 (R310A) and a Cas9 nickase (D10A), noted as: hRad51[R310A]-Cas9[D10A nickase]. Other promising data is shown for the alternative hRad constructs—hRad51-Cas9[D10A nickase], hRad51[K133R]-Cas9[D10A nickase], hRad51[R235E]-Cas9[D10A nickase], hRad51[G151D]-Cas9[D10A nickase].

When delivered into human cells with a single stranded DNA donor template encoding for the desired genome edit, the new construct(s) offer three advantages over current double-stranded DNA break-inducing methods to perform HDR: (1) Improved product purity—a dramatic reduction in undesired indel formation, (2) Improved absolute rates of homology directed repair, and (3) The new constructs do not directly generate toxic double-stranded DNA breaks.

Example 2. Development of Genome Editing Constructs for Improved Homology-Directed Repair

The inventors have surprisingly discovered through experimentation an improved genome editing construct which is capable of editing a target sequence in an HDR-dependent manner (i.e., “HDR-mediated genome editors”) with increased efficiency and reduced indel formation. In particular, the inventors surprisingly discovered a new fusion protein comprising a nucleic acid programmable DNA binding protein (napDNAbp) (e.g., Cas9) with a nickase activity fused to a single-stranded DNA binding protein (e.g., Rad51) which edits a target DNA, in the presence of a donor sequence, in an HDR-mediated manner with greater efficiency (e.g., increased rate of induced HDR) and with a lower rate or occurrence of indel formation. The following data supports these findings.

FIG. 14 demonstrates curiosities of HEK site 2. The high efficiency of hRad51[K133R]-Cas9[D10A nickase] at this locus (where D10A is not effective) is particularly encouraging for pursuing this construct further.

FIG. 16 demonstrates the titration experiment. This oligo:plasmid titration experiment was conducted under optimized conditions to ascertain the sensitivity of the system to fluctuations in oligo and plasmid amount. It appears that there is not a great deal of difference between 100-200 ng of donor ssODN and 200-800 ng plasmid (total plasmid) with 1.4 ul of L2000, but the system is not exquisitely sensitive to fluctuations in plasmid/donor ssODN amount.

These data indicate that the constructs developed herein offer an improved procedure to perform genome editing through homology-directed repair (HDR). Thus far, the most promising construct is a fusion between a mutant hRad51 (K133R) and a Cas9 nickase (D10A), noted as: hRad51[K133R]-Cas9[D10A nickase]. Slightly less promising data is shown for the alternative nickase construct hRad51[K133R]-Cas9[H840A nickase], as well as for the same nickases fused to wild type hRad51. Further improvements to these constructs are currently being generated and tested.

When delivered into human cells with a single stranded DNA donor template encoding for the desired genome edit, the new construct(s) offer three advantages over current double-stranded DNA break-inducing methods to perform HDR: (1) Improved product purity—a dramatic reduction in undesired indel formation; (2) Improved absolute rates of homology directed repair; (3) The new constructs do not directly generate toxic double-stranded DNA breaks.

Example 3: Development of hRad51-Cas9 Nickase Fusions that Mediate HDR without Double-Stranded Breaks

In mammalian cells, double-stranded DNA breaks (DSBs) are preferentially repaired through end-joining processes that generally lead to mixtures of insertions and deletions (indels) or other rearrangements at the cleavage site. In the presence of homologous DNA, homology-directed repair (HDR) can generate specific mutations, albeit typically with modest efficiency and a low ratio of HDR products:indels. Here, hRad51 mutants fused to Cas9 (D10A) nickase (RDN) that mediate HDR, while minimizing indels, were developed. RDN was used to install disease-associated point mutations in HEK293T cells with comparable or better efficiency than Cas9 nuclease and a 2.7-to-53-fold higher ratio of desired HDR product:undesired by products. Across five different human cell types, RDN variants generally result in higher HDR:indel ratios and lower off-target activity than Cas9 nuclease, although HDR efficiencies remain strongly site- and cell type-dependent. RDN variants provide precision editing options in cell types amenable to HDR, especially when byproducts of DSBs must be minimized.

Widely used genome editing strategies include gene disruption by generating insertions and deletions (indels) at a targeted locus following a double-stranded DNA break (DSB)¹², homology-directed repair (HDR) following a targeted DSB¹³, and base editing, which enables the precise installation of transition point mutations (C to T, G to A, A to G, or T to C) without creating DSBs^14-16. Among these three strategies, HDR offers access to the broadest possible range of changes to genomic DNA in mammalian cells (FIG. 17A)¹⁷. The use of single-stranded DNA oligonucleotides containing PAM-blocking mutations as donor templates can improve HDR outcomes by preventing re-cutting of the target site after successful HDR (FIG. 17A)¹⁸. Nevertheless, because HDR is usually initiated by a DSB, HDR is accompanied by undesired cellular side-effects including high levels of indel formation^18-19, DNA translocations²⁰, large deletions²¹, and p53 activation^22-23.

An improvement of the ratios of desired:undesired HDR pro-ducts was sought by exploring the initiation of HDR from a DNA nick rather than a DSB. In contrast to DSBs, DNA nicks generally do not induce undesired genome modification^24-26, a principle exploited by genome editors to minimize editing byproducts^{14, 16, 27}. Mutating catalytic residues in programmable nucleases can result in programmable nickases that cleave only one of the two strands of DNA at the target locus^28-31. Although single nicks can lead to more favorable HDR:indel ratios than double-stranded DNA breaks^{30, 32, 33}, nicks usually lead to much lower frequencies of genome editing when compared to DSBs (typically 5-20-fold)³⁴, making nickases substantially less useful than nucleases as genome editing tools^{28, 31, 33, 35}.

DSB-free HDR with minimal byproducts and reduced off-target editing was achieved by fusing hRad51 variants to a programmable nickase to generate hRad51—Cas9 (D10A) nickase fusions (RDN variants). hRad51 was selected due to its known involvement in the repair of nicked DNA^{28, 33}. RDN is capable of stimulating HDR at a DNA nick, resulting in a much higher ratio of HDR product:indel formation in human cells (up to 53-fold at the eight genomic loci tested here), substantially lower off-target editing. A known mutant of hRad51 that cannot bind BRCA2^{36, 37}can be used in RDN to further increase the HDR:indel ratio. A second known hRad51 mutant that cannot self-associate^{36, 37}increases overall HDR efficiency while slightly lowering HDR:indel ratios. RDN-mediated HDR is a one-step procedure that does not require inclusion of PAM-blocking mutations¹⁸and can use readily synthesized 100-mer single stranded DNA (ssDNA) oligonucleotides as donor templates. Although RDN remains limited by its dependence on cellular DNA repair processes underlying HDR, RDN may be useful for applications that require precise genome edits not accessible to base editing while minimizing undesired consequences of DSBs.

Results

Indels caused by single Cas9 nickases. Cas9 contains two independent nuclease domains, either of which can be disabled to generate a nickase that selectively cleaves either the guide RNA-paired strand (Cas9(D10A) nickase) or the opposite strand (Cas9 (H840A) nickase) (FIG. 17B)³⁸. High-throughput DNA sequencing (HTS) was used to systematically compare the editing out-comes of Cas9, Cas9(D10A), or Cas9(H840A) nickases at eight genomic loci in three human cell lines.

While both nickases resulted in substantially fewer indels than intact Cas9, nick-induced indel formation was highly strand-dependent and locus-dependent (FIG. 17C). The Cas9(D10A) and Cas9(H840A) nickases displayed different relative activities when paired with different sgRNAs; for example, at HEK site 2 the Cas9 (H840A) nickase generated 24±5% indels (±values represent standard deviations for three biological replicates) and the Cas9 (D10A) nickase generated only 1.1±0.2% indels, while at HEK site 3 Cas9(H840A) nickase resulted in only 0.73±0.38% indels but Cas9(D10A) nickase treatment generated 7.9±1.4% indels (FIG. 17C). One of the eight sgRNAs tested, targeted to the SERPA1 locus, did not lead to detectable indels when combined with either nickase despite robust indel formation when combined with Cas9 (FIG. 17C). A similar pattern of indel formation at nicked sites was observed in HeLa and U2OS cells (FIGS. 22A-22B), and with the ABEmax genome editor, which contains a Cas9(D10A) nickase, although other genome editors resulted in reduced indel frequencies compared to their component nickase domains alone (FIGS. 27A-27B). Observed indel frequencies did not correlate with the presence of microhomology as predicted using inDelphi³⁹(FIG. 23B). These results suggest that the cellular response to single nick generation is site-dependent and unpredictable by microhomology, though in general leads to substantially lower indel formation than the cellular response to DSBs.

It was hypothesized that the site dependence of nickase-induced indels could be explained if the induced nicks were converted to DSBs by a separate cellular process such as DNA replication. When a replication fork encounters a nick, it becomes a DSB⁴⁰. To test this possibility, two sgRNAs (211 and 210) were analyzed that target DNA either 28 bp upstream (sgRNA 210) or 18 bp downstream (sgRNA 211) of HEK site 2, a particularly asymmetric locus that results in high levels of Cas9(H840A) nickase-mediated indels but low levels of Cas9(D10A) nickase-induced indels. While Cas9(H840A) nickase and the HEK site 2 sgRNA resulted in high indel levels (24±5%), nicking the same strand slightly upstream or downstream of HEK site 2 resulted in 17-fold lower indel formation (FIG. 17D). These observations indicated that the high indel frequency generated by Cas9(H840A) nickase when paired with the HEK site 2 sgRNA was strongly dependent on the exact site being targeted. These data suggest that the cellular response to nicks is highly sgRNA-dependent. The high degree of sgRNA-dependence associated with nick-induced indels may explain previously conflicting reports of the relative inactivity of the H840A nickase in human cells^41-43.

HDR stimulated by single Cas9 nickases. The use of HDR for precision genome editing in mammalian cells is limited by low efficiency in many cell types (T cells being a notable exception⁴⁴), and the excess of indels and other undesired cellular outcomes that result from DSB formation. Previous work with Cas9 niCkases^{29, 31, 33, 35, 43}, homing endonucleases converted to nickases²⁸, and zinc finger nickases³⁰demonstrate that nicks can induce low levels of HDR when combined with a donor DNA template.

To assess whether the observed variability among nick-induced indel formation also applied to nick-induced HDR, 100-mer single-stranded DNA oligonucleotide (ssODN) templates were designed for each of the eight genomic loci and were co-delivered with Cas9 nuclease, Cas9 nickases, and catalytically dead Cas9 (dCas9). For three loci (HBB, SERPA1, and LDLR), the ssODN encoded a single human pathogenic SNP located in the protospacer. For the remaining five loci, the donor templates were designed to incorporate an SNP within the protospacer as well as a PAM-altering SNP, as described in the CORRECT method for HDR donor template design¹⁸. A plasmid encoding Cas9, Cas9 nickase, or dead Cas9, a plasmid expressing the indicated sgRNA, and the corresponding ssODN donor template were lipofected into HEK293T cells. Four days post-lipofection, genomic DNA was purified and analyzed by high throughput sequencing (HTS). Crispresso2^{45, 46}was used to filter out reads containing indels from the alignment prior to assessing HDR efficiency to ensure that reads containing both indels and HDR did not contribute to tabulated HDR efficiencies. DNA strands that contained both indels and HDR events were counted as indels during tabulations.

At seven of eight sites, HDR was detected with one or both Cas9 nickases (FIG. 17E). Linear regression analysis identified a weak positive correlation (R2=0.57, p=0.031 for the Cas9(D10A) nickase, R2=0.51, p=0.045 for the H840A nickase) between indel formation and HDR frequencies with nickases, but no significant correlation with Cas9 nuclease (R2=0.08 p=0.475)(FIG. 23A). Although the absolute frequencies of HDR were 2.0-fold to 2.5-fold higher with Cas9 nuclease than with either Cas9 nickase (average across eight sites of 10% HDR product for Cas9, 5.0% for Cas9(H840A), and 4.0% for Cas9 (D10A)), the HDR:indel ratio was 9.1-fold to 9.6-fold higher when using a nickase than Cas9 nuclease (the average HDR:indel ratio was 0.23 for Cas9, 2.1 for H840A, and 2.2 for Cas9(D10A))(FIG. 17F). Importantly, HDR was not detected above a frequency of 0.2% when dCas9 was paired with the same sgRNAs and donor templates (FIG. 17E), indicating that observed HDR frequencies were strongly dependent on Cas9 nicking, and are not artifacts of the donor template acting as a primer during the PCR reaction prior to HTS, a source of artificially high apparent HDR frequencies (FIGS. 28A-28B). To ensure that the donor templates did not participate in the PCR reactions, a size-selective DNA purification step was used FIGS. 28A-28. This example establishes that nick-induced HDR results in improved HDR:indel ratios compared to DSB-mediated HDR. However, the unpredictable nature of whether a nickase will be able to mediate HDR at a particular locus, as well as generally low efficiency, limits the utility of simple nickase-mediated HDR.

Modulating HDR by manipulating cellular repair proteins. To address these limitations, a better understanding of the cellular proteins involved in catalyzing nick-induced HDR was sought. To date, several studies have manipulated cellular DNA repair processes to favor HDR over NHEJ^{28, 33, 35, 47-49}. Previous efforts have identified key cellular DNA repair modulators that can be inhibited (such as p53 binding protein 1 (53BP1))^{47, 48}or over-expressed (such as Rad52⁴⁷) to improve HDR:indel ratios in response to a targeted DSB. Knockdown of cellular hRad51, or inhibition of hRad51 by overexpression of the dominant negative mutant hRad51(K133R), increases both indel and HDR frequencies at targeted nicks^{28, 33, 35}. Guided by these observations, the decision was made to manipulate DNA repair modulators and study the resulting effects on DSB and nick-induced HDR.

Human hRad51 or hRad51(K133R) was overexpressed in conjunction with Cas9 or the Cas9(D10A) nickase (FIG. 18A). Overexpression of hRad51 led to a significant (p<0.05; A two-tailed t-test is shown in Table 4) decrease in HDR frequency at two of eight tested loci for Cas9(D10A) nick-mediated HDR (FIG. 18B) and at five of eight loci for DSB-mediated HDR (FIG. 18D). Conversely, overexpression of hRad51 (K133R), which inhibits cellular hRad51 activity, led to an increase in the efficiency of nick-induced HDR, but not DSB-induced HDR (FIGS. 18B, 18D). Finally, HDR:indel ratios remained largely unchanged by overexpression of hRad51 or hRad51 (K133R). Together, these data demonstrate that hRad51 inhibition increased both HDR and indel frequencies at nick sites, but not at Cas9-induced DSBs (FIGS. 18C, 18E). Intriguingly, over-expression of hRad51(K133R) led to low but detectable levels of HDR at the previously refractory SERPA1 site (FIG. 18B).

To test the potential effect of p53 binding protein 1 (53BP1) on nick-induced HDR, i53, a protein inhibitor of 53BP1⁴⁸, was overexpressed. 53BP1 directs DSBs towards NHEJ-mediated repair by preventing end resection, a key event on the HDR pathway⁵⁰. Overexpression of i53 with Cas9 led to a significant (defined as p<0.05, two-tailed t-test) increase in the absolute frequency of HDR at four of eight tested loci and an improvement in the HDR:indel ratio at six of eight loci compared to Cas9 alone (FIGS. 18D, 18E, Table 4, and Table 5). However, no such HDR improvements from i53 overexpression were observed when Cas9(D10A) nickase was used instead of Cas9 (FIGS. 18B-18C), indicating that 53BP1 is unlikely to be a key modulator of nick-mediated HDR. Overexpression of Rad52, an interaction partner of hRad51, did not increase the efficiency of HDR arising from nicks or DSBs, but significantly improved the HDR:indel ratio at four of eight loci when HDR was stimulated by a nick (FIGS. 18B-18E). Together, these findings suggest that global inhibition of cellular hRad51, but not inhibition of 53BP1 or elevating Rad52 levels, can increase the frequency of HDR in response to a DNA nick.

Development of Cas9(D10A) nickase fusions that promote HDR. Based on the above findings, fusion constructs between the Cas9(D10A) nickase or the Cas9(H840A) nickase and hRad51(K133R) were generated to mediate local inhibition of hRad51 at the target site. It was anticipated that such fusion constructs might be more effective and less perturbative than global inhibition of hRad51, which causes chromosomal instability⁵¹, and that the hRad51 fusion partner would serve to modulate repair of the DNA nick. Under normal circumstances, cellular hRad51 binds to exposed genomic ssDNA after end-resection at the nick, leading to perfect, non-mutagenic repair of the nick^{24, 33}. This non-mutagenic repair process was inhibited by the dominant negative hRad51(K133R) mutant, which forms mixed filaments with wild-type hRad51 that can perform a DNA homology search, but cannot hydrolyze ATP to initiate DNA strand invasion⁵², even when low levels of the mutant protein are present⁵³.

To begin, the parameters were optimized for transfection by performing a titration of plasmid and donor template quantities, and by measuring HDR and indel efficiencies at two loci with both the Cas9(D10A) nickase and the hRad51(K133R)—Cas9 (D10A) fusion (FIGS. 24A-24C). Surprisingly, a small quantity of ssODN (50 ng) was sufficient for efficient HDR, and increasing the ssODN amount to 400 ng reduced HDR efficiency. Fusion of hRad51(K133R) to the N-terminus of the Cas9(D10A) nickase increased HDR efficiency in HEK293T cells by an average of 2.4-fold without altering the favorable HDR:indel ratio observed with the Cas9(D10A) nickase alone (FIG. 19B). This fusion construct, hRad51(K133R)-Cas9(D10A) nickase, is referred to as RDN(K133R). Compared to RDN(K133R), moving the position of hRad51(K133R) to the C terminus of the Cas9 (D10A) nickase did not significantly alter HDR frequencies (FIG. 19B), nor did fusing an additional monomer of hRad51 (K133R) to the N-terminus of Cas9(D10A) (FIG. 19B). Fusion of one hRad51(K133R) monomer to the N-terminus and one to the C-terminus, however, reduced both HDR and indel formation, possibly due to the association of multiple fusion proteins into an extended multimer (FIG. 19B). Consistent with the data showing that inhibition or overexpression of hRad51 does not have a substantial effect on DSB-mediated HDR, fusion between Cas9 and hRad51(K133R) led to a slight reduction to average HDR frequency at the loci tested (FIG. 19B; FIGS. 25A-25B). Fusion between hRad51(K133R) and the Cas9(H840A) nickase also did not improve HDR frequency or HDR:indel ratios (FIGS. 26E-26F). The nickase strand preference of HDR enhancement upon hRad51(K133R) fusion may have arisen from the position of the nick introduced by Cas9(H840A) in the R-loop of displaced genomic DNA, compared with the position of the nick from Cas9(D10A) in the DNA:RNA duplex (FIG. 17B).

Surprisingly, fusion of wild-type hRad51 to Cas9(D10A), hereafter referred to as RDN, also resulted in increased HDR efficiency (FIG. 19C), even though overexpression of hRad51 in trans with the Cas9(D10A) nickase lead to slightly decreased HDR efficiency (FIG. 18B). These results suggest that increased HDR frequency mediated by RDN results from a mechanism distinct from global inhibition of hRad51. Together, these data demonstrate that localizing hRad51 to a targeted DNA nick through the RDN fusion increases nick-mediated HDR efficiency without inhibition of strand invasion mediated by cellular hRad51.

Next, an understanding of whether the HDR frequency enhancement associated with RDN and RDN(K133R) arose from simple steric occlusion of DNA repair proteins from accessing the nick, or whether the affinity of hRad51 for single-stranded DNA lead to localization of the single-stranded DNA donor to the nick was sought. To illuminate these possibilities, fusions were created between the Cas9(D10A) nickase and RecA or bacteriophage T4-derived single-stranded binding protein (SSB). RecA is a bacterial homolog of hRad51 that catalyzes strand invasion between homologous strands of DNA. Neither RecA—Cas9(D10A) nor SSB—Cas9(D10A) resulted in HDR enhancement (FIG. 19C). Furthermore, incorporation of three additional hRad51 mutants (R310A, R235E and G151D) into RDN to generate RDN (R310A), RDN(R235E) and RDN(G151D) all displayed HDR enhancement frequencies indistinguishable from that of RDN and RDN(K133R) (FIG. 19C, and FIGS. 26I and 26J), in spite of their differing catalytic and DNA-binding characteristics (FIG. 19A)^54-56. Taken together, these observations revealed that neither the fusion orientation of hRad51 relative to Cas9(D10A) nor the strand invasion and strand exchange activities of hRad51 are critical for the ability of RDN to mediate HDR.

Donor template optimization. When possible, including a PAM-altering mutation together with the target mutation in a donor template is an effective approach to improve HDR efficiencyl^{8, 57}by preventing re-cutting and subsequent modification of the desired HDR product. HDR efficiencies are highly dependent on the distance between the DNA cleavage site and the mutation that is being incorporated^{18, 57}. The above experiments used donor templates that contain PAM-blocking mutations at five of the eight loci tested (sgRNA 1, sgRNA 2, HEK site 2, HEK site 3, and HEK site 4), and donor templates that lacked PAM-blocking mutations due to unavailability of a silent PAM-blocking mutation in addition to the target point mutation at the remaining three sites (LDLR, HBB, and SERPA1). Since indels are generated much less efficiently with nick-induced HDR compared to DSB-induced HDR (FIG. 17E), whether PAM-blocking mutations are necessary for nick-induced HDR was tested, and a definition of the region between the PAM and target mutation that can support efficient HDR was sought.

A series of eight ssODN templates were designed targeting the HEK site 3 locus, each containing a SNP located in a different position within the protospacer from position 7 to 25, counting the PAM as positions 21-23. Two sets of donor templates were used. The first set of ssODNs incorporated a PAM mutation (replacing the TGG PAM with TTT) alongside the target mutation, while the second set only encoded each target mutation. As expected, an increase in the frequency of Cas9-mediated HDR was observed when the PAM-blocking template was used compared to the non-PAM-blocking template (FIG. 20A). By contrast, incorporating a PAM mutation into the donor ssODN did not lead to increased HDR frequency for nick-induced HDR, mediated either by Cas9(D10A) or RDN(K133R), as long as the target mutation is located within the sgRNA protospacer sequence (FIG. 20A).

The frequency of HDR at HEK site 3 was previously measured using a donor template with a PAM-blocking mutation (replacing the TGG PAM with TCC, Table 1 and Table 2) using Cas9 (FIG. 17E), Cas9(D10A) (FIG. 17E), or RDN(K133R) (FIG. 26C). The HDR frequencies from Cas9 and RDN(K133R) were very similar when these different oligonucleotides were used. For example, Cas9 yielded 4.7±0.5% HDR with a TTT-blocking mutation, and 5.7±0.9% with a TCC-blocking mutation. However, the mean value for Cas9(D10A) increased from 2.6±1.0% with the TCC PAM blocking mutation to 7.9±3.2% with the TTT PAM blocking mutation, an unexpected result that suggested some ssODN dependence for Cas9(D10A) mediated HDR.

Unlike DSB-induced HDR, in which HDR efficiency steeply declines as the distance between the DSB and the incorporated mutation increases^{18, 57}(FIG. 20A), comparable HDR efficiencies were observed when RDN(K133R) was paired with different donor templates that introduced mutations from position 7 to 18 in the protospacer (FIG. 20A). This greater apparent independence of HDR efficiency from the location of the mutation to be installed relative to the protospacer suggests that RDN may offer more flexibility with regards to guide RNA choice than Cas9 nuclease-mediated HDR.

Donor template oligonucleotides that were oriented in the same sense as the sgRNA (forward template, which was used for all other experiments in this example) and in the opposite sense (reverse template) were also tested. No significant differences were observed (two-tailed t-test) in the resulting HDR efficiencies mediated by Cas9(D10A), Cas9, Cas9(H840A), or RDN(K133R) (FIG. 25), indicating that ssODN orientation is not a substantial determinant of HDR efficiencies under the conditions tested.

RDN with additional hRad51 mutants. Although the development of RDN as a tool to mediate HDR led to consistently improved HDR:indel ratios, the overall frequency of RDN-mediated HDR is similar to that of Cas9-mediated HDR (FIG. 19C). In an attempt to improve overall HDR efficiency further while maintaining favorable HDR:indel ratios, four additional mutants of hRad51 in RDN constructs were assessed.

In addition to their role in catalyzing DNA strand invasion, hRad51 monomers directly bind to BRCA2^58-60, or to other hRad51 monomers^{36, 61}. Mutants of hRad51 that have lost either or both of these capabilities have been engineered^{36, 37}(FIG. 19A). These mutations were installed into the RDN context and assayed HDR and indel outcomes of the resulting constructs to assess whether these binding interactions influence editing outcomes (FIGS. 19D-19F). The results revealed that using hRad51 mutants incapable of self-association, but which maintain BRCA2 binding, increased HDR efficiency in HEK293T cells at the eight tested sites to an average of 14% (F86E mutant, RDN(F86E)) or 15% (A89E mutant, RDN(A89E)), compared to 10% for RDN. Both of these mutants were associated with a modest reduction in HDR:indel ratio, from an average of 1.9 for RDN to 0.93 for RDN (F86E) or 0.98 for RDN(A89E).

In contrast, removing the BRCA2-binding ability of hRad51 using the double mutant (RDN(S208E A209D)) only slightly improved HDR efficiency relative to RDN (to an average of 12%), but substantially improved the HDR:indel ratio (to 3.3), suggesting that abolishing recruitment of BRCA2 to the nick promotes more favorable HDR:indel partitioning. It should be noted that even with these improvements, the efficiency of nick-induced HDR remains more sgRNA-dependent than the efficiency of DSB-induced HDR. For example, pairing original or mutant RDN constructs with sgRNA SERPA1 leads to modest (<3%) HDR frequencies compared with Cas9 (11.1±0.6%).

A final hRad51 A190L A192L double mutant was tested that lacked both BRCA2-binding and hRad51 self-association ability. RDN(A190L A192L) mediated HDR with an average efficiency of 14% and an HDR:indel ratio of 1.6, offering intermediate levels of HDR efficiency and HDR:indel ratio compared to the above RDN variants.

These analyses inform potential mechanisms by which RDN can mediate efficient HDR with favorable HDR:indel ratios. The data are consistent with a model in which self-association of hRad51 is important to maintain a high HDR:indel ratio but also limits HDR efficiency by promoting perfect repair of the DNA nick. In contrast, recruitment of BRCA2 to the nick site reduces the rate of perfect repair of the nick (FIG. 19G). For applications that benefit most from maintaining the highest possible HDR efficiency, RDN(A89E) is the most useful, whereas applications that require maximizing the HDR:indel ratio will benefit from use of the RDN (S208E A209D) variant.

Off-target modification by RDN variants. Cas9 nuclease⁶²and Cas9-derived proteins such as genome editors^{14, 16, 63}can induce off-target editing in a sgRNA-dependent fashion. Off-target editing at known off-target sites associated with three well-studied sgRNAs⁶²were characterized: HEK site 2, HEK site 3, and HEK site 4, which is a notoriously promiscuous sgRNA^{64, 65}. Although the homology required between the target genomic locus and the ssODN prevents significant off-target HDR products from being generated by Cas9 combined with an ssODN, indel formation from Cas9 nuclease activity at off-target sites under these conditions is common. Off-target indel formation was measurable (>0.1%) with Cas9 treatment at all tested Cas9 off-target sites, and off-target indel formation ranged in efficiency from 0.12 to 98% (FIG. 20B). In contrast, Cas9(D10A) nickase and RDN edited only two of the 12 off-target loci (>0.1% indel formation) (FIG. 20B). The more efficient RDN(A89E) edited four of 12 off-target sites at a frequency >0.1%, all of which are associated with the pro-miscuous HEK site 4 guide RNA. These results indicate that RDN-mediated HDR offers substantially lower off-target DNA modification than nuclease-based HDR, and that this trend even applies to RDN(A89E), which typically results in higher on-target HDR frequencies than Cas9.

HDR in other human cell types. HEK293 and HEK293T cells are known to be particularly amenable to ssODN-mediated HDR⁶⁶. Indeed, some other commonly used immortalized cell lines including HeLa and U2OS are thought to be completely refractory to ssODN-mediated HDR⁶⁶. RDN- and Cas9-mediated HDR outcomes were compared in other immortalized cell lines and in primary human cells, including HeLa cells, U2OS cells, human induced pluripotent stem (hiPS) cells and K562 cells.

In HEK293T cells, RDN(A89E) offers the highest HDR frequency (FIG. 19E) and RDN(S208E A209D) offered the highest HDR:indel ratio (FIG. 19F) of all the constructs tested, so these two constructs were tested in the wider range of cell types. For this comparison, oligonucleotides designed without PAM mutations were used to maximize the generality of the results and due to conclusions that nick-mediated HDR does not benefit from PAM blocking mutations (FIG. 20A). Unless otherwise specified, results are reported from unsorted cells as percentages of the entire cell population, not as percentages of edited or modified cells, which would greatly increase apparent editing efficiencies.

RDN (containing wild-type hRad51) led to substantially reduced HDR frequencies when compared to Cas9 in all non-HEK293T cell types tested. For example, in K562 cells the average reduction in efficiency was from a mean of 16% with Cas9 to 3.8% with RDN (FIG. 21A). The mean HDR:indel ratio, however, was improved 87-fold in K562 cells and 3-fold in HeLa cells (FIGS. 21B and 21D). RDN(S208E A209D) demonstrated slightly improved HDR:indel ratios when compared to RDN, but the overall efficiency of HDR remained low compared to that achieved by Cas9 (FIGS. 21A-21J).

When RDN(A89E) was used, however, the average HDR efficiency was substantially improved, with mean HDR frequencies of 8.3% in K562, 1.3% in U2OS, and 1.8% in HeLa cells (FIGS. 21A, 21C, 21E). HDR efficiencies in these three non-HEK293T cell types were on average 2.1-fold lower than those following Cas9 treatment. RDN(A89E) was associated with a 15-fold improvement in HDR:indel ratio in K562 cells and 7-fold in HeLa cells compared to Cas9 treatment. This improvement was not observed in U2OS cells, which exhibited a slight reduction in HDR:indel ratio when RDN(A89E) was used (FIG. 21F). In hiPS cells, only one of the three tested loci was amenable to RDN (A89E)-mediated HDR, demonstrating that this approach may be more site-dependent in hiPS cells than in immortalized cell lines. To test if this limitation was due to poor expression of RDN (A89E) in hiPS cells, Cas9 and RDN(A89E) constructs tagged with P2A GFP were generated to enable isolation of Cas9-expressing or RDN(A89E)-expressing cells. With Cas9-P2A-GFP, isolating GFP-positive cells resulted in 1.8% average HDR efficiencies in hiPS cells with an average HDR:indel ratio of 0.03 (FIGS. 21I and 21J). Among GFP-positive cells expressing RDN(A89E)-P2A-GFP, average HDR efficiencies were 1.0%, with an average HDR:indel ratio of 46 (FIGS. 21I and 21J), reflecting a modest decrease in HDR efficiency but a >1000-fold improvement in HDR:indel ratio. (FIGS. 21I and 21J, FIG. 29). Among GFP-positive cells isolated with the RDN(A89E)-P2A-GFP construct, average indel frequency was 1.6% and the vast majority showed no target site modification. This observation suggests that the majority of nicks induced by RDN(A89E) construct are perfectly repaired in hiPS cells; in contrast, GFP-positive cells containing Cas9-P2A-GFP contained an average of 77% indels.

These data together reveal that RDN(A89E) mediates more efficient HDR than Cas9 nuclease in HeLa and HEK293T cells, maintains similar levels of HDR efficiency in K562 cells, and offers improved HDR:indel ratios in HeLa, HEK293T, K562, and hiPS cells. Neither an efficiency nor a product purity advantage from any tested RDN variant was observed in U2OS cells, possibly as a result of unusual regulation of DNA repair in U2OS cells^{66, 67}. This variability is likely due to the reliance of RDN on cellular repair processes that are highly cell type-dependent.

Indel formation and base editing arising from commonly used genome editors at the genomic loci shown in FIGS. 17A-17F. The Cas9 D10A nickase is a component of many DNA genome editors, which are generally associated with low or undetectable indel rates^{71, 72}. Indel formation induced by D10A nickase was compared to that associated with the recently reported expression-optimized genome editors (ABEmax and BE4max)⁷³and their predecessors (BE4⁷⁵and ABE7.10⁷¹) (FIG. 27A). Genome editors BE4max, ABE7.10, and BE4 are associated with lower indel rates than the D10A-nickase alone (average indel generation across 8 loci was 3.7±2.8% for the D10A nickase, 1.2±0.5% for BE4max, 1.2±0.7% for BE4 and 0.37±0.2% for ABE-7.10. ABEmax generated similar indel levels to the D10A-nickase alone, an average of 3.7±3.1%.

The basis for the elevated indel rates from optimizing ABE7.10 expression, which were not observed upon optimizing BE4 expression, is unclear, but may be attributed to increased levels of D10A nickase domain expression in ABEmax compared to ABE7.10, BE4max, or BE4⁷³. These findings confirm that genome editors generally induce lower indel rates than D10A nickase alone, and the elevated indel rates associated with ABEmax can be avoided by using ABE7.10⁷¹.

TABLE 1

Single guide RNA (sgRNA) sequences and HDR products.

Change made using
Protein

Change made using ssODN
ssODN template in
change

sgRNA

template in HEK293T
extended cell line
induced

name
sgRNA + PAM sequence
experiments
experiments
by HDR

sgRNA 1
GAGCAAAGAGAATAGACTGTAGG
GAGCAAAGAGAATAGACTCTA custom-character

(SEQ ID NO: 47)
(SEQ ID NO: 48)

sgRNA 2
GGATTGACCCAGGCCAGGGCTGG
GGATTGACCCAGGCGAGGGCTG custom-character

(SEQ ID NO: 54)
(SEQ ID NO: 49)

HEK2
GAACACAAAGCATAGACTGCCGG
GAACACAAAGCAGAGACTGCC custom-character

(SEQ ID NO: 81)
(SEQ ID NO: 82)

HEK3
GGCCCAGACTGAGCACGTGATGG
GGCCCAGACTGTGCACGTGAT custom-character

GGCCCAGACTGAGCAAGTGATGG

(SEQ ID NO: 55)
(SEQ ID NO: 83)
(SEQ ID NO: 84)

HEK4
GGCACTGCGGCTGGAGGTGGGGG
GGCACTGCGGCTGGAGTTGGG custom-character

(SEQ ID NO: 68)
(SEQ ID NO: 85)

HBB
GTAACGGCAGACTTCTCCTCAGG
GTAACGGCAGACTTCTCCACAGG
GTAACGGCAGACTTCTCCACAGG
HBB:

(SEQ ID NO: 52)
(SEOIDNO:86)
(SEQ ID NO: 86)
Glu6Val

SERPA1
[G]TGCTGACCATCGACGAGAAAGGG
TGCTGACCATCGACAAGAAAGGG

SERPINA1:

(SEQ ID NO: 87)
(SEQ ID NO: 88)

Glu366Lys

LDLR
[G]CAGAGCACTGGAATTCGTCAGGG
CAGAGCACTGGAATTAGTCAGGG
CAGAGCACTGGAATTAGTCAGGG
LDLR:

(SEQ ID NO: 89)
(SEQ ID NO: 90)
(SEQ ID NO: 90)
Glu240STOP

PAM sequences are in italics. Nucleotides mutated through HDR are bold. For sgRNAs SERPA1 and LDLR, a 5′ G was included in the sgRNA expression cassette to enable efficient expression of the sgRNA from the U6 promoter. This 5′ G is indicated as [G] in the sgRNA sequence column. For sgRNAs 1, 2, HEK2, HEK3 and HEK4, a PAM mutation was incorporated into the ssODN template as well as the SNP indicated in the sgRNA protospacer sequence. HDR was quantified by the proportion of cells undergoing HDR that resulted in incorporation of the SNP in the protospacer, not in the PAM. The PAM mutation was not incorporated into the ssODN template used for extended cell line experiments at HEK site 3. Protein coding changes that would result from successful HDR have been listed in the final column.

TABLE 2

Donor template sequences used for HDR.

8 core sites: used in FIGs. 17A-17F, 18A-E, 19A-19G, 23A-23B, 24A-24D, 26A-26J, and 28A-28B

sgRNA

associated

with oligo
ssODN Oligo Sequence

sgRNA 1
ATTTTAAGCTGTAGTATTATGAAGGGAAATCTGGAGCAAAGAGAATAGACTCTACAGAAACCAGTTAAGAAATAGGACAT

GGAGGCTAGGTGCAGTGGCT(SEQ ID NO: 91)

sgRNA 2
TTTCCTCTGCCATCACGTGCTCAGTCTGGGCCCCAAGGATTGACCCAGGACAGGGCTCGAGAAGCAGAAAAAAAGCATCA

AGCCTACAAATGCATGCTTA (SEQ ID NO: 92)

HEK2
TTTTCCAGCCCGCTGGCCCTGTAAAGGAAACTGGAACACAAAGCAGAGACTGCGACGCGGGCCAGCCTGAATAGCTGCAA

ACAAGTGCAGAATATCTGAT (SEQ ID NO: 93)

HEK3
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGAGCAAGTGATTTCAGAGGAAAGGAAGCCCTGCTTCC

TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 94)

HEK4
GGATGACAGGCAGGGGCACCGCGGCGCCCCGGTGGCACTGCGGCTGGAGTTGGGAATTAAAGCGGAGACTCTGGTGCTGT

GTGACTACAGTGGGGGCCCT (SEQ ID NO: 95)

HBB
ACTTCATCCACGTTCACCTTGCCCCACAGGGCAGTAACGGCAGACTTCTCCACAGGAGTCAGATGCACCATGGTGTCTGT

TTGAGGTTGCTAGTGAACAC (SEQ ID NO: 96)

SERPA1
CATGGGTATGGCCTCTAAAAACATGGCCCCAGCAGCTTCAGTCCCTTTCTTGTCGATGGTCAGCACAGCCTTATGCACGG

CCTGGAGGGGAGAGAAGCAG (SEQ ID NO: 97)

LDLR
ATCAACACACTCTGTCCTGTTTTCCAGCTGTGGCCACCTGTCGCCCTGACTAATTCCAGTGCTCTGATGGAAACTGCATC

CATGGCAGCCGGCAGTGTGA (SEQ ID NO: 98)

FIGS. 21A-2U
ssODN Oliqo Sequence

HBB
ACTTCATCCACGTTCACCTTGCCCCACAGGGCAGTAACGGCAGACTTCTCCACAGGAGTCAGATGCACCATGGTGTCTGT

TTGAGGTTGCTAGTGAACAC (SEQ ID NO: 96)

LDLR
ATCAACACACTCTGTCCTGTTTTCCAGCTGTGGCCACCTGTCGCCCTGACTAATTCCAGTGCTCTGATGGAAACTGCATC

CATGGCAGCCGGCAGTGTGA (SEQ ID NO: 98)

HEK3
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGAGCAAGTGATGGCAGAGGAAAGGAAGCCCTGCTTCC

TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 99)

FIG. 20A and FIG. 25

Mutation In

oliqo
ssODN Oligo Sequence

ssODNs in the “forward” sense - used in FIG. 20A and FIG. 25

PAM mutation
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGAGCACGTGATTTCAGAGGAAAGGAAGCCCTGCTTCC

only
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 100)

SNP at 20 +
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGAGCACGTGTTTTCAGAGGAAAGGAAGCCCTGCTTCC

PAM mutation
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 101)

SNP at 18 +
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGAGCACGGGATTTCAGAGGAAAGGAAGCCCTGCTTCC

PAM mutation
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 102)

SNP at 16 +
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGAGCAAGTGATTTCAGAGGAAAGGAAGCCCTGCTTCC

PAM mutation
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 94)

SNP at 12 +
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGTGCACGTGATTTCAGAGGAAAGGAAGCCCTGCTTCC

PAM mutation
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 103)

SNP at 7 + PAM
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCACACTGAGCACGTGATTTCAGAGGAAAGGAAGCCCTGCTTCC

mutation
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 104)

SNP at 25 +
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGAGCACGTGATTTCGGAGGAAAGGAAGCCCTGCTTCC

PAM mutation
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 105)

SNP at 28 +
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGAGCACGTGATTTCAGATGAAAGGAAGCCCTGCTTCC

PAM mutation
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 106)

SNP at 20 no
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGAGCACGTGTTGGCAGAGGAAAGGAAGCCCTGCTTCC

PAM mutation
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 99)

SNP at 18 no
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGAGCACGGGATGGCAGAGGAAAGGAAGCCCTGCTTCC

PAM mutation
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 107)

SNP at 16 no
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGAGCAAGTGATGGCAGAGGAAAGGAAGCCCTGCTTCC

PAM mutation
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 108)

SNP at 12 no
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGTGCACGTGATGGCAGAGGAAAGGAAGCCCTGCTTCC

PAM mutation
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 109)

SNP at 7 no
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCACACTGAGCACGTGATGGCAGAGGAAAGGAAGCCCTGCTTCC

PAM mutation
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 110)

SNP at 25 no
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGAGCACGTGATGGCGGAGGAAAGGAAGCCCTGCTTCC

PAM mutation
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 111)

SNP at 28 no
GCTTCTCCAGCCCTGGCCTGGGTCAATCCTTGGGGCCCAGACTGAGCACGTGATGGCAGATGAAAGGAAGCCCTGCTTCC

PAM mutation
TCCAGAGGGCGTCGCAGGAC (SEQ ID NO: 112)

reverse ssODNs used in FIG. 25

PAM mutation
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCCTCTGAAATCACGTGCTCAGTCTGGGCCCCAAGGATTGACC

only
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 113)

SNP at 20 +
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCCTCTGAAAACACGTGCTCAGTCTGGGCCCCAAGGATTGACC

PAM mutation
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 114)

SNP at 18 +
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCCTCTGAAATCCCGTGCTCAGTCTGGGCCCCAAGGATTGACC

PAM mutation
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 115)

SNP at 16 +
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCCTCTGAAATCACTTGCTCAGTCTGGGCCCCAAGGATTGACC

PAM mutation
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 116)

SNP at 12 +
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCCTCTGAAATCACGTGCACAGTCTGGGCCCCAAGGATTGACC

PAM mutation
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 117)

SNP at 7 + PAM
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCCTCTGAAATCACGTGCTCAGTGTGGGCCCCAAGGATTGACC

mutation
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 118)

SNP at 25 +
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCCTCCGAAATCACGTGCTCAGTCTGGGCCCCAAGGATTGACC

PAM mutation
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 119)

SNP at 28 +
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCATCTGAAATCACGTGCTCAGTCTGGGCCCCAAGGATTGACC

PAM mutation
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 120)

SNP at 20 no
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCCTCTGCCAACACGTGCTCAGTCTGGGCCCCAAGGATTGACC

PAM mutation
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 121)

SNP at 18 no
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCCTCTGCCATCCCGTGCTCAGTCTGGGCCCCAAGGATTGACC

PAM mutation
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 122)

SNP at 16 no
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCCTCTGCCATCACTTGCTCAGTCTGGGCCCCAAGGATTGACC

PAM mutation
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 123)

SNP at 12 no
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCCTCTGCCATCACGTGCACAGTCTGGGCCCCAAGGATTGACC

PAM mutation
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 124)

SNP at 7 no
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCCTCTGCCATCACGTGCTCAGTGTGGGCCCCAAGGATTGACC

PAM mutation
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 125)

SNP at 25 no
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCCTCCGCCATCACGTGCTCAGTCTGGGCCCCAAGGATTGACC

PAM mutation
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 126)

SNP at 28 no
GTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTCATCTGCCATCACGTGCTCAGTCTGGGCCCCAAGGATTGACC

PAM mutation
CAGGCCAGGGCTGGAGAAGC (SEQ ID NO: 127)

TABLE 3

DNA primers used for amplification of genomic DNA prior to HTS.

Primers for amplification of genomic DNA

LDLR forward
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNGCCCTGCTTCTTTTTCTCTGGT (SEQ

ID NO: 128)

LDLR reverse
TGGAGTTCAGACGTGTGCTCTTCCGATCTACCATTAACGCAGCCAACTTCA (SEQ ID NO: 129)

HBB forward
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNGTCTTCTCTGTCTCCACATGCC

(SEQ ID NO: 130)

HBB reverse
TGGAGTTCAGACGTGTGCTCTTCCGATCTTAGGGTTGGCCAATCTACTCCC (SEQ ID NO: 131)

HEK site 3 and sgRNA 2
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNGGAAACGCCCATGCAATTAGTC

forward
(SEQ ID NO: 132)

HEK site 3 and sgRNA 2
TGGAGTTCAGACGTGTGCTCTTCCGATCTCTTGTCAACCAGTATCCCGGTG (SEQ ID NO: 133)

reverse

HEK site 2 forward
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNTGAATGGATTCCTTGGAAACAATG

(SEQ ID NO: 134)

HEK site 2 reverse
TGGAGTTCAGACGTGTGCTCTTCCGATCTCCAGCCCCATCTGTCAAACT (SEQ ID NO: 135)

HEK site 4 forward
TGGAGTTCAGACGTGTGCTCTTCCGATCTTCCTTTCAACCCGAACGGAG (SEQ ID NO: 136)

HEK site 4 reverse
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNGCTGGTCTTCTTTCCCCTCC (SEQ ID

NO: 137)

sgRNA 1 forward
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNGAGTTACTGCTCAGACATGTAA (SEQ

ID NO: 138)

sgRNA 1 reverse
TGGAGTTCAGACGTGTGCTCTTCCGATCTGACCTCGTGATCCACCTGCC (SEQ ID NO: 139)

SERPA1 forward
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNTTTGTTGAACTTGACCTCGGGG

(SEQ ID NO: 140)

SERPA1 reverse
TGGAGTTCAGACGTGTGCTCTTCCGATCTCATCAGCCAAAGCCTTGAGGAG (SEQ ID NO: 141)

TABLE 4

P-values for comparisons between conditions for absolute HDR frequencies in

HEK293T cells.

sgRNA1
sgRNA2
HEK Site 2
HEK Site 3
HEK Site 4
HBB
SERPA1
LDLR

Relevant to FIGs. 18A-18E

Cas9 vs. i53

0.027

0.002

0.494

0.083

0.812

0.017

0.394

0.020

Cas9 vs. hRad51

0.000

0.002

0.055

0.022

0.038

0.051

0.000

0.254

Cas9 vs. hRadK133R

0.022

0.951

0.007

0.206

0.050

0.855

0.031

0.343

Cas9 vs. hRad52

0.251

0.029

0.017

0.278

0.292

0.041

0.516

0.549

D10A VS. hRad52

0.946

0.892

0.534

0.643

0.034

0.818

0.026

0.276

D10A vs. hRad-K133R

0.142

0.004

0.003

0.100

0.020

0.003

0.011

0.052

D10A vs. i53

0.028

0.274

0.136

0.204

0.683

0.004

0.572

0.843

D10A vs. hRad51

0.013

0.146

0.084

0.111

0.110

0.001

0.042

0.906

Relevant to FIGs. 19A-19G

Cas9 vs. hRad51(K133R)-Cas9

0.004

0.806

0.008

0.486

0.553

0.024

0.005

0.001

D10A vs. hRad51(K133R)-D10A

0.242

0.007

0.009

0.039

0.021

0.014

0.003

0.028

H840A vs. hRad51(K133R)-H840A

0.002

0.057

0.025

0.859

0.052

0.008

0.374

0.374

hRad51(K133R)-D10A vs. hRad51-D10A

0.836

0.443

0.282

0.929

0.734

0.862

0.251

0.787

hRad51(F86E)-D10A vs. hRad51-D10A

0.001

0.287

0.019

0.514

0.007

0.044

0.015

0.001

hRad51(A89E)-D10A vs. hRad51-D10A

0.000

0.066

0.008

0.322

0.035

0.049

0.015

0.001

hRad51(A190L, A192L)-D10A vs. hRad51-D10A

0.000

0.115

0.060

0.198

0.042

0.327

0.442

0.006

hRad51(SA208, 209ED)-D10A vs. hRad51-D10A

0.001

0.847

0.331

0.414

0.119

0.426

0.000

0.022

A two-tailed, two-sample equal variance t test was performed in Excel. Values have been discerned according to p-value; underlined=p>0.05, italics=0.01<p<0.05 and bold=p<0.01.

TABLE 5

P-values for comparisons between conditions for HDR:indel ratios in

HEK293T cells.

sgRNA1
sgRNA2
HEK Site 2
HEK Site 3
HEK Site 4
HBB
SERPA1

Relevant to FIGs. 18A-18E

Cas9 vs. i53

0.01

0.03

0.09

0.12

0.64

0.00

0.07

Cas9 vs. hRad51

0.00

0.00

0.01

0.05

0.11

0.18

0.01

Cas9 vs. hRadK133R

0.03

0.11

0.81

0.02

0.27

0.08

0.02

Cas9 vs. hRad52

0.24

0.12

0.05

0.75

0.30

0.59

0.06

D10A VS. hRad52

0.00

0.14

0.17

0.04

0.04

0.02

#Div0

D10A vs. hRad-K133R

0.54

0.38

0.03

0.58

0.81

0.28

#Div0

D10A vs. i53

0.21

0.33

0.20

0.57

0.30

0.20

#Div0

D10A vs. hRad51

0.01

0.12

0.02

0.07

0.12

0.18

#Div0

Relevant to FIGs. 19A-19G

Cas9 vs. hRad51(K133R)-Cas9

0.00

0.22

0.05

0.20

0.74

0.03

0.11

D10A vs. hRad51(K133R)-D10A

0.53

0.85

0.01

0.83

0.42

0.13

#Div0

H840A vs. hRad51(K133R)-H840A

0.02

0.11

0.70

#Div0

0.37

#Div0
#Div0

hRad51(K133R)-D10A vs. hRad51-D10A

0.64

0.74

0.06

0.19

0.90

0.14

#Div0

hRad51(F86E)-D10A vs. hRad51-D10A

0.33

0.25

0.02

0.01

0.16

0.14

0.10

hRad51(A89E)-D10A vs. hRad51-D10A

0.23

0.20

0.01

0.01

0.07

0.33

0.13

hRad51(A190L, A192L)-D10A vs. hRad51-D10A

0.23

0.58

0.80

0.26

0.37

0.05

0.24

hRad51(SA208, 209ED)-D10A vs. hRad51-D10A

0.13

0.05

0.00

0.07

0.99

0.00

0.07

A two-ailed, two-sample equal variance t test was performed in Excel. Values have been discerned according to p-value; underline=p>0.05, italics=0.01<p<0.05 and bold=p<0.01. #DivO error occurred when HDR:indel ratio could not be calculated due to values of HDR lower than 1% to avoid inflating HDR:indel ratios for low values of HDR.

DISCUSSION

The method developed herein enables precise and specific changes to be made to genomic DNA through homology-directed repair, without generating a double stranded DNA break. Use of the fusion construct hRad51-Cas9(D10A)(RDN) or variants of this construct in which hRad51 has been replaced by hRad51 mutants, can address some of the challenges associated with using HDR to make precise changes to genomic DNA in certain human cell types.

The HDR:indel ratio generated by RDN is generally improved compared to that which can be achieved using a DSB. This improvement in the purity of editing outcomes is particularly important for genome editing applications in which gene knockout resulting from indel formation opposes desired biological outcomes, or in which mixtures of many different edited genotypes—the typical cellular response to DSBs—is undesired. The RDN(S208E A209D) construct is particularly useful under such circumstances since it offers ˜3.2-fold more HDR product than indels (FIG. 19D). In addition, the efficiency of HDR mediated by RDN and RDN(A89E) is higher than that of Cas9 in some (but not all) cell types (FIGS. 19E and 21A), although HDR efficiency remains modest, likely limited by dependence on cellular DNA repair processes. RDN and its variants also offer substantially higher DNA specificity (lower off-target indel formation) compared to Cas9 nucleases combined with the same sgRNAs, even when applied to a notoriously pro-miscuous guide RNA with many known off-target loci (FIG. 20B). RDN with wild-type hRad51 offers the greatest degree of DNA specificity among the mutants tested, but this difference was only notable at the promiscuous HEK Site 4, as were not able to detect off-target editing at frequencies above 0.2% at any other tested loci following use of RDN, RDN(A89E) or RDN(S208E A209D) (FIG. 20B). Finally, since RDN variants cannot directly generate DSBs, it is anticipated that the likelihood of inducing translocations, large deletions, or p53 activation will be greatly reduced compared to nuclease-based genome editing methods. Additional studies using are needed to fully characterize the scope of cellular responses to targeted nicks compared to targeted DSBs.

It is anticipated that RDN(A89E) or RDN(S208E A209D) will be useful for applications in which efficiency or cleanliness of genome editing are critical. Recent work whereby saturation genome editing was performed to investigate variants of unknown significance in BRCA1⁶⁸highlight the utility of a tool with the ability to generate mutations with single nucleotide resolution. Nuclease-mediated approaches to saturation editing can only be performed on essential genes because of the requirement that cells in which indels are induced must be excluded from the analysis. The favorable HDR:indel ratio and HDR efficiency offered by RDN may permit mutagenesis with nucleotide-level resolution on non-essential genes.

Methods

Plasmid cloning. All mammalian cell expression plasmids were constructed by USER cloning from gBlock gene fragments (Integrated DNA Technologies) with USER junctions sized between 14 and 20 nucleotides⁶⁹. Phusion U Green Multiplex PCR Master Mix (ThermoFisher) was used for amplification of DNA. sgRNA plasmids were constructed by blunt end ligation of a linear PCR product generated by encoding the 20-nt variable protospacer sequence onto the 5′ end of an amplification primer and treating the resulting piece to KLD Enzyme Mix (New England Biolabs) according to the manufacturers' instruction. Mach1 chemically competent E. coli (ThermoFisher) cells were used.

Preparation of plasmids for mammalian cell transfection. To obtain endotoxin-free plasmids for transfection, 45 mL of Mach1 cells expressing freshly-transformed plasmid were pelleted by centrifugation (6000×g, 10 minutes, 4° C.) and purified using ZymoPURE II Plasmid Midi Prep Kits (Zymo Research), according to the manufacturer's instructions with the inclusion of the optional step of passing the plasmid across the EndoZero Spin column (Zymo Research). Plasmid yield was quantified using a Nanodrop and by electrophoresis on a 1% agarose Tris/Borate/EDTA gel supplemented with ethidium bromide.

Mammalian cell culture. All cells were cultured and maintained at 37° C. with 5% CO₂. Antibiotics were not used for cell culture. HEK293T cells (ATCC CRL-3216) and HeLa cells (ATCC CCL-2) were cultured in Dulbecco's modified Eagle's medium (DMEM) plus GlutaMax (ThermoFisher) supplemented with 10% (v/v) fetal bovine serum (FBS). K562 cells (ATCC CCL-243) were cultured in Roswell Park Memorial Institute (RPMI) 1640 Medium plus GlutaMax (ThermoFisher) supplemented with 10% (v/v) FBS. U2OS cells were cultured in MyCoy's 5A Medium plus GlutaMax (ThermoFisher) supplemented with 10% (v/v) FBS.

hiPS cells (human episomal iPS cell line; A18945; ThermoFisher) were cultured in Essential 8 Flex Medium (ThermoFisher) supplemented with RevitaCell after passaging (ThermoFisher) according to the manufacturer's directions. Versene (Thermo Fisher) was used for cell passaging and dissociation. Prior to nucleofection, cells were harvested with Accutase (ThermoFisher).

For data shown in FIGS. 21I and 21J, nuclease expression plasmids were constructed whereby the Cas-enzyme construct (Cas9 or RDN(A89E)) was proceeded by P2A-GFP to enable isolation of transfected cells. iPS cells were flow sorted at the MIT FACS core 3-5 days after nucleofection and genomic DNA was isolated directly after sorting.

Mammalian cell lipofection and genomic DNA isolation. HEK293T cells were seeded on 48-well poly-D-lysine coated plates (Corning) 16-20 hours before lipofection. Lipofection was performed at a cell density of 65%. Unless otherwise stated, cells were transfected with 231 ng of nuclease-editor or base-editor expression plasmid DNA, 69 ng of sgRNA expression plasmid DNA, 50 ng (1.51 pmol) 100-nt ssODN (PAGE-purified; Integrated DNA Technologies) and 1.4 μL Lipofectamine 2000 (ThermoFisher) per well. For experiments where global inhibition or over-expression of a cellular HDR-component was performed 100 ng of the appropriate plasmid was included. Cells were harvested 4 days post-transfection and genomic DNA isolation and purification was performed with Agincort DNAdvance Kit (Beckman Coulter), according to the manufacturer's protocol. Size-selective DNA purification was necessary to prevent contamination of gDNA with donor ssODN HDR templates. For analysis of indel formation in FIGS. 22A-22B, HeLa and U2OS cells were transfected according to the above protocol except they were transfected at a density of 80% with 1.4 μL Lipofectamie 3000 and 1 μL of P3000 (ThermoFisher) per well.

Nucleofection of mammalian cells. For data generated in FIGS. 21A-21J, nucleofection of K562, HeLa and U2OS cells was performed. For these three cell types, 350 ng nuclease-expression plasmid, 150 ng sgRNA-expression plasmid and 200 pmol (6.6 μg) 100-nt ssODN (PAGE-purified; Integrated DNA Technologies) was nucleofected in a final volume of 20 μL per sample in a 16-well Nucleocuvette strip (Lonza). K562 cells were nucleofected using the SF Cell Line 4D-Nucleofector X Kit (Lonza) with 5×10⁵cells per sample (program FF-120), according to the manufacturers protocol. U2OS cells were nucleofected using the SE Cell Line 4D-Nucleofector X Kit (Lonza) with 3-4×10⁵cells per sample (program DN-100), according to the manufacturer's protocol. HeLa cells were nucleofected using the SE Cell Line 4D-Nucleofector X Kit (Lonza) with 2×10⁵cells per sample (program CN-114), according to the manufacturer's protocol. Cells were harvested 48 hours after nucleofection; genomic DNA was purified using the Agincort DNAdvance Kit (Beckman Coulter), according to the manufacturer's protocol.

hiPS cells were nucleofected with 400 ng nuclease-expression plasmid, 400 ng sgRNA-expression plasmid and 200 pmol (6.6 μg) 100-nt ssODN (PAGE-purified; Integrated DNA Technologies) in a final volume of 20 μL per sample in a 16-well Nucleocuvette strip (Lonza) using the CB-150 program in the P3 Primary Cell 4D-Nucleofector X Kit (Lonza) with 0.75-1.5×10⁶cells per sample.

Preparation of genomic DNA for high throughput sequencing (HTS). Sites of interest were amplified using the primers listed (Table 3). Amplification primers for the first PCR reaction (PCR1) were designed with primer2 and had 5′ extensions to enable amplification with an Illumina barcoding primer in a second PCR reaction (PCR2). Phusion U Green Multiplex PCR Master Mix (Thermo-Fisher) was used for both PCR1 and PCR2. For PCR1, each reaction contained 0.5 μM of the appropriate forward and reverse primer (Table 3) and 30-100 ng of genomic DNA was as a template. Cycling conditions were 98° C. for 1 minute and 30 seconds, then 30 cycles of (98° C. for 10 seconds, 61° C. for 15 seconds, and 72° C. for 15 seconds) followed by a final extension of 1 minute at 72° C. per 30 μL reaction. PCR1 products were verified on a 2% agarose gel Tris/Borate/EDTA gel supplemented with ethidium bromide. For PCR2, 1 μL of unpurified PCR1 plus 0.5 μM of each of a unique forward and reverse barcoding primer pair were added to each sample for a final volume of 30 μL. Cycling conditions were 98° C. for 1 minute and 30 seconds, then 7 cycles of (98° C. for 10 seconds, 61° C. for 15 seconds, and 72° C. for 15 seconds) followed by a final extension of 1 minute at 72° C. PCR2 products were purified by gel electrophoresis on a 2% agarose gel using the QIAquick Gel Extraction Kit (Qiagen). Purified product was passed over a second Minelute column (Qiagen) for a further round of purification before quantification with QBit ssDNA HS Assay Kit (ThermoFisher) and sequenced using an Illumina MiSeq with 230-270-bp single end reads according to the manufacturer's instructions.

Analysis of HTS data. Demultiplexing of pooled sequencing reads was per-formed using the MiSeq Reporter software (Illumina). Crispresso-v2⁴⁶was used to perform alignments between sequenced amplicons and reference amplicons. Indels were quantified in a 10-bp window surrounding the expected cut site for each sgRNA. For quantification of HDR, reads that contained indels from the alignment to the reference sequence were discarded using “discard-indel-reads” filter. This approach ensured that reads containing both an SNP incorporated through HDR and an indel as an HDR event were not erroneously, as has been previously described¹⁸. The resulting alignment contained only reads that do not contain an indel within the 10-bp window around the sgRNA cleavage site. Separately from the alignment matrix, the output of Crispresso-v2 reported the percentage of reads that had been excluded from the alignment because they contained an indel (% cells with indel). For each target point mutation that was incorporated via HDR, the alignment alone could be used to determine the % of non-indel containing cells (% indel-free cells with target mutation) that had successfully incorporated the target mutation. In order to assess the % of all cells that had the target mutation, the following correction was performed:

$% Cells with target mutation = % indel free cells with target mutation \times \frac{100 % - % cells with indel}{100 %}$

For calculation of HDR:indel ratio, the % cells with indel-free HDR at the indicated sequence was divided by the % cells with an indel in the 10-bp window surrounding the cleavage site. For experiments with HEK293T cells, where robust (>1%) HDR and indel percentages were detectable for many conditions, HDR:indel ratios were not calculated if HDR frequency was less than 1% for a particular sample, to avoid reporting artificially high HDR:indel ratios that could accompany very low frequency events. For the data shown in FIGS. 21A-21J, HDR and indel frequencies were measured in cell types less able than HEK293T cells to support HDR. For these instances, an HDR:indel ratio was not reported if the HDR frequency was <0.1% for the same reason. For calculations in the text in which averages across sites were made, if an HDR:indel ratio was not calculated due to a low HDR rate, then the HDR:indel ratio was set to zero when calculating the mean to avoid artificially inflating HDR:indel ratios.

Data Availability. Plasmids encoding the constructs used in this example are available on Addgene, accession numbers can be found in Table 6. The accession numbers have been listed in Supplementary Table 6. The source data underlying FIGS. 17A-17F, 18A-18E, 19A-19F, 20A-20B, and 21A-21E and FIGS. 22A-22B, 24A-24D, 25, 26A-26J, and 27A-27B are provided. High-throughput DNA sequencing data has been deposited in the NCBI Sequence Read Archive with BioProject accession number PRJNA515942 (SRP180368).

REFERENCES

1. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420-424, doi:10.1038/nature17946 (2016).

2. Kim, Y. B. et al. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotech. 35, 371-376, doi:10.1038/nbt.3803 (2017).

3. Komor, A. C. et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A genome editors with higher efficiency and product purity. Sci Adv 3, eaao4774, doi:10.1126/sciadv.aao4774 (2017).

4. Rees, H. A. et al. Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery. Nat Commun 8, 15790, doi:10.1038/ncomms15790 (2017).

5. Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, doi:10.1126/science.aaf8729 (2016).

6. Esvelt, K. M., Carlson, J. C. & Liu, D. R. A system for the continuous directed evolution of biomolecules. Nature 472, 499-503, doi:10.1038/nature09929 (2011).

7. Carlson, J. C., Badran, A. H., Guggiana-Nilo, D. A. & Liu, D. R. Negative selection and stringency modulation in phage-assisted continuous evolution. Nat Chem Biol 10, 216-222, doi:10.1038/nchembio.1453 (2014).

8. Leconte, A. M. et al. A population-based experimental model for protein evolution: effects of mutation rate and selection stringency on evolutionary outcomes. Biochemistry 52, 1490-1499, doi:10.1021/bi3016185 (2013).

9. Hubbard, B. P. et al. Continuous directed evolution of DNA-binding proteins to improve TALEN specificity. Nat Methods 12, 939-942, doi:10.1038/nmeth.3515 (2015).

10. Bryson, D. I. et al. Continuous directed evolution of aminoacyl-tRNA synthetases. Nat Chem Biol, doi:10.1038/nchembio.2474 (2017).

11. Gaudelli, N. M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464-471, doi:10.1038/nature24644 (2017).

12. Rouet, P., Smih, F. & Jasin, M. Introduction of double-strand breaks into the genome of mouse cells by expression of a rare-cutting endonuclease. Mol. Cell Biol. 14, 8096-8106 (1994).

13. Doudna, J. A. & Charpentier, E. Genome editing. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096 (2014).

14. Gaudelli, N. M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464-471 (2017).

15. Komor, A. C., Badran, A. H. & Liu, D. R. CRISPR-based technologies for the manipulation of eukaryotic genomes. Cell 168, 20-36 (2017).

16. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420-424 (2016).

17. Rouet, P., Smih, F. & Jasin, M. Expression of a site-specific endonuclease stimulates homologous recombination in mammalian cells. Proc. Natl Acad. Sci. USA 91, 6064-6068 (1994).

18. Paquet, D. et al. Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9. Nature 533, 125-129 (2016).

19. Chapman, J. R., Taylor, M. R. & Boulton, S. J. Playing the end game: DNA double-strand break repair pathway choice. Mol. Cell 47, 497-510 (2012).

20. Richardson, C. & Jasin, M. Frequent chromosomal translocations induced by DNA double-strand breaks. Nature 405, 697-700 (2000).

21. Kosicki, M., Tomberg, K. & Bradley, A. Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions and complex rearrangements. Nat. Biotechnol. 36, 765-771 (2018).

22. Haapaniemi, E., Botla, S., Persson, J., Schmierer, B. & Taipale, J. CRISPR-Cas9 genome editing induces a p53-mediated DNA damage response. Nat. Med. 24, 927-930 (2018).

23. Ihry, R. J. et al. p53 inhibits CRISPR-Cas9 engineering in human pluripotent stem cells. Nat. Med. 24, 939-946 (2018).

24. Maizels, N. & Davis, L. Initiation of homologous recombination at DNA nicks. Nucleic Acids Res. 46, 6962-6973 (2018).

25. Caldecott, K. W. Single-strand break repair and genetic disease. Nat. Rev. Genet. 9,619-631 (2008).

26. Lindahl, T. Instability and decay of the primary structure of DNA. Nature 362,709-715 (1993).

27. Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet., https://doi.org/10.1038/s41576-018-0059-1 (2018).

28. Davis, L. & Maizels, N. Homology-directed repair of DNA nicks via pathways distinct from canonical double-strand break repair. Proc. Natl Acad. Sci. USA 111, E924-E932 (2014).

29. Davis, L., Zhang, Y. & Maizels, N. Assaying repair at DNA nicks. Methods Enzymol. 601, 71-89 (2018).

30. Ramirez, C. L. et al. Engineered zinc finger nickases induce homology-directed repair with reduced mutagenic effects. Nucleic Acids Res. 40, 5560-5568 (2012).

31. Kan, Y., Ruis, B., Takasugi, T. & Hendrickson, E. A. Mechanisms of precise genome editing using oligonucleotide donors. Genome Res. 27, 1099-1111 (2017).

32. Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339,823-826 (2013).

33. Davis, L. & Maizels, N. Two distinct pathways support gene correction by single-stranded donors at DNA nicks. Cell Rep. 17, 1872-1881 (2016).

34. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013).

35. Bothmer, A. et al. Characterization of the interplay between DNA repair and CRISPR/Cas9-induced DNA lesions at an endogenous locus. Nat. Commun. 8, 13905 (2017).

36. Pellegrini, L. et al. Insights into DNA recombination from the structure of a RAD51-BRCA2 complex. Nature 420,287-293 (2002).

37. Yu, D. S. et al. Dynamic control of Rad51 recombinase by self-association and interaction with BRCA2. Mol. Cell 12, 1029-1041 (2003).

38. Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337,816-821 (2012).

39. Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563,646-651 (2018).

40. Payen, C., Koszul, R., Dujon, B. & Fischer, G. Segmental duplications arise from Pol32-dependent repair of broken forks through two alternative replication-based mechanisms. PLoS Genet. 4, e1000175 (2008).

41. Shen, B. et al. Efficient genome modification by CRISPR-Cas9 nickase with minimal off-target effects. Nat. Methods 11,399-402 (2014).

42. Ran, F. A. et al. Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity. Cell 154, 1380-1389 (2013).

43. Miyaoka, Y. et al. Systematic quantification of HDR and NHEJ reveals effects of locus, nuclease, and cell type on genome-editing. Sci. Rep. 6, 23549 (2016).

44. Roth, T. L. et al. Reprogramming human T cell function and specificity with non-viral genome targeting. Nature 559,405-409 (2018).

45. Pinello, L. et al. Analyzing CRISPR genome-editing experiments with CRISPResso. Nat. Biotechnol. 34,695-697 (2016).

46. Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37,224-226 (2019).

47. Paulsen, B. S. et al. Ectopic expression of RAD52 and dn53BP1 improves homology-directed repair during CRISPR-Cas9 genome editing. Nat. Biomed. Eng. 1,878-888 (2017).

48. Canny, M. D. et al. Inhibition of 53BP1 favors homology-dependent DNA repair and increases CRISPR-Cas9 genome-editing efficiency. Nat. Biotechnol. 36, 95-102 (2018).

49. Lin, S., Staahl, B. T., Alla, R. K. & Doudna, J. A. Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery. Elife 3, e04766 (2014).

50. San Filippo, J., Sung, P. & Klein, H. Mechanism of eukaryotic homologous recombination. Annu. Rev. Biochem. 77,229-257 (2008).

51. Schlacher, K. et al. Double-strand break repair-independent role for BRCA2 in blocking stalled replication fork degradation by MRE11. Cell 145,529-542 (2011).

52. Stark, J. M. et al. ATP hydrolysis by mammalian RAD51 has a key role during homology-directed DNA repair. J. Biol. Chem. 277, 20185-20194 (2002).

53. Kim, T. M. et al. RAD51 mutants cause replication defects and chromosomal instability. Mol. Cell Biol. 32, 3663-3680 (2012).

54. Prasad, T. K., Yeykal, C. C. & Greene, E. C. Visualizing the assembly of human Rad51 filaments on double-stranded DNA. J. Mol. Biol. 363, 713-728 (2006).

55. Mason, J. et al. Non-enzymatic roles of human RAD51 at stalled replication forks. BioRxiv. //doi.org/10.1101/359380 (2018).

56. Marsden, C. G. et al. The tumor-associated variant RAD51 G151D induces a hyper-recombination phenotype. PLoS Genet 12, e1006208 (2016).

57. Kwart, D., Paquet, D., Teo, S. & Tessier-Lavigne, M. Precise and efficient scarless genome editing in stem cells using CORRECT. Nat. Protoc. 12, 329-354 (2017).

58. Yang, H. et al. BRCA2 function in DNA binding and recombination from a BRCA2-DSS1-ssDNA structure. Science 297, 1837-1848 (2002).

59. Yang, H., Li, Q., Fan, J., Holloman, W. K. & Pavletich, N. P. The BRCA2 homologue Brh2 nucleates RAD51 filament formation at a dsDNA-ssDNA junction. Nature 433, 653-657 (2005).

60. Liu, J., Doty, T., Gibson, B. & Heyer, W. D. Human BRCA2 protein promotes RAD51 filament formation on RPA-covered single-stranded DNA. Nat. Struct. Mol. Biol. 17, 1260-1262 (2010).

61. Ma, C. J., Kwon, Y., Sung, P. & Greene, E. C. Human RAD52 interactions with replication protein A and the RAD51 presynaptic complex. J. Biol. Chem. 292, 11702-11713 (2017).

62. Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol. 33, 187-197 (2015).

63. Rees, H. A. et al. Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery. Nat. Commun. 8, 15790 (2017).

64. Vakulskas, C. A. et al. A high-fidelity Cas9 mutant delivered as a ribonucleoprotein complex enables efficient gene editing in human hematopoietic stem and progenitor cells. Nat. Med. 24, 1216-1224 (2018).

65. Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57-63 (2018).

66. Richardson, C. D. et al. CRISPR-Cas9 genome editing in human cells occurs via the Fanconi anemia pathway. Nat. Genet. 50, 1132-1139 (2018).

67. Tay, Y., Tan, S. M., Karreth, F. A., Lieberman, J. & Pandolfi, P. P. Characterization of dual PTEN and p53-targeting microRNAs identifies microRNA-638/Dnm2 as a two-hit oncogenic locus. Cell Rep. 8, 714-722 (2014).

68. Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217-222 (2018).

69. Badran, A. H. et al. Continuous evolution of Bacillus thuringiensis toxins overcomes insect resistance. Nature 533, 58-63 (2016).

70. Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature, doi:10.1038/s41586-018-0686-x (2018).

71. Gaudelli, N. M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464-471, doi:10.1038/nature24644 (2017).

72. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420-424, doi:10.1038/nature17946 (2016).

73. Koblan, L. W. et al. Improving cytidine and adenine genome editors by expression optimization and ancestral reconstruction. Nat Biotechnol 36, 843-846, doi:10.1038/nbt.4172 (2018).

74. Komor, A. C. et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A genome editors with higher efficiency and product purity. Sci Adv 3, eaao4774, doi:10.1126/sciadv.aao4774 (2017).

EQUIVALENTS AND SCOPE

In the claims articles such as “a,” “an,” and “the” may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Claims or descriptions that include “or” between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process.

Furthermore, the invention encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims is introduced into another claim. For example, any claim that is dependent on another claim can be modified to include one or more limitations found in any other claim that is dependent on the same base claim. Where elements are presented as lists, e.g., in Markush group format, each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should it be understood that, in general, where the invention, or aspects of the invention, is/are referred to as comprising particular elements and/or features, certain embodiments of the invention or aspects of the invention consist, or consist essentially of, such elements and/or features. For purposes of simplicity, those embodiments have not been specifically set forth in haec verba herein. It is also noted that the terms “comprising” and “containing” are intended to be open and permits the inclusion of additional elements or steps. Where ranges are given, endpoints are included. Furthermore, unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or sub-range within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise.

This application refers to various issued patents, published patent applications, journal articles, and other publications, all of which are incorporated herein by reference. If there is a conflict between any of the incorporated references and the instant specification, the specification shall control. In addition, any particular embodiment of the present invention that falls within the prior art may be explicitly excluded from any one or more of the claims. Because such embodiments are deemed to be known to one of ordinary skill in the art, they may be excluded even if the exclusion is not set forth explicitly herein. Any particular embodiment of the invention can be excluded from any claim, for any reason, whether or not related to the existence of prior art.

Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation many equivalents to the specific embodiments described herein. The scope of the present embodiments described herein is not intended to be limited to the above Description, but rather is as set forth in the appended claims. Those of ordinary skill in the art will appreciate that various changes and modifications to this description may be made without departing from the spirit or scope of the present invention, as defined in the following claims.

CONSTRUCTS FOR IMPROVED HDR-DEPENDENT GENOMIC EDITING

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

RELATED APPLICATIONS

PCT Information

Provisional Applications (1)