Described herein are fusion proteins containing adenosine deaminases, cytidine deaminases, catalytically impaired CRISPR-Cas proteins (e.g., Cas9, CasX or Cas12 nucleases), linkers, nuclear localization signals (NLSs) and uracil-n-glycosylase inhibitors (UGIs) that enable the CRISPR-guided programmable introduction of simultaneous A-to-G (T-to-C) and C-to-T (G-to-A) substitutions in DNA.
DNA base editors represent a new class of genome editing tools that enable the programmable installation of single or multiple base substitutions. Current cytosine (CBE) and adenine (ABE) generations of base editors allow for the targeted deamination of cytosines and adenines that get exposed on ssDNA by RNA-guided CRISPR-Cas proteins1-4. The majority of disease-associated genetic perturbations known to date are point mutations, also known as single nucleotide variants (SNVs). Current iterations of CBEs and ABEs can target disease-relevant transition mutations and revert them to the original genotype, e.g., correcting G-to-A (C-to-T) mutations using ABE. However, if the experimental or therapeutic goal of base editing is to modify amino acid (AA) residues, both CBEs and ABEs are limited.
Fusion proteins that contain both adenine and cytidine deaminases expand the potential for AA modification by enabling the programmable alteration of one to three neighboring codons by installing both A-to-G and C-to-T mutations side-by-side. These bifunctional adenine and cytosine editors (BACE) also allow for the correction of double or triple nucleotide variants (DNVs or TNVs) that are associated with disease.
Described herein are CRISPR-guided bifunctional adenine and cytosine base editors (BACEs) that enable the simultaneous installation of adenine-to-guanine and cytosine-to-thymine base edits within the same editing window at the ssDNA bubble generated by RNA-guided fusion proteins that contain both adenine (e.g., E. coli TadA) and cytosine (e.g., pmCDA1 or rAPOBEC1) deaminases as well as CRISPR-Cas proteins (e.g., S. pyogenes Cas9). The exemplary SpCas9-based synchronous programmable adenine and cytosine base editor (SPACE) fusion protein system described herein comprises a programmable DNA-binding domain fused to an adenosine deaminase, e.g., E. coli TadA or previously described engineered TadA variants to decrease RNA editing activity while still preserving DNA editing activity (SECURE or RRE variants) as well as to a cytidine deaminase, e.g., pmCDA1 or rAPOBEC1 or human APOBEC3A or human AID or human APOBEC3G or previously described engineered variants of these deaminases (e.g., rAPOBEC1 mutations from SECURE-BE3) with reduced RNA editing activity and preserved DNA editing capabilities5-9.
Thus, provided herein are bifunctional adenine and cytosine base editor (BACEs) comprising:
(i) an adenosine deaminase domain, e.g., a wild type (SEQ ID NO: 98) and/or engineered adenosine deaminase TadA monomer or dimer (e.g., homodimeris or heterodimeric TadA domains from ABEmax (SEQ ID NO:226), ABE7.10 (SEQ ID NO:227), or ABE8e (SEQ ID NO: 145); other options include monomer or dimer TadAs from ABEs 0.1, 0.2, 1.1, 1.2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 2.10, 2.11, 2.12, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 4.1, 4.2, 4.3, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 5.10, 5.11, 5.12, 5.13, 5.14, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 7.10 (SEQ ID NO: 139), or ABEmax (SEQ ID NO: 228), or ABE8.8 (SEQ ID NO: 148), ABE8.13 (SEQ ID NO: 149), ABE8.17 (SEQ ID NO: 150), ABE8.20 (SEQ ID NO: 151), ABE8e (SEQ ID NO: 145)—as well as K20A/R21A, V82G, or V106W variants thereof), E. coli TadA monomer, or homo- or heterodimers thereof fused to the N or C terminus, bearing one or more mutations in either or both monomers (e.g., the TadA mutant used in miniABEmax-V82G (SEQ ID 223), miniABEmax-K20A/R21A (SEQ ID 224), miniABEmax-V106W (SEQ ID 225) or any other variant from Tables C, N, and O), that decrease RNA editing activity while preserving DNA editing activity;
(ii) a cytidine deaminase from Tables A and B (e.g., pmCDA1, rat APOBEC1, human APOBEC3A, or human AID) or variations thereof with reduced RNA off-target editing, one or multiple uracil-n-clycosalyse inhibitors (UGIs); and
(iii) a programmable DNA binding domain (e.g., Cas9-D10A); and (iv) optionally further comprising one or more nuclear localization sequences (e.g., NLSs such as a bipartite NLS comprising the sequence KRTADGSEFEPKKKRKV (SEQ ID NO:229); an SV40 large T antigen NLS (PKKKRRV (SEQ ID NO:221)); or a nucleoplasmin NLS (KRPAATKKAGQAKKKK (SEQ ID NO:222)).
Exemplary BACEs include those provided in SEQ ID NOs:140-144.
In some embodiments, the adenosine deaminase comprises one or more mutations corresponding to E. coli TadA mutations in one or more TadA monomers shown in Table N, or an homologue or orthologue thereof (e.g., a TadA protein in Table C).
In some embodiments, the cytidine deaminase rat APOBEC1 (or any one of its ortho- or paralogues listed in Tables A or B) bears one or more mutations that decrease RNA editing activity while preserving DNA editing activity, wherein the mutations are at amino acid positions that correspond to residues P29, R33, K34, E181, and/or L182 of rat apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 1 (rAPOBEC1, SEQ ID NO:67).
In some embodiments, the cytidine deaminase rat APOBEC1 (e.g., or any one of its ortho- or paralogues listed in Tables A or B) bears one or more mutations at positions: P29F, P29T, R33A, K34A, R33A+K34A (double mutant), E181Q and/or L182A of SEQ ID NO:67 (rAPOBEC1, Rattus norvegicus APOBEC1).
In some embodiments, the BACE further include one or more mutations at its cytidine deaminase rat APOBEC1 (or any one of its ortho- or paralogues listed in Tables A or B) residues corresponding to E24, V25; R118, Y120, H121, R126; W224-K229; P168-1186; L173+L180; R15, R16, R17, to K15-17 & A15-17; Deletion E181-L210; P190+P191; Deletion L210-K229 (C-terminal); and/or Deletion S2-L14 (N-terminal) of SEQ ID NO:67, Table O.
In some embodiments, the BACE includes a linker between the adenosine deaminase monomer and/or between the adenosine deaminase monomer or single-chain dimers and the programmable DNA binding domain.
In some embodiments, the BACE includes a linker between the programmable DNA binding domain and the cytidine deaminase monomer (e.g., pmCDA1 or rAPOBEC1 or hA3A or hAID) or dimer.
In some embodiments, the BACE includes an N-terminal adenosine deaminase fusion (e.g., mutant TadA* monomer or dimer) and a C-terminal cytidine deaminase fusion (e.g., pmCDA1 or rAPOBEC1 or hA3A or AID).
In some embodiments, the BACE includes an N-terminal cytidine deaminase fusion (e.g., pmCDA1 or rAPOBEC1 or hA3A or AID) and a C-terminal adenosine deaminase fusion (e.g., mutant TadA* monomer or dimer).
In some embodiments, the BACE includes a heterodimeric combined N-terminal adenosine and cytidine deaminase fusion (e.g., pmCDA1 or rAPOBEC1 or hA3A or AID fused to TadA monomers or dimers with a linker) or a heterodimeric combined C-terminal adenosine and cytidine deaminase fusion (e.g., pmCDA1 or rAPOBEC1 or hA3A or AID fused to TadA monomers or dimers with a linker). In both N- and C-terminal positions of these “hybrid fusion deaminase designs” the deaminases can be fused in either of these two orders: NH2-cytidine deaminase-linker-adenosine deaminase or NH2-adenosine deaminase-linker-cytidine deaminase.
In some embodiments, the programmable DNA binding domain is selected from the group consisting of engineered C2H2 zinc-fingers, transcription activator effector-like effectors (TALEs), and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) Cas RNA-guided nucleases (RGNs) and variants thereof (e.g., Tables L and M) (e.g., a “SPACE” as described herein).
In some embodiments, the CRISPR RGN is an ssDNA nickase or is catalytically inactive, e.g., a Cas9, CasX or Cas12a that has ssDNA nickase activity or is catalytically inactive.
Further provided herein are base editing systems comprising: (i) a BACE as described herein, wherein the programmable DNA binding domain is a CRISPR Cas RGN or a variant thereof; and (ii) at least one guide RNA compatible with the base editor that directs the base editor to a target sequence.
Also described herein are isolated nucleic acids encoding any of the BACEs or base editing systems described herein, as well as vectors comprising an isolated nucleic acid described herein.
Also provided are isolated host cells, preferably mammalian (e.g., human) host cells, comprising any of the nucleic acids described herein. In some embodiments, the isolated host cells described herein express a BACE as described herein.
Further, provided herein are methods for deaminating a selected adenine and/or cytosine in a nucleic acid; the methods include contacting the nucleic acid with a BACE, a base editing system, an isolated nucleic acid, a vector, or an isolated host cell described herein.
Additionally provided herein are compositions comprising a purified BACE, a base editing system, an isolated nucleic acid, a vector, and/or an isolated host cell described herein. In some embodiments, the composition includes one or more ribonucleoprotein (RNP) complexes, e.g., comprising a BACE and compatible gRNA.
Also provided herein are methods of inducing an amino acid change in a polypeptide. The methods include contacting a nucleotide that encodes the polypeptide the amino acid sequence of which is to be changed with a BACE, a base editing system, an isolated nucleic acid, a vector, or an isolated host cell described herein, preferably wherein the amino acid change comprises one of the amino acid changes listed in Table D, and optionally wherein the amino acid change is one that can or cannot be targeted by CBE and/or ABE.
In some embodiments, the BACE is used to correct or model (create) specific disease-related mutations as shown in Table E-K.
In some embodiments, the BACE or SPACE comprises one or more uracil-N-glycosylase inhibitors (UGIs). In some embodiments, the BACE or SPACE comprise a linker between the adenosine deaminase and the programmable DNA binding domain as well as the cytidine deaminase and the DNA binding domain. In some embodiments, the TadA domain can be monomeric, homodimeric or heterodimeric and contain all combinations of wild type (WT) E. coli TadA, or mutant variants of TadA.
In some embodiments the two deaminase domains can be located at the C-terminus (e.g., pmCDA1) and N-terminus (TadA) or vice versa or they can both be located at the C- or N-terminus.
In some embodiments, the programmable DNA binding domain is selected from the group consisting of engineered C2H2 zinc-fingers, transcription activator effector-like effectors (TALEs), and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas RNA-guided nucleases (RGNs) and variants thereof.
In some embodiments, the CRISPR-Cas RGN is an ssDNA nickase or is catalytically inactive, e.g., a Cas9 or Cas12a that is catalytically inactive or has ssDNA nickase activity.
Also provided herein are base editing systems comprising (i) the adenine base editors described herein, wherein the programmable DNA binding domain is a CRISPR Cas RGN or a variant thereof; and (ii) at least one guide RNA compatible with the base editor that directs the base editor to a target sequence.
Also provided are isolated nucleic acids encoding the adenine base editors; vectors comprising the isolated nucleic acids; and isolated host cells, preferably mammalian host cells, comprising the nucleic acids. In some embodiments, the isolated host cell expresses an adenine base editor.
Further, provided herein are methods for deaminating a selected adenine in a nucleic acid, the method comprising contacting the nucleic acid with an adenine base editor or base editing system as described herein.
Also provided are compositions comprising a purified adenine base editor or base editing system as described herein. In some embodiments, the composition comprises one or more ribonucleoprotein (RNP) complexes, e.g., comprising a SPACE and compatible guide RNA.
These methods can be used for generating two or more sets of nucleic acids, each set comprising a plurality of sequences, wherein each set comprises one or more nucleic acids having the same sequence, and wherein each set differs from each of the other sets by at least one nucleotide. These methods include (i) providing a first nucleic acid comprising a first sequence, e.g., a reference or wild type sequence; (ii) contacting the first nucleic acids with a BACE as described herein, wherein the programmable DNA binding domain is a CRISPR Cas RGN or a variant thereof; and a least one guide RNA compatible with the base editor that directs the base editor to modify a selected nucleotide in the first sequence; and (iii) isolating a second nucleic acid comprising a sequence comprising the selected modification in the nucleotide sequence, to provide a second set of nucleic acids. The methods can include amplifying the second nucleic acids. Steps (i)-(iii) can optionally be repeated until a desired number of sets is obtained, e.g., until enough sets are obtained to include at least one set with a mutation at each position in a selection region of the sequence of the nucleic acid.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Methods and materials are described herein for use in the present invention; other, suitable methods and materials known in the art can also be used. The materials, methods, and examples are illustrative only and not intended to be limiting. All publications, patent applications, patents, sequences, database entries, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control.
Other features and advantages of the invention will be apparent from the following detailed description and figures, and from the claims.
ABEs install A-to-G transitions in DNA while CBEs allow for the installation of C-to-T mutations. However, a certain subset of mutations (double or triple nucleotide variants), as well as amino acid mutations that cannot be targeted with available CBEs and ABEs (e.g., transversion mutations) evade current base editing strategies.
We sought to investigate if the combination of two separate deaminases in one BE architecture would allow for efficient parallel A-to-G and C-to-T editing in one editing window in cells treated with this bifunctional adenine and cytosine editor (BACE). In its first iteration, we fused a mutant E. coli TadA monomer (adenosine deaminase), previously described in the context of miniABEmax-V82G, as well as pmCDA1 (cytidine deaminase) and two UGIs to a catalytically impaired Cas9 (D10A) which resulted in the engineering of the SpCas9-based adenine and cytosine editor (SPACE,
Both the adenosine deaminase and the cytidine deaminase domain used in the exemplary versions of SPACE described herein were chosen due to their relatively small size (e.g., TadA*-V82G monomer instead of TadA-TadA* heterodimer used in WT-ABE) and their dramatically reduced RNA off-target editing (Grunewald et al, Nature Biotechnology 2019;
Thus, described herein are variants of base editor fusion proteins that enable the parallel installation of two distinct transition mutations (C-to-T and A-to-G) simultaneously in one editing window, by fusing two deaminase domains to one DNA-binding protein (Cas9 nickase) and one or multiple uracil-glycosylase inhibitor (UGI) proteins. BACE editing with SPACE enables this combinatorial editing without introducing two separate ABE and CBE base editors into the cell, enabling easier stochiometric delivery of the two deaminases, as well as Cas9 and UGI components as well as a smaller packaging size compared to the delivery of two separate BEs. Additionally, the possibility to install two substitutions side-by-side in one or multiple codons will enable more expanded in vitro and in vivo amino acid and protein modifications. A table of potentially targetable amino acid changes are shown in Table D and a list of potential disease targets (using Cas proteins compatible with NGG, NG, NAA, and NGA PAMs) is shown in Table E-K.
In some embodiments, the adenosine deaminase is TadA from E. coli, or an orthologue from a different prokaryote, e.g., S. aureus, or a homologue from the eukaryotic domain, such as yeast TAD1/2 or a mammalian species such as human (e.g., ADAT2; Table C). The tRNA-specific adenosine deaminase family members have high sequence homology and many of these orthologues may be compatible with one or more of the amino acid substitutions in E. coli TadA expected to cause an RRE phenotype and would be desirable in a SPACE or BACE architecture.
The wild type sequence of wild type E. coli TadA, available in uniprot at P68398, is as follows:
The engineered E. coli TadA sequence present in ABE7.10 and ABEmax is as follows (SEQ ID: 226):
In the most commonly used ABEs (ABE7.10 and ABEmax), these two proteins are fused using a 32 amino acid linker (bolded in sequence below), forming a heterodimer, the sequence of which is as follows (SEQ ID: 227):
SSGGSSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGW
Other exemplary sequences are shown in Table C. These tRNA-specific adenosine deaminase orthologues and homologues also represent candidates for inclusion of the mutations previously described at analogous positions in these proteins.
In some embodiments, the base editors include catalytically dead adenine deaminase variants, e.g., E59A. (Gaudelli et al, 2017, PMID: 29160308) as part of a heterodimer.
In some embodiments, the adenosine deaminase domain (monomeric or dimeric) from ABE8s (also with additional V106W mutation) could be used (Gaudelli et al, Nature Biotechnology 2020, PMID: 32284586; SEQ-IDs: 148-151).
In some embodiments, the adenosine deaminase domain (monomeric or dimeric) from ABE8e (with V82G, K20A/R21A, or V106W mutations) could be used (Richter et al, Nature Biotechnology 2020, DOI: https://doi.org/10.1038/s41587-020-0453-z; SEQ-IDs: 145-147).
A number of cytidine deaminase domains that can be used in the proteins described herein are known in the art. In some embodiments, the cytidine deaminase is pmCDA1 (sea lamprey) or APOBEC1 from rat, or from a different species (Table A), e.g., a different mammalian species such as H. sapiens. The APOBEC, AICDA (AID) and CDA1 family members have high sequence homology and represent potential candidates for BACE and SPACE BE architectures (Table B)2, 10-13.
Specifically, rAPOBEC1, enhanced human A3A, and human AID6, 12, 14 or pmCDA12 (Grünewald et al, Nature Biotechnology, in press, preprint on bioRxiv doi: https://doi.org/10.1101/631721) are candidates for inclusion into the SPACE and BACE architectures. Alternatively, the CBE variants FERNY and evoFERNY (Thuronyi et al, Nature Biotechnology 2019, PMID 31332326; SEQ-IDs: 152-153) as well as the CBE variants YE1, YE2, and YEE (Kim et al, Nature Biotechnology 2017, PMID 28191901; SEQ-IDs: 154-156) could be used to e.g., achieve increased cytosine deamination in a 5′G context (Thuronyi et al) or to achieve less DNA and RNA off-targets (Kim et al).
In some embodiments, the base editors include programmable DNA binding domains such as engineered C2H2 zinc-fingers, transcription activator effector-like effectors (TALEs), and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) Cas RNA-guided nucleases (RGNs) and their variants, including ssDNA nickases (nCas9) or their analogs and catalytically inactive dead Cas9 (dCas9) and its analogs (e.g., as shown in Table L), and any engineered protospacer-adjacent motif (PAM) or high-fidelity variants (e.g., as shown in Table M). A programmable DNA binding domain is one that can be engineered to bind to a selected target sequence. CRISPR-Cas Nucleases Although herein we refer to Cas9, in general any Cas9-like nickase could be used (including the related Cpf1/Cas12a enzyme classes), unless specifically indicated.
See, e.g., Tables L and M. These orthologs, and mutants and variants thereof as known in the art, can be used in any of the fusion proteins described herein. See, e.g., WO 2017/040348 (which describes variants of SaCas9 and SpCas 9 with increased specificity) and WO 2016/141224 (which describes variants of SaCas9 and SpCas 9 with altered PAM specificity).
The Cas9 nuclease from S. pyogenes (hereafter simply Cas9) can be guided via simple base pair complementarity between 17-20 nucleotides of an engineered guide RNA (gRNA), e.g., a single guide RNA or crRNA/tracrRNA pair, and the complementary strand of a target genomic DNA sequence of interest that lies next to a protospacer adjacent motif (PAM), e.g., a PAM matching the sequence NGG or NAG (Shen et al., Cell Res (2013); Dicarlo et al., Nucleic Acids Res (2013); Jiang et al., Nat Biotechnol 31, 233-239 (2013); Jinek et al., Elife 2, e00471 (2013); Hwang et al., Nat Biotechnol 31, 227-229 (2013); Cong et al., Science 339, 819-823 (2013); Mali et al., Science 339, 823-826 (2013c); Cho et al., Nat Biotechnol 31, 230-232 (2013); Jinek et al., Science 337, 816-821 (2012)). The engineered CRISPR from Prevotella and Francisella 1 (Cpf1, also known as Cas12a) nuclease can also be used, e.g., as described in Zetsche et al., Cell 163, 759-771 (2015); Schunder et al., Int J Med Microbiol 303, 51-60 (2013); Makarova et al., Nat Rev Microbiol 13, 722-736 (2015); Fagerdund et al., Genome Biol 16, 251 (2015). Unlike SpCas9, Cpf1/Cas12a requires only a single 42-nt crRNA, which has 23 nt at its 3′ end that are complementary to the protospacer of the target DNA sequence (Zetsche et al., 2015). Furthermore, whereas SpCas9 recognizes an NGG PAM sequence that is 3′ of the protospacer, AsCpf1 and LbCp1 recognize TTTN PAMs that are found 5′ of the protospacer (Id.).
In some embodiments, the present system utilizes a wild type or variant Cas9 protein from S. pyogenes or Staphylococcus aureus, or a wild type or variant Cpf1 protein from Acidaminococcus sp. BV3L6 or Lachnospiraceae bacterium ND2006 either as encoded in bacteria or codon-optimized for expression in mammalian cells and/or modified in its PAM recognition specificity and/or its genome-wide specificity. A number of variants have been described; see, e.g., WO 2016/141224, PCT/US2016/049147, Kleinstiver et al., Nat Biotechnol. 2016 August; 34(8):869-74; Tsai and Joung, Nat Rev Genet. 2016 May; 17(5):300-12; Kleinstiver et al., Nature. 2016 Jan. 28; 529(7587):490-5; Shmakov et al., Mol Cell. 2015 Nov. 5; 60(3):385-97; Kleinstiver et al., Nat Biotechnol. 2015 December; 33(12):1293-1298; Dahlman et al., Nat Biotechnol. 2015 November; 33(11):1159-61; Kleinstiver et al., Nature. 2015 Jul. 23; 523(7561):481-5; Wyvekens et al., Hum Gene Ther. 2015 July; 26(7):425-31; Hwang et al., Methods Mol Biol. 2015; 1311:317-34; Osbom et al., Hum Gene Ther. 2015 February; 26(2):114-26; Konermann et al., Nature. 2015 Jan. 29; 517(7536):583-8; Fu et al., Methods Enzymol. 2014; 546:21-45; and Tsai et al., Nat Biotechnol. 2014 June; 32(6):569-76, inter alia. Concerning rAPOBEC1 itself, a number of variants have been described, e.g., Chen et al, RNA. 2010 May; 16(5):1040-52; Chester et al, EMBO J. 2003 Aug. 1; 22(15):3971-82: Teng et al, J Lipid Res. 1999 April; 40(4):623-35; Navaratnam et al, Cell. 1995 Apr. 21; 81(2):187-95; MacGinnitie et al, J Biol Chem. 1995 Jun. 16; 270(24):14768-75; Yamanaka et al, J Biol Chem. 1994 Aug. 26; 269(34):21725-34. The guide RNA is expressed or present in the cell together with the Cas9 or Cpf1. Either the guide RNA or the nuclease, or both, can be expressed transiently or stably in the cell or introduced as a purified protein or nucleic acid.
In some embodiments, the Cas9 also includes one of the following mutations, which reduce nuclease activity of the Cas9; e.g., for SpCas9, mutations at D10A or H840A (which creates a single-strand nickase).
In some embodiments, the SpCas9 variants also include mutations at one of each of the two sets of the following amino acid positions, which together destroy the nuclease activity of the Cas9: D10, E762, D839, H983, or D986 and H840 or N863, e.g., D10A/D10N and H840A/H840N/H840Y, to render the nuclease portion of the protein catalytically inactive; substitutions at these positions could be alanine (as they are in Nishimasu al., Cell 156, 935-949 (2014)), or other residues, e.g., glutamine, asparagine, tyrosine, serine, or aspartate, e.g., E762Q, H983N, H983Y, D986N, N863D, N863S, or N863H (see WO 2014/152432).
In some embodiments, the Cas9 is fused to one or more SV40 or bipartite (bp) nuclear localization sequences (NLSs) protein sequences; an exemplary (bp) NLS sequence is as follows: (KRTADGSEFES)PKKKRKV (SEQ ID NO: 24); others are known in the art and provided herein. Typically, the NLSs are at the N- and C-termini of a fusion protein, but can also be positioned at the N- or C-terminus, or between the DNA binding domain and the deaminase domain. Linkers as known in the art can be used to separate domains.
Transcription activator like effectors (TALEs) of plant pathogenic bacteria in the genus Xanthomonas play important roles in disease, or trigger defense, by binding host DNA and activating effector-specific host genes. Specificity depends on an effector-variable number of imperfect, typically ˜33-35 amino acid repeats. Polymorphisms are present primarily at repeat positions 12 and 13, which are referred to herein as the repeat variable-diresidue (RVD). The RVDs of TAL effectors correspond to the nucleotides in their target sites in a direct, linear fashion, one RVD to one nucleotide, with some degeneracy and no apparent context dependence. In some embodiments, the polymorphic region that grants nucleotide specificity may be expressed as a triresidue or triplet.
Each DNA binding repeat can include a RVD that determines recognition of a base pair in the target DNA sequence, wherein each DNA binding repeat is responsible for recognizing one base pair in the target DNA sequence. In some embodiments, the RVD can comprise one or more of: HA for recognizing C; ND for recognizing C; HI for recognizing C; HN for recognizing G; NA for recognizing G; SN for recognizing G or A; YG for recognizing T; and NK for recognizing G, and one or more of: HD for recognizing C; NG for recognizing T; NI for recognizing A; NN for recognizing G or A; NS for recognizing A or C or G or T; N* for recognizing C or T, wherein * represents a gap in the second position of the RVD; HG for recognizing T; H* for recognizing T, wherein * represents a gap in the second position of the RVD; and IG for recognizing T.
TALE proteins may be useful in research and biotechnology as targeted chimeric nucleases that can facilitate homologous recombination in genome engineering (e.g., to add or enhance traits useful for biofuels or biorenewables in plants). These proteins also may be useful as, for example, transcription factors, and especially for therapeutic applications requiring a very high level of specificity such as therapeutics against pathogens (e.g., viruses) as non-limiting examples.
Methods for generating engineered TALE arrays are known in the art, see, e.g., the fast ligation-based automatable solid-phase high-throughput (FLASH) system described in U.S. Ser. No. 61/610,212, and Reyon et al., Nature Biotechnology 30, 460-465 (2012); as well as the methods described in Bogdanove & Voytas, Science 333, 1843-1846 (2011); Bogdanove et al., Curr Opin Plant Biol 13, 394-401 (2010); Scholze & Boch, J. Curr Opin Microbiol (2011); Boch et al., Science 326, 1509-1512 (2009); Moscou & Bogdanove, Science 326, 1501 (2009); Miller et al., Nat Biotechnol 29, 143-148 (2011); Morbitzer et al., T. Proc Natl Acad Sci USA 107, 21617-21622 (2010); Morbitzer et al., Nucleic Acids Res 39, 5790-5799 (2011); Zhang et al., Nat Biotechnol 29, 149-153 (2011); Geissler et al., PLoS ONE 6, e19509 (2011); Weber et al., PLoS ONE 6, e19722 (2011); Christian et al., Genetics 186, 757-761 (2010); Li et al., Nucleic Acids Res 39, 359-372 (2011); Mahfouz et al., Proc Natl Acad Sci USA 108, 2623-2628 (2011); Mussolino et al., Nucleic Acids Res (2011); Li et al., Nucleic Acids Res 39, 6315-6325 (2011); Cermak et al., Nucleic Acids Res 39, e82 (2011); Wood et al., Science 333, 307 (2011); Hockemeye et al. Nat Biotechnol 29, 731-734 (2011); Tesson et al., Nat Biotechnol 29, 695-696 (2011); Sander et al., Nat Biotechnol 29, 697-698 (2011); Huang et al., Nat Biotechnol 29, 699-700 (2011); and Zhang et al., Nat Biotechnol 29, 149-153 (2011); all of which are incorporated herein by reference in their entirety.
Zinc finger (ZF) proteins are DNA-binding proteins that contain one or more zinc fingers, independently folded zinc-containing mini-domains, the structure of which is well known in the art and defined in, for example, Miller et al., 1985, EMBO J., 4:1609; Berg, 1988, Proc. Natl. Acad. Sci. USA, 85:99; Lee et al., 1989, Science. 245:635; and Klug, 1993, Gene, 135:83. Crystal structures of the zinc finger protein Zif268 and its variants bound to DNA show a semi-conserved pattern of interactions, in which typically three amino acids from the alpha-helix of the zinc finger contact three adjacent base pairs or a “subsite” in the DNA (Pavletich et al., 1991, Science, 252:809; Elrod-Erickson et al., 1998, Structure, 6:451). Thus, the crystal structure of Zif268 suggested that zinc finger DNA-binding domains might function in a modular manner with a one-to-one interaction between a zinc finger and a three-base-pair “subsite” in the DNA sequence. In naturally occurring zinc finger transcription factors, multiple zinc fingers are typically linked together in a tandem array to achieve sequence-specific recognition of a contiguous DNA sequence (Klug, 1993, Gene 135:83).
Multiple studies have shown that it is possible to artificially engineer the DNA binding characteristics of individual zinc fingers by randomizing the amino acids at the alpha-helical positions involved in DNA binding and using selection methodologies such as phage display to identify desired variants capable of binding to DNA target sites of interest (Rebar et al., 1994, Science, 263:671; Choo et al., 1994 Proc. Natl. Acad. Sci. USA, 91:11163; Jamieson et al., 1994, Biochemistry 33:5689; Wu et al., 1995 Proc. Natl. Acad. Sci. USA, 92: 344). Such recombinant zinc finger proteins can be fused to functional domains, such as transcriptional activators, transcriptional repressors, methylation domains, and nucleases to regulate gene expression, alter DNA methylation, and introduce targeted alterations into genomes of model organisms, plants, and human cells (Carroll, 2008, Gene Ther., 15:1463-68; Cathomen, 2008, Mol. Ther., 16:1200-07; Wu et al., 2007, Cell. Mol. Life Sci., 64:2933-44).
One existing method for engineering zinc finger arrays, known as “modular assembly,” advocates the simple joining together of pre-selected zinc finger modules into arrays (Segal et al., 2003, Biochemistry, 42:2137-48; Beerli et al., 2002, Nat. Biotechnol., 20:135-141; Mandell et al., 2006, Nucleic Acids Res., 34:W516-523; Carroll et al., 2006, Nat. Protoc. 1:1329-41; Liu et al., 2002, J. Biol. Chem., 277:3850-56; Bae et al., 2003, Nat. Biotechnol., 21:275-280; Wright et al., 2006, Nat. Protoc., 1:1637-52). Although straightforward enough to be practiced by any researcher, recent reports have demonstrated a high failure rate for this method, particularly in the context of zinc finger nucleases (Ramirez et al., 2008, Nat. Methods, 5:374-375; Kim et al., 2009, Genome Res. 19:1279-88), a limitation that typically necessitates the construction and cell-based testing of very large numbers of zinc finger proteins for any given target gene (Kim et al., 2009, Genome Res. 19:1279-88).
Combinatorial selection-based methods that identify zinc finger arrays from randomized libraries have been shown to have higher success rates than modular assembly (Maeder et al., 2008, Mol. Cell, 31:294-301; Joung et al., 2010, Nat. Methods, 7:91-92; Isalan et al., 2001, Nat. Biotechnol., 19:656-660). In preferred embodiments, the zinc finger arrays are described in, or are generated as described in, WO 2011/017293 and WO 2004/099366. Additional suitable zinc finger DBDs are described in U.S. Pat. Nos. 6,511,808, 6,013,453, 6,007,988, and 6,503,717 and U.S. patent application 2002/0160940.
In some embodiments, the DBD is fused to one or more Uracil glycosylase inhibitor (UGI) protein sequences; an exemplary UGI sequence is as follows:
Typically, the UGIs are at the C-terminus of a fusion protein as described herein, but can also be positioned at the N-terminus, or between the DNA binding domain and a deaminase domain. Linkers as known in the art can be used to separate domains.
In some embodiments, the components of the fusion proteins are at least 80%, e.g., at least 85%, 90%, 95%, 97%, or 99% identical to the amino acid sequence of a exemplary sequence (e.g., as provided herein), e.g., have differences at up to 1%, 2%, 5%, 10%, 15%, or 20% of the residues of the exemplary sequence replaced, e.g., with conservative mutations, e.g., including or in addition to the mutations described herein. In preferred embodiments, the variant retains a desired activity of the parent, e.g., deaminase activity, and/or the ability to interact with a guide RNA and/or target DNA, optionally with improved specificity or altered substrate specificity.
To determine the percent identity of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). The length of a reference sequence aligned for comparison purposes is at least 80% of the length of the reference sequence, and in some embodiments is at least 90% or 100%. The nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position (as used herein nucleic acid “identity” is equivalent to nucleic acid “homology”). The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences. Percent identity between two polypeptides or nucleic acid sequences is determined in various ways that are within the skill in the art, for instance, using publicly available computer software such as Smith Waterman Alignment (Smith, T. F. and M. S. Waterman (1981) J Mol Biol 147:195-7); “BestFit” (Smith and Waterman, Advances in Applied Mathematics, 482-489 (1981)) as incorporated into GeneMatcher Plus™ Schwarz and Dayhof (1979) Atlas of Protein Sequence and Structure, Dayhof, M. O., Ed, pp 353-358; BLAST program (Basic Local Alignment Search Tool; (Altschul, S. F., W. Gish, et al. (1990) J Mol Biol 215: 403-10), BLAST-2, BLAST-P, BLAST-N, BLAST-X, WU-BLAST-2, ALIGN, ALIGN-2, CLUSTAL, or Megalign (DNASTAR) software. In addition, those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the length of the sequences being compared. In general, for proteins or nucleic acids, the length of comparison can be any length, up to and including full length (e.g., 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or 100%). For purposes of the present compositions and methods, at least 80% of the full length of the sequence is aligned.
For purposes of the present disclosure, the comparison of sequences and determination of percent identity between two sequences can be accomplished using a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.
Conservative substitutions typically include substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine.
Also provided herein are isolated nucleic acids encoding the base editor fusion proteins, vectors comprising the isolated nucleic acids, optionally operably linked to one or more regulatory domains for expressing the variant proteins, and host cells, e.g., mammalian host cells, comprising the nucleic acids, and optionally expressing the variant proteins. In some embodiments, the host cells are stem cells, e.g., hematopoietic stem cells.
In some embodiments, the fusion proteins include a linker between the DNA binding domain (e.g., ZFN, TALE, or nCas9) and the BE domains. Linkers that can be used in these fusion proteins (or between fusion proteins in a concatenated structure) can include any sequence that does not interfere with the function of the fusion proteins. In preferred embodiments, the linkers are short, e.g., 2-20 amino acids, and are typically flexible (i.e., comprising amino acids with a high degree of freedom such as glycine, alanine, and serine). In some embodiments, the linker comprises one or more units consisting of GGGS (SEQ ID NO:135) or GGGGS (SEQ ID NO:136), e.g., two, three, four, or more repeats of the GGGS (SEQ ID NO:137) or GGGGS (SEQ ID NO:138) unit. Other linker sequences can also be used.
In some embodiments, the deaminase fusion protein includes a cell-penetrating peptide sequence that facilitates delivery to the intracellular space, e.g., HIV-derived TAT peptide, penetratins, transportans, or hCT derived cell-penetrating peptides, see, e.g., Caron et al., (2001) Mol Ther. 3(3):310-8; Langel, Cell-Penetrating Peptides: Processes and Applications (CRC Press, Boca Raton Fla. 2002); El-Andaloussi et al., (2005) Curr Pharm Des. 11(28):3597-611; and Deshayes et al., (2005) Cell Mol Life Sci. 62(16):1839-49.
Cell penetrating peptides (CPPs) are short peptides that facilitate the movement of a wide range of biomolecules across the cell membrane into the cytoplasm or other organelles, e.g., the mitochondria and the nucleus. Examples of molecules that can be delivered by CPPs include therapeutic drugs, plasmid DNA, oligonudeotides, siRNA, peptide-nucleic acid (PNA), proteins, peptides, nanopartides, and liposomes. CPPs are generally 30 amino acids or less, are derived from naturally or non-naturally occurring protein or chimeric sequences, and contain either a high relative abundance of positively charged amino acids, e.g., lysine or arginine, or an alternating pattern of polar and non-polar amino acids. CPPs that are commonly used in the art include Tat (Frankel et al., (1988) Cell. 55:1189-1193, Vives et al., (1997) J. Biol. Chem. 272:16010-16017), penetratin (Derossi et al., (1994) J. Biol. Chem. 269:10444-10450), polyarginine peptide sequences (Wender et al., (2000) Proc. Natl. Acad. Sci. USA 97:13003-13008, Futaki et al., (2001) J. Biol. Chem. 276:5836-5840), and transportan (Pooga et al., (1998) Nat. Biotechnol. 16:857-861).
CPPs can be linked with their cargo through covalent or non-covalent strategies. Methods for covalently joining a CPP and its cargo are known in the art, e.g., chemical cross-linking (Stetsenko et al., (2000) J. Org. Chem. 65:4900-4909, Gait et al. (2003) Cell. Mol. Life. Sci. 60:844-853) or cloning a fusion protein (Nagahara et al., (1998) Nat. Med. 4:1449-1453). Non-covalent coupling between the cargo and short amphipathic CPPs comprising polar and non-polar domains is established through electrostatic and hydrophobic interactions.
CPPs have been utilized in the art to deliver potentially therapeutic biomolecules into cells. Examples include cyclosporine linked to polyarginine for immunosuppression (Rothbard et al., (2000) Nature Medicine 6(11):1253-1257), siRNA against cyclin B1 linked to a CPP called MPG for inhibiting tumorigenesis (Crombez et al., (2007) Biochem Soc. Trans. 35:44-46), tumor suppressor p53 peptides linked to CPPs to reduce cancer cell growth (Takenobu et al., (2002) Mol. Cancer Ther. 1(12):1043-1049, Snyder et al., (2004) PLoS Biol. 2:E36), and dominant negative forms of Ras or phosphoinositol 3 kinase (PI3K) fused to Tat to treat asthma (Myou et al., (2003) J. Immunol. 171:4399-4405).
CPPs have been utilized in the art to transport contrast agents into cells for imaging and biosensing applications. For example, green fluorescent protein (GFP) attached to Tat has been used to label cancer cells (Shokolenko et al., (2005) DNA Repair 4(4):511-518). Tat conjugated to quantum dots have been used to successfully cross the blood-brain barrier for visualization of the rat brain (Santra et al., (2005) Chem. Commun. 3144-3146). CPPs have also been combined with magnetic resonance imaging techniques for cell imaging (Liu et al., (2006) Biochem. and Biophys. Res. Comm. 347(1):133-140). See also Ramsey and Flynn, Pharmacol Ther. 2015 Jul. 22. pii: S0163-7258(15)00141-2.
Alternatively or in addition, the deaminase fusion proteins can include a nuclear localization sequence, e.g., SV40 large T antigen NLS (PKKKRRV (SEQ ID NO:221)); nucleoplasmin NLS (KRPAATKKAGQAKKKK (SEQ ID NO:222); or bipartite NLS (KRTADGSEFEPKKKRKV (SEQ ID NO:229); or (KRTADGSEFES)PKKKRKV (SEQ ID NO: 24). Other NLSs are known in the art; see, e.g., Cokol et al., EMBO Rep. 2000 Nov. 15; 1(5): 411-415; Freitas and Cunha, Curr Genomics. 2009 December; 10(8): 550-557.
In some embodiments, the deaminase fusion proteins include a moiety that has a high affinity for a ligand, for example GST, FLAG or hexahistidine sequences. Such affinity tags can facilitate the purification of recombinant deaminase fusion proteins.
The deaminase fusion proteins described herein can be used for altering the genome of a cell. The methods generally include expressing or contacting the deaminase fusion proteins in the cells; in versions using one or two Cas9s, the methods include using a guide RNA having a region complementary to a selected portion of the genome of the cell. Methods for selectively altering the genome of a cell are known in the art, see, e.g., U.S. Pat. No. 8,993,233; US 20140186958; U.S. Pat. No. 9,023,649; WO/2014/099744; WO 2014/089290; WO2014/144592; WO144288; WO2014/204578; WO2014/152432; WO2115/099850; U.S. Pat. No. 8,697,359; US20160024529; US20160024524; US20160024523; US20160024510; US20160017366; US20160017301; US20150376652; US20150356239; US20150315576; US20150291965; US20150252358; US20150247150; US20150232883; US20150232882; US20150203872; US20150191744; US20150184139; US20150176064; US20150167000; US20150166969; US20150159175; US20150159174; US20150093473; US20150079681; US20150067922; US20150056629; US20150044772; US20150024500; US20150024499; US20150020223; US20140356867; US20140295557; US20140273235; US20140273226; US20140273037; US20140189896; US20140113376; US20140093941; US20130330778; US20130288251; US20120088676; US20110300538; US20110236530; US20110217739; US20110002889; US20100076057; US20110189776; US20110223638; US20130130248; US20150050699; US20150071899; US20150050699; US20150045546; US20150031134; US20150024500; US20140377868; US20140357530; US20140349400; US20140335620; US20140335063; US20140315985; US20140310830; US20140310828; US20140309487; US20140304853; US20140298547; US20140295556; US20140294773; US20140287938; US20140273234; US20140273232; US20140273231; US20140273230; US20140271987; US20140256046; US20140248702; US20140242702; US20140242700; US20140242699; US20140242664; US20140234972; US20140227787; US20140212869; US20140201857; US20140199767; US20140189896; US20140186958; US20140186919; US20140186843; US20140179770; US20140179006; US20140170753; WO/2008/108989; WO/2010/054108; WO/2012/164565; WO/2013/098244; WO/2013/176772; US 20150071899; Makarova et al., “Evolution and classification of the CRISPR-Cas systems” 9(6) Nature Reviews Microbiology 467-477 (1-23) (June 2011); Wiedenheft et al., “RNA-guided genetic silencing systems in bacteria and archaea” 482 Nature 331-338 (Feb. 16, 2012); Gasiunas et al., “Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria” 109(39) Proceedings of the National Academy of Sciences USA E2579-E2586 (Sep. 4, 2012); Jinek et al., “A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity” 337 Science 816-821 (Aug. 17, 2012); Carroll, “A CRISPR Approach to Gene Targeting” 20(9) Molecular Therapy 1658-1660 (September 2012); U.S. Appl. No. 61/652,086, filed May 25, 2012; AI-Attar et al., Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs): The Hallmark of an Ingenious Antiviral Defense Mechanism in Prokaryotes, Biol Chem. (2011) vol. 392, Issue 4, pp. 277-289; Hale et al., Essential Features and Rational Design of CRISPR RNAs That Function With the Cas RAMP Module Complex to Cleave RNAs, Molecular Cell, (2012) vol. 45, Issue 3, 292-302.
For methods in which the deaminase fusion proteins are delivered to cells, the proteins can be produced using any method known in the art, e.g., by in vitro translation, or expression in a suitable host cell from nucleic acid encoding the deaminase fusion protein; a number of methods are known in the art for producing proteins. For example, the proteins can be produced in and purified from yeast, E. coli, insect cell lines, plants, transgenic animals, or cultured mammalian cells; see, e.g., Palomares et al., “Production of Recombinant Proteins: Challenges and Solutions,” Methods Mol Biol. 2004; 267:15-52. In addition, the deaminase fusion proteins can be linked to a moiety that facilitates transfer into a cell, e.g., a lipid nanoparticle, optionally with a linker that is cleaved once the protein is inside the cell. See, e.g., LaFountaine et al., Int J Pharm. 2015 Aug. 13; 494(1):180-194.
To use the deaminase fusion proteins described herein, it may be desirable to express them from a nucleic acid that encodes them. This can be performed in a variety of ways. For example, the nucleic acid encoding the deaminase fusion can be cloned into an intermediate vector for transformation into prokaryotic or eukaryotic cells for replication and/or expression. Intermediate vectors are typically prokaryote vectors, e.g., plasmids, or shuttle vectors, or insect vectors, for storage or manipulation of the nucleic acid encoding the deaminase fusion for production of the deaminase fusion protein. The nucleic acid encoding the deaminase fusion protein can also be cloned into an expression vector, for administration to a plant cell, animal cell, preferably a mammalian cell or a human cell, fungal cell, bacterial cell, or protozoan cell. To obtain expression, a sequence encoding a deaminase fusion protein is typically subcloned into an expression vector that contains a promoter to direct transcription. Suitable bacterial and eukaryotic promoters are well known in the art and described, e.g., in Sambrook et al., Molecular Cloning, A Laboratory Manual (3d ed. 2001); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); and Current Protocols in Molecular Biology (Ausubel et al., eds., 2010). Bacterial expression systems for expressing the engineered protein are available in, e.g., E. coli, Bacillus sp., and Salmonella (Palva et al., 1983, Gene 22:229-235). Kits for such expression systems are commercially available. Eukaryotic expression systems for mammalian cells, yeast, and insect cells are well known in the art and are also commercially available.
The promoter used to direct expression of a nucleic acid depends on the particular application. For example, a strong constitutive promoter is typically used for expression and purification of fusion proteins. In contrast, when the deaminase fusion protein is to be administered in vivo for gene regulation, either a constitutive or an inducible promoter can be used, depending on the particular use of the deaminase fusion protein. In addition, a preferred promoter for administration of the deaminase fusion protein can be a weak promoter, such as HSV TK or a promoter having similar activity. The promoter can also include elements that are responsive to transactivation, e.g., hypoxia response elements, Gal4 response elements, lac repressor response element, and small molecule control systems such as tetracycline-regulated systems and the RU-486 system (see, e.g., Gossen & Bujard, 1992, Proc. Natl. Acad. Sci. USA, 89:5547; Oligino et al., 1998, Gene Ther., 5:491-496; Wang et al., 1997, Gene Ther., 4:432-441; Neering et al., 1996, Blood, 88:1147-55; and Rendahl et al., 1998, Nat. Biotechnol., 16:757-761).
In addition to the promoter, the expression vector typically contains a transcription unit or expression cassette that contains all the additional elements required for the expression of the nucleic acid in host cells, either prokaryotic or eukaryotic. A typical expression cassette thus contains a promoter operably linked, e.g., to the nucleic acid sequence encoding the deaminase fusion protein, and any signals required, e.g., for efficient polyadenylation of the transcript, transcriptional termination, ribosome binding sites, or translation termination. Additional elements of the cassette may include, e.g., enhancers, and heterologous spliced intronic signals.
The particular expression vector used to transport the genetic information into the cell is selected with regard to the intended use of the deaminase fusion protein, e.g., expression in plants, animals, bacteria, fungus, protozoa, etc. Standard bacterial expression vectors include plasmids such as pBR322 based plasmids, pSKF, pET23D, and commercially available tag-fusion expression systems such as GST and LacZ.
Expression vectors containing regulatory elements from eukaryotic viruses are often used in eukaryotic expression vectors, e.g., SV40 vectors, papilloma virus vectors, and vectors derived from Epstein-Barr virus. Other exemplary eukaryotic vectors include pMSG, pAV009/A+, pMTO10/A+, pMAMneo-5, baculovirus pDSVE, and any other vector allowing expression of proteins under the direction of the SV40 early promoter, SV40 late promoter, metallothionein promoter, murine mammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other promoters shown effective for expression in eukaryotic cells.
The vectors for expressing the deaminase fusion protein can include RNA Pol III promoters to drive expression of the guide RNAs, e.g., the H1, U6 or 7SK promoters. These human promoters allow for expression of deaminase fusion protein in mammalian cells following plasmid transfection.
Some expression systems have markers for selection of stably transfected cell lines such as thymidine kinase, hygromycin B phosphotransferase, and dihydrofolate reductase. High yield expression systems are also suitable, such as using a baculovirus vector in insect cells, with the gRNA encoding sequence under the direction of the polyhedrin promoter or other strong baculovirus promoters.
The elements that are typically included in expression vectors also include a replicon that functions in E. coli, a gene encoding antibiotic resistance to permit selection of bacteria that harbor recombinant plasmids, and unique restriction sites in nonessential regions of the plasmid to allow insertion of recombinant sequences.
Standard transfection methods are used to produce bacterial, mammalian, yeast or insect cell lines that express large quantities of protein, which are then purified using standard techniques (see, e.g., Colley et al., 1989, J. Biol. Chem., 264:17619-22; Guide to Protein Purification, in Methods in Enzymology, vol. 182 (Deutscher, ed., 1990)). Transformation of eukaryotic and prokaryotic cells are performed according to standard techniques (see, e.g., Morrison, 1977, J. Bacteriol. 132:349-351; Clark-Curtiss & Curtiss, Methods in Enzymology 101:347-362 (Wu et al., eds, 1983).
Any of the known procedures for introducing foreign nucleotide sequences into host cells may be used. These include the use of calcium phosphate transfection, polybrene, protoplast fusion, electroporation, nucleofection, liposomes, microinjection, naked DNA, plasmid vectors, viral vectors, both episomal and integrative, and any of the other well-known methods for introducing cloned genomic DNA, cDNA, synthetic DNA or other foreign genetic material into a host cell (see, e.g., Sambrook et al., supra). It is only necessary that the particular genetic engineering procedure used be capable of successfully introducing at least one gene into the host cell capable of expressing the deaminase fusion protein.
In methods wherein the fusion proteins include a Cas9 domain, the methods also include delivering at least one gRNA that interacts with the Cas9, or a nucleic acid that encodes a gRNA.
Alternatively, the methods can include delivering the deaminase fusion protein and guide RNA together, e.g., as a complex. For example, the deaminase fusion protein and gRNA can be can be overexpressed in a host cell and purified, then complexed with the guide RNA (e.g., in a test tube) to form a ribonudeoprotein (RNP), and delivered to cells. In some embodiments, the deaminase fusion protein can be expressed in and purified from bacteria through the use of bacterial expression plasmids. For example, His-tagged deaminase fusion protein can be expressed in bacterial cells and then purified using nickel affinity chromatography. The use of RNPs circumvents the necessity of delivering plasmid DNAs encoding the nuclease or the guide, or encoding the nuclease as an mRNA. RNP delivery may also improve specificity, presumably because the half-life of the RNP is shorter and there's no persistent expression of the nuclease and guide (as you'd get from a plasmid). The RNPs can be delivered to the cells in vivo or in vitro, e.g., using lipid-mediated transfection or electroporation. See, e.g., Liang et al. “Rapid and highly efficient mammalian cell engineering via Cas9 protein transfection.” Journal of biotechnology 208 (2015): 44-53; Zuris, John A., et al. “Cationic lipid-mediated delivery of proteins enables efficient protein-based genome editing in vitro and in vivo.” Nature biotechnology 33.1 (2015): 73-80; Kim et al. “Highly efficient RNA-guided genome editing in human cells via delivery of purified Cas9 ribonucleoproteins.” Genome research 24.6 (2014): 1012-1019.
The present invention also includes the vectors and cells comprising the vectors, as well as kits comprising the proteins and nucleic acids described herein, e.g., for use in a method described herein.
The base editors described herein can be used to deaminate a selected adenine and/or cytosine, enabling the parallel installation of two distinct transition mutations (C-to-T and A-to-G) simultaneously in one editing window, in a nucleic acid sequence, e.g., in a cell, e.g., a plant, bacterial, fungal, or animal cell. The cell can be isolated (e.g., ex vivo or in vitro) or in an animal (e.g., a mammal such as a human or veterinary subject), or a synthetic nucleic acid substrate, e.g., in vitro. The nucleic acid sequence can be, e.g., genomic DNA or mitochondrial DNA. The methods include contacting the nucleic acid with a BACE as described herein; in some embodiments the methods can be used to induce a change as shown in Table D. Where the base editor includes a CRISPR Cas9 or Cas12a protein, the methods further include the use of one or more guide RNAs (gRNAs) that direct binding of the base editor to a sequence to be deaminated. The target sequence (editing window) is located on the non-target strand (NTS), so it's the ssDNA strand that's opened up when Cas9 opens up DNA, but it's not bound by the gRNA. The NTS protospacer sequence in the target organism's DNA has the same sequence as the gRNA's spacer sequence (for both base and prime editing). Thus when SPACE is used, the gRNA protospacer directs the base editor to the target sequence, preferably wherein the target sequence comprises a cytosine to be edited at one of positions 2-7, or an adenine to be edited at one of positions 4-7.
For example, the base editors described herein can be used for in vitro, in vivo or in situ directed evolution, e.g., to engineer polypeptides or proteins based on a synthetic selection framework, e.g., antibiotic resistance in E. coli or resistance to anti-cancer therapeutics being assayed in mammalian cells (e.g., CRISPR-X Hess et al, Nat Methods. 2016 December; 13(12):1036-1042, or BE-plus systems Jiang et al, Cell Res. 2018 August; 28(8):855-861). The BACEs described herein can also be used, e.g., for targeted sequence diversification.
In some embodiments, the BACEs can be used to correct or alter a disease-causing mutation, or to introduce a protective mutation, in a cell, e.g., in a human cell, e.g., in vitro/ex vivo or in vivo; exemplary mutations can include those listed in Table E. When the alteration is made ex vivo, the edited cell can then be re-introduced into the subject. These methods can be used to treat, reduce risk of developing, delay onset of, or ameliorate a disease, e.g., a disease listed in Table E.
The BACEs can also be used to generate a cell or animal model by introducing a mutation, e.g., a disease-causing mutation, e.g., a multinucleotide variant (MNV, i.e., a variant found in phase with another variant), e.g., a MNV mutation as listed in Tables F-K. See, e.g., Example 9.
BACE/SPACE could be used for introducing A>G, T>C (A>G on the other strand), C>T, or G>A (C>T on the other strand) for every nucleotide available across a coding/non-coding region to generate a comprehensive library. This can enable high-throughput saturation mutagenesis screening and highly complex genotype-phenotype correlation to study a protein or gene of interest.
These methods can be used for generating two or more sets of nucleic acids, each set comprising a plurality of sequences, wherein each set comprises one or more nucleic acids having the same sequence, and wherein each set differs from each of the other sets by at least one nucleotide. These methods include (i) providing a first nucleic acid comprising a first sequence, e.g., a reference or wild type sequence; (ii) contacting the first nucleic acids with a BACE as described herein, wherein the programmable DNA binding domain is a CRISPR Cas RGN or a variant thereof; and a least one guide RNA compatible with the base editor that directs the base editor to modify a selected nucleotide in the first sequence; and (iii) isolating a second nucleic acid comprising a sequence comprising the selected modification in the nucleotide sequence, to provide a second set of nucleic acids. The methods can include amplifying the second nucleic acids. Steps (i)-(iii) can be repeated until a desired number of sets is obtained, e.g., until enough sets are obtained to include at least one set with a mutation at each position in a selection region of the sequence of the nucleic acid.
In some embodiments, each separate set of variant is expressed in a separate organism, and effects on phenotype can be evaluated, e.g., for programmable sequence diversification. As one example, the methods can be used to develop a plant with a desired characteristic (e.g., early harvest, pest resistance, drought tolerance, taste, sweetness, storage, resistance to browning). The methods can be used to mutate a region in a specific gene, e.g., to shuffle the region, to produce a number of variant plants. The plants can then be grown, and effects on the desired characteristic evaluated and selected. See, e.g., Li et al., Nat Biotechnol (2020). doi.org/10.1038/s41587-019-0393-7;
Loxodonta
africana
Protopterus
annectens
Alligator
mississippiensis
Anolis
carolinensis
Corvus
brachyrhynchos
Calypte anna
Tursiops
truncatus
Tyto alba
Pteropus alecto
Rhinopithecus
bieti
Delphinapterus
leucas
Lonchura
striata
domestica
Amazona
aestiva
Saimiri
boliviensis
boliviensis
Pan paniscus
Pongo
pygmaeus
Bos taurus
Myotis brandtii
Felis catus
Cebus
Cebus
capucinus
capucinus
imitator
imitator
Pan troglodytes
Alligator
sinensis
Cricetulus
griseus
Antrostomus
carolinensis
Propithecus
coquereli
Macaca
fascicularis
Nipponia
nippon
Pelecanus
crispus
Fukomys
damarensis
Myotis davidii
Canis lupus
familiaris
Dryobates
pubescens
Mandrillus
leucophaeus
Balearica
regulorum
gibbericeps
Aptenodytes
forsteri
Enhydra lutris
Enhydra lutris
kenyoni
kenyoni
Mustela
putorius furo
Trichechus
manatus
latirostris
Ailuropoda
melanoleuca
Manacus
vitellinus
Mesocricetus
auratus
Rhinopithecus
roxellana
Chlorocebus
sabaeus
Cavia porcellus
Neomonachus
schauinslandi
Opisthocomus
hoazin
Equus ferus
caballus
Homo sapiens
Nestor notabilis
Egretta garzetta
Aotus
nancymaae
Mus musculus
Heterocephalus
glaber
Merops nubicus
Fulmarus
glacialis
Nomascus
leucogenys
Papio anubis
Monodelphis
domestica
Dipodomys ordii
Odobenus
rosmarus
divergens
Patagioenas
Patagioenas
fasciata monilis
fasciata monilis
Colobus
angolensis
palliatus
Tarsius syrichta
Sus scrofa
Macaca
nemestrina
Oryctolagus
cuniculus
Rattus
norvegicus
Cariama
cristata
Gavia stellata
Macaca mulatta
Acanthisitta
chloris
chloris)
Columba livia
Ovis aries
Otolemur
garnettii
Stylophora
pistillata
Cercocebus
atys
Physeter
macrocephalus
Pongo abelii
Eurypyga helias
Sarcophilus
harrisii
Leptonychotes
weddellii
Erinaceus
europaeus
Haliaeetus
albicilla
Callithrix
jacchus
Bos mutus
Pterocles
gutturalis
Petromyzon
marinus cytosine
Petromyzon
marinus cytosine
E. coli TadA
S. aureus TadA
S. pyogenes TadA
S. typhi TadA
A. aeolicus TadA
S. pombe TAD2
S. cerevisiae TAD1
S. cerevisiae TAD2
A. thaliana TAD2
X. laevis ADAT2
X. tropicalis ADAT2
D. rerio ADAT2
B. taurus ADAT2
M. musculus ADAT2
H. sapiens ADAT2
GGT, AAT,
AGT
GCT
GGT, AGT,
GTT, ATT,
TAG
CGT
TTG, TCA,
CTG
TTG, TTA,
GGT, GAT,
GTG, GTA,
TGT, TAT,
TTG, TTA,
AAC, AAT,
ACA, ATA,
CAC, CAT,
CCA, CTA,
AAC, AAT,
ACC, ACT,
AAC, AGT,
GAC
GCA
ATC
CAA, TAA,
CCA, TCA,
CGA
TAC
CCA, TTA,
TCA
S. pyogenes Cas9
S. aureus Cas9
S. thermophilus Cas9
S. pasteurianus Cas9
C. jejuni Cas9
F. novicida Cas9
P. lavamentivorans
C. lari Cas9 (ClCas9)
Pasteurella multocida
F. novicida Cpf1
M. bovoculi Cpf1
L. bacterium N2006
S. pyogenes Cas9
S. pyogenes Cas9
S. pyogenes Cas9
S. pyogenes Cas9
S. pyogenes Cas9
S. pyogenes Cas9
S. pyogenes Cas9
S. pyogenes Cas9
S. pyogenes Cas9
S. aureus Cas9
S. aureus Cas9 with PAM interaction
Streptococcus
macacae (Smac) Cas9
N. meningitidis
The invention is further described in the following examples, which do not limit the scope of the invention described in the claims.
Constructs were cloned into the CMV from ABEmax-P2A-EGFP-NLS (Agel/Notl digest; Addgene #112101) or into the CAG backbone from SQT817 (Agel/Notl/EcoRV digest; Addgene #53373). All constructs with P2A-EGFP were cloned using either P2A-EGFP-NLS from ABEmax-P2A-EGFP-NLS or P2A-EGFP from BPK4335 serving as the template. SPACE was cloned using Gibson assembly with bpNLS-TadA7.10 (V82G)-SpCas9 (D10A)-bpNLS from miniABEmax-V82G (Addgene #131313), pmCDA1 from Target-AID (Addgene #131300), and dual UGIs from BE4max (Addgene #112093). All guide RNA plasmids were cloned by ligation into the pUC19-based entry vector BPK1520 (Bsmbl digest; Addgene #65777). All plasmids were midi or maxi prepped with the Qiagen Midi/Maxi Plus kits.
All gRNAs were of the form 5′-NNNNNNNNNNNNNNNNNNNNCGTTTTAGAGCTAGAAATAGCAAGTTAAAATA AGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTT-3′. (SEQ ID NO: 169)
GAGTATGAGGCATAGACTGC
GTCAAGAAAGCAGAGACTGC
GAGCAAAGAGAATAGACTGT
GAATACTAAGCATAGACTCC
GTAAACAAAGCATAGACTGA
GAAGACCAAGGATAGACTGC
GAACATAAAGAATAGAATGA
GGACAGGCAGCATAGACTGT
GGCTAAAGACCATAGACTGT
GTCTAGAAAGCTTAGACTGC
GACAAAGAGGAAGAGAGACG
ACACACACACTTAGAATCTG
CACACACACTTAGAATCTGT
GAGTCCGAGCAGAAGAAGAA
GTATTCACCTGAAAGTGTGC
GGAATCCCTTCTGCAGCACC
GAACACAAAGCATAGACTGC
GGCCCAGACTGAGCACGTGA
GGCACTGCGGCTGGAGGTGG
GCGTGACTTCCACATGAGCG
GGCACTCGGGGGCGAGAGGA
GAGCTCACTGAACGCTGGCA
GACCCTCAGCCGTGCTGCTC
GCTGACTCAGAGACCCTGAG
GGGGCTCAACATCGGAAGAG
GCTGGCTCAGGTTCAGGAGA
GTCATCTTAGTCATTACCTG
GAGGACGTGTGTGTCTGTGT
GAACACAATGCATAGATTGC
AAACATAAAGCATAGACTGC
CACCCAGACTGAGCACGTGC
GACACAGACTGGGCACGTGA
AGCTCAGACTGAGCAAGTGA
AGACCAGACTGAGCAAGAGA
GAGCCAGAATGAGCACGTGA
TGCACTGCGGCCGGAGGAGG
GGCTCTGCGGCTGGAGGGGG
GGCACGACGGCTGGAGGTGG
GGCATCACGGCTGGAGGTGG
GGCGCTGCGGCGGGAGGTGG
GAGTCTAAGCAGAAGAAGAA
GAGGCCGAGCAGAAGAAAGA
GAGTCCTAGCAGGAGAAGAA
GAGTCCGGGAAGGAGAAGAA
GAGCCGGAGCAGAAGAAGGA
GGAACCCCGTCTGCAGCACC
GGAGTCCCTCCTACAGCACC
AGAGGCCCCTCTGCAGCACC
ACCATCCCTCCTGCAGCACC
GGATTGCCATCCGCAGCACC
TGAATCCCATCTCCAGCACC
HEK293T cells (CRL-3216, ATCC) were grown in culture using Dulbeccos Modified Medium (Gibco) supplemented with 10% FBS (Gibco) and 1% penicillin-streptomycin solution (Gibco). Cells were passaged at ˜80% confluency every 2-3 days to maintain an actively growing population. Cells were passaged at ˜80% confluency every 4 days. Cells were used for experiments until passage 20, and were tested for mycoplasma every 4 weeks.
For DNA on-target experiments with 28 gRNAs, 1.25×104 HEK293T cells were seeded into 96-well Flat Bottom cell culture plates (Corning), transfected 24 h post-seeding with 30 ng base editor or control, 10 ng gRNA, and 0.3 μL TransIT-X2 (Mirus), and harvested 72 h after transfection to obtain genomic DNA (gDNA). For DNA off-target experiments, 6.25×104 HEK293T cells were seeded into 24-well cell culture plates (Corning), transfected 24 h post seeding with 150 ng base editor or control, 50 ng gRNA, and 1.5 μL TransIT-X2, and harvested 72 h after transfection to obtain gDNA. For RNA off-target experiments, 6.5×106 HEK293T cells were seeded into 150 mm cell culture dishes (Corning), transfected 24 h post-seeding with 37.5 μg base editor or control, 12.5 μg gRNA, and 150 μL TransIT-293 (Mirus), and sorted 36-40 h after transfection. For co-expression of miniABEmax-V82G and Target-AID (ABE & CBE mix) vs SPACE experiments, 1.25×104 HEK293T cells were seeded into 96-well cell culture plates, transfected 24 h post-seeding with 15 ng miniABEmax-V82G and 15 ng Target-AID for ABE & CBE mix, and 30 ng for both SPACE and the nCas9 control, 10 ng gRNA, and 0.3 μL TransIT-X2, and harvested 72 h after transfection to obtain gDNA. e.g.,
Sorting of negative control and BE expressing cells as well as RNA/DNA harvest was carried out on the same day. Cells were sorted on a BD FACSARIAII 36-40h after transfection. We gated on the cell population on forward/sideward scatter after exclusion of doublets. We then sorted all GFP-positive cells and/or top 5% of cells with the highest FITC signal into pre-chilled 100% FBS and 5% of mean fluorescence intensity (MFI)-matched cells for nCas9-NLS negative controls, matching the MFI/GeoMean of top 5% of ABE or ABEmax-transfected cells. We used MFI-matching for these controls, as the bpNLS-32AAlinker-nCas9-bpNLS-P2A-EGFP (control) plasmid was smaller than ABEmax-P2A-EGFP—due to the lack of the TadA-TadA* heterodimer—and thus yielded higher transfection efficiency and overall higher FITC signal. After sorting, cells were spun down, lysed using DNA lysis buffer (Laird et al, 1991) with DTT and Proteinase K or RNA lysis buffer (Macherey-Nagel). gDNA was extracted using magnetic beads (made from FisherSci Sera-Mag SpeedBeads Carboxyl Magnetic Beads, hydrophobic according to Rohland & Reich, 2012), after overnight lysis. RNA then was extracted with Macherey-Nagel's NudeoSpin RNA Plus kit.
Genomic DNA was amplified using gene-specific DNA primers flanking desired target sequence. These primers included illumina-compatible adapter-flaps. The amplicons were molecularly indexed with NEBNext Dual Index Primers (NEB) or index primers with the same or similar sequence ordered from IDT. Samples were combined into libraries and sequenced on the Illumina MiSeq machine using the MiSeq Reagent Kit v2 or Micro Kit v2 (Illumina). Sequencing results were analyzed using a batch version of the software CRISPResso 2.0 (crispresso.rocks).
RNA library preparation was performed using Illumina's TruSeq Stranded Total RNA Gold Kit with initial input of ˜500 ng of extracted RNA per sample, using SuperScript 111 for first-strand synthesis (Thermo Fisher). rRNA depletion was confirmed during library preparation by fluorometric quantitation using the Qubit HS RNA kit before and after depletion (Thermo Fisher). For indexing, we used IDT-Illumina Unique Dual Indeces (Illumina). Libraries were pooled based on qPCR quantification (NEBNext Library Quant Kit for Illumina) and loaded onto a NextSeq (at MGH Cancer Center, PE 2x150, 500/550 MidOutput Cartridge) or HiSeq2500 in High Output mode (Broad Institute, PE 2x76). Illumina fastq sequencing reads were aligned to the human hg38 reference genome with STAR (Dobin et al., 2013, PMID: 23104886) and processed with GATK best practices (McKenna et al., 2010, PMID: 20644199: DePristo et al., 2011, PMID: 21478889). RNA variants were called using HaplotypeCaller, and empirical editing efficiencies were established on PCR-de-duplicated alignment data. Variant loci in ABE/ABEmax overexpression experiments were further required to have comparable read coverage in the corresponding control experiment (read coverage for SNV in control >90th percentile of read coverage across all SNVs in overexpression). Additionally, the above loci were required to have a consensus of at least 99% of reads calling the reference allele in control.
A list of multi-nucleotide variants (MNVs) was obtained from Wang et. al. “Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes” (pre-print at dx.doi.org/10.1101/573378; data file at storage.googleapis.com/gnomad-public/release/2.1/mnv/gnomad_mnv_coding.tsv). These were filtered to detect disease correcting or disease generating modifications enabled by SPACE, which are defined as annotated MNVs with a nearby PAM. Disease correcting conversions are defined as having targetable Cs and As in the ALT position with matching Ts and Gs in the REF position; whereas disease generating conversions are defined as the reverse scenario, with targetable Cs and As in the REF position with matching Ts and Gs in the ALT position. Patterns for selected disease correcting MNV codons include “GNT>ANC”, “GTN>ACN”, “NGT>NAC”, “NTG>NCA”, “TGN>CAN”, and “TNG>CNA”; whereas patterns for disease generating include “ACN>GTN”, “ANC>GNT”, “CAN>TGN”, “CNA>TNG”, “NAC>NGT”, and “NCA>NTG”. PAMs considered include NGG, NGA, and NG.
Human HEK293T cells were transfected with plasmids encoding nCas9, miniABEmax-V82G, Target-AID and SPACE constructs (e.g., SEQ IDs 140-144;
Guide RNAs will be tested to determine how different sequence contexts might affect SPACE editing in two cell lines, as well as using SPACE mRNAs (produced via IVT or by TriLink) to electroporate primary human CD34+ and T cells. SPACE constructs will be subcloned into pET vectors with an N-terminal 6xHis-tag and codon-optimized for expression in E. coli to enable protein purification. RNPs will be electroporated with a Lonza device into HEK293T and primary human T cells.
To determine if the UGIs play a vital role in maintaining product purity in the context of SPACE, i.e. enabling high C-to-T and A-to-G editing yield without C-to-N, A-to-N, or indel byproducts, human HEK293T cells were transfected with SPACEΔUGI. NGS results indicated that SPACE constructs—that contain two UGIs—showed substantially reduced base editing byproducts, e.g., markedly decreased rates of C-to-G edits and indels (
We have recently described RNA off-target editing induced by DNA base editors (Grunewald et al, Nature 2019). In order to reduce the potential RNA off-target editing of SPACE, we fused miniABEmax-V82G and pmCDA1, two deaminase domains with markedly reduced or undetectable RNA off-target editing respectively (
Unbiased detection of RNA off-target editing with the help of RNA-seq was assessed by transfecting cells with two different gRNAs and SPACE constructs that were co-translationally expressed with P2A-EGFP in 15 cm dishes and trypsinized 36 hours post-transfection. Subsequently, GFP+ cells were sorted on a BD FACSAria II and lysed to harvest both DNA and RNA. After efficient on-target editing was confirmed via targeted amplicon sequencing, RNA-seq was performed using a TruSeq stranded total RNA library prep and sequencing on a NextSeq 500 machine at the MGH.
To characterize the transcriptome-wide RNA off-target activity of SPACE, we performed RNA-seq from HEK293T cells co-expressing SPACE with a gRNA targeting HEK site 2 or RNF2 site 1. We also performed matched side-by-side RNA-seq experiments with HEK293T cells expressing miniABEmax-V82G, Target-AID, ABEmax (a positive control for RNA editing), and GFP (a negative control). Analysis of on-target DNA editing in the cells used for RNA-seq showed efficient editing with SPACE, miniABEmax-V82G, and Target-AID with both gRNAs. As expected, GFP negative control experiments showed very few RNA C-to-U edits (range of 1-3) and A-to-I edits (range of 7-12) while ABEmax induced relatively high numbers of A-to-I edits (range of 3,105-5,696), miniABEmax-V82G induced low numbers of A-to-I edits (range of 73-194), and Target-AID induced even lower numbers of C-to-U edits (range of 6-11). Cells expressing SPACE showed very few C-to-U edits (range of 0-4) and only small numbers of A-to-I edits (range of 4-37) edits. The generally lower numbers of RNA edits we observed in our current experiments relative to previously published studies are due to reduced sequencing depth we used here (˜14-18 million reads/sample) compared with our earlier work (˜80-120 million reads/sample). Based on these results, we conclude that SPACE retains the reduced RNA-editing activities observed with miniABEmax-V82G and Target-AID, inducing very low numbers of unwanted RNA edits throughout the transcriptome (
Cas9-dependent DNA off-target effects induced by SPACE were assessed by transfecting cells with HEK site 2, 3, and 4 as well as FANCF site 1 and EMX1 site 1 gRNAs. 23 genomic sites that have previously been described as known off-target sites for said gRNAs15 were sequenced with NGS to detect potential off-target base editing of SPACE constructs. Cas9-dependent DNA off-target effects observed with SPACE were comparable or lower relative to those observed with miniABEmax-V82G or Target-AID for 17 of these 23 off-target sites (
To test whether SPACE is more efficient at inducing dual edits than the combined effects of separate adenine and cytosine base editors, we also performed experiments in HEK293T cells in which we directly compared SPACE with co-expressed miniABEmax-V82G and Target-AID (“ABE & CBE mix”) for each of the 28 gRNAs. For 22 of the 28 gRNAs, the summed frequency of dual edited on-target alleles was higher with SPACE than with the “ABE & CBE mix”. Interestingly, the summed frequencies of on-target alleles harboring only A-to-G edits was higher with the “ABE & CBE mix” condition than with SPACE for 21 of these same 22 gRNAs whereas the summed frequencies of on-target alleles with only C-to-T edits was higher with SPACE than with the “ABE & CBE mix” for 16 of the 22 gRNAs. Unwanted indels induced by SPACE (range: 0.02-7.1%, mean: 1.44%) were lower or comparable to those observed with the “ABE & CBE mix” (range: 0.13-11.92%, mean: 2.88%) at 27 out of 28 sites tested. Although we cannot rule out that differences in the architecture of SPACE, miniABEmax-V82G, and Target-AID may affect their expression levels and/or activities, our results demonstrate that SPACE generally yields higher frequencies of dual-edited alleles and lower frequencies of indels at on-target sites compared to co-expression of standard editors harboring the same adenosine and cytidine deaminases individually (
SPACE adds 60 additional codon changes (resulting in 18 amino acid substitutions) that cannot be created with existing single-action CBEs and ABEs (
SPACE could be useful for creating or reverting multi-nucleotide variants (MNVs), a newly emerging category of sequence variants associated with disease (also see: https://www.biorxiv.org/content/10.1101/573378v1) (Tables E-M). Notably, among MNVs, TG-to-CA and CA-to-TG (both inducible by SPACE) are the most frequent consecutively arising adjacent dinucleotide MNVs (Kaplanis et al, Genome Res 2019). Furthermore, the greater combinatorial diversity of mutations that result with SPACE as compared with single-deaminase base editors could make it attractive for molecular recording systems (e.g., lineage tracing; McKenna et al, Science 2016) as well as for saturation mutagenesis screens, directed evolution, and protein engineering (Canver et al, Nature 2015; Hess et al, Nat Methods 2016).
SPACE could be used for introducing A>G, T>C (A>G on the other strand), C>T, or G>A (C>T on the other strand) for every nucleotide available across a coding/non-coding region to generate a comprehensive library. This can enable high-throughput saturation mutagenesis screening and highly complex genotype-phenotype correlation to study a protein or gene of interest.
annectens OX = 7888 PE = 2 SV = 1
alecto OX = 9402 GN = PAL_GLEAN10015600 PE = 4 SV = 1
pygmaeus OX = 9600 GN = APOBEC1 PE = 3 SV = 2
brandtii OX = 109478 GN = D623_10002956 PE = 4 SV = 1
griseus OX = 10029 GN = I79_017346 PE = 4 SV = 1
davidii OX = 225400 GN = MDA_GLEAN10003736 PE = 4 SV = 1
sapiens OX = 9606 GN = APOBEC1 PE = 1 SV = 3
musculus OX = 10090 GN = Apobec1 PE = 1 SV = 1
cuniculus OX = 9986 GN = APOBEC1 PE = 1 SV = 1
norvegicus OX = 10116 GN = Apobec1 PE = 1 SV = 1
sapiens OX = 9606 GN = APOBEC2 PE = 1 SV = 1
Petromyzon marinus cytosine deaminase (pmCDA1), Genbank ABO15149.1
Petromyzon marinus cytosine deaminase (pmCDA1) R187W, as used in
E. coli TadA, SEQ ID NO: 98
S. aureus TadA, SEQ ID NO: 99
S. pyogenes TadA, SEQ ID NO: 100
S. typhi TadA, SEQ ID NO: 101
A. aeolicus TadA, SEQ ID NO: 102
S. pombe TAD2, SEQ ID NO: 103
S. cerevisiae TAD1, SEQ ID NO: 104
S. cerevisiae TAD2, SEQ ID NO: 105
A. thaliana TAD2, SEQ ID NO: 106
X. laevis ADAT2, SEQ ID NO: 107
X. tropicalis ADAT2, SEQ ID NO: 108
D. rerio ADAT2, SEQ ID NO: 109
B. Taurus ADAT2, SEQ ID NO: 110
M. musculus ADAT2, SEQ ID NO: 111
H. sapiens ADAT2 SEQ ID NO: 112
It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.
This application claims the benefit of U.S. Provisional Patent Application Ser. Nos. 62/894,612, filed on 30 Aug. 2019, and 63/023,192, filed on 11 May 2020. The entire contents of the foregoing are hereby incorporated by reference.
This invention was made with Government support under Grant No. HG009490 awarded by the National Institutes of Health and contract HR0011-17-2-0042 awarded by the Defense Advanced Research Projects Agency of the Department of Defense. The Government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/048825 | 8/31/2020 | WO |
Number | Date | Country | |
---|---|---|---|
63023192 | May 2020 | US | |
62894612 | Aug 2019 | US |