The instant application contains a Sequence Listing which has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. Said XML file, created on Aug. 26, 2024, is named 751969_UM9-302_ST26.xml and is 215,313 bytes in size.
The disclosure relates to compositions and methods for genome editing with Nme2Cas9 and Nme2smuCas9 variants.
Genome editing using CRISPR-Cas9 technologies has advanced genetic research and promises to revolutionize gene therapy. This includes nuclease editing which relies on DNA double strand breaks, in addition to alternative editing modalities such as base and prime editing. Most CRISPR-Cas9 gene editing technologies rely on efficient Cas9 nucleic acid binding, and/or cleavage and nicking of DNA strands at a genomic locus specified by the protospacer adjacent motif (PAM) and guide RNA. Type IIC Cas9 orthologues such as Nme2Cas9 and SmuCas9 often recognize favorable PAMs but are sometimes limited by their editing activity. This lower activity can sometimes result in limited efficacy for certain genome editing applications.
Accordingly, there exists a need in the art for Nme2Cas9 and Nme2SmuCas9 variants with increased genome editing activities in mammalian cells.
In one aspect, the disclosure provides a Neisseria meningitidis (Nme) 2 Cas9 (Nme2Cas9) variant comprising an amino acid substitution at one or more positions selected from the group consisting of E520, D873, D418, E471, D442, E844, E443, D470, E585, E552, D451, E587, E508, E932, D56, D1048, E1079, D660, E887, T72, and E186.
In certain embodiments, the Nme2Cas9 variant comprises 1, 2, 3, 4, or 5 amino acid substitutions.
In certain embodiments, the Nme2Cas9 variant comprises amino acid substitutions at positions E932 and D873; E932 and D56; E932 and E520; E932 and D1048; D873 and D56; D873 and E520; D873 and D1048; D56 and E520; D56 and D1048; E520 and D1048; E932, D873, and D56; E932, D873, and E520; E932, D873, and D1048; E932, D56, and E520; E932, D56, and D1048; E932, E520, and D1048; D873, D56, and E520; D873, D56, and D1048; D873, E520, and D1048; D56, E520, and D1048; E932, D873, D56, and E520; E932, D873, D56, and D1048; E932, D56, E520, and D1048; D873, D56, E520, and D1048; or E932, D873, D56, E520, and D1048.
In certain embodiments, the amino acid substitution is a positively charged amino acid. In certain embodiments, the amino acid substitution is an arginine (R), lysine (K), or histidine (H). In certain embodiments, the amino acid substitution is an arginine (R).
In certain embodiments, the Nme2Cas9 variant comprises an amino acid substitution of any one or more of E520R, D873R, D418R, E471R, D442R, E844R, E443R, D470R, E585R, E552R, D451R, E587R, E508R, E932R, D56R, D1048R, E1079R, D660R, E887R, T72R, and E186R.
In certain embodiments, the Nme2Cas9 variant comprises an amino acid substitution of any one or more of E520R, D873R, D418R, E471R, D442R, E844R, E443R, E932R, D56R, D1048R, E1079R, D660R, E887R, T72R, and E186R.
In certain embodiments, the Nme2Cas9 variant comprises amino acid substitutions E932R and D873R; E932R and D56R; E932R and E520R; E932R and D1048R; D873R and D56R; D873R and E520R; D873R and D1048R; D56R and E520R; D56R and D1048R; E520R and D1048R; E932R, D873R, and D56R; E932R, D873R, and E520R; E932R, D873R, and D1048R; E932R, D56R, and E520R; E932R, D56R, and D1048R; E932R, E520R, and D1048R; D873R, D56R, and E520R; D873R, D56R, and D1048R; D873R, E520R, and D1048R; D56R, E520R, and D1048R; E932R, D873R, D56R, and E520R; E932R, D873R, D56R, and D1048R; E932R, D56R, E520R, and D1048R; D873R, D56R, E520R, and D1048R; or E932R, D873R, D56R, E520R, and D1048R.
In certain embodiments, the Nme2Cas9 variant comprises a protospacer adjacent motif interacting domain (PID) that interacts with an N4CC nucleotide sequence, an N4CA nucleotide sequence, an N4CG nucleotide sequence, an N4CT nucleotide sequence, or an N4C nucleotide sequence.
In certain embodiments, the PID is an Nme2Cas9 PID or an SmuCas9 PID.
In certain embodiments, the Nme2Cas9 PID comprises an amino acid sequence set forth in (DNGDMVRVDVFCKVDKKGKNQYFIVPIYAWQVAENILPDIDCKGYRIDDSYTFCFSL HKYDLIAFQKDEKSKVEFAYYINCDSSNGRFYLAWHDKGSKEQQFRISTQNLVLIQKY QVNELGKEIRPCRLKKRPPVR)(SEQ ID NO:27).
In certain embodiments, the SmuCas9 PID comprises an amino acid sequence set forth in (DNATMVRVDVYTKAGKNYLVPVYVWQVAQGILPNRAVTSGKSEADWDLIDESFEFK FSLSRGDLVEMISNKGRIFGYYNGLDRANGSIGIREHDLEKSKGKDGVHRVGVKTATA FNKYHVDPLGKEIHRCSSEPRPTLKIKSKK) (SEQ ID NO:28).
In certain embodiments, the one or more amino acid positions are relative to an amino acid sequence of SEQ ID NO: 1 or SEQ ID NO: 2.
In certain embodiments, the Nme2Cas9 variant further comprises a nucleotide base editor (NBE) domain fused to the Nme2Cas9 variant.
In certain embodiments, the NBE domain is an inlaid NBE domain inserted into the Nme2Cas9 variant.
In certain embodiments, the inlaid NBE domain is inserted into a recognition (REC) domain of the Nme2Cas9 variant.
In certain embodiments, the inlaid NBE domain is inserted into a HNH domain of the Nme2Cas9 variant.
In certain embodiments, the inlaid NBE domain is inserted into a RuvC domain of the Nme2Cas9 variant.
In certain embodiments, the inlaid NBE domain is inserted between amino acid position 291 and amino acid position 292 of the Nme2Cas9 variant, relative to an amino acid sequence of SEQ ID NO: 1 or 2.
In certain embodiments, the inlaid NBE domain is inserted between amino acid position 761 and amino acid position 762 of the Nme2Cas9 variant, relative to an amino acid sequence of SEQ ID NO: 1 or 2.
In certain embodiments, the inlaid NBE domain is inserted between amino acid position 795 and amino acid position 796 of the Nme2Cas9 variant, relative to an amino acid sequence of SEQ ID NO: 1 or 2.
In certain embodiments, the inlaid NBE domain is flanked at the inlaid NBE domain N-terminus and/or C-terminus by an amino acid linker.
In certain embodiments, the amino acid linker comprises a (GGS)n(SEQ ID NO:40) linker, wherein n corresponds to 1-6.
In certain embodiments, the amino acid linker comprises GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15).
In certain embodiments, the amino acid linker comprises GSSGSETPGTSESATPESSG(SEQ ID NO: 21).
In certain embodiments, the amino acid linker comprises ED.
In certain embodiments, the inlaid NBE domain is flanked at the inlaid NBE domain N-terminus by GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15) and at the inlaid NBE domain C-terminus by GSSGSETPGTSESATPESSG(SEQ ID NO: 21).
In certain embodiments, the NBE domain is linked via an amino acid linker to the N-terminus of the Nme2Cas9 variant.
In certain embodiments, the NBE domain is linked via an amino acid linker to the C-terminus of the Nme2Cas9 variant.
In certain embodiments, the amino acid linker comprises a (GGS)n(SEQ ID NO:40) linker, wherein n corresponds to 1-6.
In certain embodiments, the amino acid linker comprises GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15).
In certain embodiments, the amino acid linker comprises GSSGSETPGTSESATPESSG(SEQ ID NO: 21).
In certain embodiments, the amino acid linker comprises ED.
In certain embodiments, the inlaid NBE domain is an adenine base editor (ABE) domain.
In certain embodiments, the inlaid ABE domain is an inlaid adenosine deaminase protein domain.
In certain embodiments, the inlaid adenosine deaminase protein domain is an adenosine deaminase 8e protein domain (TadA8e).
In certain embodiments, the TadA8e comprises an amino acid sequence set forth in (SEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAH AEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNSKRGAA GSLMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPRQVFNAQKKAQSSIN) (SEQ ID NO: 9).
In certain embodiments, the TadA8e comprises a V105W amino acid substitution relative to the amino acid sequence set forth in (SEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAH AEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNSKRGAA GSLMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPRQVFNAQKKAQSSIN) (SEQ ID NO: 9).
In certain embodiments, the TadA8e comprises an amino acid sequence set forth in (SEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAH AEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGWRNSKRGAA GSLMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPRQVFNAQKKAQSSIN) (SEQ ID NO:29).
In certain embodiments, the inlaid NBE domain is a cytidine base editor (CBE) domain.
In certain embodiments, the inlaid CBE domain is an inlaid cytosine deaminase protein domain.
In certain embodiments, the cytosine deaminase protein domain is evoFERNY or rAPOBEC1.
In certain embodiments, the evoFERNY comprises an amino acid sequence set forth in (FERNYDPRELRKETYLLYEIKWGKSGKLWRHWCQNNRTQHAEVYFLENIFNARRFNP STHCSITWYLSWSPCAECSQKIVDFLKEHPNVNLEIYVARLYYPENERNRQGLRDLVN SGVTIRIMDLPDYNYCWKTFVSDQGGDEDYWPGHFAPWIKQYSLKL) (SEQ ID NO: 13).
In certain embodiments, the rAPOBEC1 comprises an amino acid sequence set forth in (SSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNK HVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYH HADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLY VLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLK) (SEQ ID NO: 11).
In certain embodiments, the Nme2Cas9 variant further comprises one or more nuclear localization signals (NLS).
In certain embodiments, the one or more NLS are any one or more of a nucleoplasmin NLS, an SV40 NLS or a C-myc NLS.
In certain embodiments, the one or more NLS comprise an amino acid sequence selected from the group consisting of MKRTADGSEFESPKKKRKV(SEQ ID NO:30), KRTADGSEFEPKKKRKV(SEQ ID NO:31), MKRPAATKKAGQAKKKK(SEQ ID NO:32), KRPAATKKAGQAKKKK(SEQ ID NO:33), MPKKKRKV(SEQ ID NO:34), and PKKKRKV(SEQ ID NO:35).
In certain embodiments, the one or more NLS are positioned at the N-terminus and/or C-terminus of the Nme2Cas9 variant.
In certain embodiments, the Nme2Cas9 variant further comprises a uracil glycosylase inhibitor (UGI).
In certain embodiments, the Nme2Cas9 variant further comprises a D16A substitution.
In one aspect, the disclosure provides a polynucleotide encoding the Nme2Cas9 variant described herein.
In certain embodiments, the polynucleotide is a messenger RNA (mRNA).
In one aspect, the disclosure provides a vector comprising the polynucleotide sequence described herein.
In one aspect, the disclosure provides a viral vector comprising the polynucleotide sequence described herein.
In certain embodiments, the viral vector is an adeno-associated virus (AAV) vector or a lentiviral vector.
In one aspect, the disclosure provides an adeno-associated virus (AAV) comprising the polynucleotide sequence described herein.
In one aspect, the disclosure provides a genome editing system comprising the Nme2Cas9 variant described herein or a polynucleotide encoding the Nme2Cas9 variant described herein, and a guide RNA (gRNA).
In certain embodiments, the gRNA comprises: (a) a crRNA portion comprising (i) a guide sequence capable of hybridizing to a target polynucleotide sequence, and (ii) a repeat sequence; and (b) a tracrRNA portion comprising an anti-repeat nucleotide sequence that is complementary to the repeat sequence.
In certain embodiments, the gRNA comprises at least one modified nucleotide.
In certain embodiments, the at least one modified nucleotide comprises a modification of a ribose group, a phosphate group, a nucleobase, or a combination thereof.
In certain embodiments, the modification of the ribose group is independently selected from the group consisting of 2′-O-methyl, 2′-fluoro, 2′-deoxy, 2′-O-(2-methoxyethyl) (MOE), 2′-NH2 (2′-amino), 4′-thio, a bicyclic nucleotide, a locked nucleic acid (LNA), a 2′-(S)-constrained ethyl (S-cEt), a constrained MOE, and a 2′-0,4′-C-aminomethylene bridged nucleic acid (2′,4′-BNANC).
In certain embodiments, the modification of the phosphate group is independently selected from the group consisting of a phosphorothioate, phosphonoacetate (PACE), thiophosphonoacetate (thioPACE), amide, triazole, phosphonate, and phosphotriester modification.
In certain embodiments, the modification of the nucleobase group is independently selected from the group consisting of 2-thiouridine, 4-thiouridine, N6-methyladenosine, pseudouridine, 2,6-diaminopurine, inosine, thymidine, 5-methylcytosine, 5-substituted pyrimidine, isoguanine, isocytosine, and halogenated aromatic groups.
In certain embodiments, the gRNA further comprises a nucleotide or non-nucleotide loop or linker linking the 3′ end of the crRNA portion to the 5′ end of the tracrRNA portion.
In certain embodiments, the nucleotide loop is chemically modified.
In certain embodiments, the nucleotide loop comprises the nucleotide sequence of GAAA.
In one aspect, the disclosure provides a method of editing a genome, comprising: (a) introducing into the genome the genome editing system described herein; and (b) incubating the genome editing system with the genome for a time sufficient to edit the genome.
In certain embodiments, the genome edit results from a single stranded and/or double strand DNA break.
In certain embodiments, the genome edit is a base edit.
In one aspect, the disclosure provides a fusion protein comprising a Neisseria meningitidis (Nme) 2 Cas9 (Nme2Cas9) protein and an inlaid nucleotide base editor (NBE) domain, wherein the inlaid NBE domain is flanked at the inlaid NBE domain N-terminus and/or C-terminus by an amino acid linker, or a linker is absent, and wherein the total number of amino acid linker residues is less than 40 amino acids.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 20 amino acids long, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 19 amino acids long, 18 amino acids long, 17 amino acids long, 16 amino acids long, 15 amino acids long, 14 amino acids long, 13 amino acids long, 12 amino acids long, 11 amino acids long, 10 amino acids long, 9 amino acids long, 8 amino acids long, 7 amino acids long, 6 amino acids long, 5 amino acids long, 4 amino acids long, 3 amino acids long, 2 amino acids long, 1 amino acid long, or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 20 amino acids long, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 10 amino acids long, 5 amino acids long, or is absent.
In certain embodiments, the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids long, and the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 19 amino acids long, 18 amino acids long, 17 amino acids long, 16 amino acids long, 15 amino acids long, 14 amino acids long, 13 amino acids long, 12 amino acids long, 11 amino acids long, 10 amino acids long, 9 amino acids long, 8 amino acids long, 7 amino acids long, 6 amino acids long, 5 amino acids long, 4 amino acids long, 3 amino acids long, 2 amino acids long, 1 amino acid long, or is absent.
In certain embodiments, the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids long, and the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 10 amino acids long, 5 amino acids long, or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 10 amino acids long, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids long, 19 amino acids long, 18 amino acids long, 17 amino acids long, 16 amino acids long, 15 amino acids long, 14 amino acids long, 13 amino acids long, 12 amino acids long, 11 amino acids long, 10 amino acids long, 9 amino acids long, 8 amino acids long, 7 amino acids long, 6 amino acids long, 5 amino acids long, 4 amino acids long, 3 amino acids long, 2 amino acids long, 1 amino acid long, or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 10 amino acids long, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids long, 10 amino acids long, 5 amino acids long, or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 5 amino acids long, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids long, 19 amino acids long, 18 amino acids long, 17 amino acids long, 16 amino acids long, 15 amino acids long, 14 amino acids long, 13 amino acids long, 12 amino acids long, 11 amino acids long, 10 amino acids long, 9 amino acids long, 8 amino acids long, 7 amino acids long, 6 amino acids long, 5 amino acids long, 4 amino acids long, 3 amino acids long, 2 amino acids long, 1 amino acid long, or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 5 amino acids long, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids long, 10 amino acids long, 5 amino acids long, or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is absent, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids long, 19 amino acids long, 18 amino acids long, 17 amino acids long, 16 amino acids long, 15 amino acids long, 14 amino acids long, 13 amino acids long, 12 amino acids long, 11 amino acids long, 10 amino acids long, 9 amino acids long, 8 amino acids long, 7 amino acids long, 6 amino acids long, 5 amino acids long, 4 amino acids long, 3 amino acids long, 2 amino acids long, 1 amino acid long, or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is absent, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids long, 10 amino acids long, 5 amino acids long, or is absent.
In certain embodiments, the amino acid linker comprises a sequence selected from the group consisting of: GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15), SGGSGGSGGS(SEQ ID NO: 17), GGSGG(SEQ ID NO: 19), GSSGSETPGTSESATPESSG(SEQ ID NO: 21), ETPGTSESAT(SEQ ID NO: 23), and GTSES(SEQ ID NO: 25).
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain comprises GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15), and the amino acid linker that is present at the C-terminus of the inlaid NBE domain comprises ETPGTSESAT(SEQ ID NO: 23), GTSES(SEQ ID NO: 25), or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain comprises SGGSGGSGGS(SEQ ID NO: 17), and the amino acid linker that is present at the C-terminus of the inlaid NBE domain comprises GSSGSETPGTSESATPESSG(SEQ ID NO: 21), ETPGTSESAT(SEQ ID NO: 23), GTSES(SEQ ID NO: 25), or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain comprises GGSGG(SEQ ID NO: 19), and the amino acid linker that is present at the C-terminus of the inlaid NBE domain comprises GSSGSETPGTSESATPESSG(SEQ ID NO: 21), ETPGTSESAT(SEQ ID NO: 23), GTSES(SEQ ID NO: 25), or is absent.
In certain embodiments, the amino acid linker is absent at the N-terminus of the inlaid NBE domain, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain comprises GSSGSETPGTSESATPESSG(SEQ ID NO: 21), ETPGTSESAT(SEQ ID NO: 23), GTSES(SEQ ID NO: 25), or is absent.
In certain embodiments, the amino acid linker that is present at the C-terminus of the inlaid NBE domain comprises GSSGSETPGTSESATPESSG(SEQ ID NO: 21), and the amino acid linker that is present at the N-terminus of the inlaid NBE domain comprises SGGSGGSGGS(SEQ ID NO: 17), GGSGG(SEQ ID NO: 19), or is absent.
In certain embodiments, the amino acid linker that is present at the C-terminus of the inlaid NBE domain comprises ETPGTSESAT(SEQ ID NO: 23), and the amino acid linker that is present at the N-terminus of the inlaid NBE domain comprises GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15), SGGSGGSGGS(SEQ ID NO: 17), GGSGG(SEQ ID NO: 19), or is absent.
In certain embodiments, the amino acid linker that is present at the C-terminus of the inlaid NBE domain comprises GTSES(SEQ ID NO: 25), and the amino acid linker that is present at the N-terminus of the inlaid NBE domain comprises GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15), SGGSGGSGGS(SEQ ID NO: 17), GGSGG(SEQ ID NO: 19), or is absent.
In certain embodiments, the amino acid linker is absent at the C-terminus of the inlaid NBE domain, and the amino acid linker that is present at the N-terminus of the inlaid NBE domain comprises GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15), SGGSGGSGGS(SEQ ID NO: 17), GGSGG(SEQ ID NO: 19), or is absent.
In certain embodiments, the Nme2Cas9 comprises the Nme2Cas9 variant described herein.
In one aspect, the disclosure provides a polynucleotide encoding the fusion protein described herein.
In certain embodiments, the polynucleotide is a messenger RNA (mRNA).
In one aspect, the disclosure provides a vector comprising the polynucleotide sequence described herein.
In one aspect, the disclosure provides a viral vector comprising the polynucleotide sequence described herein.
In certain embodiments, the viral vector is an adeno-associated virus (AAV) vector or a lentiviral vector.
In one aspect, the disclosure provides an adeno-associated virus (AAV) comprising the polynucleotide sequence described herein.
These and other aspects of the applicant's teachings are set forth herein.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Aspects, features, benefits, and advantages of the embodiments described herein will be apparent with regard to the following description, examples, claims, and accompanying drawings where:
It will be appreciated that for clarity, the following discussion will describe various aspects of embodiments of the applicant's teachings. It should be noted that the specific embodiments are not intended as an exhaustive description or as a limitation to the broader aspects discussed herein. One aspect described in conjunction with a particular embodiment is not necessarily limited to that embodiment and can be practiced with any other embodiment(s).
Unless otherwise specified, nomenclature used in connection with cell and tissue culture, molecular biology, immunology, microbiology, genetics and protein and nucleic acid chemistry and hybridization described herein are those well-known and commonly used in the art. Unless otherwise specified, the methods and techniques provided herein are performed according to conventional methods well known in the art and as described in various general and more specific references that are cited and discussed throughout the present specification unless otherwise indicated. Enzymatic reactions and purification techniques are performed according to manufacturer's specifications, as commonly accomplished in the art or as described herein. The nomenclature used in connection with, and the laboratory procedures and techniques of, analytical chemistry, synthetic organic chemistry, and medicinal and pharmaceutical chemistry described herein are those well-known and commonly used in the art. Standard techniques are used for chemical syntheses, chemical analyses, pharmaceutical preparation, formulation, delivery, and treatment of patients.
Unless otherwise defined herein, scientific and technical terms used herein have the meanings that are commonly understood by those of ordinary skill in the art. In the event of any latent ambiguity, definitions provided herein take precedent over any dictionary or extrinsic definition. Unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular. The use of “or” means “and/or” unless stated otherwise. The use of the term “including,” as well as other forms, such as “includes” and “included,” is not limiting.
So that the disclosure may be more readily understood, certain terms are first defined.
To facilitate the understanding of this invention, a number of terms are defined below. Terms defined herein have meanings as commonly understood by a person of ordinary skill in the areas relevant to the present invention. Terms such as “a”, “an” and “the” are not intended to refer to only a singular entity, but include the general class of which a specific example may be used for illustration. The terminology herein is used to describe specific embodiments of the invention, but their usage does not delimit the invention, except as outlined in the claims.
As used herein, the term “edit” “editing” or “edited” refers to a method of altering a nucleic acid sequence of a polynucleotide (e.g., for example, a wild type naturally occurring nucleic acid sequence or a mutated naturally occurring sequence) by selective deletion of a specific genomic target. Such a specific genomic target includes, but is not limited to, a chromosomal region, a gene, a promoter, an open reading frame or any nucleic acid sequence.
As used herein, the term “single base” refers to one, and only one, nucleotide within a nucleic acid sequence. When used in the context of single base editing, it is meant that the base at a specific position within the nucleic acid sequence is replaced with a different base. This replacement may occur by many mechanisms, including but not limited to, substitution or modification.
As used herein, the term “target” or “target site” refers to a pre-identified nucleic acid sequence of any composition and/or length. Such target sites include, but is not limited to, a chromosomal region, a gene, a promoter, an open reading frame or any nucleic acid sequence. In some embodiments, the present invention interrogates these specific genomic target sequences with complementary sequences of gRNA.
The term “on-target binding sequence” as used herein, refers to a subsequence of a specific genomic target that may be completely complementary to a programmable DNA binding domain and/or a single guide RNA sequence.
The term “off-target binding sequence” as used herein, refers to a subsequence of a specific genomic target that may be partially complementary to a programmable DNA binding domain and/or a single guide RNA sequence.
The terms “reduce,” “inhibit,” “diminish,” “suppress,” “decrease,” “prevent” and grammatical equivalents (including “lower,” “smaller,” etc.) when in reference to the expression of any symptom in an untreated subject relative to a treated subject, mean that the quantity and/or magnitude of the symptoms in the treated subject is lower than in the untreated subject by any amount that is recognized as clinically relevant by any medically trained personnel. In one embodiment, the quantity and/or magnitude of the symptoms in the treated subject is at least 10% lower than, at least 25% lower than, at least 50% lower than, at least 75% lower than, and/or at least 90% lower than the quantity and/or magnitude of the symptoms in the untreated subject.
The term “attached” as used herein, refers to any interaction between a medium (or carrier) and a drug. Attachment may be reversible or irreversible. Such attachment includes, but is not limited to, covalent bonding, ionic bonding, Van der Waals forces or friction, and the like. A drug is attached to a medium (or carrier) if it is impregnated, incorporated, coated, in suspension with, in solution with, mixed with, etc.
The term “administered” or “administering”, as used herein, refers to any method of providing a composition to a patient such that the composition has its intended effect on the patient. An exemplary method of administering is by a direct mechanism such as, local tissue administration (i.e., for example, extravascular placement), oral ingestion, transdermal patch, topical, inhalation, suppository etc.
The term “patient” or “subject”, as used herein, is a human or animal and need not be hospitalized. For example, out-patients, persons in nursing homes are “patients.” A patient may comprise any age of a human or non-human animal and therefore includes both adult and juveniles (i.e., children). It is not intended that the term “patient” connote a need for medical treatment, therefore, a patient may voluntarily or involuntarily be part of experimentation whether clinical or in support of basic science studies.
The term “affinity” as used herein, refers to any attractive force between substances or particles that causes them to enter into and remain in chemical combination. For example, an inhibitor compound that has a high affinity for a receptor will provide greater efficacy in preventing the receptor from interacting with its natural ligands, than an inhibitor with a low affinity.
The term “pharmaceutically” or “pharmacologically acceptable”, as used herein, refer to molecular entities and compositions that do not produce adverse, allergic, or other untoward reactions when administered to an animal or a human.
The term, “pharmaceutically acceptable carrier”, as used herein, includes any and all solvents, or a dispersion medium including, but not limited to, water, ethanol, polyol (for example, glycerol, propylene glycol, and liquid polyethylene glycol, and the like), suitable mixtures thereof, and vegetable oils, coatings, isotonic and absorption delaying agents, liposome, commercially available cleansers, and the like. Supplementary bioactive ingredients also can be incorporated into such carriers.
The term “viral vector” encompasses any nucleic acid construct derived from a virus genome capable of incorporating heterologous nucleic acid sequences for expression in a host organism. For example, such viral vectors may include, but are not limited to, adeno-associated viral vectors, lentiviral vectors, SV40 viral vectors, retroviral vectors, adenoviral vectors. Although viral vectors are occasionally created from pathogenic viruses, they may be modified in such a way as to minimize their overall health risk. This usually involves the deletion of a part of the viral genome involved with viral replication. Such a virus can efficiently infect cells but, once the infection has taken place, the virus may require a helper virus to provide the missing proteins for production of new virions. Preferably, viral vectors should have a minimal effect on the physiology of the cell it infects and exhibit genetically stable properties (e.g., do not undergo spontaneous genome rearrangement). Most viral vectors are engineered to infect as wide a range of cell types as possible. Even so, a viral receptor can be modified to target the virus to a specific kind of cell. Viruses modified in this manner are said to be pseudotyped. Viral vectors are often engineered to incorporate certain genes that help identify which cells took up the viral genes. These genes are called marker genes. For example, a common marker gene confers antibiotic resistance to a certain antibiotic.
As used herein, the term “genetic disease” refers to any medical condition having a primary causative factor of a mutated gene. The gene mutation may comprise a nucleic acid sequence wherein at least one, if not more, nucleotides are not wild type.
As used herein, the term “CRISPRs” or “Clustered Regularly Interspaced Short Palindromic Repeats” refers to an acronym for DNA loci that contain multiple, short, direct repetitions of base sequences. Each repetition contains a series of bases followed by 30 or so base pairs known as “spacer DNA”. The spacers are short segments of DNA from a virus and may serve as a ‘memory’ of past exposures to facilitate an adaptive defense against future invasions.
As used herein, the term “Cas” or “CRISPR-associated (cas)” refers to genes often associated with CRISPR repeat-spacer arrays.
As used herein, the term “Cas9” refers to a nuclease from Type II CRISPR systems, an enzyme specialized for generating double-strand breaks in DNA, with two active cutting sites (the HNH and RuvC domains), one for each strand of the double helix. Jinek combined tracrRNA and spacer RNA into a “single-guide RNA” (sgRNA) molecule that, mixed with Cas9, could find and cleave DNA targets through Watson-Crick pairing between the guide sequence within the sgRNA and the target DNA sequence.
As used herein, the term “N-terminal domain” refers to the fusion of a first peptide or protein at the N-terminal end of a second peptide or protein. For example, a nucleotide deaminase protein may be “N-terminally” fused to the last amino acid of a Cas9 nuclease protein.
As used herein, the term “inlaid domain” refers to the fusion of a first protein between the N-terminal and C-terminal ends of a second protein. For example, a nucleotide deaminase protein is an “inlaid domain” when inserted between the N-terminal and C-terminal ends of a Cas9 nuclease protein.
The term “protospacer adjacent motif” (or PAM) as used herein, refers to a DNA sequence that may be required for a Cas9/sgRNA to form an R-loop to interrogate a specific DNA sequence through Watson-Crick pairing of its guide RNA with the genome. The PAM specificity may be a function of the DNA-binding specificity of the Cas9 protein (e.g., a “protospacer adjacent motif recognition domain” at the C-terminus of Cas9).
As used herein, the term “sgRNA” refers to single guide RNA used in conjunction with CRISPR associated systems (Cas). sgRNAs are a fusion of crRNA and tracrRNA and contain nucleotides of sequence complementary to the desired target site. Jinek et al., “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity” Science 337(6096):816-821 (2012) Watson-Crick pairing of the sgRNA with the target site permits R-loop formation, which in conjunction with a functional PAM permits DNA cleavage or in the case of nuclease-deficient Cas9 allows binds to the DNA at that locus.
As used herein, the term “fluorescent protein” refers to a protein domain that comprises at least one organic compound moiety that emits fluorescent light in response to the appropriate wavelengths. For example, fluorescent proteins may emit red, blue and/or green light. Such proteins are readily commercially available including, but not limited to: i) mCherry (Clonetech Laboratories): excitation: 556/20 nm (wavelength/bandwidth); emission: 630/91 nm; ii) sfGFP (Invitrogen): excitation: 470/28 nm; emission: 512/23 nm; iii) TagBFP (Evrogen): excitation 387/11 nm; emission 464/23 nm.
As used herein, the term “sgRNA” refers to single guide RNA used in conjunction with CRISPR associated systems (Cas). sgRNAs contains nucleotides of sequence complementary to the desired target site. Watson-crick pairing of the sgRNA with the target site recruits the nuclease-deficient Cas9 to bind the DNA at that locus.
As used herein, the term “orthogonal” refers targets that are non-overlapping, uncorrelated, or independent. For example, if two orthogonal nuclease-deficient Cas9 gene fused to different effector domains were implemented, the sgRNAs coded for each would not cross-talk or overlap. Not all nuclease-deficient Cas9 genes operate the same, which enables the use of orthogonal nuclease-deficient Cas9 gene fused to a different effector domains provided the appropriate orthogonal sgRNAs.
As used herein, the term “phenotypic change” or “phenotype” refers to the composite of an organism's observable characteristics or traits, such as its morphology, development, biochemical or physiological properties, phenology, behavior, and products of behavior. Phenotypes result from the expression of an organism's genes as well as the influence of environmental factors and the interactions between the two.
“Nucleic acid sequence” and “nucleotide sequence” as used herein refer to an oligonucleotide or polynucleotide, and fragments or portions thereof, and to DNA or RNA of genomic or synthetic origin which may be single- or double-stranded, and represent the sense or antisense strand.
The term “an isolated nucleic acid”, as used herein, refers to any nucleic acid molecule that has been removed from its natural state (e.g., removed from a cell and is, in a preferred embodiment, free of other genomic nucleic acid).
The terms “amino acid sequence” and “polypeptide sequence” as used herein, are interchangeable and to refer to a sequence of amino acids.
As used herein the term “portion” when in reference to a protein (as in “a portion of a given protein”) refers to fragments of that protein. The fragments may range in size from four amino acid residues to the entire amino acid sequence minus one amino acid.
The term “portion” when used in reference to a nucleotide sequence refers to fragments of that nucleotide sequence. The fragments may range in size from 5 nucleotide residues to the entire nucleotide sequence minus one nucleic acid residue.
As used herein, the terms “complementary” or “complementarity” are used in reference to “polynucleotides” and “oligonucleotides” (which are interchangeable terms that refer to a sequence of nucleotides) related by the base-pairing rules. For example, the sequence “C-A-G-T,” is complementary to the sequence “G-T-C-A.” Complementarity can be “partial” or “total.” “Partial” complementarity is where one or more nucleic acid bases is not matched according to the base pairing rules. “Total” or “complete” complementarity between nucleic acids is where each and every nucleic acid base is matched with another base under the base pairing rules. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. This is of particular importance in amplification reactions, as well as detection methods which depend upon binding between nucleic acids.
The terms “homology” and “homologous” as used herein in reference to nucleotide sequences refer to a degree of complementarity with other nucleotide sequences. There may be partial homology or complete homology (i.e., identity). A nucleotide sequence which is partially complementary, i.e., “substantially homologous,” to a nucleic acid sequence is one that at least partially inhibits a completely complementary sequence from hybridizing to a target nucleic acid sequence. The inhibition of hybridization of the completely complementary sequence to the target sequence may be examined using a hybridization assay (Southern or Northern blot, solution hybridization and the like) under conditions of low stringency. A substantially homologous sequence or probe will compete for and inhibit the binding (i.e., the hybridization) of a completely homologous sequence to a target sequence under conditions of low stringency. This is not to say that conditions of low stringency are such that non-specific binding is permitted; low stringency conditions require that the binding of two sequences to one another be a specific (i.e., selective) interaction. The absence of non-specific binding may be tested by the use of a second target sequence which lacks even a partial degree of complementarity (e.g., less than about 30% identity); in the absence of non-specific binding the probe will not hybridize to the second non-complementary target.
The terms “homology” and “homologous” as used herein in reference to amino acid sequences refer to the degree of identity of the primary structure between two amino acid sequences. Such a degree of identity may be directed to a portion of each amino acid sequence, or to the entire length of the amino acid sequence. Two or more amino acid sequences that are “substantially homologous” may have at least 50% identity, preferably at least 75% identity, more preferably at least 85% identity, most preferably at least 95%, or 100% identity.
An oligonucleotide sequence which is a “homolog” is defined herein as an oligonucleotide sequence which exhibits greater than or equal to 50% identity to a sequence, when sequences having a length of 100 bp or larger are compared.
Low stringency conditions comprise conditions equivalent to binding or hybridization at 42° C. in a solution consisting of 5×SSPE (43.8 g/L NaCl, 6.9 g/L NaH2PO4 H2O and 1.85 g/L EDTA, pH adjusted to 7.4 with NaOH), 0.1% SDS, 5×Denhardt's reagent {50×Denhardt's contains per 500 mL: 5 g Ficoll (Type 400, Pharmacia), 5 g BSA (Fraction V; Sigma)} and 100 g/mL denatured salmon sperm DNA followed by washing in a solution comprising 5×SSPE, 0.1% SDS at 42° C. when a probe of about 500 nucleotides in length is employed. Numerous equivalent conditions may also be employed to comprise low stringency conditions; factors such as the length and nature (DNA, RNA, base composition) of the probe and nature of the target (DNA, RNA, base composition, present in solution or immobilized, etc.) and the concentration of the salts and other components (e.g., the presence or absence of formamide, dextran sulfate, polyethylene glycol), as well as components of the hybridization solution may be varied to generate conditions of low stringency hybridization different from, but equivalent to, the above listed conditions. In addition, conditions which promote hybridization under conditions of high stringency (e.g., increasing the temperature of the hybridization and/or wash steps, the use of formamide in the hybridization solution, etc.) may also be used.
As used herein, the term “hybridization” is used in reference to the pairing of complementary nucleic acids using any process by which a strand of nucleic acid joins with a complementary strand through base pairing to form a hybridization complex. Hybridization and the strength of hybridization (i.e., the strength of the association between the nucleic acids) is impacted by such factors as the degree of complementarity between the nucleic acids, stringency of the conditions involved, the Tm of the formed hybrid, and the G:C ratio within the nucleic acids.
As used herein the term “hybridization complex” refers to a complex formed between two nucleic acid sequences by virtue of the formation of hydrogen bounds between complementary G and C bases and between complementary A and T bases; these hydrogen bonds may be further stabilized by base stacking interactions. The two complementary nucleic acid sequences hydrogen bond in an antiparallel configuration. A hybridization complex may be formed in solution (e.g., C0 t or R0 t analysis) or between one nucleic acid sequence present in solution and another nucleic acid sequence immobilized to a solid support (e.g., a nylon membrane or a nitrocellulose filter as employed in Southern and Northern blotting, dot blotting or a glass slide as employed in in situ hybridization, including FISH (fluorescent in situ hybridization)).
DNA molecules are said to have “5′ ends” and “3′ ends” because mononucleotides are reacted to make oligonucleotides in a manner such that the 5′ phosphate of one mononucleotide pentose ring is attached to the 3′ oxygen of its neighbor in one direction via a phosphodiester linkage. Therefore, an end of an oligonucleotide is referred to as the “5′ end” if its 5′ phosphate is not linked to the 3′ oxygen of a mononucleotide pentose ring. An end of an oligonucleotide is referred to as the “3′ end” if its 3′ oxygen is not linked to a 5′ phosphate of another mononucleotide pentose ring. As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide, also may be said to have 5′ and 3′ ends. In either a linear or circular DNA molecule, discrete elements are referred to as being “upstream” or 5′ of the “downstream” or 3′ elements. This terminology reflects the fact that transcription proceeds in a 5′ to 3′ fashion along the DNA strand. The promoter and enhancer elements which direct transcription of a linked gene are generally located 5′ or upstream of the coding region. However, enhancer elements can exert their effect even when located 3′ of the promoter element and the coding region. Transcription termination and polyadenylation signals are located 3′ or downstream of the coding region.
The term “transfection” or “transfected” refers to the introduction of foreign DNA into a cell.
As used herein, the terms “nucleic acid molecule encoding”, “DNA sequence encoding,” and “DNA encoding” refer to the order or sequence of deoxyribonucleotides along a strand of deoxyribonucleic acid. The order of these deoxyribonucleotides determines the order of amino acids along the polypeptide (protein) chain. The DNA sequence thus codes for the amino acid sequence.
As used herein, the term “gene” means the deoxyribonucleotide sequences comprising the coding region of a structural gene and including sequences located adjacent to the coding region on both the 5′ and 3′ ends for a distance of about 1 kb on either end such that the gene corresponds to the length of the full-length mRNA. The sequences which are located 5′ of the coding region and which are present on the mRNA are referred to as 5′ non-translated sequences. The sequences which are located 3′ or downstream of the coding region and which are present on the mRNA are referred to as 3′ non-translated sequences. The term “gene” encompasses both cDNA and genomic forms of a gene. A genomic form or clone of a gene contains the coding region interrupted with non-coding sequences termed “introns” or “intervening regions” or “intervening sequences.” Introns are segments of a gene which are transcribed into heterogeneous nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or “spliced out” from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. The mRNA functions during translation to specify the sequence or order of amino acids in a nascent polypeptide.
N. meningitidis Cas9 RNA-Guided Nucleases
N. meningitidis RNA-guided nucleases (e.g., Nme Cas9 or NmCas9) according to the present disclosure include, without limitation, any Cas9 nuclease obtained from N. meningitidis (e.g., Nme1Cas9, Nme2Cas9, or Nme3Cas9), as well as other Cas9 nucleases derived or obtained therefrom. N. meningitidis Cas9 nucleases belong to the Type II-C Cas9 nucleases, which are generally less than 1,100 amino acids in length and are capable of genome editing, including genome editing in mammalian cells. In functional terms, N. meningitidis RNA-guided nucleases are defined as those nucleases that: (a) interact with (e.g., complex with) a gRNA; and (b) together with the gRNA, associate with, and optionally cleave or modify, a target region of a DNA that includes (i) a sequence complementary to the targeting domain of the gRNA and, optionally, (ii) an additional sequence referred to as a “protospacer adjacent motif,” or “PAM,” which is described in greater detail below. As the following examples will illustrate, RNA-guided nucleases can be defined, in broad terms, by their PAM specificity and cleavage activity, even though variations may exist between individual RNA-guided nucleases that share the same PAM specificity or cleavage activity. Skilled artisans will appreciate that some aspects of the present disclosure relate to systems, methods and compositions that can be implemented using any suitable RNA-guided nuclease having a certain PAM specificity and/or cleavage activity. For this reason, unless otherwise specified, the term RNA-guided nuclease should be understood as a generic term, and not limited to any particular type (e.g., Nme1Cas9, Nme2Cas9, or Nme3Cas9), or variation (e.g., full-length vs. truncated or split; naturally-occurring PAM specificity vs. engineered PAM specificity). The PAM sequence recognized by the Nme2Cas9 nucleases of the disclosure include N4CC (see, Sun et al., supra; Edraki et al., supra).
Nme Cas9 nucleases are described in further detail in Esvelt et al. (Nat. Methods. 10: 1116-1121. 2013); Hou et al. (PNAS. 110: 15644-15649. 2013); Lee et al. (Mol. Thera. 24: 645-654. 2016); Amrani et al. (Genome Biol. 19: 214. 2018); Edraki et al. (Mol. Cell. 73: 714-726. 2019); U.S. Patent Publication 2014/0349405; U.S. Pat. No. 10,190,106; U.S. Patent Publication 2018/0355331; and U.S. Patent Publication 2019/0338308, each of which is incorporated herein by reference.
Protospacer adjacent motif (PAM) recognition by Cas9 orthologs occurs predominantly through protein-DNA interactions between the PAM Interacting Domain (PID) and the nucleotides adjacent to the protospacer (Jiang and Doudna, 2017). PAM mutations often enable phage escape from type II CRISPR immunity (Paez-Espino et al., 2015), placing these systems under selective pressure not only to acquire new CRISPR spacers, but also to evolve new PAM specificities via PID mutations. In addition, some phages and MGEs express anti-CRISPR (Acr) proteins that inhibit Cas9 (Pawluk et al., 2016; Hynes et al., 2017; Rauch et al., 2017). PID binding is an effective inhibitory mechanism adopted by some Acrs (Dong et al., 2017; Shin et al., 2017; Yang and Patel, 2017), suggesting that PID variation may also be driven by selective pressure to escape Acr inhibition. Cas9 PIDs can evolve such that closely-related orthologs recognize distinct PAMs, as illustrated recently in two species of Geobacillus. The Cas9 encoded by G. stearothermophilus recognizes a N4CRAA PAM, but when its PID was swapped with that of strain LC300's Cas9, its PAM requirement changed to N4GMAA (Harrington et al., 2017b).
In one embodiment, the present disclosure contemplates a chimeric Nme2Cas9 protein in which the Nme2Cas9 PID is replaced with the PID of Simonsiella muelleri Cas9 (SmuCas9). This chimeric Nme2Cas9 is designated Nme2SmuCas9 herein. The PAM recognized by Nme2SmuCas9 is expanded beyond N4CC (the WT Nme2Cas9 PAM), to N4CN (e.g., N4CC, N4CT, N4CG, and N4CA), thereby greatly expanding the number of potential target sites in the genome. Exemplary Nme2Cas9 and Nme2SmuCas9 amino acid sequences are provided herein in Table 1. Nme2SmuCas9 is described in further detail in PCT/US22/48261, incorporated herein by reference.
In certain embodiments, the Nme2Cas9 (e.g., the Nme2Cas9 variant) comprises a PID that interacts with an N4CC nucleotide sequence, an N4CA nucleotide sequence, an N4CG nucleotide sequence, an N4CT nucleotide sequence, or an N4C nucleotide sequence.
In certain embodiments, the PID is an Nme2Cas9 PID or an SmuCas9 PID.
In certain embodiments, the Nme2Cas9 PID comprises an amino acid sequence set forth in (DNGDMVRVDVFCKVDKKGKNQYFIVPIYAWQVAENILPDIDCKGYRIDDSYTFCFSL HKYDLIAFQKDEKSKVEFAYYINCDSSNGRFYLAWHDKGSKEQQFRISTQNLVLIQKY QVNELGKEIRPCRLKKRPPVR) (SEQ ID NO:27).
In certain embodiments, the SmuCas9 PID comprises an amino acid sequence set forth in (DNATMVRVDVYTKAGKNYLVPVYVWQVAQGILPNRAVTSGKSEADWDLIDESFEFK FSLSRGDLVEMISNKGRIFGYYNGLDRANGSIGIREHDLEKSKGKDGVHRVGVKTATA FNKYHVDPLGKEIHRCSSEPRPTLKIKSKK) (SEQ ID NO:28).
Described herein are Nme2Cas9 and Nme2SmuCas9 variants with increased genome editing activities (e.g., nuclease and base editing efficiencies) in mammalian cells. Specific amino acid substitutions were selected by rational design and screening that increased editing activities of Nme2Cas9 and Nme2SmuCas9 for both nuclease editing and base editing.
In certain embodiments, one or more amino acid substitutions are introduced into amino acids that contact the target strand (TS) DNA. In certain embodiments, one or more amino acid substitutions are introduced into amino acids that contact the non-target strand (NTS) DNA. In certain embodiments, one or more amino acid substitutions are introduced into amino acids that contact the sgRNA (SG). This is exemplified in
In one aspect, the disclosure provides a Neisseria meningitidis (Nme) 2 Cas9 (Nme2Cas9) variant comprising an amino acid substitution at one or more positions selected from the group consisting of E520, D873, D418, E471, D442, E844, E443, D470, E585, E552, D451, E587, E508, E932, D56, D1048, E1079, D660, E887, T72, and E186. In certain embodiments, the target strand contacting positions correspond to E520, D873, D418, E471, D442, E844, E443, D470, E585, E552, D451, E587, and E508. In certain embodiments, the non-target strand and sgRNA contacting positions correspond to E932, D56, D1048, E1079, D660, E887, T72, and E186. The recited amino acid positions are relative to an amino acid sequence of SEQ ID NO: 1 (WT Nme2Cas9) or SEQ ID NO: 2 (Nme2SmuCas9). All of the recited amino acid positions are present in both Nme2Cas9 and Nme2SmuCas9, with the exception of positions D1048 and E1079, which are only present in the Smu PID of Nme2SmuCas9.
In certain embodiments, the Nme2Cas9 or Nme2SmuCas9 comprises 1, 2, 3, 4, or 5 amino acid substitutions (i.e., 1, 2, 3, 4, or 5 amino acid substitutions from the amino acid positions of E520, D873, D418, E471, D442, E844, E443, D470, E585, E552, D451, E587, E508, E932, D56, D1048, E1079, D660, E887, T72, and E186).
In certain embodiments, the Nme2Cas9 or Nme2SmuCas9 comprises or consists of amino acid substitutions at positions E932 and D873; E932 and D56; E932 and E520; E932 and D1048; D873 and D56; D873 and E520; D873 and D1048; D56 and E520; D56 and D1048; E520 and D1048; E932, D873, and D56; E932, D873, and E520; E932, D873, and D1048; E932, D56, and E520; E932, D56, and D1048; E932, E520, and D1048; D873, D56, and E520; D873, D56, and D1048; D873, E520, and D1048; D56, E520, and D1048; E932, D873, D56, and E520; E932, D873, D56, and D1048; E932, D56, E520, and D1048; D873, D56, E520, and D1048; or E932, D873, D56, E520, and D1048.
In certain embodiments, the amino acid substitution is a positively charged amino acid. In certain embodiments, the amino acid substitution is an arginine (R), lysine (K), or histidine (H). In certain embodiments, the amino acid substitution is an arginine (R). In certain embodiments, the Nme2Cas9 or Nme2SmuCas9 comprises an amino acid substitution of any one or more of E520R, D873R, D418R, E471R, D442R, E844R, E443R, D470R, E585R, E552R, D451R, E587R, E508R, E932R, D56R, D1048R, E1079R, D660R, E887R, T72R, and E186R.
In certain embodiments, the Nme2Cas9 or Nme2SmuCas9 comprises an amino acid substitution of any one or more of E520R, D873R, D418R, E471R, D442R, E844R, E443R, E932R, D56R, D1048R, E1079R, D660R, E887R, T72R, and E186R.
In certain embodiments, the Nme2Cas9 or Nme2SmuCas9 comprises amino acid substitutions E932R and D873R; E932R and D56R; E932R and E520R; E932R and D1048R; D873R and D56R; D873R and E520R; D873R and D1048R; D56R and E520R; D56R and D1048R; E520R and D1048R; E932R, D873R, and D56R; E932R, D873R, and E520R; E932R, D873R, and D1048R; E932R, D56R, and E520R; E932R, D56R, and D1048R; E932R, E520R, and D1048R; D873R, D56R, and E520R; D873R, D56R, and D1048R; D873R, E520R, and D1048R; D56R, E520R, and D1048R; E932R, D873R, D56R, and E520R; E932R, D873R, D56R, and D1048R; E932R, D56R, E520R, and D1048R; D873R, D56R, E520R, and D1048R; or E932R, D873R, D56R, E520R, and D1048R. 3′
The Nme2Cas9 and Nme2SmuCas9 variants described herein may serve as the Cas9 domain of a base editor fusion protein. Nucleotide base editors (NBEs), such as cytosine and adenine base editors (CBEs and ABEs) were developed as a way to precisely correct point mutations without inducing double-strand breaks or requiring a DNA donor. Base editor fusion proteins are comprised of a catalytically impaired Cas9 domain that is completely inactive or cleaves only one strand (a.k.a. dead/dCas9 or nickase/nCas9, respectively) fused to one or more cytosine deaminase (CBE) or adenine deaminase (ABE) domains. For efficient base editing to occur, the Cas9 base editor fusion must recognize a short sequence motif, called a PAM, adjacent to the target site, and a target adenine within an “editing window” upstream of PAM. The PAM and editing window are defined by the Cas domain, deaminase, and the type of fusion between the two effectors.
In certain embodiments, the Nme2Cas9 and Nme2SmuCas9 variants of the disclosure further comprises a nucleotide base editor (NBE) domain fused to the Nme2Cas9 variant or Nme2SmuCas9 variant.
In certain embodiments, the NBE domain (i.e., an inlaid NBE domain or terminal NBE domain) is an adenine base editor (ABE) domain. In certain embodiments, the ABE domain is an inlaid adenosine deaminase protein domain. In certain embodiments, the adenosine deaminase protein domain is an adenosine deaminase 8e protein domain (TadA8e). In certain embodiments, the TadA8e comprises an amino acid sequence set forth in SEQ ID NO: 9 (SEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNRAIGLHDPTAH AEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNSKRGAA GSLMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPRQVFNAQKKAQSSIN), or an amino acid sequence comprising at least 80% identity (i.e., 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, Or 100% identity) to SEQ ID NO: 9.
In certain embodiments, the NBE domain (i.e., an inlaid NBE domain or terminal NBE domain) is a cytidine base editor (CBE) domain. In certain embodiments, the CBE domain is an inlaid cytosine deaminase protein domain. In certain embodiments, the cytosine deaminase protein domain is evoFERNY or rAPOBEC1. In certain embodiments, the evoFERNY comprises an amino acid sequence set forth in SEQ ID NO: 13 (FERNYDPRELRKETYLLYEIKWGKSGKLWRHWCQNNRTQHAEVYFLENIFNARRFNP STHCSITWYLSWSPCAECSQKIVDFLKEHPNVNLEIYVARLYYPENERNRQGLRDLVN SGVTIRIMDLPDYNYCWKTFVSDQGGDEDYWPGHFAPWIKQYSLKL) or an amino acid sequence comprising at least 80% identity (i.e., 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, Or 100% identity) to SEQ ID NO: 13.
In certain embodiments, the rAPOBEC1 comprises an amino acid sequence set forth in SEQ ID NO: 11 (SSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHSIWRHTSQNTNK HVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYH HADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLY VLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLK) or an amino acid sequence comprising at least 80% identity (i.e., 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, Or 100% identity) to SEQ ID NO: 11.
Where the percent identity of any one of SEQ ID NO: 9, 11, and 13 is less than 100%, it will be understood that the NBE domain will retain the base editing activity described herein.
In certain embodiments, the Nme2Cas9 variant or Nme2SmuCas9 variant further comprises a uracil glycosylase inhibitor (UGI). A UGI may be expressed as a separate protein or also linked to the fusion protein comprising the Nme2Cas9 protein and NBE domain. The UGI is capable of enhancing the base editing activity of a CBE domain. The CBE domain mediates a C to T change by creating a U on the free DNA strand. This U may be transformed into an apurinic/apyrimidinic (AP) site by various DNA glycosylases. A UGI may prevent the transformation of the U into an AP.
In certain embodiments, the NBE domain is an inlaid NBE domain inserted into the Nme2Cas9 variant or Nme2SmuCas9 variant.
In certain embodiments, the inlaid NBE domain is inserted into a REC domain of the Nme2Cas9 variant or Nme2SmuCas9 variant.
In certain embodiments, the inlaid NBE domain is inserted into a HNH domain of the Nme2Cas9 variant or Nme2SmuCas9 variant.
In certain embodiments, the inlaid NBE domain is inserted into a RuvC domain of the Nme2Cas9 variant or Nme2SmuCas9 variant.
In certain embodiments, the inlaid NBE domain is inserted between amino acid position 291 and amino acid position 292 of the Nme2Cas9 variant or Nme2SmuCas9 variant, relative to an amino acid sequence of SEQ ID NO: 1 or 2. Base editor fusion proteins with an inlaid NBE domain inserted between amino acid position 291 and amino acid position 292 are referred to herein as NBE-i1 base editors (such as ABE8e-i1).
In certain embodiments, the inlaid NBE domain is inserted between amino acid position 761 and amino acid position 762 of the Nme2Cas9 variant or Nme2SmuCas9 variant, relative to an amino acid sequence of SEQ ID NO: 1 or 2. Base editor fusion proteins with an inlaid NBE domain inserted between amino acid position 761 and amino acid position 762 are referred to herein as NBE-i7 base editors (such as ABE8e-i7).
In certain embodiments, the inlaid NBE domain is inserted between amino acid position 795 and amino acid position 796 of the Nme2Cas9 variant or Nme2SmuCas9 variant, relative to an amino acid sequence of SEQ ID NO: 1 or 2. Base editor fusion proteins with an inlaid NBE domain inserted between amino acid position 795 and amino acid position 796 are referred to herein as NBE-i8 base editors (such as ABE8e-i8).
The inlaid NBE domain may be flanked at the NBE domain N-terminus and/or NBE domain C-terminus by an amino acid linker. In other embodiments, the NBE domain may be directly linked (i.e., no amino acid linker) to the Nme2Cas9 variant or Nme2SmuCas9 variant at the inlaid position (i.e., between amino acid positions 291 and 292, 761 and 762, or 795 and 796).
In certain embodiments, the amino acid linker comprises a (GGS)n(SEQ ID NO:41) linker, wherein n corresponds to 1-7. In certain embodiments, the amino acid linker comprises GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15). In certain embodiments, the amino acid linker comprises GGS. In certain embodiments, the amino acid linker comprises GGSGGS(SEQ ID NO:36). In certain embodiments, the amino acid linker comprises GGSGGSGGS(SEQ ID NO:37). In certain embodiments, the amino acid linker comprises GGSGGSGGSGGS(SEQ ID NO:38). In certain embodiments, the amino acid linker comprises GGSGGSGGSGGSGGS(SEQ ID NO:39). In certain embodiments, the amino acid linker comprises SGGSGGSGGS(SEQ ID NO: 17). In certain embodiments, the amino acid linker comprises GGSGG(SEQ ID NO: 19).
In certain embodiments, the amino acid linker consists of the six hydrophilic, chemically stable amino acids A, E, G, P, S and T. In certain embodiments, the amino acid linker comprises GSSGSETPGTSESATPESSG(SEQ ID NO: 21). In certain embodiments, the amino acid linker comprises ETPGTSESAT(SEQ ID NO: 23). In certain embodiments, the amino acid linker comprises GTSES(SEQ ID NO: 25).
In certain embodiments, the amino acid linker comprises a sequence selected from the group consisting of: GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15), SGGSGGSGGS(SEQ ID NO: 17), GGSGG(SEQ ID NO: 19), GSSGSETPGTSESATPESSG(SEQ ID NO: 21), ETPGTSESAT(SEQ ID NO: 23), and GTSES(SEQ ID NO: 25).
In certain embodiments, the amino acid linker comprises ED.
In certain embodiments, the inlaid NBE domain is flanked at the inlaid NBE domain N-terminus by an amino acid linker and the inlaid NBE domain C-terminus lacks an amino acid linker. In certain embodiments, the inlaid NBE domain is flanked at the inlaid NBE domain C-terminus by an amino acid linker and the inlaid NBE domain N-terminus lacks an amino acid linker. In certain embodiments, the amino acid linker at the inlaid NBE domain N-terminus is different than the amino acid linker at the inlaid NBE domain C-terminus. In certain embodiments, the amino acid linker at the inlaid NBE domain N-terminus is identical to the amino acid linker at the inlaid NBE domain C-terminus.
In certain embodiments, the inlaid NBE domain is flanked at the inlaid NBE domain N-terminus by GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15) and at the inlaid NBE domain C-terminus by GSSGSETPGTSESATPESSG(SEQ ID NO: 21). In certain embodiments, the inlaid NBE domain is flanked at the inlaid NBE domain N-terminus by GSSGSETPGTSESATPESSG(SEQ ID NO: 21) and at the inlaid NBE domain C-terminus by GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15).
In certain embodiments, the NBE domain is linked via an amino acid linker to the N-terminus of the Nme2Cas9 variant or Nme2SmuCas9 variant (i.e., not an inlaid NBE domain).
In certain embodiments, the NBE domain is linked via an amino acid linker to the C-terminus of the Nme2Cas9 variant or Nme2SmuCas9 variant (i.e., not an inlaid NBE domain).
In certain embodiments, the amino acid linker comprises a (GGS)n(SEQ ID NO:41) linker, wherein n corresponds to 1-7. In certain embodiments, the amino acid linker comprises GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15). In certain embodiments, the amino acid linker comprises GGS. In certain embodiments, the amino acid linker comprises GGSGGS(SEQ ID NO:36). In certain embodiments, the amino acid linker comprises GGSGGSGGS(SEQ ID NO:37). In certain embodiments, the amino acid linker comprises GGSGGSGGSGGS(SEQ ID NO:38). In certain embodiments, the amino acid linker comprises GGSGGSGGSGGSGGS(SEQ ID NO:39). In certain embodiments, the amino acid linker comprises SGGSGGSGGS(SEQ ID NO: 17). In certain embodiments, the amino acid linker comprises GGSGG(SEQ ID NO: 19).
In certain embodiments, the amino acid linker consists of the six hydrophilic, chemically stable amino acids A, E, G, P, S and T. In certain embodiments, the amino acid linker comprises GSSGSETPGTSESATPESSG(SEQ ID NO: 21). In certain embodiments, the amino acid linker comprises ETPGTSESAT(SEQ ID NO: 23). In certain embodiments, the amino acid linker comprises GTSES(SEQ ID NO: 25).
In certain embodiments, the amino acid linker comprises a sequence selected from the group consisting of: GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15), SGGSGGSGGS(SEQ ID NO: 17), GGSGG(SEQ ID NO: 19), GSSGSETPGTSESATPESSG(SEQ ID NO: 21), ETPGTSESAT(SEQ ID NO: 23), and GTSES(SEQ ID NO: 25).
In certain embodiments, the amino acid linker comprises ED.
Adeno-associated viruses (AAVs) are useful viral vectors for the delivery of therapeutic proteins to subjects. However, the packaging size limit of an AAV is 4.8 kb to 5.0 kb, which includes the 5′ ITR and 3′ ITR sequences, the promoter sequence, and terminator sequence. The closer the AAV vector size is to 5.0 kb, the worse AAV packaging becomes. By way of example, an AAV9 Nme2Cas9-ABE-i1 has a vector size of−4.9 kb, right against the packaging limit, with the Nme2Cas9-ABE-i1 transgene contributing 3987 bp to the vector size. The Nme2SmuCas9-ABE-i1 is 4011 bp, 24 bp larger (8 amino acids) than the Nme2Cas9-ABE−i1. Accordingly, there exists a need to reduce the transgene size of the inlaid base editors described herein to improve AAV compatibility without sacrificing base editor activity. To achieve this result, the instant disclosure describes the optimization of amino acid linker length between the inlaid NBE domain and the Nme2Cas9.
In one aspect, the disclosure provides a fusion protein comprising a Neisseria meningitidis (Nme) 2 Cas9 (Nme2Cas9) protein and an inlaid nucleotide base editor (NBE) domain, wherein the inlaid NBE domain is flanked at the inlaid NBE domain N-terminus and/or C-terminus by an amino acid linker, or a linker is absent, and wherein the total number of amino acid linker residues is less than 40 amino acids.
Non-limiting examples of the Nme2Cas9 protein include a WT Nme2Cas9 protein, a chimeric Nme2SmuCas9 protein described herein, an Nme2Cas9 variant described herein, or a Nme2SmuCas9 variant described herein.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 20 amino acids, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 19 amino acids, 18 amino acids, 17 amino acids, 16 amino acids, 15 amino acids, 14 amino acids, 13 amino acids, 12 amino acids, 11 amino acids, 10 amino acids, 9 amino acids, 8 amino acids, 7 amino acids, 6 amino acids, 5 amino acids, 4 amino acids, 3 amino acids, 2 amino acids, 1 amino acid, or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 20 amino acids, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 10 amino acids, 5 amino acids, or is absent.
In certain embodiments, the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids, and the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 19 amino acids, 18 amino acids, 17 amino acids, 16 amino acids, 15 amino acids, 14 amino acids, 13 amino acids, 12 amino acids, 11 amino acids, 10 amino acids, 9 amino acids, 8 amino acids, 7 amino acids, 6 amino acids, 5 amino acids, 4 amino acids, 3 amino acids, 2 amino acids, 1 amino acid, or is absent.
In certain embodiments, the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids, and the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 10 amino acids, 5 amino acids, or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 10 amino acids, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids, 19 amino acids, 18 amino acids, 17 amino acids, 16 amino acids, 15 amino acids, 14 amino acids, 13 amino acids, 12 amino acids, 11 amino acids, 10 amino acids, 9 amino acids, 8 amino acids, 7 amino acids, 6 amino acids, 5 amino acids, 4 amino acids, 3 amino acids, 2 amino acids, 1 amino acid, or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 10 amino acids, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids, 10 amino acids, 5 amino acids, or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 5 amino acids, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids, 19 amino acids, 18 amino acids, 17 amino acids, 16 amino acids, 15 amino acids, 14 amino acids, 13 amino acids, 12 amino acids, 11 amino acids, 10 amino acids, 9 amino acids, 8 amino acids, 7 amino acids, 6 amino acids, 5 amino acids, 4 amino acids, 3 amino acids, 2 amino acids, 1 amino acid, or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is 5 amino acids, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids, 10 amino acids, 5 amino acids, or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is absent, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids, 19 amino acids, 18 amino acids, 17 amino acids, 16 amino acids, 15 amino acids, 14 amino acids, 13 amino acids, 12 amino acids, 11 amino acids, 10 amino acids, 9 amino acids, 8 amino acids, 7 amino acids, 6 amino acids, 5 amino acids, 4 amino acids, 3 amino acids, 2 amino acids, 1 amino acid, or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain is absent, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain is 20 amino acids, 10 amino acids, 5 amino acids, or is absent.
In certain embodiments, the amino acid linker comprises a sequence selected from the group consisting of: GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15), SGGSGGSGGS(SEQ ID NO: 17), GGSGG(SEQ ID NO: 19), GSSGSETPGTSESATPESSG(SEQ ID NO: 21), ETPGTSESAT(SEQ ID NO: 23), and GTSES(SEQ ID NO: 25).
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain comprises GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15), and the amino acid linker that is present at the C-terminus of the inlaid NBE domain comprises ETPGTSESAT(SEQ ID NO: 23), GTSES(SEQ ID NO: 25), or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain comprises SGGSGGSGGS(SEQ ID NO: 17), and the amino acid linker that is present at the C-terminus of the inlaid NBE domain comprises GSSGSETPGTSESATPESSG(SEQ ID NO: 21), ETPGTSESAT(SEQ ID NO: 23), GTSES(SEQ ID NO: 25), or is absent.
In certain embodiments, the amino acid linker that is present at the N-terminus of the inlaid NBE domain comprises GGSGG(SEQ ID NO: 19), and the amino acid linker that is present at the C-terminus of the inlaid NBE domain comprises GSSGSETPGTSESATPESSG(SEQ ID NO: 21), ETPGTSESAT(SEQ ID NO: 23), GTSES(SEQ ID NO: 25), or is absent.
In certain embodiments, the amino acid linker is absent at the N-terminus of the inlaid NBE domain, and the amino acid linker that is present at the C-terminus of the inlaid NBE domain comprises GSSGSETPGTSESATPESSG(SEQ ID NO: 21), ETPGTSESAT(SEQ ID NO: 23), GTSES(SEQ ID NO: 25), or is absent.
In certain embodiments, the amino acid linker that is present at the C-terminus of the inlaid NBE domain comprises GSSGSETPGTSESATPESSG(SEQ ID NO: 21), and the amino acid linker that is present at the N-terminus of the inlaid NBE domain comprises SGGSGGSGGS(SEQ ID NO: 17), GGSGG(SEQ ID NO: 19), or is absent.
In certain embodiments, the amino acid linker that is present at the C-terminus of the inlaid NBE domain comprises ETPGTSESAT(SEQ ID NO: 23), and the amino acid linker that is present at the N-terminus of the inlaid NBE domain comprises GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15), SGGSGGSGGS(SEQ ID NO: 17), GGSGG(SEQ ID NO: 19), or is absent.
In certain embodiments, the amino acid linker that is present at the C-terminus of the inlaid NBE domain comprises GTSES(SEQ ID NO: 25), and the amino acid linker that is present at the N-terminus of the inlaid NBE domain comprises GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15), SGGSGGSGGS(SEQ ID NO: 17), GGSGG(SEQ ID NO: 19), or is absent.
In certain embodiments, the amino acid linker is absent at the C-terminus of the inlaid NBE domain, and the amino acid linker that is present at the N-terminus of the inlaid NBE domain comprises GGSGGSGGSGGSGGSGGSGG(SEQ ID NO: 15), SGGSGGSGGS(SEQ ID NO: 17), GGSGG(SEQ ID NO: 19), or is absent.
Any of the Nme2Cas9 proteins described herein (i.e., WT Nme2Cas9, Nme2SmuCas9, Nme2Cas9 variants, Nme2SmuCas9 variants, and base editor fusions of the same), may further comprise one or more nuclear localization signals (NLS).
In certain embodiments, the NLS is any one or more of a nucleoplasmin NLS, an SV40 NLS or a C-myc NLS.
In certain embodiments, the NLS comprises an amino acid sequence selected from the group consisting of MKRTADGSEFESPKKKRKV(SEQ ID NO:30), KRTADGSEFEPKKKRKV(SEQ ID NO:31), MKRPAATKKAGQAKKKK(SEQ ID NO:32), KRPAATKKAGQAKKKK(SEQ ID NO:33), MPKKKRKV(SEQ ID NO:34), or PKKKRKV(SEQ ID NO:35).
In certain embodiments, the one or more NLS are positioned at the N-terminus and/or C-terminus of the Nme2Cas9 protein (i.e., WT Nme2Cas9, Nme2SmuCas9, Nme2Cas9 variant, Nme2SmuCas9 variant, and base editor fusions of the same).
Cas9 enzymes use their HNH and RuvC domains to cleave the guide-complementary and non-complementary strand of the target DNA, respectively. Cas9 nickases (nCas9s), in which either the HNH or RuvC domain is mutationally inactivated, have been used to induce homology-directed repair (HDR) and to improve genome editing specificity via DSB induction by dual nickases (Mali et al., 2013a; Ran et al., 2013).
Nme2Cas9 nickases include Nme2Cas9D16A (HNH nickase) and Nme2Cas9H588A (RuvC nickase), which possess alanine mutations in catalytic residues of the RuvC and HNH domains, respectively (Esvelt et al., 2013; Hou et al., 2013; Zhang et al., 2013).
As used herein, the term “guide RNA” or “gRNA” refer to any nucleic acid that promotes the specific association (or “targeting”) of an RNA-guided nuclease such as a Cas9 to a target sequence (e.g., a genomic or episomal sequence) in a cell.
As used herein, a “modular” or “dual RNA” guide comprises more than one, and typically two, separate RNA molecules, such as a CRISPR RNA (crRNA) and a trans-activating crRNA (tracrRNA), which are usually associated with one another, for example by duplexing. gRNAs and their component parts are described throughout the literature (see, e.g., Briner et al. Mol. Cell, 56(2), 333-339 (2014), which is incorporated by reference).
As used herein, a “unimolecular gRNA,” “chimeric gRNA,” or “single guide RNA (sgRNA)” comprises a single RNA molecule. The sgRNA may be a crRNA and tracrRNA linked together. For example, the 3′ end of the crRNA may be linked to the 5′ end of the tracrRNA. A crRNA and a tracrRNA may be joined into a single unimolecular or chimeric gRNA, for example, by means of a four nucleotide (e.g., GAAA) “tetraloop” or “linker” sequence bridging complementary regions of the crRNA (at its 3′ end) and the tracrRNA (at its 5′ end).
As used herein, a “repeat” sequence or region is a nucleotide sequence at or near the 3′ end of the crRNA which is complementary to an anti-repeat sequence of a tracrRNA.
As used herein, an “anti-repeat” sequence or region is a nucleotide sequence at or near the 5′ end of the tracrRNA which is complementary to the repeat sequence of a crRNA.
Additional details regarding guide RNA structure and function, including the gRNA/Cas9 complex for genome editing may be found in, at least, Mali et al. Science, 339(6121), 823-826 (2013); Jiang et al. Nat. Biotechnol. 31(3). 233-239 (2013); Jinek et al. Science, 337(6096), 816-821 (2012); and Sun et al. Mol. Cell, 76, 938-952 (2019), each of which are incorporated herein by reference.
As used herein, a “guide sequence” or “targeting sequence” refers to the nucleotide sequence of a gRNA, whether unimolecular or modular, that is fully or partially complementary to a target domain or target polynucleotide within a DNA sequence in the genome of a cell where editing is desired. Guide sequences are typically 10-30 nucleotides in length, preferably 16-26 nucleotides in length (for example, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, or 26 nucleotides in length), and are at or near the 5′ terminus of a Cas9 gRNA.
As used herein, a “target domain” or “target polynucleotide sequence” is the DNA sequence in a genome of a cell that is complementary to the guide sequence of the gRNA.
In addition to the targeting domains, gRNAs typically include a plurality of domains that influence the formation or activity of gRNA/Cas9 complexes. For example, as mentioned above, the duplexed structure formed by first and secondary complementarity domains of a gRNA (also referred to as a repeat: anti-repeat duplex) interacts with the recognition (REC) lobe of Cas9 and may mediate the formation of Cas9/gRNA complexes (Nishimasu et al. Cell 156: 935-949 (2014); Nishimasu et al. Cell 162(2), 1113-1126 (2015); Sun et al., supra, each incorporated by reference herein). It should be noted that the first and/or second complementarity domains can contain one or more poly-U tracts, which can be recognized by RNA polymerases as a termination signal. The sequence of the first and second complementarity domains are, therefore, optionally modified to eliminate these tracts and promote the complete in vitro transcription of gRNAs, for example through the use of A-G swaps as described in Briner 2014, or A-U swaps. These and other similar modifications to the first and second complementarity domains are within the scope of the present disclosure.
Along with the first and second complementarity domains, Cas9 gRNAs typically include two or more additional duplexed regions that are necessary for nuclease activity in vivo but not necessarily in vitro (Nishimasu 2015, supra; Sun et al., supra). A first stem-loop near the 3′ portion of the second complementarity domain is referred to variously as the “proximal domain,” “stem loop 1” (Nishimasu 2014, supra; Nishimasu 2015, supra; Sun et al., supra) and the “nexus” (Briner 2014, supra). One or more additional stem loop structures are generally present near the 3′ end of the gRNA, with the number varying by species: N. meningitidis gRNAs typically include two 3′ stem loops (for a total of four stem loop structures including the repeat: anti-repeat duplex), while S. aureus and other species have only one (for a total of three). A description of conserved stem loop structures (and gRNA structures more generally) organized by species is provided in Briner 2014, which is incorporated herein by reference. Additional details regarding guide RNAs generally may be found in WO2018026976A1, which is incorporated herein by reference.
In certain embodiments, the gRNA comprises: (a) a crRNA portion comprising (i) a guide sequence capable of hybridizing to a target polynucleotide sequence, and (ii) a repeat sequence; and (b) a tracrRNA portion comprising an anti-repeat nucleotide sequence that is complementary to the repeat sequence.
In certain embodiments, the gRNA comprises at least one modified nucleotide. Chemically modified guide RNAs of the disclosure contain one or more modified nucleotides comprising a modification in a ribose group, a phosphate group, a nucleobase, or a combination thereof.
Chemical modifications to the ribose group may include, but are not limited to, 2′-O-methyl, 2′-fluoro, 2′-deoxy, 2′-O-(2-methoxyethyl) (MOE), 2′-NH2 (2′-amino), 4′-thio, 2′-O-Allyl, 2′-O-Ethylamine, 2′-O-Cyanoethyl, 2′-O-Acetalester, or a bicyclic nucleotide, such as locked nucleic acid (LNA), 2′-(S)-constrained ethyl (S-cEt), constrained MOE, or 2′-0,4′-C-aminomethylene bridged nucleic acid (2′,4′-BNANC).
The term “4′-thio” as used herein corresponds to a ribose group modification where the sugar ring oxygen of the ribose is replaced with a sulfur.
Chemical modifications to the phosphate group may include, but are not limited to, a phosphorothioate, phosphonoacetate (PACE), thiophosphonoacetate (thioPACE), amide, triazole, phosphonate, or phosphotriester modification.
Chemical modifications to the nucleobase may include, but are not limited to, 2-thiouridine, 4-thiouridine, N6-methyladenosine, pseudouridine, 2,6-diaminopurine, inosine, thymidine, 5-methylcytosine, 5-substituted pyrimidine, isoguanine, isocytosine, or halogenated aromatic groups.
The chemically modified guide RNAs may have one or more chemical modifications in the crRNA portion and/or the tracrRNA portion for a modular or dual RNA guide. The chemically modified guide RNAs may also have one or more chemical modifications in the single guide RNA for the unimolecular guide RNA.
In certain embodiments, the chemically modified Nme2Cas9 gRNA described above further comprises a nucleotide or non-nucleotide loop or linker linking the 3′ end of the crRNA portion to the 5′ end of the tracrRNA portion.
In certain embodiments, the nucleotide loop is chemically modified. In certain embodiments, the nucleotide loop comprises the nucleotide sequence of GAAA. In certain embodiments, the nucleotide loop comprises the nucleotide sequence of (mG)(mA)(mA)(mA), wherein mN corresponds to a 2′-O-methyl RNA and N corresponds to any nucleotide.
In certain embodiments, the non-nucleotide linker comprises an azide linker, an ethylene glycol oligomer, a tetrazine linker, an alkyl chain, a peptide, an amide, or a carbamate (see, e.g., Pils et al. Nucleic Acids Res. 28(9): 1859-1863 (2000)).
In one aspect, the disclosure provides a chemically modified Neisseria meningitidis (Nme) single guide RNA (sgRNA) comprising one or more chemical modifications.
The activity of a guide RNA can be readily determined by any means known in the art. In an embodiment, % activity is measured with the traffic light reporter (TLR) Multi-Cas Variant 1 system (TLR-MCV1), described below. The TLR-MCV1 system will provide a % fluorescent cells which is a measure of % activity.
Nme2Cas9 gRNAs and sgRNAs are described in further detail in WO2023064813, incorporated herein by reference.
FCKVDKKGKNQYFIVPIYAWQVAENILPDIDCKGYRIDDSYTFC
FSLHKYDLIAFQKDEKSKVEFAYYINCDSSNGRFYLAWHDKGS
KEQQFRISTONLVLIQKYQVNELGKEIRPCRLKKRPPVR
(SEQ
YTKAGKNYLVPVYVWQVAQGILPNRAVTSGKSEADWDLIDESF
EFKFSLSRGDLVEMISNKGRIFGYYNGLDRANGSIGIREHDLEK
SKGKDGVHRVGVKTATAFNKYHVDPLGKEIHRCSSEPRPTLKI
KSKK
(SEQ ID NO: 2)
EVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGL
VMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNSKR
GAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPR
QVFNAQKKAQSSIN
X
nGSERPLTDTERATLMDEPYRKSKLTYAQA
EVPVGAVLVLNNRVIGEGWNRAIGLHDPTAHAEIMALRQGGL
VMQNYRLIDATLYVTFEPCVMCAGAMIHSRIGRVVFGVRNSKR
GAAGSLMNVLNYPGMNHRVEITEGILADECAALLCDFYRMPR
QVFNAQKKAQSSIN
X
nEFEEADTPEKLRTLLAEKLSSRPEAVHEYV
While several experimental Examples are contemplated, these Examples are intended to be non-limiting.
Nucleotide sequences of Nme2Cas9 and Nme2SmuCas9 base editors are provided in Table 1 and
In Vitro mRNA Synthesis
mRNAs used in this manuscript were in vitro transcribed as described by Zhang (see Zhang, H. et al. Adenine base editing in vivo with a single adeno-associated virus vector. GEN Biotechnol. 1, 285-299 (2022)), using the Hiscribe T7 RNA synthesis kit (NEB #E2040S). In brief, 500 ng of linearized plasmid template was used for the reaction, with complete substitution of uridine to 1-methylpseudouridine and CleanCap AG analog (N-1081 and N-7113, TriLink Biotechnologies).
Mouse N2A (ATCC #CCL-131), HEK293T (ATCC #CRL-3216) cells and their reporter-transduced derivatives were cultured in Dulbecco's Modified Eagle's Medium (DMEM; Genesee Scientific #25-500) supplemented with 10% fetal bovine serum (FBS; Gibco #26140079). All cells were incubated at 37° C. with 5% CO2. For plasmid transfections, cells were seeded in 96-well plates at−15,000 cells per well and incubated overnight. The following day, cells were transfected with plasmid DNA using Lipofectamine 2000 (ThermoFisher #11668019) following the manufacturer's protocol. For editing the mCherry reporter and endogenous target sites, 100 ng of effector plasmid and 100 ng of sgRNA plasmid was transfected with 0.75 μl Lipofectamine 2000. For the orthogonal R-loop assay, 125 ng of each effector and each sgRNA was used with 0.75 μl Lipofectamine 2000. For editing experiments with amplicon sequencing analysis, genomic DNA was extracted from cells 72 h post-transfection with QuickExtract (Lucigen #QE0905) following the manufacturer's protocol.
Rett syndrome PDFs were obtained from the Rett Syndrome Research Trust and cultured in Dulbecco's Modified Eagle's Medium (DMEM; Genesee Scientific #25-500) supplemented with 15% fetal bovine serum (Gibco #26140079) and 1× nonessential amino acids (Gibco #11140050). These cells were also incubated at 37° C. with 5% CO2. PDF electroporation's were performed using the Neon Transfection System 10 μl kit (ThermoFisher #MPK1096) as described by Zang (see Zhang, H. et al. Adenine base editing in vivo with a single adeno-associated virus vector. GEN Biotechnol. 1, 285-299 (2022)). A total of 500 ng ABE mRNA and 100 pmol sgRNA were electroporated into−50,000 PDF cells. 48 h post-electroporation, genomic DNA was extracted with QuickExtract (Lucigen #QE09050) for amplicon sequencing.
In total, 72 h post-transfection, cells were trypsinized, collected, and washed with FACS buffer (chilled PBS and 3% fetal bovine serum). Cells were resuspended in 300 μl FACS buffer for flow cytometry analysis using the MACSQuant VYB system. 10,000 cells per sample were counted for analysis with Flowjo v10.
Amplicon sequencing, library preparation, and analysis were performed as described by Zhang (see Zhang, H. et al. Adenine base editing in vivo with a single adeno-associated virus vector. GEN Biotechnol. 1, 285-299 (2022)). Briefly, Q5 High-Fidelity polymerase (NEB #M0492) was used to amplify genomic DNA for library preparation, and libraries were pooled and purified twice after gel extraction with the Zymo gel extraction kit and DNA Clean and Concentrator (Zymo Research #11-301 and #11-303). Pooled amplicons were then sequenced on an Illumina MiniSeq system (300 cycles, Illumina sequencing kit #FC-420-1004) following the manufacturer's protocol. Sequencing data was analyzed with CRISPResso2 (see Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224-226 (2019)) (version 2.0.40) in BE output batch mode with and the following flags:−w 12,−wc −12,−q 30.
A 200-member guide-target library was designed and ordered as an oligo pool from Twist Bioscience. The oligo pool was PCR-amplified according to the recommended Twist amplification protocol. The amplified pool was then cloned via Gibson assembly into p2Tol-U6-2×BbsI-sgRNA-HygR plasmid (Addgene, #71485) cut with XbaI and BbsI. The assembled product was column-purified and electroporated into 10-beta electrocompetent cells (NEB #C3020K) as described by Miller and Arbab (see Miller, S. M. et al. Continuous evolution of SpCas9 variants compatible with non-G PAMs. Nat Biotechnol 38, 471-481 (2020); and Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463-480.e30 (2020)) with the following adaptations. Following electroporation, the plasmid library was grown in an overnight liquid culture and isolated by miniprep plasmid purification. The number of transformants was assessed by serial dilution and counted colonies were above 200,000 for >1,000× library coverage.
Stable integration of the Tol2 guide-target library was achieved as described by Arbab (see Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463-480.e30 (2020)) with the following alterations.−6×106 HEK293T cells in a 10-cm plate were transfected with 30 μg plasmid DNA at a 1:1 molar ratio of Tol2 transposase plasmid to guide-target plasmid library using Lipofectamine 2000 (ThermoFisher #11668019) and following the manufacturer's protocol. 1 day post-transfection, culture media was supplemented with hygromycin [50 μg mL−1] for a minimum of 2 weeks before use in editing experiments. Library cells were maintained with over 200,000 cells for >1000× library coverage. The library cell line was transfected with ABE8e constructs that had been cloned into p2T-CMV-ABEmax-BlastR (Addgene, #152989) via Gibson assembly. For the transfections, cells were seeded with non-selective medium in 12-well plates at−200,000 cells per well and incubated overnight. The following day, cells were transfected with 1.6 μg of plasmid DNA using Lipofectamine 2000 (ThermoFisher #11668019) following the manufacturer's protocol. 1 day post-transfection, culture media was supplemented with Blasticidin S [10 μg mL-1]. After 3 days, genomic DNA was extracted from cells with QuickExtract (Lucigen #QE0905), column-purified and used for NGS library preparation.
NGS preparation and sequencing was done as described above with the following modifications. >1 μg of input DNA was used to ensure >500× library coverage (see Kim, H. K. et al. In vivo high-throughput profiling of CRISPR-Cpf1 activity. Nat Methods 14, 153-159 (2017)), pooled amplicons were sequenced on an Illumina NextSeq 2000 system (200 cycles, Illumina sequencing kit #20046812) following the manufacturer's protocol. Sequencing data were further processed and binned by matching spacers and their barcode sequences using a custom demultiplexing script. Sequencing data was analyzed with CRISPResso2 (version 2.0.40) in BE output batch mode. Guide-target library members with <40 reads were omitted from analysis in all samples.
AAV vector packaging was done at the Viral Vector Core of the Horae Gene Therapy Center at the UMass Chan Medical School as described by Zhang (see Zhang, H. et al. Adenine base editing in vivo with a single adeno-associated virus vector. GEN Biotechnol. 1, 285-299 (2022). Constructs were packaged in AAV9 capsids and viral titers were determined by digital droplet PCR and gel electrophoresis followed by silver staining.
All animal study protocols were approved by the Institutional Animal Care and Use Committee (IACUC) at UMass Chan Medical School. The 8-week-old C57BL/6 J mice (Jackson Laboratory, Stock No. 000664) were tail-vein injected with a dosage of 4×1011 vg per mouse (in 200 μl saline). Mice were euthanized at 6 weeks post injection and perfused with PBS. Livers were harvested and pulverized in liquid nitrogen, and 15 mg of the tissue from each mouse liver was used for genomic DNA extraction. Genomic DNA from mouse liver or striatum (see below) was extracted using GenElute Mammalian Genomic DNA Miniprep Kit (Millipore Sigma #G1N350). Three mice per group were used to determine in vivo editing efficiency.
8-15-week-old C57BL/6 J mice were weighed and anesthetized by intraperitoneal injection of a 0.1 mg/kg Fentanyl, 5 mg/kg Midazolam, and 0.25 mg/kg Dexmedetomidine mixture. Once pedal reflex ceased, mice were shaved and a total dose of 1×1010 vg of AAV was administered via bilateral intrastriatal injection (2 μL per side) performed at the following coordinates from bregma: +1.0 mm anterior-posterior (AP), ±2.0 mm mediolateral, and −3.0 mm dorsoventral. Once the injection was completed, mice were intraperitoneally injected with 0.5 mg/kg Flumazenil and 5.0 mg/kg Atipamezole and subcutaneously injected with 0.3 mg/kg Buprenorphine. Mice were euthanized at 6 weeks post-injection and perfused with PBS. Brains were harvested and biopsies at the striatum were taken for genomic DNA extraction.
Plasmids encoding C-terminal 6×-His(SEQ ID NO:42) tagged Nme2-ABE8e's were delivered with sgRNA into HEK293T cells via transient transfection as described above. Protein lysates were collected 72 h post-transfection by direct addition of 2× Laemmli sample buffer (BioRad #1610737EDU) followed by lysis at 95° C. for 10 min. Western blots were performed as described by Lee (see Lee, J. et al. Tissue-restricted genome editing in vivo specified by microRNA-repressible anti-CRISPR proteins. RNA 25, 1421-1431 (2019)). Primary mouse-anti-6×His(SEQ ID NO:42) (ThermoFisher #MA1-21315, 1:2000 dilution) was used for Nme2-ABE8e detection and rabbit-anti-LaminB1 (Abcam #AB16048, 1:10,000 dilution) was used for detection of the loading control. After incubation with secondary antibodies, goat-anti-mouse IRDye®800CW (LI-COR #925-32210, 1:20,000 dilution) and goat-anti-rabbit IRDye®680RD (LI-COR #926-68071, 1:20,000 dilution), blots were visualized using a BioRad imaging system.
Statistical analysis was performed using one- or two-way ANOVA using Dunnett's multiple comparisons test for correction in GraphPad Prism 9.4.0.
Nme2SmuCas9 effectors edit N4CN PAM targets, but with reduced activity. To improve PID Chimeric Nme2 activity, compensatory mutations were introduced via rational design and directed evolution. An Nme2SmuCas9 homology model was created using the SWISS-MODEL server. Negatively charged amino acids within 5-10 angstroms of nucleic acid phosphate backbone were selected for Arginine mutagenesis (
To test the efficacy of the novel Nme2SmuCas9 effectors, a modified, fluorescence-based Traffic Light Reporter (TLR2.0) was used (Certo et al., 2011). Briefly, a disrupted GFP is followed by an out-of-frame T2A peptide and mCherry cassette (
Activities of Nme2Smu-ABE8e-i1 and the target-strand (TS) and of Nme2Smu-ABE8e-i1, the single guide RNA (SG) and the non-target strand (NTS) interacting arginine mutants were tested in the mCherry ABE reporter cell line (activated upon A-to-G editing). Activities were measured by flow cytometry after plasmid transfection with an N4CC PAM targeting sgRNA plasmid and a base editor plasmid (n=2 biological replicates; data represent mean±SD). Nme2Smu-ABE8e-i1 variant comprising an arginine substitution at the following positions showed improved editing in the reporter assay: E520, D873, D418, E471, D442, and E844 in the Nme2Smu-ABE8e-i1 and the TS (
The top-performing arginine mutants (Nme2Smu-ABE8e-i1 variants comprising an arginine substitution at the following positions E932, D56, D873, D1048, E520R, E1079, D660, E887, E186, and T72Y) were further tested in the mCherry ABE reporter cell line (activated upon A-to-G editing) at N4CD PAM targets (
Characterization of activity of Nme2Smu-ABE8e variants are also presented in
The HEK293T TLR-MCV1 reporter encodes a broken GFP, followed by a an out of frame T2A and mCherry. DSBs within a specific region of the broken GFP can result in imprecise NHEJ repair events. In cases of a +1 frameshift, mCherry is expressed. Nme2Cas9 N4CN PAM target sites for mCherry Activation in the TLR-MCV1 reporter occurs via nuclease mediated NHEJ. Nme2Cas9 N4CN PAM has four target sites for mCherry activation in the TLR-MCV1 reporter via nuclease mediated NHEJ (
Activities of four Nme2SmuCas9 nuclease single mutants within the HEK293T TLR-MCV1 reporter were tested at N4CN PAM targets Nme2Cas9 vs. eNme2-C·NR (vliu) vs. Nme2SmuCas9 and Nme2SmuCas9. After parallel plasmid transfection with associated sgRNA plasmid and a nuclease editor plasmid, activities were measured by flow cytometry (n=2 biological replicates; data represent mean±SD). The mean activity of these variants at a single PAM target site were then calculated to compare their performance with Nme2Cas9 and Nme2SmuCas9 as references. About half of the variants performed better than the WT (
Next, to understand whether an improve in base editing activity also related to an improve in nuclease activity, the correlation between ABE and nuclease Nme2SmuCas9 effectors was measured. Indeed, the observed activity of the top performing Nme2Smu Arginine mutations correlate for nuclease and ABE editing when compared to Wild-Type Nme2SmuCas9 (nuclease) or Nme2Smu-ABE8e-i1 (ABE) in the reporter assays (
The activities of the nuclease variants were also tested for combination mutants within the HEK293T TLR-MCV1 reporter at N4CN PAM targets. Nme2Cas9, eNme2-C·NR (vliu), eNme2-C·NR (vEJS), and Nme2SmuCas9 and Nme2SmuCas9's nuclease activity was tested at N4CA, N4CC, and N4CG PAM targets (
Characterization of the activity and specificity of Nme2- and Nme2SmuCas9 nuclease variants are also presented in
A-to-G edits were performed at endogenous HEK293T genomic loci with Nme2-ABE81-i1 or Nme2smu-ABE8e-i1 constructs by plasmid transfection to test the adenine edits for each target. Maximum A-to-G editing rates (
The Nme2Cas9 all-in-one AAV delivery platform, can in principle, be used to target as wide a range of sites (
The editing windows of Nme2Smu-ABE-i1 (
The specificities of the domain-inlaid Nme2-ABEs were determined. Guide-dependent off-target editing is driven by Cas9 unwinding and R-loop formation at targets with high sequence similarity. Nme2-ABE8e-nt has a much lower propensity for guide-dependent off-target editing compared to Spy-ABE8e. Using the most active inlaid variant (Nme2-ABE8e-i1) as a prototype, guide-dependent specificity was examined using a series of double-mismatch guides targeting the mCherry reporter, with Spy-ABE8e and Nme2-ABE8e-nt used for comparison. In all cases, the target adenosine was at the eighth nt of the protospacer (
The specificity of domain-inlaid Nme2- and Nme2Smu-ABE8e's against their respective ABE8e-nt variants at bona fide endogenous off-target sites was then evaluated. Although Nme2Cas9 off-target sites are rare due to its intrinsic accuracy in mammalian genome editing, a few off-target sites have been identified for both nuclease and ABE variants via GUIDE-seq or in silico prediction. Four target sites for assessment were selected, of which three had been validated as detectably edited off-target sites (see Zhang, H. et al. Adenine base editing in vivo with a single adeno-associated virus vector. GEN Biotechnol. 1, 285-299 (2022); Edraki, A. et al. A compact, high-accuracy Cas9 with a dinucleotide PAM for in vivo genome editing. Molecular Cell 73, 714-726.e4 (2019); and Huang, T. P. et al. High-throughput continuous evolution of compact Cas9 variants targeting single-nucleotide-pyrimidine PAMs. Nat Biotechnol (2022)) (
Guide-independent off-target editing were then assessed. Similar to other domain-inlaid BE architectures, the internal positioning of the deaminase was expected to limit the propensity for off-target nucleic acid editing that occurs in trans. The orthogonal R-loop assay with HNH-nicking SauCas9 (nSauD10A) (see Richter, M. F. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat Biotechnol. (2020); Chu, S. H. et al. Rationally designed base editors for precise editing of the sickle cell disease mutation. The CRISPR Journal 4, 169˜177 (2021); and Doman, J. L., Raguram, A., Newby, G. A. & Liu, D. R. Evaluation and minimization of Cas9-independent off-target DNA editing by cytosine base editors. Nat Biotechnol 38, 620˜628 (2020)) was used to generate off-target R-loops and capture the guide-independent DNA editing mediated by Spy-ABE8e or the Nme2-ABE8e variants (−nt- and i1). The on- and off-target activity of these ABE8e effectors was evaluated by amplicon deep sequencing at the guide-targeted genomic site in addition to three SauCas9D10A_generated R-loops. Nme2-ABE8e-i1 was found to be less prone to editing the orthogonal R-loops compared to Nme2-ABE8e-nt and Spy-ABE8e (
A compact AAV design that enables all-in-one delivery of Nme2-ABE8e-nt with a sgRNA for in vivo base editing was used (see Zhang, H. et al. Adenine base editing in vivo with a single adeno-associated virus vector. GEN Biotechnol. 1, 285˜299 (2022)). At 4996 bp, the cassettes harboring the domain-inlaid Nme2-ABE8e variants and a guide RNA are also within the packaging limit of some single AAV vectors, allowing to test whether they outperform Nme2-ABE8e-nt in an in vivo setting. For the in vivo assay, AAV genomes containing Nme2-ABE8e-nt, Nme2-ABE8e-i1 or Nme2-ABE8evio6w-i1 with an sgRNA targeting the Rosa26 locus were designed (
Tow in vivo editing experiments with 9-week-old mice were conducted. First, systemic [intravenous (i.v.)] injection and editing in the liver as assessed, whereas the second experiment tested editing in the brain after intrastriatal injection. In both cases, mice were sacrificed 6 weeks after their respective injections and editing was quantified by amplicon sequencing. Within the liver, Nme2-ABE8e-i1 and Nme2-ABE-i1V106W had editing efficiencies of ˜49% (p=0.015) and ˜46% (p=0.04) respectively, outperforming Nme2-ABE8e-nt (editing efficiency ˜34% at A6 of the Rosa26 target site), (one-way ANOVA) (
Whether the boost in on-target activity in the liver was also accompanied by increased sgRNA-dependent off-target activity was then determined. The Rosa26 sgRNA used in this assay is unusual among Nme2Cas9 guides in having a previously validated off-target site (Rosa26-OT1). Amplicon sequencing at Rosa26-OT1 on genomic DNA extracted from the mouse livers was conducted. It was found that both Nme2-ABE8e-i1 and the V106W variant increased off-target A-to-G editing (up to ˜7% and ˜5% respectively) compared to Nme2-ABE8e-nt (˜0.2%) (
The contents of all cited references (including literature references, patents, patent applications, patent publications, and websites) that maybe cited throughout this application are hereby expressly incorporated by reference in their entirety for any purpose, as are the references cited therein. The disclosure will employ, unless otherwise indicated, conventional techniques of immunology, molecular biology and cell biology, which are well known in the art.
The present disclosure also incorporates by reference in their entirety techniques well known in the field of molecular biology and drug delivery. These techniques include, but are not limited to, techniques described in the following publications:
All publications and patents mentioned in the above specification are herein incorporated by reference. Various modifications and variations of the described methods and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention that are obvious to those skilled in biological control, biochemistry, molecular biology, entomology, plankton, fishery systems, and fresh water ecology, or related fields are intended to be within the scope of the following claims.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 63/465,582, filed May 11, 2023. The entire contents of the above-referenced patent application is incorporated by reference in their entirety herein.
This invention was made with government support under GM143879 awarded by the National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63465582 | May 2023 | US |