The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jun. 9, 2022, is named 132045-00401_SL.txt and is 903,580 bytes in size.
CRISPR (clustered regularly interspaced short palindromic repeats) is a family of DNA sequences found within the genomes of prokaryotic organisms such as bacteria and archaea. These sequences are understood to be derived from DNA fragments of bacteriophages that have previously infected the prokaryote, and are used to detect and destroy DNA or RNA from similar bacteriophages during subsequent infections of the prokaryotes.
CRISPR-associated system is a set of homologous genes, or Cas genes, some of which encode Cas protein having helicase and nuclease activities. The Cas proteins are enzymes that utilize RNA derived from the CRISPR sequences (crRNA) as guide sequences to recognize and cleave specific strands of polynucleotide (e.g., DNA) that are complementary to the crRNA.
Together, the CRISPR-Cas system constitutes a primitive prokaryotic “immune system” that confers resistance or acquired immunity to foreign pathogenic genetic elements, such as those present within extrachromosomal DNA (e.g., plasmids) and bacteriophages, or foreign RNA encoded by foreign DNA.
In nature, the CRISPR/Cas system appears to be a widespread prokaryotic defense mechanism against foreign genetic materials, and is found in approximately 50% of sequenced bacterial genomes and nearly 90% of sequenced archaea. This prokaryotic system has since been developed to form the basis of a technology known as CRISPR-Cas that found extensive use in numerous eukaryotic organisms including human, in a wide variety of applications including basic biological research, development of biotechnology products, and disease treatment.
The prokaryotic CRISPR-Cas systems comprise an extremely diverse group of effector proteins, non-coding elements, as well as loci architectures, some examples of which have been engineered and adapted to produce important biotechnologies.
The CRISPR locus structure has been studied in many systems. In these systems, the CRISPR array in the genomic DNA typically comprises an AT-rich leader sequence, followed by short DR sequences separated by unique spacer sequences. These CRISPR DR sequences typically range in size from 28 to 37 bps, though the range can be 23-55 bps. Some DR sequences show dyad symmetry, implying the formation of a secondary structure such as a stem-loop (“hairpin”) in the RNA, while others appear unstructured. The size of spacers in different CRISPR arrays is typically 28-38 bps (with a range of 21-72 bps). There are usually fewer than 50 units of the repeat-spacer sequence in a CRISPR array.
Small clusters of cas genes are often found next to such CRISPR repeat-spacer arrays. So far, the 93 identified cas genes have been grouped into 35 families, based on sequence similarity of their encoded proteins. Eleven of the 35 families form the so-called cas core, which includes the protein families Cas1 through Cas9. A complete CRISPR-Cas locus has at least one gene belonging to the cas core.
CRISPR-Cas systems can be broadly divided into two classes—Class 1 systems use a complex of multiple Cas proteins to degrade foreign nucleic acids, while Class 2 systems use a single large Cas protein for the same purpose. The single-subunit effector compositions of the Class 2 systems provide a simpler component set for engineering and application translation, and has thus far been important sources of discovery, engineering, and optimization of novel powerful programmable technologies for genome engineering and beyond.
Class 1 system is further divided into types I, III, and IV; and Class 2 system is divided into types II, V, and VI. These 6 system types are additionally divided into 19 subtypes. Classification is also based on the complement of cas genes that are present. Most CRISPR-Cas systems have a Cas1 protein. Many prokaryotes contain multiple CRISPR-Cas systems, suggesting that they are compatible and may share components.
One of the first and best characterized Cas proteins—Cas9—is a prototypical member of Class 2, type II, and originates from Streptococcus pyogenes (SpCas9). Cas9 is a DNA endonuclease activated by a small crRNA molecule that complements a target DNA sequence, and a separate trans-activating CRISPR RNA (tracrRNA). The crRNA consists of a direct repeat (DR) sequence responsible for protein binding to the crRNA and a spacer sequence, which may be engineered to be complementary to any desired nucleic acid target sequence. In this way, CRISPR systems can be programmed to target DNA or RNA targets by modifying the spacer sequence of the crRNA. The crRNA and tracrRNA have been fused to form a single guide RNA (sgRNA) for better practical utility. When combined with Cas9, sgRNA hybridizes with its target DNA, and guides Cas9 to cut the target DNA. Other Cas9 effector protein from other species have also been identified and used similarly, including Cas9 from the S. thermophilus CRISPR system. These CRISPR/Cas9 systems have been widely used in numerous eukaryotic organisms, including baker's yeast (Saccharomyces cerevisiae), the opportunistic pathogen Candida albicans, zebrafish (Danio rerio), fruit flies (Drosophila melanogaster), ants (Harpegnathos saltator and Ooceraea biroi), mosquitoes (Aedes aegypti), nematodes (Caenorhabditis elegans), plants, mice, monkeys, and human embryos.
Another recently characterized Cas effector protein is Cas12a (formerly known as Cpf1). Cas12a, together with C2c1 and C2c3, are members belonging to Class 2, type V Cas proteins that lack HNH nuclease, but have RuvC nuclease activity. Cas12a which was initially characterized in the CRISPR/Cpf1 system of the bacterium Francisella novicida. Its original name reflects the prevalence of its CRISPR-Cas subtype in the Prevotella and Francisella lineages. Cas12a showed several key differences from Cas9, including: causing a “staggered” cut in double stranded DNA as opposed to the “blunt” cut produced by Cas9, relying on a “T rich” PAM sequence (which provides alternative targeting sites to Cas9) and requiring only a CRISPR RNA (crRNA) and no tracrRNA for successful targeting. Cas12a's small crRNAs are better suited than Cas9 for multiplexed genome editing, as more of them can be packaged in one vector than can Cas9's sgRNAs. Further, the sticky 5′ overhangs left by Cas12a can be used for DNA assembly that is much more target-specific than traditional Restriction Enzyme cloning. Finally, Cas12a cleaves DNA 18-23 base pairs downstream from its PAM site, which means no disruption to the nuclease recognition sequence after DNA repair following the creation of double stranded break (DSB) by the NHEJ system, thus Cas12a enables multiple rounds of DNA cleavage, as opposed to the likely one round after Cas9 cleavage because the Cas9 cleavage sequence is only 3 base pairs upstream of the PAM site, and the NHEJ pathway typically results in indel mutations which destroy the recognition sequence, thereby preventing further rounds of cutting. In theory, repeated rounds of DNA cleavage is associated with an increased chance for the desired genomic editing to occur.
More recently, several Class 2, type VI Cas proteins, including Cas13 (also known as C2c2), Cas13b, Cas13c, Cas13d (including the engineered variant CasRx), Cas13e, and Cas13f have been identified, each is an RNA-guided RNase (i.e., these Cas proteins use their crRNA to recognize target RNA sequences, rather than target DNA sequences in Cas9 and Cas12a). Overall, the CRISPR/Cas13 systems can achieve higher RNA digestion efficiency compared to the traditional RNAi and CRISPRi technologies, while simultaneously exhibiting much less off-target cleavage compared to RNAi.
CRISPR-Cas13 is quickly becoming a widely adopted RNA editing technology. This system can use its sequence specific guide RNA to selectively modify (e.g., cut or cleave via endonuclease activity) a target RNA, such as mRNA. Compared to the permanent genomic changes introduced by DNA-based editing, RNA controls gene expression at the transcription level, thus providing a safer and more controllable gene therapy approach. Because of the high RNA editing efficiency of the CRISPR/Cas13 systems, they have already been widely used in a number of organisms including yeast, plant, mammal, and zebra fish (see (Abudayyeh et al., 2017; Aman et al., 2018; Cox et al., 2017; Jing et al., 2018; Konermann et al., 2018). An ortholog of CRISPR-Cas13d, CasRx, could mediate RNA knockdown in vivo and effectively alleviate disease phenotypes in various mouse models (He et al., Protein Cell 11:518-524, 2020; Zhou et al., Cell 181:590-603 e516, 2020; and Zhou et al., National Science Review 7:835-837, 2020).
One drawback from these currently identified Cas13 proteins, however, is that they all have non-specific/collateral RNase activity upon activation by crRNA-based target sequence recognition. This activity is particularly strong in Cas13a and Cas13b, and still detectably exists in Cas13d and, to a lesser extent, in Cas13e, for example. While this property can be advantageously used in nucleic acid detection methods, the non-specific/collateral RNase activity of these Cas13 proteins also causes undesirable collateral degradation of bystander RNAs, and has imposed a major barrier for their in vivo application, such as in gene therapy.
On the other hand, for practical utilities such as SHERLOCK that relies on collateral activity for sensitive detection, it can be beneficial to have mutant Cas13 effector enzymes that exhibit even higher collateral activity compared to wild-type Cas13.
Thus, there is a need to further optimize wild-type Cas13 in the art for different purposes, e.g., either to lower collateral cleavage activity with acceptable on-target cleavage activity for certain uses such as therapeutical applications, or to enhance/increase collateral cleavage activity with acceptable on-target cleavage activity for certain other uses such as diagnostic applications.
One aspect of the invention provides an engineered Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-Cas13 effector enzyme, wherein the engineered Cas13: (1) comprises a mutation in a region spatially close to an endonuclease catalytic domain (e.g., a HEPN domain) of the corresponding wild-type Cas13 effector enzyme; (2) substantially preserves (e.g., retains at least 50%, 60%, 70%, 72.5%, 75%, 80%, 85%, 87.5%, 90%, 95%, 96%, 97%, 97.5%, 98%, 99% or more of) guide sequence-specific endonuclease cleavage activity of the wild-type Cas13 (or theoretical maximum thereof) towards a target RNA complementary to the guide sequence; and, (3) substantially lacks (e.g., retains less than 50%, 40%, 35%, 30%, 27.5%, 25%, 22.5%, 20%, 17.5%, 15%, 12.5%, 10%, 7.5%, 5%, 4%, 3%, 2.5%, 2%, 1% or less of) guide sequence-independent collateral endonuclease cleavage activity of the wild-type Cas13 (or theoretical maximum thereof) towards a non-target RNA that does not bind to the guide sequence.
Another aspect of the invention provides an engineered Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-Cas13 effector enzyme, wherein the engineered Cas13: (1) comprises a mutation in a region spatially close to an endonuclease catalytic domain (e.g., a HEPN domain) of the corresponding wild-type Cas13 effector enzyme; (2) substantially preserves or has enhanced (e.g., retains at least 50%, 60%, 70%, 72.5%, 75%, 80%, 85%, 87.5%, 90%, 95%, 96%, 97%, 97.5%, 98%, 99%, 100%, 102%, 105%, 108%, 110% or more of) guide sequence-specific endonuclease cleavage activity of the wild-type Cas13 (or theoretical maximum thereof) towards a target RNA complementary to the guide sequence; and, (3) substantially enhances (e.g., has more than 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200% or more of) guide sequence-independent collateral endonuclease cleavage activity of the wild-type Cas13 towards a non-target RNA that does not bind to the guide sequence.
In certain embodiments, the Cas13 is a Cas13a, a Cas13b, a Cas13c, a Cas13d (including CasRx), a Cas13e, or a Cas13f.
In certain embodiments, the Cas13e has the amino acid sequence of SEQ ID NO: 4, and/or wherein the Cas13d has the amino acid sequence of SEQ ID NO: 101, and/or wherein the Cas13f has the amino acid sequence of SEQ ID NO: 52.
In certain embodiments, the region includes residues within 130, 125, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 amino acids from any residues of the endonuclease catalytic domain (e.g., an RXXXXH domain) in the primary sequence of the Cas13e, and residues within 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50,40, 30, 20, or 10 amino acids from any residues of the endonuclease catalytic domain (e.g., an RXXXXH domain) in the primary sequence of the Cas13d; or residues within 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 amino acids from any residues of the endonuclease catalytic domain (e.g., an RXXXXH domain) in the primary sequence of the Cas13f.
In certain embodiments, the region includes residues more than 100, 110, 120, or 130 residues away from any residues of the endonuclease catalytic domain in the primary sequence of the Cas13, but are spatially within 1-10 or 5 angstrom of a residue of the endonuclease catalytic domain.
In certain embodiments, the endonuclease catalytic domain is a HEPN domain, optionally a HEPN domain comprising an RXXXXH motif.
In certain embodiments, the RXXXXH motif comprises a R{N/H/K/Q/R}X1X2X3H sequence (SEQ ID NO: 1024).
In certain embodiments, in the R{N/H/K/Q/R}X1X2X3H sequence (SEQ ID NO: 1025), X1 is R, S, D, E, Q, N, G, or Y; X2 is I, S, T, V, or L; and X3 is L, F, N, Y, V, I, S, D, E, or A.
In certain embodiments, the RXXXXH motif is an N-terminal RXXXXH motif comprising an RNXXXH sequence, such as an RN{Y/F}{F/Y}SH sequence (SEQ ID NO: 64).
In certain embodiments, the N-terminal RXXXXH motif has a RNYFSH sequence (SEQ ID NO: 65).
In certain embodiments, the N-terminal RXXXXH motif has a RNFYSH sequence (SEQ ID NO: 66).
In certain embodiments, the RXXXXH motif is a C-terminal RXXXXH motif comprising an R{N/A/R}{A/K/S/F}{A/L/F}{F/H/L}H sequence (SEQ ID NO: 1026).
In certain embodiments, the C-terminal RXXXXH motif has a RN(A/K)ALH sequence (SEQ ID NO: 67).
In certain embodiments, the C-terminal RXXXXH motif has a RAFFHH (SEQ ID NO: 68) or RRAFFH sequence (SEQ ID NO: 69).
In certain embodiments, said region comprises, consists essentially of, or consists of: (i) residues corresponding to residues between residues 1-194, 2-187, 227-242, 620-775, or 634-755 of SEQ ID NO: 4; or, (ii) residues corresponding to the HEPN1-1 domain (e.g., residues 90-292), Helical2 domain (e.g., residues 536-690), and the HEPN2 domain (e.g., residues 690-967) of SEQ ID NO: 101; or, (iii) residues corresponding to the HEPN1 domain (e.g., residues 1-168), Helical1 domain, Helical2 domain (e.g., residues 346-477), and the HEPN2 domain (e.g., residues 644-790) of SEQ ID NO: 52.
In certain embodiments, said region comprises, consists essentially of, or consists of residues corresponding to residues between residues 35-51, 52-67, 156-171, 666-682, or 712-727 of SEQ ID NO: 4.
In certain embodiments, said mutation comprises, consists essentially of, or consists of substitutions, within a stretch of 15-20 consecutive amino acids within the region, (a) one or more charged, nitrogen-containing side chain group, bulky (such as F or Y), aliphatic, and/or polar residues to a charge-neutral short chain aliphatic residue (such as A, V, or I); (b) one or more I/L to A substitution(s); and/or (c) one or more A to V substitution(s).
In certain embodiments, said stretch is about 16 or 17 residues.
In certain embodiments, substantially all, except for up to 1, 2, or 3, charged and polar residues within the stretch are substituted.
In certain embodiments, a total of about 7, 8, 9, or 10 charged and polar residues within the stretch are substituted.
In certain embodiments, the N- and C-terminal 2 residues of the stretch are substituted to amino acids the coding sequences of which contain a restriction enzyme recognition sequence.
In certain embodiments, the N-terminal two residues are VF, and the C-terminal 2 residues are ED, and the restriction enzyme is BpiI.
In certain embodiments, the one or more charged or polar residues comprise N, Q, R, K, H, D, E, Y, S, and T residues.
In certain embodiments, the one or more charged or polar residues comprise R, K, H, N, Y, and/or Q residues.
In certain embodiments, one or more Y residue(s) within said stretch is substituted.
In certain embodiments, said one or more Y residues(s) correspond to Y672, Y676, and/or Y715 of wild-type Cas13e.1 (SEQ ID NO: 4).
In certain embodiments, said stretch is residues 35-51, 52-67, 156-171, 666-682, or 712-727 of SEQ ID NO: 4.
In certain embodiments, the mutation comprises Ala substitution(s) corresponding to any one or more of SEQ ID NOs: 37-39, 45, and 48.
In certain embodiments, the charge-neutral short chain aliphatic residue is Ala (A).
In certain embodiments, said mutation with reduced collateral activity comprises, consists essentially of, or consists of: (a) substitutions within 1, 2, 3, 4, or 5 of said stretches of 15-20 consecutive amino acids within the region; (b) a mutation corresponds to a Cas13d mutation of Example 4 that retains at least about 75% of guide RNA-specific cleavage of wild-type Cas13d (such as SEQ ID NO: 101) (or theoretical maximum thereof), and exhibits less than about 27.5% collateral effect of wild-type Cas13d (such as SEQ ID NO: 101) (or theoretical maximum thereof); (c) a mutation corresponds to the N1V7, N2V7, N2V8 (cfCas13d), N3V7, or N15V4 mutation of Cas13d mutation; (d) a mutation corresponds to a Cas13d mutation of Example 4 that retains between about 25-75% of guide RNA-specific cleavage of wild-type Cas13d (such as SEQ ID NO: 101) (or theoretical maximum thereof), and exhibits less than about 27.5% collateral effect of wild-type Cas13d (such as SEQ ID NO: 101) (or theoretical maximum thereof); (e) a mutation corresponds to the N2V4, N2V5, N4V3, N6V3, N10V6, N15V2, N20V6, or N20-Y910A mutation of Cas13d mutation; (f) a mutation corresponds to a Cas13e mutation of Example 1, 2, or 5 that retains at least about 75% of guide RNA-specific cleavage of wild-type Cas13e (such as SEQ ID NO: 4) (or theoretical maximum thereof), and exhibits less than about 25% collateral effect of wild-type Cas13e (such as SEQ ID NO: 4) (or theoretical maximum thereof); (g) a mutation corresponds to the M1V4, M2V2, M2V3, M2V4, M5V1, M6V2, M6V3, M6V4, M7V1, M7V2, M7V3, M7-Y55A, M7-Y61A, M11V1, M12V3, M15V1, M15V2, M15-Y643A, M15-Y647A, M16V1, M16V2, M17V2, M18V2, M18V3, M19V2, M19V3, or M19-IA mutation of Cas13e mutation; (h) a mutation corresponds to a Cas13e mutation of Example 5 that retains between about 25-75% of guide RNA-specific cleavage of wild-type Cas13e (such as SEQ ID NO: 4) (or theoretical maximum thereof), and exhibits less than about 25% collateral effect of wild-type Cas13e (such as SEQ ID NO: 4) (or theoretical maximum thereof); (i) a mutation corresponds to the M17YY (cfCas13e), M8V4, M9V1, M11V2, M11V3, M13V1, M13V2, M13V3, M15V3, or M20V2 mutation of Cas13e mutation; (j) a mutation corresponds to a Cas13f mutation (e.g., that of Example 12) that retains at least about 75% of guide RNA-specific cleavage of wild-type Cas13f (such as SEQ ID NO: 52) (or theoretical maximum thereof), and exhibits less than about 25 or 27.5% collateral effect of wild-type Cas13f (such as SEQ ID NO: 52) (or theoretical maximum thereof); (k) a mutation corresponds to the F7V2, F10V1, F10V4, F40V2, F40V4, F44V2, F10S19, F10S21, F10S24, F10S26, F10S27, F10S33, F10S34, F10S35, F10S36, F10S45, F10S46, F10S48, F10S49, F40S22, F40S23, F40S26, F40S27, OR F40S36 mutation of Cas13f mutation; (1) a mutation corresponds to a Cas13f mutation (e.g., that of Example 12) that retains between about 50-75% of guide RNA-specific cleavage of wild-type Cas13f (such as SEQ ID NO: 52) (or theoretical maximum thereof), and exhibits less than about 25 or 27.5% collateral effect of wild-type Cas13f (such as SEQ ID NO: 52) (or theoretical maximum thereof); and/or (m) a mutation corresponds to the F2V4, F3V1, F3V3, F3V4, F5V2, F5V3, F6V4, F7V1, F38V4, F40V1, F41V1, F41V3, F42V4, F43V1, F10S2, F10S11, F10S12, F10S18, F10S20, F10S23, F10S25, F10S28, F10S43, F10S44, F10S47, F10S50, F10S51, F10S52, F40S7, F40S9, F40S11, F40S21, F40S22, F40S24, F40S28, F40S29, F40S30, F40S35, OR F40S37 mutation of Cas13f mutation.
In certain embodiments, the mutation with enhanced collateral activity comprises, consists essentially of, or consists of: (a) substitutions within 1, 2, 3, 4, or 5 of said stretches of 15-20 consecutive amino acids within the region; (b) a mutation corresponds to a Cas13d mutation (e.g., that of Example 4) that retains at least about 75% of guide RNA-specific cleavage of wild-type Cas13d (such as SEQ ID NO: 101) (or theoretical maximum thereof), and exhibits more than about 110%, 115%, 120%, 125%, 130%, 135%, 140%, 145%, 150%, 155%, 160%, 165%, 170%, 175%, 180% or more collateral effect of wild-type Cas13d (such as SEQ ID NO: 101); (c) a mutation corresponds to the N2-Y142A, N4-Y193A, N12-Y604A, N21V7 mutation of Cas13d mutation in Example 4; (d) a mutation corresponds to a Cas13e mutation (e.g., that of Example 5) that retains at least about 75% of guide RNA-specific cleavage of wild-type Cas13e (such as SEQ ID NO: 4) (or theoretical maximum thereof), and exhibits more than about 110%, 115%, 120%, 125%, 130%, 135%, 140%, 145%, 150%, 155%, 160%, 165%, 170%, 175%, 180% or more collateral effect of wild-type Cas13e (such as SEQ ID NO: 4); (e) a mutation corresponds to the M4V2, M4V3, M4V4, M8V1, M8V2, M9V2, M9V3, M10V1, M10V2, M11V4, M12V2, M14V1, M14V2, M16V3, M18V1, M19-G712A, M19-C727A, M19T725A, or M21V2 mutation of Cas13e mutation; (f) a mutation corresponds to a Cas13f mutation (e.g., that of Example 12) that retains at least about 75% of guide RNA-specific cleavage of wild-type Cas13f (such as SEQ ID NO: 52) (or theoretical maximum thereof), and exhibits more than about 110%, 115%, 120%, 125%, 130%, 135%, 140%, 145%, 150%, 155%, 160%, 165%, 170%, 175%, 180% or more collateral effect of wild-type Cas13f (such as SEQ ID NO: 52); (g) a mutation corresponds to the F38V2, F42V1, F46V3, F38S2, F38S4, F38S5, F38S6, F38S7, F38S8, F38S9, F38S10, F38S11, F38S12, F38S13, F38S15, F38S16, F38S17, F40S1, F40S2, F40S3, F40S4, F40S5, F40S6, F40S8, F40S16, F40S18, F46S1, F46S4, F46S6, F46S7, F46S10, F46S14, F46S15, F10S4, F10S5, F10S6, F10S9, F10S10, F10S7, F38S1, F38S13, or F46S2 mutation of Cas13f mutation (e.g., that of Example 12).
In certain embodiments, the engineered Cas13 preserves at least about 50%, 60%, 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the guide sequence-specific endonuclease cleavage activity of the wild-type Cas13 (or theoretical maximum thereof) towards the target RNA.
In certain embodiments, the engineered Cas13 lacks at least about 70%, 72.5%, 75%, 77.5%, 80%, 82.5%, 85%, 87.5%, 90%, 92.5%, 95%, 96%, 97%, 98%, 99%, or 100% of the guide sequence-independent collateral endonuclease cleavage activity of the wild-type Cas13 (or theoretical maximum thereof) towards the non-target RNA.
In certain embodiments, the engineered Cas13 preserves at least about 80-90% of the guide sequence-specific endonuclease cleavage activity of the wild-type Cas13 (or theoretical maximum thereof) towards the target RNA, and lacks at least about 95-100% of the guide sequence-independent collateral endonuclease cleavage activity of the wild-type Cas13 (or theoretical maximum thereof) towards the non-target RNA.
In certain embodiments, the engineered Cas13 of the invention has an amino acid sequence at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.86% identical to any one of SEQ ID NOs: 6-10 and Cas13d (e.g., SEQ ID NO: 101), excluding any one or more of the regions defined by SEQ ID NOs: 16, 20, 24, 28, and 32, and any of the mutation regions in Example 4 or 5.
In certain embodiments, said amino acid sequence contains up to 1, 2, 3, 4, or 5 differences (a) in each of one or more regions defined by SEQ ID NO: 16, 20, 24, 28, and 32, as compared to SEQ ID NOs: 17, 21, 25, 29, and 33, respectively, or (b) in any of the desired mutations in Cas13d and Cas13e disclosed herein.
In certain embodiments, the engineered Cas13 of the invention has the amino acid sequence of any one of SEQ ID NOs: 6-10.
In certain embodiments, the engineered Cas13 of the invention has the amino acid sequence of SEQ ID NO: 9 or 10.
In certain embodiments, the engineered Cas13 of the invention further comprises a nuclear localization signal (NLS) sequence or a nuclear export signal (NES).
In certain embodiments, the engineered Cas13 comprises an N- and/or a C-terminal NLS.
Another aspect of the invention provides a polynucleotide encoding the engineered Cas13 of the invention.
In certain embodiments, the polynucleotide of the invention is codon-optimized for expression in a eukaryote, a mammal, such as a human or a non-human mammal, a plant, an insect, a bird, a reptile, a rodent (e.g., mouse, rat), a fish, a worm/nematode, or a yeast.
Another aspect of the invention provides A polynucleotide having (i) one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10) nucleotides additions, deletions, or substitutions compared to the polynucleotide of the invention; (ii) at least 50%, 60%, 70%, 80%, 90%, 95%, or 97% sequence identity to the polynucleotide of the invention; (iii) hybridize under stringent conditions with the polynucleotide of the invention, or any of (i) and (ii); or (iv) is a complement of any of (i)-(iii).
Another aspect of the invention provides a vector comprising the polynucleotide of the invention.
In certain embodiments, the polynucleotide is operably linked to a promoter and optionally an enhancer.
In certain embodiments, the promoter is a constitutive promoter, an inducible promoter, a ubiquitous promoter, or a tissue specific promoter.
In certain embodiments, the vector is a plasmid.
In certain embodiments, the vector is a retroviral vector, a phage vector, an adenoviral vector, a herpes simplex viral (HSV) vector, an AAV vector, or a lentiviral vector.
In certain embodiments, the AAV vector is a recombinant AAV vector of the serotype AAV1, AAV2, AAV4, AAV5, AAV6, AAV7, AAVrh74, AAV8, AAV9, AAV10, AAV 11, AAV 12, or AAV 13.
Another aspect of the invention provides a delivery system comprising (1) a delivery vehicle, and (2) the engineered Cas13 of the invention, the polynucleotide of the invention, or the vector of the invention.
In certain embodiments, the delivery vehicle is a nanoparticle, a liposome, an exosome, a microvesicle, or a gene-gun.
Another aspect of the invention provides a cell or a progeny thereof, comprising the engineered Cas13 of the invention, the polynucleotide of the invention, or the vector of the invention.
In certain embodiments, the cell or progeny thereof is a eukaryotic cell (e.g., a non-human mammalian cell, a human cell, or a plant cell) or a prokaryotic cell (e.g., a bacteria cell).
Another aspect of the invention provides a non-human multicellular eukaryote comprising the cell of the invention.
In certain embodiments, the non-human multicellular eukaryote is an animal (e.g., rodent or primate) model for a human genetic disorder.
Another aspect of the invention provides a method of modifying a target RNA, the method comprising contacting the target RNA with a CRISPR-Cas13 complex comprising the engineered Cas13 of the invention, and a spacer sequence complementary to at least 15 nucleotides of the target RNA; wherein upon binding of the complex to the target RNA through the spacer sequence, engineered Cas13 modifies the target RNA.
In certain embodiments, the target RNA is modified by cleavage by the engineered Cas13.
In certain embodiments, the target RNA is an mRNA, a tRNA, an rRNA, a non-coding RNA, an lncRNA, or a nuclear RNA.
In certain embodiments, upon binding of the complex to the target RNA, the engineered Cas13 does not exhibit substantial (or detectable) collateral RNase activity.
In certain embodiments, the target RNA is within a cell.
In certain embodiments, the cell is a cancer cell.
In certain embodiments, the cell is infected with an infectious agent.
In certain embodiments, the infectious agent is a virus, a prion, a protozoan, a fungus, or a parasite.
In certain embodiments, the cell is a neuronal cell (e.g., astrocyte, glial cell (e.g., Muller glia cell, oligodendrocyte, ependymal cell, Schwan cell, NG2 cell, or satellite cell)).
In certain embodiments, the CRISPR-Cas13 complex is encoded by a first polynucleotide encoding the engineered Cas13 of the invention, and a second polynucleotide comprising or encoding a spacer RNA capable of binding to the target RNA, wherein the first and the second polynucleotides are introduced into the cell.
In certain embodiments, the first and the second polynucleotides are introduced into the cell by the same vector.
In certain embodiments, the method causes one or more of: (i) in vitro or in vivo induction of cellular senescence; (ii) in vitro or in vivo cell cycle arrest; (iii) in vitro or in vivo cell growth inhibition and/or cell growth inhibition; (iv) in vitro or in vitro induction of anergy; (v) in vitro or in vitro induction of apoptosis; and (vi) in vitro or in vitro induction of necrosis.
Another aspect of the invention provides a method of treating a condition or disease in a subject in need thereof, the method comprising administering to the subject a composition comprising a CRISPR-Cas complex comprising the engineered Cas13 of the invention or a polynucleotide encoding the same; and a spacer sequence complementary to at least 15 nucleotides of a target RNA associated with the condition or disease; wherein upon binding of the complex to the target RNA through the spacer sequence, the engineered Cas13 cleaves the target RNA, thereby treating the condition or disease in the subject.
In certain embodiments, the condition or disease is a neurological condition, a cancer or an infectious disease.
In certain embodiments, the cancer is Wilms' tumor, Ewing sarcoma, a neuroendocrine tumor, a glioblastoma, a neuroblastoma, a melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, renal cancer, pancreatic cancer, lung cancer, biliary cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid carcinoma, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphoblastic leukemia, acute myelogenous leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma, or urinary bladder cancer.
In certain embodiments, the neurological condition is glaucoma, age-related RGC loss, optic nerve injury, retinal ischemia, Leber's hereditary optic neuropathy, a neurological condition associated with degeneration of RGC neurons, a neurological condition associated with degeneration of functional neurons in the striatum of a subject in need thereof, Parkinson's disease, Alzheimer's disease, Huntington's disease, Schizophrenia, depression, drug addiction, movement disorder such as chorea, choreoathetosis, and dyskinesias, bipolar disorder, Autism spectrum disorder (ASD), or dysfunction.
In certain embodiments, the method is an in vitro method, an in vivo method, or an ex vivo method.
Another aspect of the invention provides A CRISPR-Cas complex comprising the engineered Cas13 of the invention, a guide RNA comprising a DR sequence that binds the engineered Cas13 and a spacer sequence designed to be complementary to and binds a target RNA.
In certain embodiments, the target RNA is encoded by a eukaryotic DNA.
In certain embodiments, the eukaryotic DNA is a non-human mammalian DNA, a non-human primate DNA, a human DNA, a plant DNA, an insect DNA, a bird DNA, a reptile DNA, a rodent DNA, a fish DNA, a worm/nematode DNA, a yeast DNA.
In certain embodiments, the target RNA is an mRNA.
In certain embodiments, the CRISPR-Cas complex further comprises a target RNA comprising a sequence capable of hybridizing to the spacer sequence.
Another aspect of the invention provides a method of identifying an engineered CRISPR/Cas effector enzyme of a corresponding wild-type Cas effector enzyme, wherein the engineered Cas substantially maintains guide-sequence-specific endonuclease activity and substantially lacks guide-sequence-independent collateral endonuclease activity, the method comprising: (1) in each of one or more regions of 15-20 consecutive polynucleotides (a) within 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, or 180 residues of any residues of a endonuclease catalytic domain of the wild-type Cas effector enzyme or (b) spatially within 1-10 Ångström of any residues of the endonuclease catalytic domain of the wild-type Cas effector enzyme, substituting one or more (e.g., substantially all, except for up to 1, 2, 3, 4, or 5) polar and charged residues with a charge neutral aliphatic side-chain residue (such as A); and, (2) identifying engineered Cas substantially maintains guide-sequence-specific endonuclease activity and substantially lacks guide-sequence-independent collateral endonuclease activity compared to the corresponding wild-type Cas.
In certain embodiments, the wild-type Cas effector enzyme is a Cas13.
In certain embodiments, the Cas13 is a Cas13a, a Cas13b, a Cas13c, a Cas13d (e.g., CasRx), a Cas13e, or a Cas13f.
In certain embodiments, the Cas13e has the amino acid sequence of SEQ ID NO: 4; or wherein the Cas13d has the amino acid sequence of SEQ ID NO: 101; or wherein the Cas13f has the amino acid sequence of SEQ ID NO: 52.
Another aspect of the invention provides a method of identifying an engineered Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-Cas13 effector enzyme with altered guide sequence-independent collateral nuclease activity, the method comprising: in a region spatially close to an endonuclease catalytic domain of the corresponding wild-type Cas13 effector enzyme, substituting one or more charged or polar residues to a charge-neutral short chain aliphatic residue (such as A), to determine whether the resulting variant Cas13 effector enzyme: (1) has substantially preserved guide sequence-specific endonuclease cleavage activity of the wild-type Cas13 (or theoretical maximum thereof) towards a target RNA complementary to the guide sequence; and, (2) either substantially lacks or has enhanced guide sequence-independent collateral endonuclease cleavage activity of the wild-type Cas13 (or theoretical maximum thereof) towards a non-target RNA that does not bind to the guide sequence, thereby identifying said engineered Cas13 effector enzyme with altered guide sequence-independent collateral nuclease activity.
In certain embodiments, the engineered Cas13 effector enzyme substantially lacks guide sequence-independent collateral nuclease activity.
In certain embodiments, the engineered Cas13 effector enzyme has enhanced guide sequence-independent collateral nuclease activity.
In certain embodiments, said one or more charged or polar residues are within a stretch of 15-20 (e.g., 16 or 17) consecutive amino acids within the region.
In certain embodiments, said one or more charged or polar residues comprise, consist essentially of, or consist of one or more (or all) Tyr (Y) residue(s) within the stretch.
It should be understood that any one embodiment of the invention described herein, including those described only in the examples or claims, or only in one aspects/sections below, can be combined with any other one or more embodiments of the invention, unless explicitly disclaimed or improper.
A broad range of CRISPR-Cas systems has been discovered, and a classification system and a common nomenclature have been established for the associated Cas genes. Under such classification system, the CRISPR-Cas systems and the associated effector enzymes belong to two classes—Class 1 and Class 2—each further divided into three types and numerous subtypes based on their signature Cas genes. The Class 1 systems encompass types I, III, and IV systems, utilizing multisubunit RNA-Protein (RNP) complexes. The Class 2 systems encompass types II, V, and VI systems, utilizing single protein RNP complexes.
Cas9 is a Class 2, type II effector enzyme, while the recently discovered Cas13 enzymes, including Cas13a, Cas13b, Cas13c, Cas13d (including the engineered variant CasRx), Cas13e, and Cas13f are Class 2, type VI effector enzymes. Unlike any other CRISPR-Cas systems, Class 2 type VI effector proteins have been demonstrated to exclusively cleave RNA targets. Such Class 2 type VI effector enzymes have two distinct active sites, both conferring RNase activity: one involved in pre-crRNA processing, the other involved in target RNA degradation.
Several subtypes of Class 2 type VI exist, including at least subtype VI-A (Cas13a/C2c2), VI-B (Cas13b1 and Cas13b2), VI-C (Cas13c), VI-D (Cas13d, CasRx), VI-E (Cas13e), and VI-F (Cas13f). The Cas13 subtypes generally share very low sequence identity/similarity, but can all be classified as type VI Cas proteins (e.g., generally referred to herein as “Cas13”) based on the presence of two conserved HEPN-like RNase domains. See
The Cas13 CRISPR locus is initially transcribed into a long pre-crRNA transcript. The Cas13 proteins then cleave the pre-crRNA at fixed positions upstream of the stem-loop structure formed by the palindromic nature of the direct repeat (DR) sequences. Pre-crRNA processing in type VI involves metal-independent cleavages upstream of the stem-loop, and does not require a trans-activating crRNA (tracrRNA) or other host factors. The mature crRNA, which comprises a DR sequence and a guide sequence complementary to a target RNA, assembles with the Cas13 proteins to form a functional RNP complex, which then scans transcripts for the complementary RNA target. Once such RNA target is found and bound by the guide sequence, the RNA target is degraded by the Cas13 endonuclease.
The Cas13 effector enzymes display unprecedented sensitivity to recognize specific target RNAs within a heterogeneous population of non-target RNAs. It has been reported that Cas13 can detect target RNAs with femtomolar sensitivity. Thus on the one hand, the Class 2 type VI enzymes or Cas13 offer tremendous opportunity to knock down target gene products (e.g., mRNA) for gene therapy, yet on the other hand, such use is inherently limited by the co-called collateral activity that poses significant risk of cytotoxicity.
Specifically, in Class 2 type VI systems, a guide sequence non-specific RNA cleavage, referred to as “collateral activity,” is conferred by the higher eukaryotes and prokaryotes nucleotide-binding (HEPN) domain in Cas13 after target RNA binding. Binding of its cognate target ssRNA complementary to the bound crRNA causes substantial conformational changes in Cas13 effector enzyme, leading to the formation of a single, composite catalytic site for guide-sequence independent “collateral” RNA cleavage, thus converting Cas13 into a sequence non-specific ribonuclease. This newly formed highly accessible active site would not only degrade the target RNA in cis if the target RNA is sufficiently long to reach this new active site, but also degrade non-target RNAs in trans based on this promiscuous RNase activity.
Most RNAs appear to be vulnerable to this promiscuous RNAse activity of Cas13, and most (if not all) Cas13 effector enzymes possess this collateral endonuclease activity. It has been shown recently that the collateral effects by Cas13-mediated knockdown exist in mammalian cells and animals (manuscript submitted), suggesting that clinical application of Cas13-mediated target RNA knock down will face significant challenge in the presence of collateral effect.
The existence of substantial collateral effects of Cas13-mediated RNA knockdown has been demonstrated using a dual-fluorescent reporter system of the invention as described herein. Such collateral effects have been observed for both exogenous and endogenous genes in mammalian cells. In particular, wild-type Cas13d with this collateral effect was found to induce transcriptome-wide off-target editing and cell growth arrest.
Thus, in order to use the Cas13 enzymes for specifically knocking down a target RNA in gene therapy, it is evident that this guide-sequence non-specific collateral activity must be tightly controlled to prevent unwanted spontaneous cellular toxicity. Through unclear mechanism, subtype VI-B systems include a natural means to regulate the collateral activity of Cas13b via the type VI-associated genes csx27 and csx28, but such natural regulatory mechanism appears to be unique to subtype VI-B, as similar mechanism does not seem to exist in other subtypes such as type VI-A and VI-C.
Using this same reporter system of the invention, about 200 Cas13d and Cas13e variants obtained by structure-guided mutagenesis were screened. It was found that several variants with 2-4 mutations on the Higher Eukaryotes and Prokaryotes Nucleotide-binding (HEPN) domains retained undiminished on-target activity, but greatly reduced collateral effects. For the Cas13d variant with diminished collateral effect, the transcriptome-wide off-target editing and cell growth arrest observed in wild-type Cas13d were eliminated.
Interestingly, it was found that the majority of variants exhibited either low dual cleavage activity, or high on-target cleavage activity but low collateral cleavage activity. However, there is almost no variants showing low on-target cleavage activity but high collateral cleavage activity. These results suggest a distinct binding mechanism between on-target and collateral cleavage activity.
While not wishing to be bound by any particular theory, Applicant believes the following model of target (e.g., gRNA-specific) and collateral cleavage activity aids the rationale design of collateral effect-free variants of the Cas13 effector enzymes. Specifically, as shown in
Thus, the invention described herein provides engineered high-fidelity Class 2 type VI or Cas13 (e.g., Cas13d, Cas13e, and Cas13f) effector enzyme variants with minimal residual collateral effects. These variants are useful, for example, in targeting degradation of RNAs in basic research and therapeutic applications.
On the other hand, multiple low-fidelity Cas13 variants exhibiting increased dual cleavage activity were identified. Such variants have utility for better nucleic acid detection application (such as those used in the SHERLOCK assay).
Specifically, in one aspect, the invention provides engineered Class 2 type VI or Cas13 (e.g., Cas13d, e, or f) effector enzymes that largely maintain their sequence-specific endonuclease activity against a target RNA, yet with diminished if not eliminated non-guide sequence-specific endonuclease activity against non-target RNAs. Such engineered Cas13 effector enzymes that substantially lack collateral effect pave the way for using Cas13 in target RNA-knock down-based utility, such as gene therapy. Such engineered Cas13 effector enzymes that substantially lack collateral effect are also useful for RNA-base editing, because a nuclease dead version (or “dCas13”) of such engineered Cas13 also has reduced off-target effect, which is still present in dCas13 without the mutations in the subject engineered Cas13.
While not wishing to be bound by any particular theory,
According to this model, off-target effect in RNA-base editing using a nuclease-deficient (dCas13) version of the engineered Cas13 can also be reduced or eliminated, because the loss of non-specific RNA binding in the engineered dCas13 reduced/eliminates unintended RNA based editing due to the proximity of the RNA base editing domain (e.g., ADAR or CDAR) and an off-target RNA substrate.
In a related aspect, the invention also provides engineered Class 2 type VI or Cas13 (e.g., Cas13d, Cas13e, or Cas13f) effector enzymes that largely maintain their sequence-specific endonuclease activity against a target RNA, yet with enhanced non-guide sequence-specific endonuclease activity against non-target RNAs compared to the corresponding wild-type Cas13. Such engineered Cas13 with enhanced collateral effect provides a better (e.g., more sensitive) variant, compared to the wild-type, in nucleic acid detection assays such as SHERLOCK, which takes advantage of the collateral activity to provide an extreme sensitive assay for detecting very small quantities of a guide sequence-specific target RNA in a sample, with or without pre-amplification of the initial nucleic acids in the sample.
More specifically, one aspect of the invention provides an engineered Class 2 type VI Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-Cas effector enzyme, such as Cas13 (e.g., Cas13d, Cas13e, or Cas13f) wherein the engineered Class 2 type VI Cas effector enzyme: (1) comprises a mutation in a region spatially close to an endonuclease catalytic domain of the corresponding wild-type effector enzyme; (2) substantially preserves guide sequence-specific endonuclease cleavage activity of the wild-type effector enzyme (or theoretical maximum thereof) towards a target RNA complementary to the guide sequence; and, (3) either substantially lacks or has enhanced guide sequence-independent collateral endonuclease cleavage activity of the wild-type effector enzyme (or theoretical maximum thereof) towards a non-target RNA that is substantially not complement to/does not bind to the guide sequence.
In certain embodiments, the guide sequence-specific endonuclease cleavage activity and the guide sequence-independent collateral endonuclease cleavage activity can both be measured as compared to the corresponding wild-type Cas13 effector enzymes (such as mutant Cas13e vs. wild-type Cas13e from which the mutant derives from), as normalized against a corresponding nuclease-deficient Cas13 (such as dCas13e).
The nuclease-deficient Cas13 may be lack of catalytic domain, motif, or key catalytic residues such that it exhibits no appreciable or detectable level of guide sequence-dependent target RNA endonuclease cleavage activity, as well as guide sequence-independent collateral endonuclease cleavage activity. Thus in the due reporter system described herein, dCas13 typically has 100% remaining/baseline EGFP signal as an indication of no appreciable or detectable level of guide sequence-dependent target RNA endonuclease cleavage activity, and has 100% remaining/baseline mCherry signal as an indication of no appreciable or detectable level of guide sequence-independent collateral endonuclease cleavage activity. Meanwhile, wild-type Cas13 typically exibit strong guide sequence-dependent target RNA endonuclease cleavage activity (as reflected by nearly 80%, 90%, 95%, or close to 100% reduction of the dCas13 EGFP reference signal). The theoretical maximum of such guide sequence-dependent target RNA endonuclease cleavage activity is 100%, which is equivalent to complete elimination of all dCas13 EGFP reference signal.
Wild-type Cas13 also typically exhibit various levels of guide sequence-independent collateral endonuclease cleavage activity, leading to about 50%-70% reduction of the dCas13 mCherry reference signal. The theoretical maximum of such guide sequence-independent collateral endonuclease cleavage activity is 100%, which is equivalent to complete elimination of all dCas13 mCherry reference signal.
In certain embodiments, the engineered Cas13 effector enzyme of the invention exhibits reduced or diminished guide sequence-independent collateral endonuclease cleavage activity compared to the corresponding wild-type Cas13 (or theoretical maximum thereof) from which the engineered Cas13 derives. For example, the engineered Cas13 effector enzyme may substantially lack (e.g., retains less than 50%, 40%, 35%, 30%, 27.5%, 25%, 22.5%, 20%, 17.5%, 15%, 12.5%, 10%, 7.5%, 5%, 4%, 3%, 2.5%, 2%, 1% or less of) guide sequence-independent collateral endonuclease cleavage activity of the wild-type Cas13 towards a non-target RNA that does not bind to the guide sequence. For example, if the wild-type Cas13 eliminates about 70% (with the theoretical maximum being 100% elimination) of the dCas13 mCherry baseline signal due to collateral activity, and the mutant Cas13 with diminished collateral activity only eliminates about 10% of the dCas13 mCherry baseline signal due to remaining collateral activity, the mutant only exhibits or retains about 1/7 (or about 15%) of the wild-type collateral activity (or 10% of the theoretical maximum).
In certain embodiments, the engineered Cas13 effector enzyme of the invention exhibits increased or enhanced guide sequence-independent collateral endonuclease cleavage activity compared to the corresponding wild-type Cas13 from which the engineered Cas13 derives. For example, the engineered Cas13 effector enzyme may have substantially enhanced or increased (e.g., has more than 100%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200% or more of) guide sequence-independent collateral endonuclease cleavage activity of the wild-type Cas13 towards a non-target RNA that does not bind to the guide sequence. For example, if the wild-type Cas13 eliminates about 50% of the dCas13 mCherry baseline signal due to collateral activity, and the mutant Cas13 with enhanced collateral activity eliminates about 90% of the dCas13 mCherry baseline signal due to its enhanced collateral activity, the mutant exhibits about 90/50 (or about 180%) of the wild-type collateral activity.
In certain embodiments, the mutation occurs within a region, e.g., within one of two RNA binding domains at, near, or proximal to one of the HEPN-type catalytic domains, of a wild-type Cas13 (such as Cas13a, Cas13b, Cas13c, Cas13d, Cas13e, Cas13f etc). In certain embodiments, the mutation weakens (e.g., significantly weakens or eliminates) binding of the wild-type Cas13 to a non-specific RNA target (e.g., one not substantially complementary to a guide RNA), but substantially retains binding to a target RNA substantially complementary to the guide RNA. In certain embodiments, the mutation causes steric hindrance effects and/or change in charge, polarity, and/or size of the sidechain of the involved residues, leading to weakened interactions between activated Cas13 and promiscuous RNA, but not much (if any) effect between activated Cas13 and the on-target RNA.
As used herein, “Cas13” is a Class 2 type VI CRISPR-Cas effector enzyme that displays collateral activity as wild-type enzyme upon binding to a cognate target RNA complementary to a guide sequence of its crRNA. The collateral activity of a wild-type Class 2 type VI effector enzyme enables it to cleave RNase or endonuclease activity against a non-target RNA that does not or substantially does not complement with the guide sequence of the crRNA. The wild-type Class 2 type VI effector enzyme may also exhibit one or more of the following characteristics: having one or two conserved HEPN-like RNase domains, such as HEPN domains having the conserved RXXXXH motif (with X being any amino acid), e.g., the RXXXXH motifs described herein below; having a “clenched fist”-like structure when the Class 2 type VI effector enzyme (e.g., Cas13) binds a cognate crRNA; having a bi-lobed structure with a nuclease (NUC) lobe and a crRNA recognition (REC) lobe, optionally, the REC lobe has a variable N-terminal domain (NTD), followed by a helical domain (Helical-1), and/or optionally, the NUC lobe consists of the two HEPN domains (HEPN-1 and HEPN-2) separated by a linker domain (Helical-3), wherein the HEPN-1 domain is optionally split into two subdomains by another helical domain (Helical-2); processes pre-crRNA transcript into crRNA; does not require a trans-activating crRNA (tracrRNA) or other host factors for pre-crRNA processing; and exhibits femtomolar sensitivity to recognize guide sequence-specific target RNAs within a heterogeneous population of non-target RNAs.
In certain embodiments, the Class 2 type VI effector enzyme (e.g., Cas13) has one of the RXXXXN motifs in the HEPN-like domains located at or close to (e.g., within 50-160 residues, or within 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150 or 160 residues of) the N-terminus. In certain embodiments, the Class 2 type VI effector enzyme (e.g., Cas13) has one of the RXXXXN motifs in the HEPN-like domains located at or close to (e.g., within 50-160 residues, or within 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150 or 160 residues of) the C-terminus. In certain embodiments, the Class 2 type VI effector enzyme (e.g., Cas13) has one of the RXXXXN motifs of the HEPN-like domains located at or close to (e.g., within 50-160 residues, or within 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150 or 160 residues of) the N-terminus, while the other of the RXXXXN of the HEPN-like domains is located at or close to (e.g., within 50-160 residues, or within 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150 or 160 residues of) the C-terminus. An RXXXXN motif is “at or near” the N- or C-terminus, if either the R or the N residue of the RXXXXN motif is at or near the N- or C-terminus.
Based on biological and cellular experimental data, the engineered Class 2 type VI effector enzyme (e.g., Cas13 particularly Cas13e) effector enzymes have drastically reduced non-sequence-specific endonuclease activity against non-target RNAs, yet simultaneously exhibiting substantially the same if not higher sequence-specific endonuclease activity against a target RNA that substantially complements the guide sequence of the crRNA. The engineered effector enzymes enable high fidelity RNA targeting/editing.
In certain embodiments, the Class 2 type VI effector enzyme is Cas13a, Cas13b, Cas13c, Cas13d (including the engineered variant CasRx), Cas13e, or Cas13f, or an ortholog, paralog, homolog, natural or engineered variant thereof, or functional fragment thereof that substantially maintains the guide sequence-specific endonuclease activity.
In certain embodiments, the variant or functional fragment thereof maintains at least one function of the corresponding wild-type effector enzyme. Such functions include, but are not limited to, the ability to bind a guide RNA/crRNA of the invention (described herein below) to form a complex, the guide sequence-specific RNase activity, and the ability to bind to and cleave a target RNA at a specific site under the guidance of the crRNA that is at least partially complementary to the target RNA.
In certain embodiments, the Cas13 protein is a Cas13a protein. In some embodiments, the Cas13a protein is from a species of the genus Bacteroides, Blautia, Butyrivibrio, Carnobacterium, Chloroflexus, Clostridium, Demequina, Eubacterium, Herbinix, Insoliti spirillum, Lachnospiraceae, Leptotrichia, Listeria, Paludibacter, Porphyromonadaceae, Pseudobutyrivibrio, Rhodobacter, or Thalassospira. In certain embodiments, the Cas13a protein is from a species of Leptotrichia shahii, Listeria seeligeri, Lachnospiraceae bacterium (such as Lb MA2020, Lb NK4A179, Lb NK4A144), Clostridium aminophilum (such as Ca DSM 10710), Carnobacterium gallinarum (such as Cg DSM 4847), Paludibacter propionicigenes (such as Pp WB4), Listeria weihenstephanensis (such as Lw FSL R9-0317), Listeriaceae bacterium (such as Lb FSL M6-0635), Leptotrichia wadei (such as Lw F0279), Rhodobacter capsulatus (such as Rc SB 1003, Rc R121, Rc DE442), Leptotrichia buccalis (such as Lb C-1013-b), Herbinix hemicellulosilytica, Eubacteriaceae bacterium (such as Eb CHKCI004), Blautia. sp Marseille-P2398, Leptotrichia sp. oral taxon 879 str. F0557, Chloroflexus aggregans, Demequina aurantiaca, Thalassospira sp. TSLS-1, Pseudobutyrivibrio sp. OR37, Butyrivibrio sp. YAB3001, Leptotrichia sp. Marseille-P3007, Bacteroides ihuae, Porphyromonadaceae bacterium (such as Pb KH3CP3RA), Listeria riparia, or Insoliti spirillum peregrinum.
In certain embodiments, the Cas13a is any one of Cas13a disclosed in WO2020/028555 (incorporated herein by reference).
In some embodiments, the Cas13 protein is a Cas13b protein. In some embodiments, the Cas13b protein is from a species of the genus Alistipes, Bacteroides, Bacteroidetes, Bergeyella, Capnocytophaga, Chryseobacterium, Flavobacterium, Myroides, Paludibacter, Phaeodactylibacter, Porphyromonas, Prevotella, Psychroflexus, Reichenbachiella, Riemerella, or Sinomicrobium. In certain embodiments, the Cas13b protein is from a species Alistipes sp. ZOR0009, Bacteroides pyogenes (such as Bp F0041), Bacteroidetes bacterium (such as Bb GWA2319), Bergeyella zoohelcum (such as Bz ATCC 43767), Capnocytophaga canimorsus, Capnocytophaga cynodegmi, Chryseobacterium carnipullorum, Chryseobacterium jejuense, Chryseobacterium ureilyticum, Flavobacterium branchiophilum, Flavobacterium columnare, Flavobacterium sp. 316, Myroides odoratimimus (such as Mo CCUG 10230, Mo CCUG 12901, Mo CCUG 3837), Paludibacter propionicigenes, Phaeodactylibacter xiamenensis, Porphyromonas gingivalis (such as Pg F0185, Pg F0568, Pg JCVI SC001, Pg W4087, Porphyromonas gulae, Porphyromonas sp. COT-052 OH4946, Prevotella aurantiaca, Prevotella buccae (such as Pb ATCC 33574), Prevotella falsenii, Prevotella intermedia (such as Pi 17, Pi ZT), Prevotella pallens (such as Pp ATCC 700821), Prevotella pleuritidis, Prevotella saccharolytica (such as Ps F0055), Prevotella sp. MA2016, Prevotella sp. MSX73, Prevotella sp. P4-76, Prevotella sp. P5-119, Prevotella sp. P5-125, Prevotella sp. P5-60, Psychroflexus torquis, Reichenbachiella agariperforans, Riemerella anatipestifer, or Sinomicrobium oceani.
In certain embodiments, the Cas13b is any one of Cas13b disclosed in WO2020/028555 (incorporated herein by reference).
In some embodiments, the Cas13 protein is a Cas13c protein. In some embodiments, the Cas13c protein is from a species of the genus Fusobacterium or Anaerosalibacter. In certain embodiments, the Cas13c protein is from a species of Fusobacterium necrophorum (such as Fn subsp. funduliforme ATCC 51357, Fn DJ-2, Fn BFTR-1, Fn subsp. Funduliforme), Fusobacterium perfoetens (such as Fp ATCC 29250), Fusobacterium ulcerans (such as Fu ATCC 49185), or Anaerosalibacter sp. ND1.
In certain embodiments, the Cas13c is any one of Cas13c disclosed in WO2020/028555 (incorporated herein by reference).
In some embodiments, the Cas13 protein is a Cas13d protein. In some embodiments, the Cas13d protein is from a species of the genus Eubacterium or Ruminococcus. In certain embodiments, the Cas13d protein is from a species of Eubacterium siraeum, Ruminococcus flavefaciens (such as Rfx XPD3002), or Ruminococcus albus. In certain embodiments, Cas13d is CasRx. In certain embodiments, Cas13d has the amino acid sequence of SEQ ID NO: 101.
In certain embodiments, the Cas13d is any one of Cas13d disclosed in WO2020/028555 (incorporated herein by reference).
In some embodiments, the Cas13 protein is a Cas13e protein. In some embodiments, the Cas13e protein is from a species of the genus Planctomycetes. In certain embodiments, the Cas13e protein has an amino acid sequence of SEQ ID NO: 4, 50 or 51. The direct repeat (DR) sequences for the Cas13e of SEQ ID NOs: 50 and 51 are SEQ ID NOs: 57 and 58, respectively.
In some embodiments, the Cas13 protein is a Cas13f protein. In certain embodiments, the Cas13f protein has an amino acid sequence of any one of SEQ ID NOs: 52-56. The direct repeat (DR) sequences for the Cas13f of SEQ ID NOs: 52-56 are SEQ ID NOs: 59-63, respectively.
As used herein, “direct repeat sequence” may refer to the DNA coding sequence in the CRISPR locus, or to the RNA encoded by the same in crRNA. Thus when any of SEQ ID NOs: 57-63 is referred to in the context of an RNA molecule, such as crRNA, each T is understood to represent a U.
In certain embodiments, the wild-type Cas effector proteins of the invention can be: (i) any one of SEQ ID NOs: 50-56, such as SEQ ID NO: 50; (ii) an ortholog, paralog, homolog of any one of SEQ ID NOs: 50-56; or (iii) a Class 2 type VI effector enzyme having amino acid sequence identity of at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% compared to any one of SEQ ID NOs: 50-56.
In certain embodiments, the Cas13e and Cas13f effector proteins, orthologs, homologs, derivatives and functional fragments thereof are naturally existing. In certain other embodiments, the Cas13e and Cas13f effector proteins, orthologs, homologs, derivatives and functional fragments thereof are not naturally existing, e.g., having at least one amino acid difference compared to a naturally existing sequence.
In certain embodiments, the region spatially close to the endonuclease catalytic domain of the corresponding wild-type Cas13 effector enzyme includes residues within 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 amino acids from any residues of the endonuclease catalytic domain (e.g., an RXXXXH domain) in the primary sequence of the Cas13.
In certain embodiments, the region includes residues within 130, 125, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 amino acids from any residues of the endonuclease catalytic domain (e.g., an RXXXXH domain) in the primary sequence of the Cas13e; residues within 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 amino acids from any residues of the endonuclease catalytic domain (e.g., an RXXXXH domain) in the primary sequence of the Cas13d; or residues within 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 amino acids from any residues of the endonuclease catalytic domain (e.g., an RXXXXH domain) in the primary sequence of the Cas13f.
In certain embodiments, the region spatially close to the endonuclease catalytic domain of the corresponding wild-type Cas13 effector enzyme includes residues more than 100, 110, 120, or 130 residues away from any residues of the endonuclease catalytic domain in the primary sequence of the Cas13, but are spatially within 1-10 or 5 ångström of a residue of the endonuclease catalytic domain.
In certain embodiments, the endonuclease catalytic domain is a HEPN domain, optionally a HEPN domain comprising an RXXXXH motif.
In certain embodiments, the RXXXXH motif comprises a R{N/H/K/Q/R}X1X2X3H sequence (SEQ ID NO: 1024).
In certain embodiments, in the R{N/H/K/Q/R}X1X2X3H sequence (SEQ ID NO: 1025), X1 is R, S, D, E, Q, N, G, or Y; X2 is I, S, T, V, or L; and X3 is L, F, N, Y, V, I, S, D, E, or A.
In certain embodiments, the RXXXXH motif is an N-terminal RXXXXH motif comprising an RNXXXH sequence, such as an RN{Y/F}{F/Y}SH sequence (SEQ ID NO: 64). In certain embodiments, the N-terminal RXXXXH motif has a RNYFSH sequence (SEQ ID NO: 65). In certain embodiments, the N-terminal RXXXXH motif has a RNFYSH sequence (SEQ ID NO: 66). In certain embodiments, the RXXXXH motif is a C-terminal RXXXXH motif comprising an R{N/A/R}{A/K/S/F}{A/L/F}{F/H/L}H sequence (SEQ ID NO: 1026). For example, the C-terminal RXXXXH motif may have a RN(A/K)ALH sequence (SEQ ID NO: 67), or a RAFFHH (SEQ ID NO: 68) or RRAFFH sequence (SEQ ID NO: 69).
In certain embodiments, region comprises, consists essentially of, or consists of: (a) residues corresponding to residues between residues 1-194, 2-187, 227-242, 620-775, or 634-755 of SEQ ID NO: 4. In certain embodiments, region comprises, consists essentially of, or consists of residues corresponding to residues between residues 35-51, 52-67, 156-171, 666-682, or 712-727 of SEQ ID NO: 4; (ii) residues corresponding to the HEPN1-1 domain (e.g., residues 90-292), Helical2 domain (e.g., residues 536-690), and the HEPN2 domain (e.g., residues 690-967) of SEQ ID NO: 101; or (iii) residues corresponding to the HEPN1 domain (e.g., residues 1-168), Helical1 domain, Helical2 domain (e.g., residues 346-477), and the HEPN2 domain (e.g., residues 644-790) of SEQ ID NO: 52.
In certain embodiments, the mutation comprises, consists essentially of, or consists of substitutions, within a stretch of 15-20 consecutive amino acids within the region, one or more charged or polar residues to a charge neutral short chain aliphatic residue (such as A). For example, in some embodiments, the stretch is about 16 or 17 residues.
In certain embodiments, the mutation comprises, consists essentially of, or consists of substitutions, within a stretch of 15-20 consecutive amino acids within the region, (a) one or more charged, nitrogen-containing side chain group, bulky (such as F or Y), aliphatic, and/or polar residues to a charge-neutral short chain aliphatic residue (such as A, V, or I); (b) one or more I/L to A substitution(s); and/or (c) one or more A to V substitution(s).
In certain embodiments, substantially all, except for up to 1, 2, or 3, charged and polar residues within the stretch are substituted.
In certain embodiments, a total of about 7, 8, 9, or 10 charged and polar residues within the stretch are substituted.
In certain embodiments, the N- and C-terminal 2 residues of the stretch are substituted to amino acids the coding sequences of which contain a restriction enzyme recognition sequence. For example, in some embodiments, the N-terminal two residues may be VF, and the C-terminal 2 residues may be ED, and the restriction enzyme is BpiI. Other suitable RE sites are readily envisioned. The RE sites for the N- and C-terminal ends can be, but need not be identical.
In certain embodiments, the one or more charged or polar residues comprise N, Q, R, K, H, D, E, Y, S, and T residues. In certain embodiments, the one or more charged or polar residues comprise R, K, H, N, Y, and/or Q residues.
In certain embodiments, one or more Y residue(s) within said stretch is substituted. In certain embodiments, said one or more Y residues(s) correspond to Y672, Y676, and/or Y715 of wild-type Cas13e.1 (SEQ ID NO: 4). In certain embodiments, said stretch is residues 35-51, 52-67, 156-171, 666-682, or 712-727 of SEQ ID NO: 4.
In certain embodiments, the mutation leads to reduction or elimination of guide sequence-independent collateral RNase activity. In certain embodiments, the mutation comprises charge-neutral short chain aliphatic residue substitution(s) corresponding to any one or more of SEQ ID NOs: 37-39, 45, and 48.
In certain embodiments, the mutation leads to enhanced guide sequence-independent collateral RNase activity compared to the wild-type Cas13. In certain embodiments, the mutation comprises charge-neutral short chain aliphatic residue substitution(s) corresponding to any one or more of SEQ ID NOs: 40-42.
In certain embodiments, the charge-neutral short chain aliphatic residue is A, I, L, V, or G.
In certain embodiments, the charge-neutral short chain aliphatic residue is Ala (A).
In certain embodiments, the mutation comprises, consists essentially of, or consists of substitutions within 2, 3, 4, or 5 said stretches of 15-20 consecutive amino acids within the region.
In certain embodiments, the mutation with reduced collateral activity comprises, consists essentially of, or consists of: (a) substitutions within 1, 2, 3, 4, or 5 of said stretches of 15-20 consecutive amino acids within the region; (b) a mutation corresponds to a Cas13d mutation (e.g., that of Example 4) that retains at least about 75% of guide RNA-specific cleavage of wild-type Cas13d (such as SEQ ID NO: 101) (or theoretical maximum thereof), and exhibits less than about 25% or 27.5% collateral effect of wild-type Cas13d (such as SEQ ID NO: 101) (or theoretical maximum thereof); (c) a mutation corresponds to the N1V7, N2V7, N2V8 (cfCas13d), N3V7, or N15V4 mutation of Cas13d mutation; (d) a mutation corresponds to a Cas13d mutation (e.g., that of Example 4) that retains between about 25-75% of guide RNA-specific cleavage of wild-type Cas13d (such as SEQ ID NO: 101) (or theoretical maximum thereof), and exhibits less than about 25% or 27.5% collateral effect of wild-type Cas13d (such as SEQ ID NO: 101) (or theoretical maximum thereof); (e) a mutation corresponds to the N2V4, N2V5, N4V3, N6V3, N10V6, N15V2, N20V6, or N20-Y910A mutation of Cas13d mutation; (f) a mutation corresponds to a Cas13e mutation (e.g., that of Example 1, 2, or 5) that retains at least about 75% of guide RNA-specific cleavage of wild-type Cas13e (such as SEQ ID NO: 4) (or theoretical maximum thereof), and exhibits less than about 25% collateral effect of wild-type Cas13e (such as SEQ ID NO: 4) (or theoretical maximum thereof); (g) a mutation corresponds to the M1V4, M2V2, M2V3, M2V4, M5V1, M6V2, M6V3, M6V4, M7V1, M7V2, M7V3, M7-Y55A, M7-Y61A, M11V1, M12V3, M15V1, M15V2, M15-Y643A, M15-Y647A, M16V1, M16V2, M17V2, M18V2, M18V3, M19V2, M19V3, or M19-IA mutation of Cas13e mutation; (h) a mutation corresponds to a Cas13e mutation (e.g., that of Example 5) that retains between about 25-75% of guide RNA-specific cleavage of wild-type Cas13e (such as SEQ ID NO: 4) (or theoretical maximum thereof), and exhibits less than about 25% collateral effect of wild-type Cas13e (such as SEQ ID NO: 4) (or theoretical maximum thereof); and/or (i) a mutation corresponds to the M17YY (cfCas13e), M8V4, M9V1, M11V2, M11V3, M13V1, M13V2, M13V3, M15V3, or M20V2 mutation of Cas13e mutation; (j) a mutation corresponds to a Cas13f mutation (e.g., that of Example 12) that retains at least about 75% of guide RNA-specific cleavage of wild-type Cas13f (such as SEQ ID NO: 52) (or theoretical maximum thereof), and exhibits less than about 25 or 27.5% collateral effect of wild-type Cas13f (such as SEQ ID NO: 52) (or theoretical maximum thereof); (k) a mutation corresponds to the F7V2, F10V1, F10V4, F40V2, F40V4, F44V2, F10S19, F10S21, F10S24, F10S26, F10S27, F10S33, F10S34, F10S35, F10S36, F10S45, F10S46, F10S48, F10S49, F40S22, F40S23, F40S26, F40S27, OR F40S36 mutation of Cas13f mutation; (1) a mutation corresponds to a Cas13f mutation (e.g., that of Example 12) that retains between about 50-75% of guide RNA-specific cleavage of wild-type Cas13f (such as SEQ ID NO: 52) (or theoretical maximum thereof), and exhibits less than about 25 or 27.5% collateral effect of wild-type Cas13f (such as SEQ ID NO: 52) (or theoretical maximum thereof); and/or (m) a mutation corresponds to the F2V4, F3V1, F3V3, F3V4, F5V2, F5V3, F6V4, F7V1, F38V4, F40V1, F41V1, F41V3, F42V4, F43V1, F10S2, F10S11, F10S12, F10S18, F10S20, F10S23, F10S25, F10S28, F10S43, F10S44, F10S47, F10S50, F10S51, F10S52, F40S7, F40S9, F40S11, F40S21, F40S22, F40S24, F40S28, F40S29, F40S30, F40S35, OR F40S37 mutation of Cas13f mutation.
In certain embodiments, the mutation with enhanced collateral activity comprises, consists essentially of, or consists of: (a) substitutions within 1, 2, 3, 4, or 5 of said stretches of 15-20 consecutive amino acids within the region; (b) a mutation corresponds to a Cas13d mutation (e.g., that of Example 4) that retains at least about 75% of guide RNA-specific cleavage of wild-type Cas13d (such as SEQ ID NO: 101) (or theoretical maximum thereof), and exhibits more than about 110%, 115%, 120%, 125%, 130%, 135%, 140%, 145%, 150%, 155%, 160%, 165%, 170%, 175%, 180% or more collateral effect of wild-type Cas13d (such as SEQ ID NO: 101); (c) a mutation corresponds to the N2-Y142A, N4-Y193A, N12-Y604A, N21V7 mutation of Cas13d mutation in Example 4; (d) a mutation corresponds to a Cas13e mutation (e.g., that of Example 5) that retains at least about 75% of guide RNA-specific cleavage of wild-type Cas13e (such as SEQ ID NO: 4) (or theoretical maximum thereof), and exhibits more than about 110%, 115%, 120%, 125%, 130%, 135%, 140%, 145%, 150%, 155%, 160%, 165%, 170%, 175%, 180% or more collateral effect of wild-type Cas13e (such as SEQ ID NO: 4); (e) a mutation corresponds to the M4V2, M4V3, M4V4, M8V1, M8V2, M9V2, M9V3, M10V1, M10V2, M11V4, M12V2, M14V1, M14V2, M16V3, M18V1, M19-G712A, M19-C727A, M19T725A, or M21V2 mutation of Cas13e mutation; (1) a mutation corresponds to a Cas13f mutation (e.g., that of Example 12) that retains at least about 75% of guide RNA-specific cleavage of wild-type Cas13f (such as SEQ ID NO: 52) (or theoretical maximum thereof), and exhibits more than about 110%, 115%, 120%, 125%, 130%, 135%, 140%, 145%, 150%, 155%, 160%, 165%, 170%, 175%, 180% or more collateral effect of wild-type Cas13f (such as SEQ ID NO: 52); (g) a mutation corresponds to the F38V2, F42V1, F46V3, F38S2, F38S4, F3855, F38S6, F38S7, F38S8, F38S9, F38S10, F38S11, F38S12, F38S13, F38S15, F38S16, F38S17, F40S1, F40S2, F40S3, F40S4, F40S5, F40S6, F40S8, F40S16, F40S18, F46S1, F46S4, F46S6, F46S7, F46S10, F46S14, F46S15, F10S4, F10S5, F10S6, F10S9, F10S10, F10S7, F38S1, F38S13, or F46S2 mutation of Cas13f mutation (e.g., that of Example 12).
The sequences of the mutations and/or variants referenced herein for Cas13d, Cas13e, and Cas13f are described in detail in the examples (such as examples 1, 2, 4, 5, and 12) and the associated sequence listing.
In certain embodiments, more than one (e.g., any combinations of two or more of) such mutations/variants may be present in the same engineered Cas13 effector enzyme.
In certain embodiments, the engineered Cas13 preserves at least about 50%, 60%, 70%, 72.5%, 75%, 80%, 85%, 87.5%, 90%, 95%, 96%, 97%, 97.5%, 98%, or 99% of the guide sequence-specific endonuclease cleavage activity of the wild-type Cas13 (or theoretical maximum thereof) towards the target RNA.
In certain embodiments, the engineered Cas13 has at least about 95%, 100%, 105%, 110%, 115%, 120%, 125%, 130%, 135%, 140%, 145%, 150%, 155%, 160% or more of the guide sequence-specific endonuclease cleavage activity of the wild-type Cas13 towards the target RNA. That is, the subject engineered Cas13 variant may have higher guide sequence-specific endonuclease cleavage activity towards the target RNA compared to the wild-type Cas13 from which the variant is derived.
In certain embodiments, the engineered Cas13 lacks at least about 70%, 72.5%, 75%, 77.5%, 80%, 82.5%, 85%, 87.5%, 90%, 92.5%, 95%, 96%, 97%, 98%, 99%, or 100% of the guide sequence-independent collateral endonuclease cleavage activity of the wild-type Cas13 (or theoretical maximum thereof) towards the non-target RNA.
In certain embodiments, the engineered Cas13 preserves at least about 80-90% of the guide sequence-specific endonuclease cleavage activity of the wild-type Cas13 (or theoretical maximum thereof) towards the target RNA, and lacks at least about 95-100% of the guide sequence-independent collateral endonuclease cleavage activity of the wild-type Cas13 (or theoretical maximum thereof) towards the non-target RNA.
In certain embodiments, the guide RNA-specific and collateral (gRNA-independent) cleavage activity by the engineered Cas13 effector enzymes are measured using methods substantially as described in any of the examples (such as Examples 1, 2, 4, 5 and 12).
In certain embodiments, the engineered Cas13 of the invention has an amino acid sequence at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, or 99.86% identical to any one of SEQ ID NOs: 6-10, and Cas13d (such as SEQ ID NO: 101), excluding any one or more of the regions defined by SEQ ID NOs: 16, 20, 24, 28, and 32, and any of the mutation regions in Example 4 or 5. For example, in the regions outside or excluding SEQ ID NOs: 16, 20, 24, 28, and/or 32, the engineered Cas13 of the invention may differ from the engineered Cas13 of any one of SEQ ID NOs: 6-10 by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more residues, provided that such additional changes do not substantially negatively affect the guide sequence-specific endonuclease activity, and/or do not increase the guide sequence-independent collateral effect.
In certain embodiments, the amino acid sequence contains up to 1, 2, 3, 4, or 5 differences in each of one or more regions defined by SEQ ID NO: 16, 20, 24, 28, and 32, as compared to SEQ ID NOs: 17, 21, 25, 29, and 33, respectively. For example, additional changes in SEQ ID NOs: 17, 21, 25, 29, and/or 33 are possible without substantially negatively affect the guide sequence-specific endonuclease activity, and/or do not increase the guide sequence-independent collateral effect.
In certain embodiments, the engineered Cas13 of the invention has the amino acid sequence of any one of SEQ ID NOs: 6-10. In certain embodiments, the engineered Cas13 of the invention has the amino acid sequence of SEQ ID NO: 9 or 10.
In certain embodiments, the engineered Cas13 of the invention further comprises a nuclear localization signal (NLS) sequence or a nuclear export signal (NES). For example, in certain embodiments, the engineered Cas13 may comprise an N- and/or a C-terminal NLS.
In a related aspect, the invention provides additional derivatives of the subject engineered Cas13, such as those either substantially lacking or having enhanced collateral endonuclease activity, such as Cas13e and Cas13f effector proteins based on any one of SEQ ID NOs: 50-56 (e.g., SEQ ID NOs: 6-10), or the above orthologs, homologs, derivatives and functional fragments thereof, which comprises another covalently or non-covalently linked protein or polypeptide or other molecules (such as detection reagents or drug/chemical moieties). Such other proteins/polypeptides/other molecules can be linked through, for example, chemical coupling, gene fusion, or other non-covalent linkage (such as biotin-streptavidin binding). Such derived proteins do not affect the function of the original protein, such as the ability to bind a guide RNA/crRNA of the invention (described herein below) to form a complex, the RNase activity, and the ability to bind to and cleave a target RNA at a specific site, under the guidance of the crRNA that is at least partially complementary to the target RNA. In addition, such derived proteins do retain the characteristics of the subject engineered Cas13 either lacking or having enhanced collateral endonuclease activity.
That is, in certain embodiments, upon binding of the RNP complex of the subject engineered Cas13 (or derivative thereof) to the target RNA, the engineered Cas13 either does not exhibit substantial (or detectable) or has enhanced collateral RNase activity.
Such derivation may be used, for example, to add a nuclear localization signal (NLS, such as SV40 large T antigen NLS) to enhance the ability of the subject Cas13, e.g., Cas13e and Cas13f effector proteins, to enter cell nucleus. Such derivation can also be used to add a targeting molecule or moiety to direct the subject Cas13, e.g., Cas13e and Cas13f effector proteins, to specific cellular or subcellular locations. Such derivation can also be used to add a detectable label to facilitate the detection, monitoring, or purification of the subject Cas13, e.g., Cas13e and Cas13f effector proteins. Such derivation can further be used to add a deamination enzyme moiety (such as one with adenine or cytosine deamination activity) to facilitate RNA base editing.
The derivation can be through adding any of the additional moieties at the N- or C-terminal of the subject Cas13 effector proteins, or internally (e.g., internal fusion or linkage through side chains of internal amino acids).
In a related aspect, the invention provides conjugates of the subject engineered Cas13, such as those either substantially lacking or having enhanced substantially lacking collateral endonuclease activity, such as Cas13e and Cas13f effector proteins based on any one of SEQ ID NOs: 50-56 (e.g., SEQ ID NOs: 6-10), or the above orthologs, homologs, derivatives and functional fragments thereof, which are conjugated with moieties such as other proteins or polypeptides, detectable labels, or combinations thereof. Such conjugated moieties may include, without limitation, localization signals, reporter genes (e.g., GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP), labels (e.g., fluorescent dye such as FITC, or DAPI), NLS, targeting moieties, DNA binding domains (e.g., MBP, Lex A DBD, Gal4 DBD), epitope tags (e.g., His, myc, V5, FLAG, HA, VSV-G, Trx, etc), transcription activation domains (e.g., VP64 or VPR), transcription inhibition domains (e.g., KRAB moiety or SID moiety), nucleases (e.g., FokI), deamination domain (e.g., ADAR1, ADAR2, APOBEC, AID, or TAD), methylase, demethylase, transcription release factor, HDAC, ssRNA cleavage activity, dsRNA cleavage activity, ssDNA cleavage activity, dsDNA cleavage activity, DNA or RNA ligase, any combination thereof, etc.
For example, the conjugate may include one or more NLSs, which can be located at or near N-terminal, C-terminal, internally, or combination thereof. The linkage can be through amino acids (such as D or E, or S or T), amino acid derivatives (such as Ahx, β-Ala, GABA or Ava), or PEG linkage.
In certain embodiments, conjugations do not affect the function of the original engineered protein, such as those either substantially lacking or having enhanced collateral effect, such as the ability to bind a guide RNA/crRNA of the invention (described herein below) to form a complex, and the ability to bind to and cleave a target RNA at a specific site, under the guidance of the crRNA that is at least partially complementary to the target RNA.
In a related aspect, the invention provides fusions of the subject engineered Cas13, such as those either substantially lacking or having enhanced collateral endonuclease activity, such as Cas13e and Cas13f effector proteins based on any one of SEQ ID NOs: 50-56 (e.g., SEQ ID NOs: 6-10), or the above orthologs, homologs, derivatives and functional fragments thereof, which fusions are with moieties such as localization signals, reporter genes (e.g., GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP), NLS, protein targeting moieties, DNA binding domains (e.g., MBP, Lex A DBD, Gal4 DBD), epitope tags (e.g., His, myc, V5, FLAG, HA, VSV-G, Trx, etc), transcription activation domains (e.g., VP64 or VPR), transcription inhibition domains (e.g., KRAB moiety or SID moiety), nucleases (e.g., FokI), deamination domain (e.g., ADAR1, ADAR2, APOBEC, AID, or TAD), methylase, demethylase, transcription release factor, HDAC, ssRNA cleavage activity, dsRNA cleavage activity, ssDNA cleavage activity, dsDNA cleavage activity, DNA or RNA ligase, any combination thereof, etc.
For example, the fusion may include one or more NLSs, which can be located at or near N-terminal, C-terminal, internally, or combination thereof. In certain embodiments, conjugations do not affect the function of the original engineered Cas13 protein, such as those either substantially lacking or having enhanced collateral activity, such as the ability to bind a guide RNA/crRNA of the invention (described herein below) to form a complex, the RNase activity, and the ability to bind to and cleave a target RNA at a specific site, under the guidance of the crRNA that is at least partially complementary to the target RNA.
In another aspect, the invention provides a polynucleotide encoding the engineered Cas13 of the invention. The polynucleotide may comprise: (i) a polynucleotide encoding any one of the engineered Cas13, such as those either substantially lacking or having enhanced collateral effect, e.g., those based on Cas13e or Cas13f effector proteins of SEQ ID NOs: 50-56 (such as SEQ ID NOs: 6-10), or orthologs, homologs, derivatives, functional fragments, fusions thereof; (ii) a polynucleotide of any one of SEQ ID NOs: 11-15; or (iii) a polynucleotide comprising (i) and (ii).
In certain embodiments, the polynucleotide of the invention is codon-optimized for expression in a eukaryote, a mammal (such as a human or a non-human mammal), a plant, an insect, a bird, a reptile, a rodent (e.g., mouse, rat), a fish, a worm/nematode, or a yeast.
In a related aspect, the invention provides a polynucleotide having (i) one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10) nucleotides additions, deletions, or substitutions compared to the subject polynucleotide described above; (ii) at least 50%, 60%, 70%, 80%, 90%, 95%, or 97% sequence identity to the subject polynucleotide described above; (iii) hybridize under stringent conditions with the subject polynucleotide described above or any of (i) and (ii); or (iv) is a complement of any of (i)-(iii).
In another related aspect, the invention provides a vector comprising or encompassing any one of the polynucleotides of the invention described herein. The vector can be a cloning vector, or an expression vector. The vector can be a plasmid, phagemid, or cosmid, just to name a few. In certain embodiments, the vector can be used to express the polynucleotide in a mammalian cell, such as a human cell, any one of the engineered Cas13, such as those either substantially lacking or having enhanced collateral activity, e.g., the subject engineered Cas13e or Cas13f effector proteins based on SEQ ID NOs: 50-56 (such as SEQ ID NOs: 6-10), or orthologs, homologs, derivatives, functional fragments, fusions thereof; or any of the polynucleotide of the invention; or any of the complex of the invention.
In certain embodiments, the polynucleotide is operably linked to a promoter and optionally an enhancer. For example, in some embodiments, the promoter is a constitutive promoter, an inducible promoter, a ubiquitous promoter, or a tissue specific promoter. In certain embodiments, the vector is a plasmid. In certain embodiments, the vector is a retroviral vector, a phage vector, an adenoviral vector, a herpes simplex viral (HSV) vector, an AAV vector, or a lentiviral vector. In certain embodiments, the AAV vector is a recombinant AAV vector of the serotype AAV1, AAV2, AAV4, AAV5, AAV6, AAV7, AAVrh74, AAV8, AAV9, AAV10, AAV 11, AAV 12, or AAV 13. In certain embodiments.
Another aspect of the invention provides a delivery system comprising (1) a delivery vehicle, and (2) the engineered Cas13 of the invention, the polynucleotide of the invention, or the vector of the invention.
In certain embodiments, the delivery vehicle is a nanoparticle, a liposome, an exosome, a microvesicle, or a gene-gun.
A further aspect of the invention provides a cell or a progeny thereof, comprising the engineered Cas13 of the invention, the polynucleotide of the invention, or the vector of the invention. The cell can be a prokaryote such as E. coli, or a cell from a eukaryote such as yeast, insect, plant, animal (e.g., mammal including human and mouse). The cell can be isolated primary cell (such as bone marrow cells for ex vivo therapy), or established cell lines such as tumor cell lines, 293T cells, or stem cells, iPCs, etc.
In certain embodiments, the cell or progeny thereof is a eukaryotic cell (e.g., a non-human mammalian cell, a human cell, or a plant cell) or a prokaryotic cell (e.g., a bacteria cell).
A further aspect of the invention provides a non-human multicellular eukaryote comprising the cell of the invention.
In certain embodiments, the non-human multicellular eukaryote is an animal (e.g., rodent or primate) model for a human genetic disorder.
In another aspect, the invention provides a complex comprising: (i) a protein composition of any one of the subject engineered Cas13, such as those either substantially lacking or having enhanced collateral endonuclease activity, e.g., engineered Cas13e or Cas13f effector protein, or orthologs, homologs, derivatives, conjugates, functional fragments thereof, conjugates thereof, or fusions thereof; and (ii) a polynucleotide composition, comprising an isolated polynucleotide comprising a cognate DR sequence for said engineered Cas13 effector enzyme, and a spacer/guide sequence complementary to at least a portion of a target RNA.
In certain embodiments, the DR sequence is at the 3′ end of the spacer sequence.
In certain embodiments, the DR sequence is at the 5′ end of the spacer sequence.
In some embodiments, the polynucleotide composition is the guide RNA/crRNA of the subject engineered Cas13, such as those either substantially lacking or having enhanced collateral activity, e.g., engineered Cas13e or Cas13f system, which does not include a tracrRNA.
In certain embodiments, for use with the subject engineered Cas13, such as those either substantially lacking or having enhanced collateral activity, e.g., the subject engineered Cas13e and Cas13f effector proteins, homologs, orthologs, derivatives, fusions, conjugates, or functional fragments thereof having guide sequence-specific RNase activity, the spacer sequence is at least about 10 nucleotides, or between 10-60, 15-50, 20-50, 25-40, 25-50, or 19-50 nucleotides.
In a related aspect, the invention provides a eukaryotic cell comprising a subject complex comprising a subject engineered Cas13, said complex comprising: (1) an RNA guide sequence comprising a spacer sequence capable of hybridizing to a target RNA, and a direct repeat (DR) sequence 5′ or 3′ to the spacer sequence; and, (2) a subject engineered Cas13, such as those either substantially lacking or having enhanced collateral activity, such as a subject engineered Cas13e or Cas13f effector enzyme based on a wild-type having an amino acid sequence of any one of SEQ ID NOs: 50-56 (such as SEQ ID NOs: 6-10), or a derivative or functional fragment of said Cas; wherein the Cas, the derivative, and the functional fragment of said Cas, are capable of (i) binding to the RNA guide sequence and (ii) targeting the target RNA.
In another aspect, the invention provides a composition comprising: (i) a first (protein) composition selected from any one of the engineered Cas13, such as those either substantially lacking or having enhanced collateral activity, e.g., engineered Cas13e or Cas13f effector proteins based on SEQ ID NOs: 50-56 (such as SEQ ID NOs: 6-10), or orthologs, homologs, derivatives, conjugates, functional fragments, fusions thereof; and (ii) a second (nucleotide) composition comprising an RNA encompassing a guide RNA/crRNA, particularly a spacer sequence, or a coding sequence for the same. The guide RNA may comprise a DR sequence, and a spacer sequence which can complement or hybridize with a target RNA. The guide RNA can form a complex with the first (protein) composition of (i). In some embodiment, the DR sequence can be the polynucleotide of the invention. In some embodiment, the DR sequence can be at the 5- or 3′-end of the guide RNA. In some embodiments, the composition (such as (i) and/or (ii)) is non-naturally occurring or modified from a naturally occurring composition. In some embodiments, the target sequence is an RNA from a prokaryote or a eukaryote, such as a non-naturally existing RNA. The target RNA may be present inside a cell, such as in the cytosol or inside an organelle. In some embodiments, the protein composition may have an NLS that can be located at its N- or C-terminal, or internally.
In another aspect, the invention provides a composition comprising one or more vectors of the invention, said one or more vectors comprise: (i) a first polynucleotide that encodes any one of the engineered Cas13, such as those either substantially lacking or having enhanced collateral activity, such as a subject engineered Cas13e or Cas13f effector proteins based on SEQ ID NOs: 50-56 (such as SEQ ID NOs: 6-10), or orthologs, homologs, derivatives, functional fragments, fusions thereof; optionally operably linked to a first regulatory element; and (ii) a second polynucleotide that encodes a guide RNA of the invention; optionally operably linked to a second regulatory element. The first and the second polynucleotides can be on different vectors, or on the same vector. The guide RNA can form a complex with the protein product encoded by the first polynucleotide, and comprises a DR sequence (such as any one of the 4th aspect) and a spacer sequence that can bind to/complement with a target RNA. In some embodiments, the first regulatory element is a promoter, such as an inducible promoter. In some embodiments, the second regulatory element is a promoter, such as an inducible promoter. In some embodiments, the target sequence is an RNA from a prokaryote or a eukaryote, such as a non-naturally existing RNA. The target RNA may be present inside a cell, such as in the cytosol or inside an organelle. In some embodiments, the protein composition may have an NLS that can be located at its N- or C-terminal, or internally.
In some embodiments, the vector is a plasmid. In some embodiment, the vector is a viral vector based on a retrovirus, a replication incompetent retrovirus, adenovirus, replication incompetent adenovirus, or AAV. In some embodiments, the vector can self-replicate in a host cell (e.g., having a bacterial replication origin sequence). In some embodiments, the vector can integrate into a host genome and be replicated therewith. In some embodiment, the vector is a cloning vector. In some embodiment, the vector is an expression vector.
The invention further provides a delivery composition for delivering any of the engineered Cas13, such as those either substantially lacking or having enhanced collateral activity, e.g., a subject engineered Cas13e or Cas13f effector proteins based on SEQ ID NOs: 50-56 (such as SEQ ID NOs: 6-10), or orthologs, homologs, derivatives, conjugates, functional fragments, fusions thereof of the invention; the polynucleotide of the invention; the complex of the invention; the vector of the invention; the cell of the invention, and the composition of the invention. The delivery can be through any one known in the art, such as transfection, lipofection, electroporation, gene gun, microinjection, sonication, calcium phosphate transfection, cation transfection, viral vector delivery, etc., using vehicles such as liposome(s), nanoparticle(s), exosome(s), microvesicle(s), a gene-gun or one or more viral vector(s).
The invention further provides a kit comprising any one or more of the following: any of the engineered Cas13, such as those either substantially lacking or having enhanced collateral activity, e.g., a subject engineered Cas13e or Cas13f effector proteins based on SEQ ID NOs: 50-56 (such as SEQ ID NOs: 6-10), or orthologs, homologs, derivatives, conjugates, functional fragments, fusions thereof of the invention; the polynucleotide of the invention; the complex of the invention; the vector of the invention; the cell of the invention, and the composition of the invention. In some embodiments, the kit may further comprise an instruction for how to use the kit components, and/or how to obtain additional components from 3rd party for use with the kit components. Any component of the kit can be stored in any suitable container.
Another aspect of the invention provides an engineered Cas13 effector enzyme comprising any one or more mutations as described in any of the Examples, such as Example 1, 2, 4, 5, or 12.
In certain embodiments, the engineered Cas13 effector enzyme exhibits about the same or enhanced guide-RNA-mediated cleavage of a target RNA complementary to the guide RNA, as compared to that of the wild-type Cas13 effector enzyme from which the engineered Cas13 effector enzyme derives (or theoretical maximum thereof).
In certain embodiments, the engineered Cas13 effector enzyme exhibits reduced or diminished guide-RNA independent or collateral cleavage of a non-specific RNA (e.g., one not substantially complementary to the guide RNA), as compared to that of the wild-type Cas13 effector enzyme (or theoretical maximum thereof) from which the engineered Cas13 effector enzyme derives. For example, the engineered Cas13 effector enzyme exhibits about 50%, 40%, 30%, 20%, 15%, 10% or less collateral cleavage compared to that of the wild-type Cas13 effector enzyme (or theoretical maximum thereof) from which the engineered Cas13 effector enzyme derives.
In certain embodiments, the engineered Cas13 effector enzyme exhibits increased guide-RNA independent or collateral cleavage of a non-specific RNA (e.g., one not substantially complementary to the guide RNA), as compared to that of the wild-type Cas13 effector enzyme from which the engineered Cas13 effector enzyme derives. For example, the engineered Cas13 effector enzyme exhibits about 105%, 110%, 120%, 130%, 140%, 150%, 160%, 170%, 180%, 190%, 200% or more collateral cleavage compared to that of the wild-type Cas13 effector enzyme from which the engineered Cas13 effector enzyme derives.
With the inventions generally described herein above, more detailed descriptions for the various aspects of the invention are provided in separate sections below. However, it should be understood that, for simplicity and to reduce redundancy, certain embodiments of the invention are only described under one section or only described in the claims or examples. Thus it should also be understood that any one embodiment of the invention, including those described only under one aspect, section, or only in the claims or examples, can be combined with any other embodiment of the invention, unless specifically disclaimed or the combination is improper.
One aspect of the invention provides engineered Cas13, such as those either substantially lacking or having enhanced collateral activity.
In certain embodiments, the Cas13 effector enzyme is a Class 2, type VI effector enzyme having two strictly conserved RX4-6H (RXXXXH)-like motifs, characteristic of Higher Eukaryotes and Prokaryotes Nucleotide-binding (HEPN) domains. In certain embodiments, the CRISPR Class 2, type VI effectors that contain two HEPN domains have been previously characterized and include, for example, CRISPR Cas13a (C2c2), Cas13b, Cas13c, Cas13d (including the engineered variant CasRx), Cas13e, and Cas13f.
HEPN domains have been shown to be RNase domains and confer the ability to bind to and cleave target RNA molecule. The target RNA may be any suitable form of RNA, including but not limited to mRNA, tRNA, ribosomal RNA, non-coding RNA, lncRNA (long non-coding RNA), and nuclear RNA. For example, in some embodiments, the engineered Cas13 proteins recognize and cleave RNA targets located on the coding strand of open reading frames (ORFs).
In one embodiment, the Class 2 type VI Cas13 effector enzyme is of the subtype Type VI-E and VI-F, or Cas13e or Cas13f (such as SEQ ID NOs: 50-56). Direct comparison of the wild-type Type VI-E and VI-F CRISPR-Cas effector proteins with the effector of these other systems shows that Type VI-E and VI-F CRISPR-Cas effector proteins are significantly smaller (e.g., about 20% fewer amino acids) than even the smallest previously identified Type VI-D/Cas13d effectors (see
Class 2, subtypes VI-E and VI-F effectors, like other Cas13 proteins, can be used in a variety of applications, and are particularly suitable for therapeutic applications since they are significantly smaller than other effectors (e.g., CRISPR Cas13a, Cas13b, Cas13c, and Cas13d/CasRx effectors) which allows for the packaging of the nucleic acids encoding the effectors and their guide RNA coding sequences into delivery systems having size limitations, such as the AAV vectors. Further, the lack of detectable collateral/non-specific RNase activity of the subject engineered Cas13, upon activation of the guide sequence-specific RNase activity, makes these engineered Cas13 effectors less prong to (if not immune from) potentially dangerous generalized off-target RNA digestion in target cells that are desirably not destroyed.
Exemplary Type VI-D CRISPR-Cas effector proteins include Cas13d, such as SEQ ID NO: 101. Exemplary Type VI-E and VI-F CRISPR-Cas effector proteins are provided in the table below.
In the sequences above, the two RX4-6H (RXXXXH) motifs in each effector are double-underlined. In Cas13e.1, the C-terminal motif may have two possibilities due to the RR and HH sequences flanking the motif. Mutations at one or both such domains may create an RNase dead version (or “dCas) of the Cas13e and Cas13f effector proteins, homologs, orthologs, fusions, conjugates, derivatives, or functional fragments thereof, while substantially maintaining their ability to bind the guide RNA and the target RNA complementary to the guide RNA.
The corresponding DR coding sequences for the Cas effectors are listed below:
In some embodiments, a subject engineered Cas13 effector enzyme, such as those either substantially lacking or having enhanced collateral activity is based on a “derivative” of a wild-type Type VI-D, Type VI-E and VI-F CRISPR-Cas effector proteins, said derivative having an amino acid sequence with at least about 80% sequence identity to the amino acid sequence of any one of SEQ ID NOs: 50-56 and 101 above (e.g., 81%, 82%, 83%, 84%, 85%, 86%, 87% 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%). Such derivative Cas effectors sharing significant protein sequence identity to any one of SEQ ID NOs: 50-56 and 101 have retained at least one of the functions of the Cas of SEQ ID NOs: 50-56 and 101 (see below), such as the ability to bind to and form a complex with a crRNA comprising at least one of the DR sequences of Cas13d, and SEQ ID NOs: 57-63. For example, a Cas13e.1 derivative may share 85% amino acid sequence identity to SEQ ID NO: 50, 51, 52, 53, 54, 55, or 56, respectively, and retains the ability to bind to and form a complex with a crRNA having a DR sequence of SEQ ID NO: 57, 58, 59, 60, 61, 62, or 63, respectively.
In certain embodiments, the sequence identity between the derivative and the wild-type Cas13 is based on regions outside the regions defined by the mutant regions in Examples 1, 2, 4 and 5, such as SEQ ID NOs: 16, 20, 24, 28, and 32.
In some embodiments, the derivative comprises conserved amino acid residue substitutions. In some embodiments, the derivative comprises only conserved amino acid residue substitutions (i.e., all amino acid substitutions in the derivative are conserved substitutions, and there is no substitution that is not conserved).
In some embodiments, the derivative comprises no more than 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acid insertions or deletions into any one of the wild-type sequences of Cas13d, and SEQ ID NOs: 50-56. The insertion and/or deletion maybe clustered together, or separated throughout the entire length of the sequences, so long as at least one of the functions of the wild-type sequence is preserved. Such functions may include the ability to bind the guide/crRNA, the RNase activity, the ability to bind to and/or cleave the target RNA complementary to the guide/crRNA. In some embodiments, the insertions and/or deletions are not present in the RXXXXH motifs, or within 5, 10, 15, or 20 residues from the RXXXXH motifs.
In some embodiments, the derivative has retained the ability to bind guide RNA/crRNA.
In some embodiments, the derivative has retained the guide/crRNA-activated RNase activity.
In some embodiments, the derivative has retained the ability to bind target RNA and/or cleave the target RNA in the presence of the bound guide/crRNA that is complementary in sequence to at least a portion of the target RNA.
In other embodiments, the derivative has completely or partially lost the guide/crRNA-activated RNase activity, due to, for example, mutations in one or more catalytic residues of the RNA-guided RNase. Such derivatives are sometimes referred to as dCas, such as dCas13d and dCas13e.1.
Thus in certain embodiments, the derivative may be modified to have diminished nuclease/RNase activity, e.g., nuclease inactivation of at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, or 100% as compared with the counterpart wild type proteins. The nuclease activity can be diminished by several methods known in the art, e.g., introducing mutations into the nuclease (catalytic) domains of the proteins. In some embodiments, catalytic residues for the nuclease activities are identified, and these amino acid residues can be substituted by different amino acid residues (e.g., glycine or alanine) to diminish the nuclease activity. In some embodiments, the amino acid substitution is a conservative amino acid substitution. In some embodiments, the amino acid substitution is a non-conservative amino acid substitution.
In some embodiments, the modification comprises one or more mutations (e.g., amino acid deletions, insertions, or substitutions) in at least one HEPN domain. In some embodiments, there is one, two, three, four, five, six, seven, eight, nine, or more amino acid substitutions in at least one HEPN domain.
For example, in some embodiments, the one or more mutations comprise a substitution (e.g., an alanine substitution) at an amino acid residue corresponding to R84, H89, R739, H744, R740, H745 of SEQ ID NO: 50 or R97, H102, R770, H775 of SEQ ID NO: 51 or R77, H82, R764, H769 of SEQ ID NO: 52, or R79, H84, R766A, H771 of SEQ ID NO: 53, or R79, H84, R766, H771 of SEQ ID NO: 54, or R89, H94, R773, H778 of SEQ ID NO: 55, or R89, H94, R777, H782 of SEQ ID NO: 56.
In certain embodiments, the one or more mutations comprises, consists essentially of, or consists of: (a) substitutions within 1, 2, 3, 4, or 5 of said stretches of 15-20 consecutive amino acids within the region; (b) a mutation corresponds to a Cas13d mutation of Example 4 that retains at least about 75% of guide RNA-specific cleavage of wild-type Cas13d (such as SEQ ID NO: 101), and exhibits less than about 27.5% collateral effect of wild-type Cas13d (such as SEQ ID NO: 101); (c) a mutation corresponds to the N1V7, N2V7, N2V8 (cfCas13d), N3V7, or N15V4 mutation of Cas13d mutation; (d) a mutation corresponds to a Cas13d mutation of Example 4 that retains between about 25-75% of guide RNA-specific cleavage of wild-type Cas13d (such as SEQ ID NO: 101), and exhibits less than about 27.5% collateral effect of wild-type Cas13d (such as SEQ ID NO: 101); (e) a mutation corresponds to the N2V4, N2V5, N4V3, N6V3, N10V6, N15V2, N20V6, or N20-Y910A mutation of Cas13d mutation; (f) a mutation corresponds to a Cas13e mutation of Example 1, 2, or 5 that retains at least about 75% of guide RNA-specific cleavage of wild-type Cas13e (such as SEQ ID NO: 4), and exhibits less than about 25% collateral effect of wild-type Cas13e (such as SEQ ID NO: 4); (f) a mutation corresponds to the M1V4, M2V2, M2V3, M2V4, M5V1, M6V2, M6V3, M6V4, M7V1, M7V2, M7V3, M7-Y55A, M7-Y61A, M11V1, M12V3, M15V1, M15V2, M15-Y643A, M15-Y647 A, M16V1, M16V2, M17V2, M18V2, M18V3, M19V2, M19V3, or M19-IA mutation of Cas13e mutation; (g) a mutation corresponds to a Cas13e mutation of Example 5 that retains between about 25-75% of guide RNA-specific cleavage of wild-type Cas13e (such as SEQ ID NO: 4), and exhibits less than about 25% collateral effect of wild-type Cas13e (such as SEQ ID NO: 4); and/or (h) a mutation corresponds to the M17YY (cfCas13e), M8V4, M9V1, M11V2, M11V3, M13V1, M13V2, M13V3, M15V3, or M20V2 mutation of Cas13e mutation.
In certain embodiments, the one or more mutations or the two or more mutations may be in a catalytically active domain of the effector protein comprising a HEPN domain, or a catalytically active domain which is homologous to a HEPN domain. In certain embodiments, the effector protein comprises one or more of the following mutations: R84A, H89A, R739A, H744A, R740A, H745A (wherein amino acid positions correspond to amino acid positions of Cas13e.1).
The skilled person will understand that corresponding amino acid positions in different Cas13 proteins, such as different Cas13d, Cas13e and Cas13f proteins, may be mutated to the same effect. In this regard,
In certain embodiments, one or more mutations abolishes catalytic activity of the protein completely or partially (e.g. altered cleavage rate, altered specificity, etc.).
Other exemplary (catalytic) residue mutations include: R97A, H102A, R770A, H775A of Cas13e.2, or R77A, H82A, R764A, H769A of Cas13f.1, or R79A, H84A, R766A, H771A of Cas13f.2, or R79A, H84A, R766A, H771A of Cas13f.3, or R89A, H94A, R773A, H778A of Cas13f.4, or R89A, H94A, R777A, H782A of Cas13f.5. In certain embodiments, any of the R and/or H residues herein may be replaced not be A but by G, V, or I.
The presence of at least one of these mutations results in a derivative having reduced or diminished guide sequence-dependent RNase activity as compared to the corresponding wild-type protein lacking the mutations. The additional presence of any one of the mutations in the subject engineered Cas13 substantially lacking collateral effect can reduce/eliminate off-target effect resulting from non-specific RNA binding.
In certain embodiments, the effector protein as described herein is a “dead” effector protein, such as a dead Cas13e or Cas13f effector protein (i.e. dCas13e and dCas13f). In certain embodiments, the effector protein has one or more mutations in HEPN domain 1 (N-terminal). In certain embodiments, the effector protein has one or more mutations in HEPN domain 2 (C-terminal). In certain embodiments, the effector protein has one or more mutations in HEPN domain 1 and HEPN domain 2.
The inactivated Cas or derivative or functional fragment thereof can be fused or associated with one or more heterologous/functional domains (e.g., via fusion protein, linker peptides, “GS” linkers, etc.). These functional domains can have various activities, e.g., methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, RNA cleavage activity, DNA cleavage activity, nucleic acid binding activity, base-editing activity, and switch activity (e.g., light inducible). In some embodiments, the functional domains are Krüppel associated box (KRAB), SID (e.g. SID4X), VP64, VPR, VP16, FokI, P65, HSF1, MyoD1, Adenosine Deaminase Acting on RNA such as ADAR1, ADAR2, APOBEC, cytidine deaminase (AID), TAD, mini-SOG, APEX, and biotin-APEX.
In some embodiments, the functional domain is a base editing domain, e.g., ADAR1 (including wild-type or ADAR2DD version thereof, with or without the E1008Q and/or the E488Q mutation(s)), ADAR2 (including wild-type or ADAR2DD version thereof, with or without the E1008Q and/or the E488Q mutation(s)), APOBEC, or AID.
In some embodiments, the functional domain may comprise one or more nuclear localization signal (NLS) domains. The one or more heterologous functional domains may comprise at least two or more NLS domains. The one or more NLS domain(s) may be positioned at or near or in proximity to a terminus of the effector protein (e.g., Cas13e/Cas13f effector proteins) and if two or more NLSs, each of the two may be positioned at or near or in proximity to a terminus of the effector protein (e.g., Cas13e/Cas13f effector proteins).
In some embodiments, at least one or more heterologous functional domains may be at or near the amino-terminus of the effector protein and/or wherein at least one or more heterologous functional domains is at or near the carboxy-terminus of the effector protein. The one or more heterologous functional domains may be fused to the effector protein. The one or more heterologous functional domains may be tethered to the effector protein. The one or more heterologous functional domains may be linked to the effector protein by a linker moiety.
In some embodiments, multiple (e.g., two, three, four, five, six, seven, eight, or more) identical or different functional domains are present.
In some embodiments, the functional domain (e.g., a base editing domain) is further fused to an RNA-binding domain (e.g., MS2).
In some embodiments, the functional domain is associated to or fused via a linker sequence (e.g., a flexible linker sequence or a rigid linker sequence). Exemplary linker sequences and functional domain sequences are provided in table below.
The positioning of the one or more functional domains on the inactivated Cas proteins is one that allows for correct spatial orientation for the functional domain to affect the target with the attributed functional effect. For example, if the functional domain is a transcription activator (e.g., VP16, VP64, or p65), the transcription activator is placed in a spatial orientation that allows it to affect the transcription of the target. Likewise, a transcription repressor is positioned to affect the transcription of the target, and a nuclease (e.g., FokI) is positioned to cleave or partially cleave the target. In some embodiments, the functional domain is positioned at the N-terminus of the Cas/dCas. In some embodiments, the functional domain is positioned at the C-terminus of the Cas/dCas. In some embodiments, the inactivated CRISPR-associated protein (dCas) is modified to comprise a first functional domain at the N-terminus and a second functional domain at the C-terminus.
Various examples of inactivated CRISPR-associated proteins fused with one or more functional domains and methods of using the same are described, e.g., in International Publication No. WO 2017/219027, which is incorporated herein by reference in its entirety, and in particular with respect to the features described herein.
In some embodiments, instead of using full-length wild-type (SEQ ID NOs: 50-56) or derivative Type VI-E and VI-F Cas effectors, “functional fragments” thereof can be used.
A “functional fragment,” as used herein, refers to a fragment of a wild-type Cas13 protein such as any one of SEQ ID NOs: 50-56 and 101, or a derivative thereof, that has less-than full-length sequence. The deleted residues in the functional fragment can be at the N-terminus, the C-terminus, and/or internally. The functional fragment retains at least one function of the wild-type VI-D, VI-E or VI-F Cas, or at least one function of its derivative. Thus a functional fragment is defined specifically with respect to the function at issue. For example, a functional fragment, wherein the function is the ability to bind crRNA and target RNA, may not be a functional fragment with respect to the RNase function, because losing the RXXXXH motifs at both ends of the Cas may not affect its ability to bind a crRNA and target RNA, but may eliminate/destroy the RNase activity. In certain embodiments, the engineered Cas13 of the invention including a functional fragment of an engineered Cas13 that substantially retains the corresponding wild-type Cas13's guide sequence-dependent RNase activity, but substantially lacks collateral activity.
In some embodiments, compared to full-length wild-type sequences, the engineered Class 2 type VI effector proteins or derivatives thereof or functional fragments thereof lacks about 30, 60, 90, 120, 150, or about 180 residues from the N-terminus.
In some embodiments, compared to full-length wild-type sequences, the engineered Class 2 type VI effector proteins or derivatives thereof or functional fragments thereof lacks about 30, 60, 90, 120, or about 150 residues from the C-terminus.
In some embodiments, compared to full-length wild-type sequences, the engineered Class 2 type VI effector proteins or derivatives thereof or functional fragments thereof lacks about 30, 60, 90, 120, 150, or about 180 residues from the N-terminus, and lacks about 30, 60, 90, 120, or about 150 residues from the C-terminus.
In some embodiments, the engineered Class 2 Type VI Cas13 effector proteins or derivatives thereof or functional fragments thereof have RNase activity, e.g., guide/crRNA-activated specific RNase activity.
In some embodiments, the engineered Class 2 Type VI Cas13 effector proteins or derivatives thereof or functional fragments thereof have no substantial/detectable collateral RNase activity.
The present disclosure also provides a split version of the engineered Class 2 type VI Cas13 effector enzyme described herein (e.g., a Type VI-D, VI-E or VI-F CRISPR-Cas effector protein). The split version of the engineered Cas13 may be advantageous for delivery. In some embodiments, the engineered Cas13 is split into two parts of the enzyme, which together substantially comprise a functioning engineered Class 2 type VI Cas13.
The split can be done in a way that the catalytic domain(s) are unaffected. The CRISPR-associated protein may function as a nuclease or may be an inactivated enzyme, which is essentially a RNA-binding protein with very little or no catalytic activity (e.g., due to mutation(s) in its catalytic domains). Split enzymes are described, e.g., in Wright et al., “Rational design of a split-Cas9 enzyme complex,” Proc. Nat'l. Acad. Sci. 112(10): 2984-2989, 2015, which is incorporated herein by reference in its entirety.
For example, in some embodiments, the nuclease lobe and a-helical lobe are expressed as separate polypeptides. Although the lobes do not interact on their own, the crRNA recruits them into a ternary complex that recapitulates the activity of full-length CRISPR-associated proteins and catalyzes site-specific cleavage. The use of a modified crRNA abrogates split-enzyme activity by preventing dimerization, allowing for the development of an inducible dimerization system.
In some embodiments, the split CRISPR-associated protein can be fused to a dimerization partner, e.g., by employing rapamycin sensitive dimerization domains. This allows the generation of a chemically inducible CRISPR-associated protein for temporal control of the activity of the protein. The CRISPR-associated protein can thus be rendered chemically inducible by being split into two fragments and rapamycin-sensitive dimerization domains can be used for controlled re-assembly of the protein.
The split point is typically designed in silico and cloned into the constructs. During this process, mutations can be introduced to the split CRISPR-associated protein and non-functional domains can be removed.
In some embodiments, the two parts or fragments of the split CRISPR-associated protein (i.e., the N-terminal and C-terminal fragments), can form a full CRISPR-associated protein, comprising, e.g., at least 70%, at least 80%, at least 90%, at least 95%, or at least 99% of the sequence of the wild-type CRISPR-associated protein.
The CRISPR-associated proteins described herein (e.g., a Type VI-D, VI-E or VI-F CRISPR-Cas effector protein) can be designed to be self-activating or self-inactivating. For example, the target sequence can be introduced into the coding construct of the CRISPR-associated protein. Thus, the CRISPR-associated protein can cleave the target sequence, as well as the construct encoding the protein thereby self-inactivating their expression. Methods of constructing a self-inactivating CRISPR system are described, e.g., in Epstein and Schaffer, Mol. Ther. 24: S50, 2016, which is incorporated herein by reference in its entirety.
In some other embodiments, an additional crRNA, expressed under the control of a weak promoter (e.g., 7SK promoter), can target the nucleic acid sequence encoding the CRISPR-associated protein to prevent and/or block its expression (e.g., by preventing the transcription and/or translation of the nucleic acid). The transfection of cells with vectors expressing the CRISPR-associated protein, the crRNAs, and crRNAs that target the nucleic acid encoding the CRISPR-associated protein can lead to efficient disruption of the nucleic acid encoding the CRISPR-associated protein and decrease the levels of CRISPR-associated protein, thereby limiting its activity.
In some embodiments, the activity of the CRISPR-associated protein can be modulated through endogenous RNA signatures (e.g., miRNA) in mammalian cells. A CRISPR-associated protein switch can be made by using a miRNA-complementary sequence in the 5′-UTR of mRNA encoding the CRISPR-associated protein. The switches selectively and efficiently respond to miRNA in the target cells. Thus, the switches can differentially control the Cas activity by sensing endogenous miRNA activities within a heterogeneous cell population. Therefore, the switch systems can provide a framework for cell-type selective activity and cell engineering based on intracellular miRNA information (see, e.g., Hirosawa et al., Nucl. Acids Res. 45(13): e118, 2017).
The engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity (e.g., engineered Type VI-D, VI-E and VI-F CRISPR-Cas effector proteins) can be inducibly expressed, e.g., their expression can be light-induced or chemically-induced. This mechanism allows for activation of the functional domain in the CRISPR-associated proteins. Light inducibility can be achieved by various methods known in the art, e.g., by designing a fusion complex wherein CRY2 PHR/CIBN pairing is used in split CRISPR-associated proteins (see, e.g., Konermann et al., “Optical control of mammalian endogenous transcription and epigenetic states,” Nature 500:7463, 2013.
Chemical inducibility can be achieved, e.g., by designing a fusion complex wherein FKBP/FRB (FK506 binding protein/FKBP rapamycin binding domain) pairing is used in split CRISPR-associated proteins. Rapamycin is required for forming the fusion complex, thereby activating the CRISPR-associated proteins (see, e.g., Zetsche et al., “A split-Cas9 architecture for inducible genome editing and transcription modulation,” Nature Biotech. 33:2:139-42, 2015).
Furthermore, expression of the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity can be modulated by inducible promoters, e.g., tetracycline or doxycycline controlled transcriptional activation (Tet-On and Tet-Off expression system), hormone inducible gene expression system (e.g., an ecdysone inducible gene expression system), and an arabinose-inducible gene expression system. When delivered as RNA, expression of the RNA targeting effector protein can be modulated via a riboswitch, which can sense a small molecule like tetracycline (see, e.g., Goldfless et al., “Direct and specific chemical control of eukaryotic translation with a synthetic RNA-protein interaction,” Nucl. Acids Res. 40:9: e64-e64, 2012).
Various embodiments of inducible CRISPR-associated proteins and inducible CRISPR systems are described, e.g., in U.S. Pat. No. 8,871,445, US Publication No. 2016/0208243, and International Publication No. WO 2016/205764, each of which is incorporated herein by reference in its entirety.
In some embodiments, the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity include at least one (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) Nuclear Localization Signal (NLS) attached to the N-terminal or C-terminal of the protein. Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence of SEQ ID NO: 79; the NLS from nucleoplasmin (e.g., the nucleoplasmin bipartite NLS with the sequence of SEQ ID NO: 80); the c-myc NLS having the amino acid sequence of SEQ ID NO: 81 or 82; the hRNPA1 M9 NLS having the sequence of SEQ ID NO: 83; the sequence of SEQ ID NO: 84 of the IBB domain from importin-alpha; the sequences of SEQ ID NO: 85 or 86 of the myoma T protein; the sequence of SEQ ID NO: 87 of human p53; the sequence of SEQ ID NO: 88 of mouse c-abl IV; the sequences of SEQ ID NO: 89 or 90 of the influenza virus NS1; the sequence of SEQ ID NO: 91 of the Hepatitis virus delta antigen; the sequence of SEQ ID NO: 92 of the mouse Mx1 protein; the sequence of SEQ ID NO: 93 of the human poly(ADP-ribose) polymerase; and the sequence of SEQ ID NO: 94 of the human glucocorticoid receptor. In some embodiments, the CRISPR-associated protein comprises at least one (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) Nuclear Export Signal (NES) attached the N-terminal or C-terminal of the protein. In a preferred embodiment a C-terminal and/or N-terminal NLS or NES is attached for optimal expression and nuclear targeting in eukaryotic cells, e.g., human cells.
In some embodiments, the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity are mutated at one or more amino acid residues to alter one or more functional activities.
For example, in some embodiments, the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity is mutated at one or more amino acid residues to alter its helicase activity.
In some embodiments, the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity is mutated at one or more amino acid residues to alter its nuclease activity (e.g., endonuclease activity or exonuclease activity), such as the collateral nuclease activity that is not dependent on guide sequence.
In some embodiments, the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity is mutated at one or more amino acid residues to alter its ability to functionally associate with a guide RNA.
In some embodiments, the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity is mutated at one or more amino acid residues to alter its ability to functionally associate with a target nucleic acid.
In some embodiments, the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity described herein are capable of cleaving a target RNA molecule.
In some embodiments, the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity is mutated at one or more amino acid residues to alter its cleaving activity. For example, in some embodiments, the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity may comprise one or more mutations that render the enzyme incapable of cleaving a target nucleic acid.
In some embodiments, the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity is capable of cleaving the strand of the target nucleic acid that is complementary to the strand to which the guide RNA hybridizes.
In some embodiments, a engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity described herein can be engineered to have a deletion in one or more amino acid residues to reduce the size of the enzyme while retaining one or more desired functional activities (e.g., nuclease activity and the ability to interact functionally with a guide RNA). The truncated engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity can be advantageously used in combination with delivery systems having load limitations.
In some embodiments, the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity described herein can be fused to one or more peptide tags, including a His-tag, GST-tag, a V5-tag, FLAG-tag, HA-tag, VSV-G-tag, Trx-tag, or myc-tag.
In some embodiments, the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity described herein can be fused to a detectable moiety such as GST, a fluorescent protein (e.g., GFP, HcRed, DsRed, CFP, YFP, or BFP), or an enzyme (such as HRP or CAT).
In some embodiments, the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity described herein can be fused to MBP, LexA DNA binding domain, or Gal4 DNA-binding domain.
In some embodiments, the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity described herein can be linked to or conjugated with a detectable label such as a fluorescent dye, including FITC and DAPI.
In any of the embodiments herein, the linkage between the engineered Class 2 type VI Cas13 effectors, such as those either substantially lacking or having enhanced collateral activity described herein and the other moiety can be at the N- or C-terminal of the CRISPR-associated proteins, and sometimes even internally via covalent chemical bonds. The linkage can be affected by any chemical linkage known in the art, such as peptide linkage, linkage through the side chain of amino acids such as D, E, S, T, or amino acid derivatives (Ahx, β-Ala, GABA or Ava), or PEG linkage.
The invention also provides nucleic acids encoding the proteins described herein (e.g., an engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity).
In some embodiments, the nucleic acid is a synthetic nucleic acid. In some embodiments, the nucleic acid is a DNA molecule. In some embodiments, the nucleic acid is an RNA molecule (e.g., an mRNA molecule encoding the engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity, derivative or functional fragment thereof). In some embodiments, the mRNA is capped, polyadenylated, substituted with 5-methyl cytidine, substituted with pseudouridine, or a combination thereof.
In some embodiments, the nucleic acid (e.g., DNA) is operably linked to a regulatory element (e.g., a promoter) in order to control the expression of the nucleic acid. In some embodiments, the promoter is a constitutive promoter. In some embodiments, the promoter is an inducible promoter. In some embodiments, the promoter is a cell-specific promoter. In some embodiments, the promoter is an organism-specific promoter.
Suitable promoters are known in the art and include, for example, a pol I promoter, a pol II promoter, a pol III promoter, a T7 promoter, a U6 promoter, a H1 promoter, retroviral Rous sarcoma virus LTR promoter, a cytomegalovirus (CMV) promoter, a SV40 promoter, a dihydrofolate reductase promoter, and a β-actin promoter. For example, a U6 promoter can be used to regulate the expression of a guide RNA molecule described herein.
In some embodiments, the nucleic acid(s) are present in a vector (e.g., a viral vector or a phage). The vector can be a cloning vector, or an expression vector. The vectors can be plasmids, phagemids, Cosmids, etc. The vectors may include one or more regulatory elements that allow for the propagation of the vector in a cell of interest (e.g., a bacterial cell or a mammalian cell). In some embodiments, the vector includes a nucleic acid encoding a single component of a CRISPR-associated (Cas) system described herein. In some embodiments, the vector includes multiple nucleic acids, each encoding a component of a CRISPR-associated (Cas) system described herein.
In one aspect, the present disclosure provides nucleic acid sequences that are at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the nucleic acid sequences described herein, i.e., nucleic acid sequences encoding the engineered Class 2 type VI Cas13 protein substantially lacking collateral activity, derivatives, functional fragments, or guide/crRNA, including the DR sequences.
In another aspect, the present disclosure also provides nucleic acid sequences encoding amino acid sequences that are at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequences of the subject engineered Class 2 type VI Cas13 protein substantially lacking collateral activity.
In some embodiments, the nucleic acid sequences have at least a portion (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides, e.g., contiguous or non-contiguous nucleotides) that is the same as the sequences described herein. In some embodiments, the nucleic acid sequences have at least a portion (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides, e.g., contiguous or non-contiguous nucleotides) that is different from the sequences described herein.
In related embodiments, the invention provides amino acid sequences having at least a portion (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40, 50, 60, 70, 80, 90, or 100 amino acid residues, e.g., contiguous or non-contiguous amino acid residues) that is the same as the sequences described herein. In some embodiments, the amino acid sequences have at least a portion (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40, 50, 60, 70, 80, 90, or 100 amino acid residues, e.g., contiguous or non-contiguous amino acid residues) that is different from the sequences described herein.
To determine the percent identity of two amino acid sequences, or of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). In general, the length of a reference sequence aligned for comparison purposes should be at least 80% of the length of the reference sequence, and in some embodiments is at least 90%, 95%, or 100% of the length of the reference sequence. The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences. For purposes of the present disclosure, the comparison of sequences and determination of percent identity between two sequences can be accomplished using a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.
The proteins described herein (e.g., an engineered Class 2 type VI Cas13 protein substantially lacking collateral activity) can be delivered or used as either nucleic acid molecules or polypeptides.
In certain embodiments, the nucleic acid molecule encoding the engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity, derivatives or functional fragments thereof are codon-optimized for expression in a host cell or organism. The host cell may include established cell lines (such as 293T cells) or isolated primary cells. The nucleic acid can be codon optimized for use in any organism of interest, in particular human cells or bacteria. For example, the nucleic acid can be codon-optimized for any prokaryotes (such as E. coli), or any eukaryotes such as human and other non-human eukaryotes including yeast, worm, insect, plants and algae (including food crop, rice, corn, vegetables, fruits, trees, grasses), vertebrate, fish, non-human mammal (e.g., mice, rats, rabbits, dogs, birds (such as chicken), livestock (cow or cattle, pig, horse, sheep, goat etc.), or non-human primates). Codon usage tables are readily available, for example, at the “Codon Usage Database” available at www.kazusa.orjp/codon/, and these tables can be adapted in a number of ways. See Nakamura et al., Nucl. Acids Res. 28:292, 2000 (incorporated herein by reference in its entirety). Computer algorithms for codon optimizing a particular sequence for expression in a particular host cell are also available, such as Gene Forge (Aptagen; Jacobus, Pa.).
An example of a codon optimized sequence, is in this instance a sequence optimized for expression in a eukaryote, e.g., humans (i.e. being optimized for expression in humans), or for another eukaryote, animal or mammal as herein discussed; see, e.g., SaCas9 human codon optimized sequence in WO 2014/093622 (PCT/US2013/074667). Whilst this is preferred, it will be appreciated that other examples are possible and codon optimization for a host species other than human, or for codon optimization for specific organs is known. In general, codon optimization refers to a process of modifying a nucleic acid sequence for enhanced expression in the host cells of interest by replacing at least one codon (e.g. about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons) of the native sequence with codons that are more frequently or most frequently used in the genes of that host cell while maintaining the native amino acid sequence. Various species exhibit particular bias for certain codons of a particular amino acid. Codon bias (differences in codon usage between organisms) often correlates with the efficiency of translation of messenger RNA (mRNA), which is in turn believed to be dependent on, among other things, the properties of the codons being translated and the availability of particular transfer RNA (tRNA) molecules. The predominance of selected tRNAs in a cell is generally a reflection of the codons used most frequently in peptide synthesis. Accordingly, genes can be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are readily available, for example, at the “Codon Usage Database” available at http://www.kazusa.orjp/codon/ and these tables can be adapted in a number of ways. See Nakamura, Y., et al. “Codon usage tabulated from the international DNA sequence databases: status for the year 2000” Nucl. Acids Res. 28:292 (2000). Computer algorithms for codon optimizing a particular sequence for expression in a particular host cell are also available, such as Gene Forge (Aptagen; Jacobus, Pa.), are also available. In some embodiments, one or more codons (e.g., 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more, or all codons) in a sequence encoding a Cas correspond to the most frequently used codon for a particular amino acid.
4. RNA Guides or crRNA
In some embodiments, the CRISPR systems described herein include at least RNA guide (e.g., a gRNA or a crRNA).
The architecture of multiple RNA guides is known in the art (see, e.g., International Publication Nos. WO 2014/093622 and WO 2015/070083, the entire contents of each of which are incorporated herein by reference).
In some embodiments, the CRISPR systems described herein include multiple RNA guides (e.g., one, two, three, four, five, six, seven, eight, or more RNA guides).
In some embodiments, the RNA guide includes a crRNA. In some embodiments, the RNA guide includes a crRNA but not a tracrRNA.
Sequences for guide RNAs from multiple CRISPR systems are generally known in the art, see, for example, Grissa et al. (Nucleic Acids Res. 35 (web server issue): W52-7, 2007; Grissa et al., BMC Bioinformatics 8:172, 2007; Grissa et al., Nucleic Acids Res. 36 (web server issue): W145-8, 2008; and Moller and Liang, PeerJ 5: e3788, 2017; the CRISPR database at: crispr.i2bc.paris-saclayfr/crispr/BLAST/CRISPRsBlast.php; and MetaCRAST available at: github.com/molleraj/MetaCRAST). All incorporated herein by reference.
In some embodiments, the crRNA includes a direct repeat (DR) sequence and a spacer sequence. In certain embodiments, the crRNA comprises, consists essentially of, or consists of a direct repeat sequence linked to a guide sequence or spacer sequence, preferably at the 3′-end of the spacer sequence.
In general, an engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity forms a complex with the mature crRNA, which spacer sequence directs the complex to a sequence-specific binding with the target RNA that is complementary to the spacer sequence, and/or hybridizes to the spacer sequence. The resulting complex comprises the engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity and the mature crRNA bound to the target RNA.
The direct repeat sequences for the Cas13 systems are generally well conserved, especially at the ends, with, for example, a GCTG for Cas13e and GCTGT for Cas13f at the 5′-end, reverse complementary to a CAGC for Cas13e and ACAGC for Cas13f at the 3′ end. This conservation suggests strong base pairing for an RNA stem-loop structure that potentially interacts with the protein(s) in the locus.
In some embodiments, the direct repeat sequence, when in RNA, comprises the general secondary structure of 5′-S1a-Ba-S2a-L-S2b-Bb-S1b-3′, wherein segments S1a and S1b are reverse complement sequences and form a first stem (S1) having 4 nucleotides in Cas13e and 5 nucleotides in Cas13f; segments Ba and Bb do not base pair with each other and form a symmetrical or nearly symmetrical bulge (B), and have 5 nucleotides each in Cas13e, and 5 (Ba) and 4 (Bb) or 6 (Ba) and 5 (Bb) nucleotides respectively in Cas13f; segments S2a and S2b are reverse complement sequences and form a second stem (S2) having 5 base pairs in Cas13e and either 6 or 5 base pairs in Cas13f; and L is an 8-nucleotide loop in Cas13e and a 5-nucleotide loop in Cas13f.
In certain embodiments, S1a has a sequence of GCUG in Cas13e and GCUGU in Cas13f.
In certain embodiments, S2a has a sequence of GCCCC in Cas13e and A/G CCUC G/A in Cas13f (wherein the first A or G may be absent).
In some embodiments, the direct repeat sequence comprises or consists of a nucleic acid sequence of SEQ ID NOs: 57-63.
As used herein, “direct repeat sequence” may refer to the DNA coding sequence in the CRISPR locus, or to the RNA encoded by the same in crRNA. Thus when any of SEQ ID NOs: 57-63 is referred to in the context of an RNA molecule, such as crRNA, each T is understood to represent a U.
In some embodiments, the direct repeat sequence comprises or consists of a nucleic acid sequence having up to 1, 2, 3, 4, 5, 6, 7, or 8 nucleotides of deletion, insertion, or substitution of SEQ ID NOs: 57-63. In some embodiments, the direct repeat sequence comprises or consists of a nucleic acid sequence having at least 80%, 85%, 90%, 95%, or 97% of sequence identity with SEQ ID NOs: 57-63 (e.g., due to deletion, insertion, or substitution of nucleotides in SEQ ID NOs: 57-63). In some embodiments, the direct repeat sequence comprises or consists of a nucleic acid sequence that is not identical to any one of SEQ ID NOs: 57-63, but can hybridize with a complement of any one of SEQ ID NOs: 57-63 under stringent hybridization conditions, or can bind to a complement of any one of SEQ ID NOs: 57-63 under physiological conditions.
In certain embodiments, the deletion, insertion, or substitution does not change the overall secondary structure of that of SEQ ID NOs: 57-63 (e.g., the relative locations and/or sizes of the stems and bulges and loop do not significantly deviate from that of the original stems, bulges, and loop). For example, the deletion, insert, or substitution may be in the bulge or loop region so that the overall symmetry of the bulge remains largely the same. The deletion, insertion, or substitution may be in the stems so that the length of the stems do not significantly deviate from that of the original stems (e.g., adding or deleting one base pair in each of the two stems correspond to 4 total base changes).
In certain embodiments, the deletion, insertion, or substitution results in a derivative DR sequence that may have ±1 or 2 base pair(s) in one or both stems, have ±1, 2, or 3 bases in either or both of the single strands in the bulge, and/or have ±1, 2, 3, or 4 bases in the loop region.
In certain embodiments, any of the above direct repeat sequences that is different from any one of SEQ ID NOs: 57-63 retains the ability to function as a direct repeat sequence in the Cas13e or Cas13f proteins, as the DR sequence of SEQ ID NOs: 57-63.
In some embodiments, the direct repeat sequence comprises or consists of a nucleic acid having a nucleic acid sequence of any one of SEQ ID NOs: 57-63, with a truncation of the initial three, four, five, six, seven, or eight 3′ nucleotides.
In classic CRISPR systems, the degree of complementarity between a guide sequence (e.g., a crRNA) and its corresponding target sequence can be about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or 100%. In some embodiments, the degree of complementarity is 90-100%.
The guide RNAs can be about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, 100, 125, 150, 175, 200 or more nucleotides in length. For example, for use in a functional engineered Cas13e or Cas13f effector protein, or homologs, orthologs, derivatives, fusions, conjugates, or functional fragment thereof, the spacer can be between 10-60 nucleotides, 20-50 nucleotides, 25-45 nucleotides, 25-35 nucleotides, or about 27, 28, 29, 30, 31, 32, or 33 nucleotides. For use in dCas version of any of the above, however, the spacer can be between 10-200 nucleotides, 20-150 nucleotides, 25-100 nucleotides, 25-85 nucleotides, 35-75 nucleotides, 45-60 nucleotides, or about 46, 47, 48, 49, 50, 51, 52, 53, 54, or 55 nucleotides.
To reduce off-target interactions, e.g., to reduce the guide interacting with a target sequence having low complementarity, mutations can be introduced to the CRISPR systems so that the CRISPR systems can distinguish between target and off-target sequences that have greater than 80%, 85%, 90%, or 95% complementarity. In some embodiments, the degree of complementarity is from 80% to 95%, e.g., about 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% (for example, distinguishing between a target having 18 nucleotides from an off-target of 18 nucleotides having 1, 2, or 3 mismatches). Accordingly, in some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence is greater than 94.5%, 95%, 95.5%, 96%, 96.5%, 97%, 97.5%, 98%, 98.5%, 99%, 99.5%, or 99.9%. In some embodiments, the degree of complementarity is 100%.
It is known in the field that complete complementarity is not required, provided there is sufficient complementarity to be functional. Modulations of cleavage efficiency can be exploited by introduction of mismatches, e.g., one or more mismatches, such as 1 or 2 mismatches between spacer sequence and target sequence, including the position of the mismatch along the spacer/target. The more central (i.e., not at the 3′ or 5′-ends) a mismatch, e.g., a double mismatch, is located; the more cleavage efficiency is affected. Accordingly, by choosing mismatch positions along the spacer sequence, cleavage efficiency can be modulated. For example, if less than 100% cleavage of targets is desired (e.g., in a cell population), 1 or 2 mismatches between spacer and target sequence can be introduced in the spacer sequences.
Type VI CRISPR-Cas effectors have been demonstrated to employ more than one RNA guide, thus enabling the ability of these effectors, and systems and complexes that include them, to target multiple nucleic acids. In some embodiments, the CRISPR systems comprising the engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity, as described herein, include multiple RNA guides (e.g., two, three, four, five, six, seven, eight, nine, ten, fifteen, twenty, thirty, forty, or more) RNA guides. In some embodiments, the CRISPR systems described herein include a single RNA strand or a nucleic acid encoding a single RNA strand, wherein the RNA guides are arranged in tandem. The single RNA strand can include multiple copies of the same RNA guide, multiple copies of distinct RNA guides, or combinations thereof. The processing capability of the Type VI-E and VI-F CRISPR-Cas effector proteins described herein enables these effectors to be able to target multiple target nucleic acids (e.g., target RNAs) without a loss of activity. In some embodiments, the Type VI-E and VI-F CRISPR-Cas effector proteins may be delivered in complex with multiple RNA guides directed to different target RNA. In some embodiments, the engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity may be co-delivered with multiple RNA guides, each specific for a different target nucleic acid. Methods of multiplexing using CRISPR-associated proteins are described, for example, in U.S. Pat. No. 9,790,490 B2, and EP3009511 B1, the entire contents of each of which are expressly incorporated herein by reference.
The spacer length of crRNAs can range from about 10-50 nucleotides, such as 15-50 nucleotides, 20-50 nucleotides, 25-50 nucleotide, or 19-50 nucleotides. In some embodiments, the spacer length of a guide RNA is at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, or at least 22 nucleotides. In some embodiments, the spacer length is from 15 to 17 nucleotides (e.g., 15, 16, or 17 nucleotides), from 17 to 20 nucleotides (e.g., 17, 18, 19, or 20 nucleotides), from 20 to 24 nucleotides (e.g., 20, 21, 22, 23, or 24 nucleotides), from 23 to 25 nucleotides (e.g., 23, 24, or 25 nucleotides), from 24 to 27 nucleotides, from 27 to 30 nucleotides, from 30 to 45 nucleotides (e.g., 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, or 45 nucleotides), from 30 or 35 to 40 nucleotides, from 41 to 45 nucleotides, from 45 to 50 nucleotides (e.g., 45, 46, 47, 48, 49, or 50 nucleotides), or longer. In some embodiments, the spacer length is from about 15 to about 42 nucleotides.
In some embodiments, the direct repeat length of the guide RNA is 15-36 nucleotides, is at least 16 nucleotides, is from 16 to 20 nucleotides (e.g., 16, 17, 18, 19, or 20 nucleotides), is from 20-30 nucleotides (e.g., 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides), is from 30-40 nucleotides (e.g., 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 nucleotides), or is about 36 nucleotides (e.g., 33, 34, 35, 36, 37, 38, or 39 nucleotides). In some embodiments, the direct repeat length of the guide RNA is 36 nucleotides.
In some embodiments, the overall length of the crRNA/guide RNA is about 36 nucleotides longer than any one of the spacer sequence length described herein above. For example, the overall length of the crRNA/guide RNA may be between 45-86 nucleotides, or 60-86 nucleotides, 62-86 nucleotides, or 63-86 nucleotides.
The crRNA sequences can be modified in a manner that allows for formation of a complex between the crRNA and the engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity, and successful binding to the target, while at the same time not allowing for successful nuclease activity (i.e., without nuclease activity/without causing indels). These modified guide sequences are referred to as “dead crRNAs,” “dead guides,” or “dead guide sequences.” These dead guides or dead guide sequences may be catalytically inactive or conformationally inactive with regard to nuclease activity. Dead guide sequences are typically shorter than respective guide sequences that result in active RNA cleavage. In some embodiments, dead guides are 5%, 10%, 20%, 30%, 40%, or 50%, shorter than respective guide RNAs that have nuclease activity. Dead guide sequences of guide RNAs can be from 13 to 15 nucleotides in length (e.g., 13, 14, or 15 nucleotides in length), from 15 to 19 nucleotides in length, or from 17 to 18 nucleotides in length (e.g., 17 nucleotides in length).
Thus, in one aspect, the disclosure provides non-naturally occurring or engineered CRISPR systems including a functional engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity as described herein, and a crRNA, wherein the crRNA comprises a dead crRNA sequence whereby the crRNA is capable of hybridizing to a target sequence such that the CRISPR system is directed to a target RNA of interest in a cell without detectable nuclease activity (e.g., RNase activity).
A detailed description of dead guides is described, e.g., in International Publication No. WO 2016/094872, which is incorporated herein by reference in its entirety.
Guide RNAs (e.g., crRNAs) can be generated as components of inducible systems. The inducible nature of the systems allows for spatio-temporal control of gene editing or gene expression. In some embodiments, the stimuli for the inducible systems include, e.g., electromagnetic radiation, sound energy, chemical energy, and/or thermal energy.
In some embodiments, the transcription of guide RNA (e.g., crRNA) can be modulated by inducible promoters, e.g., tetracycline or doxycycline controlled transcriptional activation (Tet-On and Tet-Off expression systems), hormone inducible gene expression systems (e.g., ecdysone inducible gene expression systems), and arabinose-inducible gene expression systems. Other examples of inducible systems include, e.g., small molecule two-hybrid transcription activations systems (FKBP, ABA, etc.), light inducible systems (Phytochrome, LOV domains, or cryptochrome), or Light Inducible Transcriptional Effector (LITE). These inducible systems are described, e.g., in WO 2016205764 and U.S. Pat. No. 8,795,965, both of which are incorporated herein by reference in the entirety.
Chemical modifications can be applied to the crRNA's phosphate backbone, sugar, and/or base. Backbone modifications such as phosphorothioates modify the charge on the phosphate backbone and aid in the delivery and nuclease resistance of the oligonucleotide (see, e.g., Eckstein, “Phosphorothioates, essential components of therapeutic oligonucleotides,” Nucl. Acid Ther., 24, pp. 374-387, 2014); modifications of sugars, such as 2′-O-methyl (2′-OMe), 2′-F, and locked nucleic acid (LNA), enhance both base pairing and nuclease resistance (see, e.g., Allerson et al. “Fully 2′-modified oligonucleotide duplexes with improved in vitro potency and stability compared to unmodified small interfering RNA,” J. Med. Chem. 48.4: 901-904, 2005). Chemically modified bases such as 2-thiouridine or N6-methyladenosine, among others, can allow for either stronger or weaker base pairing (see, e.g., Bramsen et al., “Development of therapeutic-grade small interfering RNAs by chemical engineering,” Front. Genet., 2012 Aug. 20; 3:154). Additionally, RNA is amenable to both 5′ and 3′ end conjugations with a variety of functional moieties including fluorescent dyes, polyethylene glycol, or proteins.
A wide variety of modifications can be applied to chemically synthesized crRNA molecules. For example, modifying an oligonucleotide with a 2′-OMe to improve nuclease resistance can change the binding energy of Watson-Crick base pairing. Furthermore, a 2′-OMe modification can affect how the oligonucleotide interacts with transfection reagents, proteins or any other molecules in the cell. The effects of these modifications can be determined by empirical testing.
In some embodiments, the crRNA includes one or more phosphorothioate modifications. In some embodiments, the crRNA includes one or more locked nucleic acids for the purpose of enhancing base pairing and/or increasing nuclease resistance.
A summary of these chemical modifications can be found, e.g., in Kelley et al., “Versatility of chemically synthesized guide RNAs for CRISPR-Cas9 genome editing,” J. Biotechnol. 233:74-83, 2016; WO 2016205764; and U.S. Pat. No. 8,795,965 B2; each which is incorporated by reference in its entirety.
The sequences and the lengths of the RNA guides (e.g., crRNAs) described herein can be optimized. In some embodiments, the optimized length of an RNA guide can be determined by identifying the processed form of crRNA (i.e., a mature crRNA), or by empirical length studies for crRNA tetraloops.
The crRNAs can also include one or more aptamer sequences. Aptamers are oligonucleotide or peptide molecules have a specific three-dimensional structure and can bind to a specific target molecule. The aptamers can be specific to gene effectors, gene activators, or gene repressors. In some embodiments, the aptamers can be specific to a protein, which in turn is specific to and recruits and/or binds to specific gene effectors, gene activators, or gene repressors. The effectors, activators, or repressors can be present in the form of fusion proteins. In some embodiments, the guide RNA has two or more aptamer sequences that are specific to the same adaptor proteins. In some embodiments, the two or more aptamer sequences are specific to different adaptor proteins. The adaptor proteins can include, e.g., MS2, PP7, Qβ, F2, GA, fr, JP501, M12, R17, BZ13, JP34, JP500, KU1, M11, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, ϕkCb5, ϕkCb8r, ϕkCb12r, ϕkCb23r, 7s, and PRR1. Accordingly, in some embodiments, the aptamer is selected from binding proteins specifically binding any one of the adaptor proteins as described herein. In some embodiments, the aptamer sequence is a MS2 binding loop (SEQ ID NO: 95). In some embodiments, the aptamer sequence is a QBeta binding loop (SEQ ID NO: 96). In some embodiments, the aptamer sequence is a PP7 binding loop (SEQ ID NO: 97). A detailed description of aptamers can be found, e.g., in Nowak et al., “Guide RNA engineering for versatile Cas9 functionality,” Nucl. Acid. Res., 44(20):9555-9564, 2016; and WO 2016205764, which are incorporated herein by reference in their entirety.
In certain embodiments, the methods make use of chemically modified guide RNAs. Examples of guide RNA chemical modifications include, without limitation, incorporation of 2′-O-methyl (M), 2′-O-methyl 3′-phosphorothioate (MS), or 2′-O-methyl 3′-thioPACE (MSP) at one or more terminal nucleotides. Such chemically modified guide RNAs can comprise increased stability and increased activity as compared to unmodified guide RNAs, though on-target vs. off-target specificity is not predictable. See, Hendel, Nat Biotechnol. 33(9):985-9, 2015, incorporated by reference). Chemically modified guide RNAs may further include, without limitation, RNAs with phosphorothioate linkages and locked nucleic acid (LNA) nucleotides comprising a methylene bridge between the 2′ and 4′ carbons of the ribose ring.
The invention also encompasses methods for delivering multiple nucleic acid components, wherein each nucleic acid component is specific for a different target locus of interest thereby modifying multiple target loci of interest. The nucleic acid component of the complex may comprise one or more protein-binding RNA aptamers. The one or more aptamers may be capable of binding a bacteriophage coat protein. The bacteriophage coat protein may be selected from the group comprising Qβ, F2, GA, fr, JP501, MS2, M12, R17, BZ13, JP34, JP500, KU1, M11, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, ϕCb5, ϕCb8r, ϕCb12r, ϕCb23r, 7s and PRR1. In certain embodiments, the bacteriophage coat protein is MS2.
The target RNA can be any RNA molecule of interest, including naturally-occurring and engineered RNA molecules. The target RNA can be an mRNA, a tRNA, a ribosomal RNA (rRNA), a microRNA (miRNA), an interfering RNA (siRNA), a ribozyme, a riboswitch, a satellite RNA, a microswitch, a microzyme, or a viral RNA.
In some embodiments, the target nucleic acid is associated with a condition or disease (e.g., an infectious disease or a cancer).
Thus, in some embodiments, the systems described herein can be used to treat a condition or disease by targeting these nucleic acids. For instance, the target nucleic acid associated with a condition or disease may be an RNA molecule that is overexpressed in a diseased cell (e.g., a cancer or tumor cell). The target nucleic acid may also be a toxic RNA and/or a mutated RNA (e.g., an mRNA molecule having a splicing defect or a mutation). The target nucleic acid may also be an RNA that is specific for a particular microorganism (e.g., a pathogenic bacteria).
One aspect of the invention provides a complex of an engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity, such as CRISPR/Cas13e or CRISPR/Cas13f complex, comprising (1) any of the engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity (e.g., engineered Cas13e/Cas13f effector proteins, homologs, orthologs, fusions, derivative, conjugates, or functional fragments thereof as described herein), and (2) any of the guide RNA described herein, each including a spacer sequence designed to be at least partially complementary to a target RNA, and a DR sequence compatible with the engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity (e.g., Cas13d, Cas13e/Cas13f effector proteins), homologs, orthologs, fusions, derivatives, conjugates, or functional fragments thereof.
In certain embodiments, the complex further comprises the target RNA bound by the guide RNA.
In a related aspect, the invention also provides a cell comprising any of the complex of the invention. In certain embodiments, the cell is a prokaryote. In certain embodiments, the cell is a eukaryote.
The CRISPR/Cas systems having the engineered Cas13, e.g., an engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity, as described herein, have a wide variety of utilities like the corresponding wild-type Cas13-based systems, including modifying (e.g., deleting, inserting, translocating, inactivating, or activating) a target polynucleotide or nucleic acid in a multiplicity of cell types. The CRISPR systems have a broad spectrum of applications in, e.g., tracking and labeling of nucleic acids, enrichment assays (extracting desired sequence from background), controlling interfering RNA or miRNA, detecting circulating tumor DNA, preparing next generation library, drug screening, disease diagnosis and prognosis, and treating various genetic disorders.
Certain engineered Cas13 effector enzymes, as described herein, have enhanced collateral effect compared to the wild-type, and thus may be better alternatives than the wild-type Cas13 effector enzymes for utilities that take advantage of the enhanced collateral activity, such as DNA/RNA detection (e.g., specific high sensitivity enzymatic reporter unlocking (SHERLOCK)). Such engineered Cas13 effector enzymes with enhanced collateral activity is within the scope of one aspect of the invention.
In one aspect, the CRISPR systems described herein can be used in RNA detection. As shown in the examples, wild-type Cas13 such as Cas13e of the invention exhibit non-specific/collateral RNase activity upon activation of its guide RNA-dependent specific RNase activity when the spacer sequence is about 30 nucleotides. Thus the engineered CRISPR-associated proteins of the invention with enhanced collateral activity (compared to the wild-type) can be reprogrammed with CRISPR RNAs (crRNAs) to provide a platform for specific RNA sensing. Further, by choosing specific spacer sequence length, and upon recognition of its RNA target, activated CRISPR-associated proteins engage in enhanced collateral cleavage of nearby non-targeted RNAs. This crRNA-programmed collateral cleavage activity allows the CRISPR systems to detect the presence of a specific RNA by triggering programmed cell death or by nonspecific degradation of labeled RNA.
The SHERLOCK method (Specific High Sensitivity Enzymatic Reporter UnLOCKing) provides an in vitro nucleic acid detection platform with attomolar sensitivity based on nucleic acid amplification and collateral cleavage of a reporter RNA, allowing for real-time detection of the target. To achieve signal detection, the detection can be combined with different isothermal amplification steps. For example, recombinase polymerase amplification (RPA) can be coupled with T7 transcription to convert amplified DNA to RNA for subsequent detection. The combination of amplification by RPA, T7 RNA polymerase transcription of amplified DNA to RNA, and detection of target RNA by collateral RNA cleavage-mediated release of reporter signal is referred as SHERLOCK. Methods of using CRISPR in SHERLOCK are described in detail, e.g., in Gootenberg, et al. “Nucleic acid detection with CRISPR-Cas13a/C2c2,” Science, 2017 Apr. 28; 356(6336):438-442, which is incorporated herein by reference in its entirety.
The invention described herein provides mutant/variant Class 2, Type VI CRISPR/Cas effector enzymes, especially Type VI-D, -E, and -F Cas mutants/variants having enhanced collateral effect, such that they can be more effective in nucleic acid detection assays based on the collateral effect, such as the SHERLOCK assay. Such mutants include any one described in Examples 1, 2, 4, and 5, as well as
In certain embodiments, such Cas13 mutants have enhanced collateral effect comprises, consists essentially of, or consists of a mutation corresponding to the N2-Y142A, N4-Y193A, N12-Y604A, or N21V7 mutation of Cas13d, or to the M14V2, M16V3, M18V1, M19-G712A, M19-T725A, or M19-C727A mutation of Cas13e.
The CRISPR-associated proteins can be used in Northern blot assays, which use electrophoresis to separate RNA samples by size. The CRISPR-associated proteins can be used to specifically bind and detect the target RNA sequence. The CRISPR-associated proteins can also be fused to a fluorescent protein (e.g., GFP) and used to track RNA localization in living cells. More particularly, the CRISPR-associated proteins can be inactivated in that they no longer cleave RNAs as described above. Thus, CRISPR-associated proteins can be used to determine the localization of the RNA or specific splice variants, the level of mRNA transcripts, up- or down-regulation of transcripts and disease-specific diagnosis. The CRISPR-associated proteins can be used for visualization of RNA in (living) cells using, for example, fluorescent microscopy or flow cytometry, such as fluorescence-activated cell sorting (FACS), which allows for high-throughput screening of cells and recovery of living cells following cell sorting. A detailed description regarding how to detect DNA and RNA can be found, e.g., in International Publication No. WO 2017/070605, which is incorporated herein by reference in its entirety.
In some embodiments, the CRISPR systems described herein can be used in multiplexed error-robust fluorescence in situ hybridization (MERFISH). These methods are described in, e.g., Chen et al., “Spatially resolved, highly multiplexed RNA profiling in single cells,” Science, 2015 Apr. 24; 348(6233):aaa6090, which is incorporated herein by reference herein in its entirety.
In some embodiments, the CRISPR systems described herein can be used to detect a target RNA in a sample (e.g., a clinical sample, a cell, or a cell lysate). The collateral RNase activity of the engineered Cas13, e.g., Type VI-E and/or VI-F CRISPR-Cas effector proteins described herein, is activated when the effector proteins bind to a target nucleic acid when the spacer sequence is of a specific chosen length (such as about 30 nucleotides). Upon binding to the target RNA of interest, the effector protein cleaves a labeled detector RNA to generate a signal (e.g., an increased signal or a decreased signal) thereby allowing for the qualitative and quantitative detection of the target RNA in the sample. The specific detection and quantification of RNA in the sample allows for a multitude of applications including diagnostics. In some embodiments, the methods include contacting a sample with: i) an RNA guide (e.g., crRNA) and/or a nucleic acid encoding the RNA guide, wherein the RNA guide consists of a direct repeat sequence and a spacer sequence capable of hybridizing to the target RNA; (ii) an engineered Class 2 type VI Cas13 protein with enhanced collateral activity compared to wild-type Cas13, such as a subject engineered Type VI-E or VI-F CRISPR-Cas effector protein (Cas13e or Cas13f) and/or a nucleic acid encoding the effector protein; and (iii) a labeled detector RNA; wherein the effector protein associates with the RNA guide to form a complex; wherein the RNA guide hybridizes to the target RNA; and wherein upon binding of the complex to the target RNA, the effector protein exhibits collateral RNase activity and cleaves the labeled detector RNA; and b) measuring a detectable signal produced by cleavage of the labeled detector RNA, wherein said measuring provides for detection of the single-stranded target RNA in the sample. In some embodiments, the methods further comprise comparing the detectable signal with a reference signal and determining the amount of target RNA in the sample.
In some embodiments, the measuring is performed using gold nanoparticle detection, fluorescence polarization, colloid phase transition/dispersion, electrochemical detection, and semiconductor based-sensing. In some embodiments, the labeled detector RNA includes a fluorescence-emitting dye pair, a fluorescence resonance energy transfer (FRET) pair, or a quencher/fluor pair. In some embodiments, upon cleavage of the labeled detector RNA by the effector protein, an amount of detectable signal produced by the labeled detector RNA is decreased or increased. In some embodiments, the labeled detector RNA produces a first detectable signal prior to cleavage by the effector protein and a second detectable signal after cleavage by the effector protein. In some embodiments, a detectable signal is produced when the labeled detector RNA is cleaved by the effector protein. In some embodiments, the labeled detector RNA comprises a modified nucleobase, a modified sugar moiety, a modified nucleic acid linkage, or a combination thereof. In some embodiments, the methods include the multi-channel detection of multiple independent target RNAs in a sample (e.g., two, three, four, five, six, seven, eight, nine, ten, fifteen, twenty, thirty, forty, or more target RNAs) by using multiple engineered Cas13, such as the engineered Type VI-E and/or VI-F CRISPR-Cas (Cas13e and/or Cas130 systems of the invention, each including a distinct orthologous effector protein and corresponding RNA guides, allowing for the differentiation of multiple target RNAs in the sample. In some embodiments, the methods include the multi-channel detection of multiple independent target RNAs in a sample, with the use of multiple instances of engineered Cas13, such as engineered Type VI-E and/or VI-F CRISPR-Cas systems of the invention, each containing an orthologous effector protein with differentiable collateral RNase substrates. Methods of detecting an RNA in a sample using CRISPR-associated proteins are described, for example, in U.S. Patent Publication No. 2017/0362644, the entire contents of which are incorporated herein by reference.
Cellular processes depend on a network of molecular interactions among proteins, RNAs, and DNAs. Accurate detection of protein-DNA and protein-RNA interactions is key to understanding such processes. In vitro proximity labeling techniques employ an affinity tag combined with, a reporter group, e.g., a photoactivatable group, to label polypeptides and RNAs in the vicinity of a protein or RNA of interest in vitro. After UV irradiation, the photoactivatable groups react with proteins and other molecules that are in close proximity to the tagged molecules, thereby labelling them. Labelled interacting molecules can subsequently be recovered and identified. The CRISPR-associated proteins can for instance be used to target probes to selected RNA sequences. These applications can also be applied in animal models for in vivo imaging of diseases or difficult-to culture cell types. The methods of tracking and labeling of nucleic acids are described, e.g., in U.S. Pat. No. 8,795,965, WO 2016205764, and WO 2017070605; each of which is incorporated herein by reference herein in its entirety.
RNA Isolation, Purification, Enrichment, and/or Depletion
The CRISPR systems (e.g., CRISPR-associated proteins) described herein can be used to isolate and/or purify the RNA. The CRISPR-associated proteins can be fused to an affinity tag that can be used to isolate and/or purify the RNA-CRISPR-associated protein complex. These applications are useful, e.g., for the analysis of gene expression profiles in cells.
In some embodiments, the CRISPR-associated proteins can be used to target a specific noncoding RNA (ncRNA) thereby blocking its activity. In some embodiments, the CRISPR-associated proteins can be used to specifically enrich a particular RNA (including but not limited to increasing stability, etc.), or alternatively, to specifically deplete a particular RNA (e.g., particular splice variants, isoforms, etc.).
These methods are described, e.g., in U.S. Pat. No. 8,795,965, WO 2016205764, and WO 2017070605; each of which is incorporated herein by reference herein in its entirety.
The CRISPR systems described herein can be used for preparing next generation sequencing (NGS) libraries. For example, to create a cost-effective NGS library, the CRISPR systems can be used to disrupt the coding sequence of a target gene product, and the CRISPR-associated protein transfected clones can be screened simultaneously by next-generation sequencing (e.g., on the Ion Torrent PGM system). A detailed description regarding how to prepare NGS libraries can be found, e.g., in Bell et al., “A high-throughput screening strategy for detecting CRISPR-Cas9 induced mutations using next-generation sequencing,” BMC Genomics, 15.1 (2014): 1002, which is incorporated herein by reference in its entirety.
Microorganisms (e.g., E. coli, yeast, and microalgae) are widely used for synthetic biology. The development of synthetic biology has a wide utility, including various clinical applications. For example, the programmable CRISPR systems can be used to split proteins of toxic domains for targeted cell death, e.g., using cancer-linked RNA as target transcript. Further, pathways involving protein-protein interactions can be influenced in synthetic biological systems with, e.g., fusion complexes with the appropriate effectors such as kinases or enzymes.
In some embodiments, crRNAs that target phage sequences can be introduced into the microorganism. Thus, the disclosure also provides methods of vaccinating a microorganism (e.g., a production strain) against phage infection.
In some embodiments, the CRISPR systems provided herein can be used to engineer microorganisms, e.g., to improve yield or improve fermentation efficiency. For example, the CRISPR systems described herein can be used to engineer microorganisms, such as yeast, to generate biofuel or biopolymers from fermentable sugars, or to degrade plant-derived lignocellulose derived from agricultural waste as a source of fermentable sugars. More particularly, the methods described herein can be used to modify the expression of endogenous genes required for biofuel production and/or to modify endogenous genes, which may interfere with the biofuel synthesis. These methods of engineering microorganisms are described e.g., in Verwaal et al., “CRISPR/Cpf1 enables fast and simple genome editing of Saccharomyces cerevisiae,” Yeast doi: 10.1002/yea.3278, 2017; and Hlavova et al., “Improving microalgae for biotechnology-from genetics to synthetic biology,” Biotechnol. Adv., 33:1194-203, 2015, both of which are incorporated herein by reference in the entirety.
In some embodiments, the CRISPR systems provided herein can be used to induce death or dormancy of a cell (e.g., a microorganism such as an engineered microorganism). These methods can be used to induce dormancy or death of a multitude of cell types including prokaryotic and eukaryotic cells, including, but not limited to mammalian cells (e.g., cancer cells, or tissue culture cells), protozoans, fungal cells, cells infected with a virus, cells infected with an intracellular bacteria, cells infected with an intracellular protozoan, cells infected with a prion, bacteria (e.g., pathogenic and non-pathogenic bacteria), protozoans, and unicellular and multicellular parasites. For instance, in the field of synthetic biology it is highly desirable to have mechanisms of controlling engineered microorganisms (e.g., bacteria) in order to prevent their propagation or dissemination. The systems described herein can be used as “kill-switches” to regulate and/or prevent the propagation or dissemination of an engineered microorganism. Further, there is a need in the art for alternatives to current antibiotic treatments. The systems described herein can also be used in applications where it is desirable to kill or control a specific microbial population (e.g., a bacterial population). For example, the systems described herein may include an RNA guide (e.g., a crRNA) that targets a nucleic acid (e.g., an RNA) that is genus-, species-, or strain-specific, and can be delivered to the cell. Upon complexing and binding to the target nucleic acid, the collateral RNase activity of the Type VI-E and/or VI-F CRISPR-Cas effector proteins is activated leading to the cleavage of non-target RNA within the microorganisms, ultimately resulting in dormancy or death. In some embodiments, the methods comprise contacting the cell with a system described herein including a Type VI-E and/or VI-F CRISPR-Cas effector proteins or a nucleic acid encoding the effector protein, and a RNA guide (e.g., a crRNA) or a nucleic acid encoding the RNA guide, wherein the spacer sequence is complementary to at least 15 nucleotides (e.g., 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50 or more nucleotides) of a target nucleic acid (e.g., a genus-, strain-, or species-specific RNA guide). Without wishing to be bound by any particular theory, the cleavage of non-target RNA by the Type VI-E and/or VI-F CRISPR-Cas effector proteins may induce programmed cell death, cell toxicity, apoptosis, necrosis, necroptosis, cell death, cell cycle arrest, cell anergy, a reduction of cell growth, or a reduction in cell proliferation. For example, in bacteria, the cleavage of non-target RNA by the Type VI-E and/or VI-F CRISPR-Cas effector proteins may be bacteriostatic or bactericidal.
The CRISPR systems described herein have a wide variety of utility in plants. In some embodiments, the CRISPR systems can be used to engineer transcriptome of plants (e.g., improving production, making products with desired post-translational modifications, or introducing genes for producing industrial products). In some embodiments, the CRISPR systems can be used to introduce a desired trait to a plant (e.g., without heritable modifications to the genome), or regulate expression of endogenous genes in plant cells or whole plants.
In some embodiments, the CRISPR systems can be used to identify, edit, and/or silence genes encoding specific proteins, e.g., allergenic proteins (e.g., allergenic proteins in peanuts, soybeans, lentils, peas, green beans, and mung beans). A detailed description regarding how to identify, edit, and/or silence genes encoding proteins is described, e.g., in Nicolaou et al., “Molecular diagnosis of peanut and legume allergy,” Curr. Opin. Allergy Clin. Immunol. 11(3):222-8, 2011, and WO 2016205764 A1; both of which are incorporated herein by reference in the entirety.
As described herein, pooled CRISPR screening is a powerful tool for identifying genes involved in biological mechanisms such as cell proliferation, drug resistance, and viral infection. Cells are transduced in bulk with a library of guide RNA (gRNA)-encoding vectors described herein, and the distribution of gRNAs is measured before and after applying a selective challenge. Pooled CRISPR screens work well for mechanisms that affect cell survival and proliferation, and they can be extended to measure the activity of individual genes (e.g., by using engineered reporter cell lines). Arrayed CRISPR screens, in which only one gene is targeted at a time, make it possible to use RNA-seq as the readout. In some embodiments, the CRISPR systems as described herein can be used in single-cell CRISPR screens. A detailed description regarding pooled CRISPR screenings can be found, e.g., in Datlinger et al., “Pooled CRISPR screening with single-cell transcriptome read-out,” Nat. Methods. 14(3):297-301, 2017, which is incorporated herein by reference in its entirety.
The CRISPR systems described herein can be used for in situ saturating mutagenesis. In some embodiments, a pooled guide RNA library can be used to perform in situ saturating mutagenesis for particular genes or regulatory elements. Such methods can reveal critical minimal features and discrete vulnerabilities of these genes or regulatory elements (e.g., enhancers). These methods are described, e.g., in Canver et al., “BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis,” Nature 527(7577):192-7, 2015, which is incorporated herein by reference in its entirety.
The CRISPR systems described herein can have various RNA-related applications, e.g., modulating gene expression, degrading a RNA molecule, inhibiting RNA expression, screening RNA or RNA products, determining functions of lincRNA or non-coding RNA, inducing cell dormancy, inducing cell cycle arrest, reducing cell growth and/or cell proliferation, inducing cell anergy, inducing cell apoptosis, inducing cell necrosis, inducing cell death, and/or inducing programmed cell death. A detailed description of these applications can be found, e.g., in WO 2016/205764 A1, which is incorporated herein by reference in its entirety. In different embodiments, the methods described herein can be performed in vitro, in vivo, or ex vivo.
For example, the CRISPR systems described herein can be administered to a subject having a disease or disorder to target and induce cell death in a cell in a diseased state (e.g., cancer cells or cells infected with an infectious agent). For instance, in some embodiments, the CRISPR systems described herein can be used to target and induce cell death in a cancer cell, wherein the cancer cell is from a subject having a Wilms' tumor, Ewing sarcoma, a neuroendocrine tumor, a glioblastoma, a neuroblastoma, a melanoma, skin cancer, breast cancer, colon cancer, rectal cancer, prostate cancer, liver cancer, renal cancer, pancreatic cancer, lung cancer, biliary cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, medullary thyroid carcinoma, ovarian cancer, glioma, lymphoma, leukemia, myeloma, acute lymphoblastic leukemia, acute myelogenous leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, Hodgkin's lymphoma, non-Hodgkin's lymphoma, or urinary bladder cancer.
The CRISPR systems described herein can be used to modulate gene expression. The CRISPR systems can be used, together with suitable guide RNAs, to target gene expression, via control of RNA processing. The control of RNA processing can include, e.g., RNA processing reactions such as RNA splicing (e.g., alternative splicing), viral replication, and tRNA biosynthesis. The RNA targeting proteins in combination with suitable guide RNAs can also be used to control RNA activation (RNAa). RNA activation is a small RNA-guided and Argonaute (Ago)-dependent gene regulation phenomenon in which promoter-targeted short double-stranded RNAs (dsRNAs) induce target gene expression at the transcriptional/epigenetic level. RNAa leads to the promotion of gene expression, so control of gene expression may be achieved that way through disruption or reduction of RNAa. In some embodiments, the methods include the use of the RNA targeting CRISPR as substitutes for e.g., interfering ribonucleic acids (such as siRNAs, shRNAs, or dsRNAs). The methods of modulating gene expression are described, e.g., in WO 2016205764, which is incorporated herein by reference in its entirety.
Control over interfering RNAs or microRNAs (miRNA) can help reduce off-target effects by reducing the longevity of the interfering RNAs or miRNAs in vivo or in vitro. In some embodiments, the target RNAs can include interfering RNAs, i.e., RNAs involved in the RNA interference pathway, such as small hairpin RNAs (shRNAs), small interfering (siRNAs), etc. In some embodiments, the target RNAs include, e.g., miRNAs or double stranded RNAs (dsRNA).
In some embodiments, if the RNA targeting protein and suitable guide RNAs are selectively expressed (for example spatially or temporally under the control of a regulated promoter, for example a tissue- or cell cycle-specific promoter and/or enhancer), this can be used to protect the cells or systems (in vivo or in vitro) from RNA interference (RNAi) in those cells. This may be useful in neighboring tissues or cells where RNAi is not required or for the purposes of comparison of the cells or tissues where the CRISPR-associated proteins and suitable crRNAs are and are not expressed (i.e., where the RNAi is not controlled and where it is, respectively). The RNA targeting proteins can be used to control or bind to molecules comprising or consisting of RNAs, such as ribozymes, ribosomes, or riboswitches. In some embodiments, the guide RNAs can recruit the RNA targeting proteins to these molecules so that the RNA targeting proteins are able to bind to them. These methods are described, e.g., in WO 2016205764 and WO 2017070605, both of which are incorporated herein by reference in the entirety.
Riboswitches are regulatory segments of messenger RNAs that bind small molecules and in turn regulate gene expression. This mechanism allows the cell to sense the intracellular concentration of these small molecules. A specific riboswitch typically regulates its adjacent gene by altering the transcription, the translation or the splicing of this gene. Thus, in some embodiments, the riboswitch activity can be controlled by the use of the RNA targeting proteins in combination with suitable guide RNAs to target the riboswitches. This may be achieved through cleavage of, or binding to, the riboswitch. Methods of using CRISPR systems to control riboswitches are described, e.g., in WO 2016205764 and WO 2017070605, both of which are incorporated herein by reference in their entireties.
In some embodiments, the CRISPR-associated proteins described herein can be fused to a base-editing domain, such as ADAR1, ADAR2, APOBEC, or activation-induced cytidine deaminase (AID), and can be used to modify an RNA sequence (e.g., an mRNA). In some embodiments, the CRISPR-associated protein includes one or more mutations (e.g., in a catalytic domain), which renders the subject CRISPR-associated protein incapable of cleaving RNA (e.g., the dCas13 version of the engineered Class 2 type VI Cas13 protein described herein).
In some embodiments, such CRISPR-associated proteins can be used with an RNA-binding fusion polypeptide comprising a base-editing domain (e.g., ADAR1, ADAR2, APOBEC, or AID) fused to an RNA-binding domain, such as MS2 (also known as MS2 coat protein), Qbeta (also known as Qbeta coat protein), or PP7 (also known as PP7 coat protein). The amino acid sequences of the RNA-binding domains MS2, Qbeta, and PP7 are provided below:
MS2 (MS2 coat protein) (SEQ ID NO: 98)
Qbeta (Qbeta coat protein) (SEQ ID NO: 99)
PP7 (PP7 coat protein) (SEQ ID NO: 100)
In some embodiments, the RNA binding domain can bind to a specific sequence (e.g., an aptamer sequence) or secondary structure motifs on a crRNA of the system described herein (e.g., when the crRNA is in an effector-crRNA complex), thereby recruiting the RNA binding fusion polypeptide (which has a base-editing domain) to the effector complex. For example, in some embodiments, the CRISPR system includes a CRISPR associated protein, a crRNA having an aptamer sequence (e.g., an MS2 binding loop, a QBeta binding loop, or a PP7 binding loop), and a RNA-binding fusion polypeptide having a base-editing domain fused to an RNA-binding domain that specifically binds to the aptamer sequence. In this system, the CRISPR-associated protein forms a complex with the crRNA having the aptamer sequence. Further the RNA-binding fusion polypeptide binds to the crRNA (via the aptamer sequence) thereby forming a tripartite complex that can modify a target RNA.
Methods of using CRISPR systems for base editing are described, e.g., in International Publication No. WO 2017/219027, which is incorporated herein by reference in its entirety, and in particular with respect to its discussion of RNA modification.
In some embodiments, an inactivated or dCas13 version of the engineered Class 2 type VI Cas13 protein substantially lacking collateral activity described herein (e.g., an engineered CRISPR associated protein having one or more further mutations in a catalytic domain) can be used to target and bind to specific splicing sites on RNA transcripts. Binding of the inactivated CRISPR-associated protein to the RNA may sterically inhibit interaction of the spliceosome with the transcript, enabling alteration in the frequency of generation of specific transcript isoforms. Such method can be used to treat disease through exon skipping such that an exon having a mutation may be skipped in a mature protein. Methods of using CRISPR systems to alter splicing are described, e.g., in International Publication No. WO 2017/219027, which is incorporated herein by reference in its entirety, and in particular with respect to its discussion of RNA splicing.
The CRISPR systems described herein can have various therapeutic applications. Such applications may be based on one or more of the abilities below, both in vitro and in vivo, of the subject engineered Cas13, e.g., engineered CRISPR/Cas13e or Cas13f systems: induce cellular senescence, induce cell cycle arrest, inhibit cell growth and/or proliferation, induce apoptosis, induce necrosis, etc.
In some embodiments, the new engineered CRISPR systems can be used to treat various diseases and disorders, e.g., genetic disorders (e.g., monogenetic diseases), diseases that can be treated by nuclease activity (e.g., Pcsk9 targeting, Duchenne Muscular Dystrophy (DMD), BCL11a targeting), and various cancers, etc.
In some embodiments, the CRISPR systems described herein can be used to edit a target nucleic acid to modify the target nucleic acid (e.g., by inserting, deleting, or mutating one or more nucleic acid residues).
In one aspect, the CRISPR systems described herein can be used for treating a disease caused by overexpression of RNAs, toxic RNAs, and/or mutated RNAs (e.g., splicing defects or truncations). For example, expression of toxic RNAs may be associated with the formation of nuclear inclusions and late-onset degenerative changes in brain, heart, or skeletal muscle. In some embodiments, the disorder is myotonic dystrophy. In myotonic dystrophy, the main pathogenic effect of the toxic RNAs is to sequester binding proteins and compromise the regulation of alternative splicing (see, e.g., Osborne et al., “RNA-dominant diseases,” Hum. Mol. Genet., 2009 Apr. 15; 18(8):1471-81). Myotonic dystrophy (dystrophia myotonica (DM)) is of particular interest to geneticists because it produces an extremely wide range of clinical features. The classical form of DM, which is now called DM type 1 (DM1), is caused by an expansion of CTG repeats in the 3′-untranslated region (UTR) of DMPK, a gene encoding a cytosolic protein kinase. The CRISPR systems as described herein can target overexpressed RNA or toxic RNA, e.g., the DMPK gene or any of the mis-regulated alternative splicing in DM1 skeletal muscle, heart, or brain.
The CRISPR systems described herein can also target trans-acting mutations affecting RNA-dependent functions that cause various diseases such as, e.g., Prader Willi syndrome, Spinal muscular atrophy (SMA), and Dyskeratosis congenita. A list of diseases that can be treated using the CRISPR systems described herein is summarized in Cooper et al., “RNA and disease,” Cell, 136.4 (2009): 777-793, and WO 2016/205764 A1, both of which are incorporated herein by reference in the entirety. Those of skill in this field will understand how to use the new CRISPR systems to treat these diseases.
The CRISPR systems described herein can also be used in the treatment of various tauopathies, including, e.g., primary and secondary tauopathies, such as primary age-related tauopathy (PART)/Neurofibrillary tangle (NFT)-predominant senile dementia (with NFTs similar to those seen in Alzheimer Disease (AD), but without plaques), dementia pugilistica (chronic traumatic encephalopathy), and progressive supranuclear palsy. A useful list of tauopathies and methods of treating these diseases are described, e.g., in WO 2016205764, which is incorporated herein by reference in its entirety.
The CRISPR systems described herein can also be used to target mutations disrupting the cis-acting splicing codes that can cause splicing defects and diseases. These diseases include, e.g., motor neuron degenerative disease that results from deletion of the SMN1 gene (e.g., spinal muscular atrophy), Duchenne Muscular Dystrophy (DMD), frontotemporal dementia, and Parkinsonism linked to chromosome 17 (FTDP-17), and cystic fibrosis.
The CRISPR systems described herein can further be used for antiviral activity, in particular against RNA viruses. The CRISPR-associated proteins can target the viral RNAs using suitable guide RNAs selected to target viral RNA sequences.
The CRISPR systems described herein can also be used to treat a cancer in a subject (e.g., a human subject). For example, the CRISPR-associated proteins described herein can be programmed with crRNA targeting a RNA molecule that is aberrant (e.g., comprises a point mutation or are alternatively-spliced) and found in cancer cells to induce cell death in the cancer cells (e.g., via apoptosis).
The CRISPR systems described herein can also be used to treat an autoimmune disease or disorder in a subject (e.g., a human subject). For example, the CRISPR-associated proteins described herein can be programmed with crRNA targeting a RNA molecule that is aberrant (e.g., comprises a point mutation or are alternatively-spliced) and found in cells responsible for causing the autoimmune disease or disorder.
Further, the CRISPR systems described herein can also be used to treat an infectious disease in a subject. For example, the CRISPR-associated proteins described herein can be programmed with crRNA targeting a RNA molecule expressed by an infectious agent (e.g., a bacteria, a virus, a parasite or a protozoan) in order to target and induce cell death in the infectious agent cell. The CRISPR systems may also be used to treat diseases where an intracellular infectious agent infects the cells of a host subject. By programming the CRISPR-associated protein to target a RNA molecule encoded by an infectious agent gene, cells infected with the infectious agent can be targeted and cell death induced.
Furthermore, in vitro RNA sensing assays can be used to detect specific RNA substrates. The CRISPR-associated proteins can be used for RNA-based sensing in living cells. Examples of applications are diagnostics by sensing of, for examples, disease-specific RNAs.
A detailed description of therapeutic applications of the CRISPR systems described herein can be found, e.g., in U.S. Pat. No. 8,795,965, EP3009511, WO 2016205764, and WO 2017070605; each of which is incorporated herein by reference in its entirety.
In certain embodiments, the methods of the invention can be used to introduce the CRISPR systems described herein into a cell, and cause the cell and/or its progeny to alter the production of one or more cellular produces, such as antibody, starch, ethanol, or any other desired products. Such cells and progenies thereof are within the scope of the invention.
In certain embodiments, the methods and/or the CRISPR systems described herein lead to modification of the translation and/or transcription of one or more RNA products of the cells. For example, the modification may lead to increased transcription/translation/expression of the RNA product. In other embodiments, the modification may lead to decreased transcription/translation/expression of the RNA product.
In certain embodiments, the cell is a prokaryotic cell.
In certain embodiments, the cell is a eukaryotic cell, such as a mammalian cell, including a human cell (a primary human cell or an established human cell line). In certain embodiments, the cell is a non-human mammalian cell, such as a cell from a non-human primate (e.g., monkey), a cow/bull/cattle, sheep, goat, pig, horse, dog, cat, rodent (such as rabbit, mouse, rat, hamster, etc). In certain embodiments, the cell is from fish (such as salmon), bird (such as poultry bird, including chick, duck, goose), reptile, shellfish (e.g., oyster, claim, lobster, shrimp), insect, worm, yeast, etc. In certain embodiments, the cell is from a plant, such as monocot or dicot. In certain embodiment, the plant is a food crop such as barley, cassava, cotton, groundnuts or peanuts, maize, millet, oil palm fruit, potatoes, pulses, rapeseed or canola, rice, rye, sorghum, soybeans, sugar cane, sugar beets, sunflower, and wheat. In certain embodiment, the plant is a cereal (barley, maize, millet, rice, rye, sorghum, and wheat). In certain embodiment, the plant is a tuber (cassava and potatoes). In certain embodiment, the plant is a sugar crop (sugar beets and sugar cane). In certain embodiment, the plant is an oil-bearing crop (soybeans, groundnuts or peanuts, rapeseed or canola, sunflower, and oil palm fruit). In certain embodiment, the plant is a fiber crop (cotton). In certain embodiment, the plant is a tree (such as a peach or a nectarine tree, an apple or pear tree, a nut tree such as almond or walnut or pistachio tree, or a citrus tree, e.g., orange, grapefruit or lemon tree), a grass, a vegetable, a fruit, or an algae. In certain embodiment, the plant is a nightshade plant; a plant of the genus Brassica; a plant of the genus Lactuca; a plant of the genus Spinacia; a plant of the genus Capsicum; cotton, tobacco, asparagus, carrot, cabbage, broccoli, cauliflower, tomato, eggplant, pepper, lettuce, spinach, strawberry, blueberry, raspberry, blackberry, grape, coffee, cocoa, etc.
A related aspect provides cells or progenies thereof modified by the methods of the invention using the CRISPR systems described herein.
In certain embodiments, the cell is modified in vitro, in vivo, or ex vivo.
In certain embodiments, the cell is a stem cell.
Through this disclosure and the knowledge in the art, the CRISPR systems described herein comprising an engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity (such as Cas13e or Cas13f), or any of the components thereof described herein (Cas13 proteins, derivatives, functional fragments or the various fusions or adducts thereof, and guide RNA/crRNA), nucleic acid molecules thereof, and/or nucleic acid molecules encoding or providing components thereof, can be delivered by various delivery systems such as vectors, e.g., plasmids and viral delivery vectors, using any suitable means in the art. Such methods include (and are not limited to) electroporation, lipofection, microinjection, transfection, sonication, gene gun, etc.
In certain embodiments, the CRISPR-associated proteins and/or any of the RNAs (e.g., guide RNAs or crRNAs) and/or accessory proteins can be delivered using suitable vectors, e.g., plasmids or viral vectors, such as adeno-associated viruses (AAV), lentiviruses, adenoviruses, retroviral vectors, and other viral vectors, or combinations thereof. The proteins and one or more crRNAs can be packaged into one or more vectors, e.g., plasmids or viral vectors. For bacterial applications, the nucleic acids encoding any of the components of the CRISPR systems described herein can be delivered to the bacteria using a phage. Exemplary phages, include, but are not limited to, T4 phage, Mu, λ phage, T5 phage, T7 phage, T3 phage, Φ29, M13, MS2, Qβ, and Φ174.
In some embodiments, the vectors, e.g., plasmids or viral vectors, are delivered to the tissue of interest by, e.g., intramuscular injection, intravenous administration, transdermal administration, intranasal administration, oral administration, or mucosal administration. Such delivery may be either via a single dose, or multiple doses. One skilled in the art understands that the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector choices, the target cells, organisms, tissues, the general conditions of the subject to be treated, the degrees of transformation/modification sought, the administration routes, the administration modes, the types of transformation/modification sought, etc.
In certain embodiments, the delivery is via adenoviruses, which can be at a single dose containing at least 1×105 particles (also referred to as particle units, pu) of adenoviruses. In some embodiments, the dose preferably is at least about 1×106 particles, at least about 1×107 particles, at least about 1×108 particles, and at least about 1×109 particles of the adenoviruses. The delivery methods and the doses are described, e.g., in WO 2016205764 A1 and U.S. Pat. No. 8,454,972 B2, both of which are incorporated herein by reference in the entirety.
In some embodiments, the delivery is via plasmids. The dosage can be a sufficient number of plasmids to elicit a response. In some cases, suitable quantities of plasmid DNA in plasmid compositions can be from about 0.1 to about 2 mg. Plasmids will generally include (i) a promoter; (ii) a sequence encoding a nucleic acid-targeting CRISPR-associated proteins and/or an accessory protein, each operably linked to a promoter (e.g., the same promoter or a different promoter); (iii) a selectable marker; (iv) an origin of replication; and (v) a transcription terminator downstream of and operably linked to (ii). The plasmids can also encode the RNA components of a CRISPR complex, but one or more of these may instead be encoded on different vectors. The frequency of administration is within the ambit of the medical or veterinary practitioner (e.g., physician, veterinarian), or a person skilled in the art.
In another embodiment, the delivery is via liposomes or lipofection formulations and the like, and can be prepared by methods known to those skilled in the art. Such methods are described, for example, in WO 2016205764 and U.S. Pat. Nos. 5,593,972; 5,589,466; and 5,580,859; each of which is incorporated herein by reference in its entirety.
In some embodiments, the delivery is via nanoparticles or exosomes. For example, exosomes have been shown to be particularly useful in delivery RNA.
Further means of introducing one or more components of the new CRISPR systems to the cell is by using cell penetrating peptides (CPP). In some embodiments, a cell penetrating peptide is linked to the CRISPR-associated proteins. In some embodiments, the CRISPR-associated proteins and/or guide RNAs are coupled to one or more CPPs to effectively transport them inside cells (e.g., plant protoplasts). In some embodiments, the CRISPR-associated proteins and/or guide RNA(s) are encoded by one or more circular or non-circular DNA molecules that are coupled to one or more CPPs for cell delivery.
CPPs are short peptides of fewer than 35 amino acids derived either from proteins or from chimeric sequences capable of transporting biomolecules across cell membrane in a receptor independent manner. CPPs can be cationic peptides, peptides having hydrophobic sequences, amphipathic peptides, peptides having proline-rich and anti-microbial sequences, and chimeric or bipartite peptides. Examples of CPPs include, e.g., Tat (which is a nuclear transcriptional activator protein required for viral replication by HIV type 1), penetratin, Kaposi fibroblast growth factor (FGF) signal peptide sequence, integrin β3 signal peptide sequence, polyarginine peptide Args sequence, Guanine rich-molecular transporters, and sweet arrow peptide. CPPs and methods of using them are described, e.g., in Hällbrink et al., “Prediction of cell-penetrating peptides,” Methods Mol. Biol., 2015; 1324:39-58; Ramakrishna et al., “Gene disruption by cell-penetrating peptide-mediated delivery of Cas9 protein and guide RNA,” Genome Res., 2014 June; 24(6):1020-7; and WO 2016205764 A1; each of which is incorporated herein by reference in its entirety.
Various delivery methods for the CRISPR systems described herein are also described, e.g., in U.S. Pat. No. 8,795,965, EP3009511, WO 2016205764, and WO 2017070605; each of which is incorporated herein by reference in its entirety.
Another aspect of the invention provides a kit, comprising any two or more components of the subject CRISPR/Cas system described herein comprising an engineered Class 2 type VI Cas13 protein, such as those either substantially lacking or having enhanced collateral activity, such as the Cas13e and Cas13f proteins, derivatives, functional fragments or the various fusions or adducts thereof, guide RNA/crRNA, complexes thereof, vectors encompassing the same, or host encompassing the same.
In certain embodiments, the kit further comprises an instruction to use the components encompassed therein, and/or instructions for combining with additional components that may be available elsewhere.
In certain embodiments, the kit further comprises one or more nucleotides, such as nucleotide(s) corresponding to those useful to insert the guide RNA coding sequence into a vector and operably linking the coding sequence to one or more control elements of the vector.
In certain embodiments, the kit further comprises one or more buffers that may be used to dissolve any of the components, and/or to provide suitable reaction conditions for one or more of the components. Such buffers may include one or more of PBS, HEPES, Tris, MOPS, Na2CO3, NaHCO3, NaB, or combinations thereof. In certain embodiments, the reaction condition includes a proper pH, such as a basic pH. In certain embodiments, the pH is between 7-10.
In certain embodiments, any one or more of the kit components may be stored in a suitable container.
This example demonstrates that collateral effect or non-sequence-specific endonuclease activity of the Cas13 enzymes (e.g., Cas13e) can be largely reduced by introducing mutations that reduce the affinity between Cas13e and potential RNA targets (sequence specific or non-sequence specific targets), thus disproportionally reducing collateral non-sequence-specific endonuclease activity, while substantially maintaining sequence-specific endonuclease activity against the target RNA, partly due to the binding between the guide sequence and the target RNA. See
Using the I-TASSER website (zhanglab.ccmb.med.umich.edu/I-TASSER), the 3D structure of Cas13e was predicted. Further, using the NCBI web tool (ncbi.nlm.nih.gov/Structure/icn3d/full.html), or PyMOL, the predicted structure was visualized. Based on the relevant sequence information, sequences that are spatially close to the two HEPN RXXXXH sequences were analyzed in Cas13e. See
Based on this theory, sequences that are spatially close to the two HEPN domains in Cas13e, e.g., residues 2-187 and 634-755 that are around the two HEPN domains, respectively, as well as the spatially close region between residues 227-242, were systematically mutated (see
In order to facilitate further screening and selection, to the ends of each selected mutagenesis region (see
To facilitate further characterization, an EGFP-mCherry double fluorescent reporting system was constructed (see
The sequences of the EGFP and mCherry reporters are in SEQ ID NOs: 1 and 2. The gRNA is SEQ ID NO: 3. Wild type Cas13e protein is SEQ ID NO: 4, and its codon-optimized polynucleotide coding sequence is SEQ ID NO: 5.
Human HEK293T cells were cultured in 24-well tissue culture plates according to standard methods, before the double-fluorescent reporting system plasmid was transfected into the cells using standard polyethylenimine (PEI) transfection. Transfected cells were then cultured at 37° C. under CO2 for 48 hrs. EGFP and mCherry signals were detected using FACS.
The standard for selecting engineered Cas13e with reduced collateral effect, using the double-fluorescent reporting system, was following:
1) mutant/engineered Cas13e has similar/equivalent EGFP signal compared to the wild-type Cas13e, indicating that the guide-sequence-specific cleavage of the target RNA (EGFP) was not/little affected by the mutations in the engineered Cas13e;
2) mutant/engineered Cas13e has similar/equivalent mCherry signal compared to the nuclease dead dCas13e, indicating that the non-sequence-specific cleavage of the non-target RNA (mCherry) was non-existing in the engineered Cas13e, just like dCas13e that is unable to cleave mCherry mRNA.
Based on the above standard and further characterization, 5 distinct engineered Cas13e were identified, each with much reduced collateral effect compared to wild-type Cas13e (see
For comparison, in the Mut-6 mutation region, the corresponding wild-type sequence and mutant sequence listed below in SEQ ID NOs: 16 and 17, with changed sequences double underlined. The corresponding nucleotide sequences are in SEQ ID NOs: 18 and 19.
VFAAAAAAGLFVASLED
gtcttcgcCgccGcCgccgccGcC
In the Mut-7 mutation region, the corresponding wild-type sequence and mutant sequence listed below in SEQ ID NOs: 20 and 21, with changed sequences double underlined. The corresponding nucleotide sequences are in SEQ ID NOs: 22 and 23.
VFAYSAAAWYAAATED
gtcttcgccTACAGCgccgccgccTGGT
In the Mut-12 mutation region, the corresponding wild-type sequence and mutant sequence listed below in SEQ ID NOs: 22 and 23, with changed sequences double underlined. The corresponding nucleotide sequences are in SEQ ID NOs: 24 and 25.
VFLAALAGAVAGLAED
gtcttcCTGGccgccCTGgccG
In the Mut-17 mutation region, the corresponding wild-type sequence and mutant sequence listed below in SEQ ID NOs: 28 and 29, with changed sequences double underlined. The corresponding nucleotide sequences are in SEQ ID NOs: 30 and 31.
VFGAIAAATVYAAGED
gtcttcGGCgccATCgccgccgccA
In the Mut-19 mutation region, the corresponding wild-type sequence and mutant sequence listed below in SEQ ID NOs: 32 and 33, with changed sequences double underlined. The corresponding nucleotide sequences are in SEQ ID NOs: 34 and 35.
VFAAIAFAAILAQAED
Based on further characterization, Mut-17 and Mut-19 essentially eliminated collateral effect of wild-type Cas13e, while maintained relatively high guide-sequence specific endonuclease activity.
Further, the method described herein has been shown to be able to identify residues for engineering even though these residues are far away from the HEPN domains in primary sequence, but can be shown to be spatially close to the HEPN domains based on predicted 3D structure (using commonly available tools such as PyMOL or I-TASSER). See
In order to narrow down the key amino acids in the Mut-17 region that affect the bystander effect, a series of 8 mutations in the Mut-17 region were constructed and tested, including M17.5, M17.6, M17.8, M17.9, M17.10, M17.11, M17.12, and M17.13 (see
For comparison, in the M17.5 mutation region, the corresponding wild-type sequence and mutant sequence are listed below in SEQ ID NOs: 28 and 36, with changed sequences double underlined.
In the M17.6 mutation region, the corresponding wild-type sequence and mutant sequence are listed below in SEQ ID NOs: 28 and 37, with changed sequences double underlined.
In the M17.8 mutation region, the corresponding wild-type sequence and mutant sequence are listed below in SEQ ID NOs: 28 and 38, with changed sequences double underlined.
In the M17.9 mutation region, the corresponding wild-type sequence and mutant sequence are listed below in SEQ ID NOs: 28 and 39, with changed sequences double underlined.
In the M17.10 mutation region, the corresponding wild-type sequence and mutant sequence are listed below in SEQ ID NOs: 28 and 40, with changed sequences double underlined.
In the M17.11 mutation region, the corresponding wild-type sequence and mutant sequence are listed below in SEQ ID NOs: 28 and 41, with changed sequences double underlined.
In the M17.12 mutation region, the corresponding wild-type sequence and mutant sequence are listed below in SEQ ID NOs: 28 and 42, with changed sequences double underlined.
In the M17.13 mutation region, the corresponding wild-type sequence and mutant sequence are listed below in SEQ ID NOs: 28 and 43, with changed sequences double underlined.
Based on this further characterization, and consistent with previous results, most tested point mutations within the Mut-17 region do not have significant effect on the guide sequence-dependent RNase activity (see
In contrast, point mutations M17.6, M17.8 and M17.9 (SEQ ID NOs: 37-39) essentially eliminated collateral effect of wild-type Cas13e to dCas13e.1 level, while the other point mutations retained different degrees of collateral effect compared to wild-type Cas13e.1, including in some cases enhanced collateral effect (see
Similarly, in order to narrow down the key amino acid residues in the Mut-19 region that affect the collateral activity, a series of 6 mutants in the Mut-19 region were constructed and tested (see
For comparison, in the M19.1 mutation region, the corresponding wild-type Cas13e.1 sequence and mutant sequences are listed below in SEQ ID NOs: 32 and 44, with changed sequences double underlined.
In the M19.2 mutation region, the corresponding wild-type Cas13e.1 sequence and mutant sequences are listed below in SEQ ID NOs: 32 and 45, with changed sequences double underlined.
In the M19.3 mutation region, the corresponding wild-type Cas13e.1 sequence and mutant sequences are listed below in SEQ ID NOs: 32 and 46, with changed sequences double underlined.
In the M19.4 mutation region, the corresponding wild-type Cas13e.1 sequence and mutant sequences are listed below in SEQ ID NOs: 32 and 47, with changed sequences double underlined.
In the M19.5 mutation region, the corresponding wild-type Cas13e.1 sequence and mutant sequences are listed below in SEQ ID NOs: 32 and 48, with changed sequences double underlined.
AAHYADFREALAQAMA
In the M19.6 mutation region, the corresponding wild-type Cas13e.1 sequence and mutant sequences are listed below in SEQ ID NOs: 32 and 49, with changed sequences double underlined.
Based on this further characterization, and consistent with previous results, most tested point mutations within the Mut-19 region do not have significant effect on the guide sequence-dependent RNase activity (see
In contrast, point mutations M19.2 and M19.5 (SEQ ID NOs: 45 and 48) essentially eliminated collateral effect of wild-type Cas13e to dCas13e.1 level, while the other point mutations retained different degrees of collateral effect compared to wild-type Cas13e.1 (see
Collateral RNA degradation by the Cas13 family of effector enzymes has previously been found in glioma cells and flies, but its presence in mammalian cells has not been definitively demonstrated. Based on the fast and sensitive dual-fluorescence reporter system for detecting collateral effects as described herein, this example demonstrates that Cas13 could indeed induce substantial collateral effects in HEK293T cells when targeting either exogenous and endogenous genes. In particular, Cas13d was shown to mediate transcriptome-wide RNA off-target editing, causing cell growth arrest and reducing cell viability.
Specifically, to evaluate the collateral effects of Cas13 in mammalian cells, Cas13 (Cas13a or Cas13d) were co-transfected with EGFP and mCherry coding sequences, together with targeted (against mCherry) or non-targeted (NT, control) guide RNA (gRNA) into HEK293T cells. Expression levels of the targeted mCherry and the non-targeted EGFP were measured 48 hrs after transfection (
It was found that, with three different mCherry gRNAs, both Cas13a and Cas13d not only mediated expected decrease of mCherry fluorescence intensity, but also caused significant decrease of EGFP fluorescence intensity, as compared to NT gRNA (
Together, these findings showed that collateral effects of Cas13-mediated RNA reduction were detectable in the mammalian HEK293T cells when targeting transiently overexpressed exogenous genes.
However, the collateral effects are not limited to transiently overexpressed exogenous genes. The data presented herein also demonstrates that Cas13d could induce collateral effects when targeting endogenous genes in HEK293T.
Flow cytometry experiments showed that Cas13d-mediated knockdown induced a substantial collateral cleavage (as indicated by the reduction of EGFP and mCherry fluorescence) when targeting the endogenous RPL4 gene (
Furthermore, by determining the RNA-targeting efficiency on RPL4 with four different gRNAs (gRNA-1 to gRNA-4), consistently robust knockdown for RPL4 with each gRNA by Cas13 targeting was observed, along with notable knockdown of EGFP transcript with RPL4 gRNA-1, gRNA-3 and gRNA-4, but not gRNA-2 (
Regardless, these findings convincingly demonstrate that Cas13-mediated RNA knockdown results in substantial collateral effects in mammalian cells, when targeting either exogenous or endogenous genes.
Consistent with what has been shown in Examples 1 and 2 concerning Cas13e, the example demonstrates that the collateral effects of other Cas13 (e.g., Cas13d or CasRx) can also be diminished (even if not completely eliminated) via mutagenesis, based on the hypothesis that changing RNA-binding cleft proximal to catalytic sites RXXXXH in HEPN domains may selectively decrease promiscuous RNA binding and non-target cleavage while maintain on-target RNA cleavage.
Specifically, as before, a publicly available online tool TASSER was used to predict the 3D structure of Cas13d, and the predicted structure was visualized with PyMOL in order to determine the position of the various structual domains in 3D (see
Then an unbiased screening system was designed based on the dual-fluorescence approach described above, in which coding sequences for EGFP, mCherry, EGFP-targeting gRNA, together with each Cas13 variants, were inserted into one plasmid for expression in 293T cells. In this system, expression of EGFP and expression of mCherry were driven by the same SV40 promoter, in order to ensure roughly equally stable expression of the reporter genes in the transfected host cell. The gRNA was chosen to be specific for EGFP mRNA. Each coding sequence for Cad13d and variants has an N-terminal and a C-terminal nuclear localization signal (NLS), and expression of Cas13d and variants/mutants was driven by the strong CAG promoter.
The EGFP and mCherry coding sequences are SEQ ID NOs: 1 and 2, respectively. The corresponding DNA sequence of the gRNA is SEQ ID NO: 3. The wild-type Cas13d protein sequence is SEQ ID NO: 101. The coding sequence for the wild-type Cas13d is SEQ ID NO: 102. The CAG promoter sequence is SEQ ID NO: 103. The SV40 promoter sequence is SEQ ID NO: 104.
The HEPN1-I, HEPN1-II, and HEPN2 domains of Cas13d, corresponding to residues 77-328 and 458-961, were chosen for generating a Cas13d mutagenesis library. First, these regions were divided into 21 small segments (N1-N21), each with about 36 residues. More specifically, these 21 mutated regions cover HEPN1-I (N1-N6), HEPN1-II (N8-N10), HEPN2 (N14-N21), Helical-1 (N7) and Helical-2 (N10-N14) domains (
To facilitate subsequent selection, a BpiI restriction enzyme recognition site (GTCTTC, corresponding to encoded residues VF; reverse complement GAAGAC, corresponding to encoded residues ED) was introduced at each end of the segments. When producing mutants, all non-Ala residues were substituted by Ala, and all Ala residues were substituted by Val (e.g., replacing all non-alanine to alanine, X>A, and alanine to valine, A>V). About 4-5 total mutations were introduced between the two BpiI sites flanking each segment. The various mutants so generated and their corresponding wild-type sequences (N1L1-N21L, N1R-N21R) are provided below.
AAYAVVAANPLYAAPVQ
AAMLGLAETLEARYFGEAA
AAAANICIQVIHNILAI
AAILAEYIAAAAYAVNNIA
ALDADIIGFGAFATVYT
AAEFAAPEHHRAAFNNNDA
ADEFKDPEHHRAAFNNNDK
AIAAIKAAYDEFDAFLD
AARLAYFGQAFFAAEGRNY
AANYGNACYDALVLLSG
AAHWVVHNNEEEARIARAW
AYALDKALDAEYISTLA
AAYDRITNELTNAFAANSA
ALYDRITNELTNSFSKNSA
VNVNYIVATLGINPVAF
VAQYFRFAIMAEQANLGFN
VGADVSVFSKLMYVLTM
AADAAEINDLLTTLINAFD
AIQSFLKAMPLIGAAAK
AAEEYAFFADAAAIADELR
AAKSFARAGEPAADARR
VAYIDAIRILAANLAYDEL
AALADTFALDENANALA
AAAHGMRNFIINNVIANAR
AHYLARYGDPAALAEAA
AAEAVVAFVLGRIADIQAA
AGAAGKAAIDRYYETCI
AADAGASVSEAVDALTKII
AGMNYAQFAKKRSVIEA
AARENAEREAFAAIISLYL
AVIYHILAAIVAIAARY
AAAFHCVERDAQLYAEAGY
AINLKKLEEKAFSAVAK
AAAGIDETAPDARADVEAE
AAERAKESIASAESAAP
AAYANYIAYSDEAKAEEFT
AQINAEKAKTALNVYLA
AAAWNVIIREDLLRIDNAA
ATLARNKAVHLAVVAYV
AVYIAAIAEVNAYFQLYHY
AMQRAAANERYEKSSGK
AAEYFAAVNDEAAYNDRLL
ALLCAPFAYCIPRFAAL
AAEALFDRNEAAAFDAEAK
Using the EGFP-mCherry dual-fluorescence reporter system of the invention, these Cas13d mutants were functionally screened to assess their collateral vs. gRNA-guided cleavage activities. Specifically, according to standard cell culture methods, human HEK293 cells were grown in 24-well tissue culture plates to a suitable density before the cells were transfected with PEI reagents and plasmids that express each mutant Cas13d and the reporter system fluorescent proteins. Transfected cells were cultured at 37° C. in incubator under 5% CO2 for about 48 hours, before measuring EGFP and mCherry signals in the cells with FACS. Mutants leading to low percentage of the gRNA-targeted EGFP signal (lower percentage of EGFP cells, as a readout for preserved gRNA-guided cleavage) and high percentage of non-targeted mCherry signal (higher percentage of mCherry+ cells, as a readout for lacking collateral effect) were selected.
In this experiment, dCas13d with no gRNA-guided cleavage was used as a negative control, and the results (mean±s.e.m.) were normalized against that of dCas13d and listed below. Cas13d mutants located at the upper left area of
After normalization of EGFP and mCherry fluorescence intensity by inactive dead Cas13d (dCas13d with R295A, H300A, R849A, and H854A mutations in HEPN domains), it was found that variants with mutation sites in N1, N2, N3, or N15, specially N1V7, N2V7, N2V8, N3V7, and N15V4, exhibited relatively low EGFP fluorescence intensity but high mCherry fluorescence intensity, indicating that these variants retained a high on-target activity but greatly reduced collateral activity (
Overall, these mutants exhibited less than 27.5% collateral effect (e.g., ≥72.5% mCherry+ cells), and ≥75% gRNA-guided cleavage (≤25% EGFP+ cells). They include: N1V7, N2V7, N2V8, N3V7, and N15V4, etc. (see above table and
Further, some of the Cas13d mutants exhibited low collateral effect (e.g., ≤27.5% collateral effect, or ≥72.5% mCherry+ cells), and intermediate gRNA-guided cleavage (e.g., 25%≤EGFP+ cells≤75%), including: N2V4, N2V5, N4V3, N6V3, N10V6, N15V2, N20V6, and N20-Y910A, etc. (see above table and
In other words, the invention has provided mutants having substantially retained (e.g., retaining at least about 50%, 60%, 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) wild-type level gRNA-guided cleavage, while substantially reducing/eliminating (at least about 72.5%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%) Cas13d collateral effect.
Since N2V7 and N2V8 retained relatively high guide RNA-specific cleavage, with essentially eliminated Cas13d collateral effect, and the residues affected by these mutants are very close together, further mutagenesis study in the two regions of these mutants was conducted, by generating a number of additional mutants with single, double, triple, or quadruple combination mutations. The sequences of these mutants and the corresponding wild-type sequences (N2C) are listed below:
ALAEYITNAAYAVNNIS
ALAAYATNAAYAVNNIS
ALAAYITNAAYAVNNAS
ALAEYATNAAYAVNNAS
ALAAYITNAAYAVNNIS
ALAEYATNAAYAVNNIS
ALAEYITNAAYAVNNAS
ALAAYITNAAYVVNNIS
ALVAYITNAAYVVNNIS
Using the same assay above, and after normalizing the data with that of the dCas13d, mutants occupying the upper left corner of
Based on comprehensive analysis of all these mutants, N2V8 (carrying A134V, A140V, A141V, A143V) was believed to has superior characteristics, in that it retained relatively high guide RNA-specific cleavage, while essentially eliminated Cas13d collateral effect. See data above and
Based on the structure of Cas13d and PyMOL visualization, it was identified that the mutation sites of various effective variants were mainly located in a-helix proximal to catalytic sites of two HEPN domains (RXXXXH-1, RXXXXH-2) (
The identified desired Cas13d mutants with reduced/eliminated collateral effects seem to share the following characteristics:
1. mutations are mainly located within the HEPN1-1 domain (e.g., residues 90-292), Helical2 domain (e.g., residues 536-690), and the HEPN2 domain (e.g., residues 690-967 in Cas13d).
2. in Cas13d, mutations are located within 170 residues of the RXXXXH motif.
3. most mutations, in 3D structure, are in the vicinity of the catalytic activity site formed by the RXXXXH motifs of HEPN1 and HEPN2 domains.
4. for each mutated residue, substitutions by residues other than Ala (especially Val, Gly, and Ile), are similarly effective to reduce/eliminate collateral effect.
Certain specific positions of the desired mutants in Cas13d are listed below:
AAILAEYIAAAAYAVNNIA
AAAFHCVERDAQLYAEAGY
Interestingly, the majority of variants exhibited either low dual cleavage activity (upper right in
To confirm the elimination of collateral effects by cfCas13d, EGFP was targeted with other three different gRNAs, and substantial collateral effects was found to be induced by the wild-type Cas13d, but essentially no collateral effects were induced by cfCas13d (
Next, in vitro cleavage activities of purified Cas13d and cfCas13d proteins on targeted RNAs, in the presence of non-targeted single-strand RNA probes, were investigated. It was found that cfCas13d exhibited consistently efficient on-target activity with essentially no collateral cleavage, whereas wild-type Cas13d showed notable collateral activity (
On the other hand, the above screening also produced multiple mutants with significantly enhanced collateral effect, based on ≥87.5% collateral cleavage efficiency (e.g., ≤12.5% mCherry+ cells) and better gRNA-guided cleavage compared to wild-type (e.g., ≤4% EGFP+ cells). These mutants include: N2-Y142A, N4-Y193A, N12-Y604A, N21V7, etc. Among them, N2-Y142A is located in the Helical2 domain, extending towards the two HEPN domains in the 3D structure. Meanwhile, N4-Y193A and N21V7 are within the HEPN1 and HEPN2 domains, respectively, and are relatively far away from the catalytic active site. The residues involved in these mutants are listed below.
It should be understood that, although Ala was used in the mutagenesis studies herein, other substitutions at the same positions (especially those with small (alky) side chains such as Val or Ile, or Gly), also have similar effects as Ala substitution. These mutations are expressly contemplated and disclosed herein, and are within the scope of the invention.
This example provides additional Cas13e mutants with reduced/eliminated collateral effect, based on knowledge of Cas13d mutants screening and simulated structural analysis of Cas13e (see
Specifically, a mutagenesis library was developed for Cas13e, covering HEPN1 and HEPN2 domains (
ATIMEAAYEAAIFECAAR
VFEEKAAKVKKMSEKE
AMKKYAAEKEAKFPAK
AVSKQAAKKRELAIDE
AQGARKWCFTIAFNKA
AVNRDKNDGAFVESAAR
AEAYSAADWYDEDTAA
AAKCSTQVANAKAEAA
ARHSPGCLTFTAEDEL
ATAVIIEFPSLFEGAA
ATTVGVVFFASFFAER
AVLAALYGAVSGLAAN
AGQYALTAAALAMYCL
AKKRAAANEGANPKRH
AAIVFAVADYGALYVL
AAAEFAARIAEYFMPH
AKGKIAYHTVYAKGFA
VYNDLQKKCVAVVLVA
AARYIAFREIAAAAMC
AARYIDFREILAQTMC
AEAEAAAVNAVRRAFF
AALKFVIDEFGLFSD
Using the EGFP-mCherry dual-fluorescence reporter system of the invention, these Cas13e mutants were functionally screened to assess their collateral vs. gRNA-guided cleavage activities. Specifically, according to standard cell culture methods, human HEK293 cells were grown in 24-well tissue culture plates to a suitable density before the cells were transfected with PEI reagents and plasmids that express each mutant Cas13e and the reporter system fluorescent proteins. Transfected cells were cultured at 37° C. in incubator under 5% CO2 for about 48 hours, before measuring EGFP and mCherry signals in the cells with FACS. Mutants leading to low percentage of the gRNA-targeted EGFP signal (lower percentage of EGFP+ cells, as a readout for preserved gRNA-guided cleavage) and high percentage of non-targeted mCherry signal (higher percentage of mCherry+ cells, as a readout for lacking collateral effect) were selected.
In this experiment, dCas13e with no gRNA-guided cleavage was used as a negative control, and the results (mean±s.e.m.) were normalized against that of dCas13e and listed below. Cas13e mutants located at the upper left area of
After screening from the mutagenesis library and further different combinations with single, double, triple or quadruple mutations, many mutants with reduced/eliminated collateral effect were identified. For example, Cas13e-M17YY (carrying Y672A, Y676A) exhibited similarly high level of EGFP knockdown and lower mCherry knockdown, compared with wild-type Cas13e (
Overall, these mutants exhibited less than 25% collateral effect (e.g., ≥75% mCherry+ cells), and ≥75% gRNA-guided cleavage (≤25% EGFP+ cells). They include: M1V4, M2V2, M2V3, M2V4, M5V1, M6V2, M6V3, M6V4, M7V1, M7V2, M7V3, M7-Y55A, M7-Y61A, M11V1, M12V3, M15V1, M15V2, M15-Y643A, M15-Y647A, M16V1, M16V2, M17V2, M18V2, M18V3, M19V2, M19V3, M19-IA, etc. (see above table and
Further, some of the Cas13e mutants exhibited low collateral effect (e.g., ≤25% collateral effect, or ≥75% mCherry+ cells), and intermediate gRNA-guided cleavage (e.g., 25%≤EGFP+ cells≤75%), including: M17YY, M8V4, M9V1, M11V2, M11V3, M13V1, M13V2, M13V3, M15V3, M20V2, etc. (see above table and
In other words, the invention has provided mutants having substantially retained (e.g., retaining at least about 50%, 60%, 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%) wild-type level gRNA-guided cleavage, while substantially reducing/eliminating (at least about 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%) collateral effect.
While not wishing to be bound by any particular theory, the data presented herein seems to suggest the following mechanism for reduced/eliminated collateral effect, partly based on the analysis of the locations of the effective mutants in the 3D structure of the Cas13 effector enzyme based on PyMOL visualization. Specifically, it was found that most mutants with the desired effects (e.g., reduced/eliminated collateral effect) have mutations within the HEPN1/HEPN2 domains, usually near the RXXXXH catalytic active site. It is believed that residues in these regions may have participated in binding between Cas13e to the target RNA and/or the non-specific RNA, and mutations in these residues had different/differential effects on Cas13e affinity towards different RNA targets, hence the cleavage efficiency towards these RNA targets.
The identified desired Cas13e mutants with reduced/eliminated collateral effects seem to share the following characteristics:
1. mutations are located within the HEPN1 domain and the inter-domain linker (IDL) region (e.g., residues 1-194 in Cas13e), and the HEPN2 domain (e.g., residues 620-775 in Cas13e).
2. in Cas13e, mutations are located within 125 residues of the RXXXXH motif.
3. most mutations, in 3D structure, are in the vicinity of the catalytic activity site formed by the RXXXXH motifs of HEPN1 and HEPN2 domains.
4. for each mutated residue, substitutions by residues other than Ala (especially Val, Gly, and Ile), are similarly effective to reduce/eliminate collateral effect. These mutations are expressly contemplated and disclosed herein, and are within the scope of the invention.
Certain specific positions of the desired mutants in Cas13e are listed below:
VFEEKAAKVKKMSEKE
AVNRDKNDGAFVESAAR
AEAYSAADWYDEDTAA
AGQYALTAAALAMYCL
AAIVFAVADYGALYVL
AAAEFAARIAEYFMPH
VYNDLQKKCVAVVLVA
One specific mutant, M17YY, to a large extent has reduced collateral effect compared to the previously identified M17.15-1 and M17.15-2 mutants (Y672A,Y676A) (see
On the other hand, the above screening also produced multiple mutants with significantly enhanced collateral effect, based on ≥60% collateral cleavage efficiency (e.g., ≤40% mCherry+ cells) and better gRNA-guided cleavage compared to wild-type (e.g., ≤5.5% EGFP+ cells). These mutants include: M14V2, M16V3, M18V1, M19-G712A, M19-T725A, M19-C727A, etc. These mutants are mainly located between the two catalytic active sites formed by the RXXXXH motifs. For example, M14V2 is located in the Helical1-1 domain, around the beta-turn towards the two HEPN domains in the 3D structure. Meanwhile, M16V3, M18V1, M19-G712A, M19-T725A, and M19-C727A have mutations in the HEPN2 domain, around/near the alpha-helic and the its flanking unstructured regions, all close to the catalytic active site. The residues involved in these mutants are listed below.
AKKRAAANEGANPKRH
AAHYIDFREILAQTMC
It should be understood that, although Ala was used in the mutagenesis studies herein, other substitutions at the same positions (especially those with small (alky) side chains such as Val or Ile, or Gly), also have similar effects as Ala substitution. These mutations are expressly contemplated and disclosed herein, and are within the scope of the invention.
This experiment, based on using 4 different gRNA (g1-g4) targeting EGFP, demonstrates that cfCas13d has similarly high gRNA-guided target RNA cleavage as the wild-type Cas13d, yet exhibits no significant collateral effect. See
Purified wild-type Cas13d, cfCas13d, and dCas13d were used to assess in vitro collateral effect as well as gRNA-guided target RNA cleavage. The results showed that cfCas13d did not exhibit any detectable collateral effect (
The ssRNA target sequence and crRNA for determining gRNA-directed cleavage are:
ssRNA-cy5-Labeled: 5′-CY5-GGCCAGUGAAUUCGAGCUCGGUACCCGGGGAUCCUCUAGA AAUAUGGAUUACUUGGUAGAACAGCAAUCUACUCGACCUGCAGGCAUGCAAGCUUGGCGU-BHQ2-3′ (SEQ ID NO: 853), and Cas13d-crRNA (SEQ ID NO: 854).
The ssRNA target sequence and crRNA for determining collateral cleavage are: ssRNA (SEQ ID NO: 853), Cas13d-crRNA (SEQ ID NO: 854), and Collateral RNA-FMA-Labeled:
This experiment, based on using 4 different gRNA (g1-g4) targeting EGFP, demonstrates that cfCas13e has similarly high gRNA-guided target RNA cleavage as the wild-type Cas13e, yet exhibits no significant collateral effect. See
Purified wild-type Cas13e, cfCas13e, and dCas13e were used to assess in vitro collateral effect as well as gRNA-guided target RNA cleavage. The results showed that cfCas13e did not exhibit any detectable collateral effect (
The ssRNA target sequence and crRNA for determining gRNA-directed cleavage are:
ssRNA-cy5-Labeled: 5′-CY5-GGCCAGUGAAUUCGAGCUCGGUACCCGGGGAUCCUCUAG AAAUAUGGAUUACUUGGUAGAACAGCAAUCUACUCGACCUGCAGGCAUGCAAGCUUGGCGU-BHQ2-3′ (SEQ ID NO: 859), and Cas13e-crRNA (SEQ ID NO: 860).
The ssRNA target sequence and crRNA for determining collateral cleavage are: ssRNA (SEQ ID NO: 861), Cas13e-crRNA (SEQ ID NO: 862), and collateral RNA-FMA-Labeled:
To evaluate whether the expression level of endogenous genes could affect the extend of collateral effects by Cas13d, a panel of 23 endogenous genes with diverse roles and differential expression levels in mammalian cells were selected. For each transcript, 1-6 gRNAs were then designed (
HEK293 cells were transfected with an all-in-one construct containing Cas13d, EGFP, mCherry, non-target (NT) gRNA, or a gRNA targeting each endogenous gene, and another construct containing BFP driven by CAG promoter. BFP was used here for normalizing transfection efficiency. About 48 hours post-transfection, the EGFP and mCherry fluorescence intensity was examined for the collateral effects and target transcript level for RNA knockdown activity (
In general, increased expression level of the endogenous genes were associated with more prominent collateral effects induced by Cas13d (
Three individual highly expressed transcripts were selected, with four gRNAs from these endogenous genes for further characterization: RPL4-gRNA1, PPIA-gRNA1, PPIA-gRNA2, and RPS5-gRNA1. It was found that consistent notable reduced fluorescence intensity in Cas13d group but not in cfCas13d group, when compared with dCas13d group (
Meanwhile, for one medium expressed and one low expressed transcript with target gRNA: CA2-gRNA1 and B4GALNT1-gRNA1, reduced fluorescence intensity was slightly detectable in Cas13d group, but not in cfCas13d group (
Consistently, both Cas13d and cfCas13d targeting exhibited robust knockdown of these genes, as confirmed by qPCR analysis (
These results indicate that collateral effects induced by Cas13-mediated knockdown were correlated with gene expression levels, and these collateral effects could be eliminated by cfCas13d.
To confirm that RNA interference activity by cfCas13d is still broadly applicable, cfCas13d and Cas13d were tested on randomly selected 14 endogenous transcripts in HEK293 cells. It was found that cfCas13d and Cas13d exhibited comparable efficient RNA knockdown activity (82±2% and 93±1%, respectively), indicating that cfCas13d retained high-level activity of RNA interference on most endogenous genes (
Taken together, these results indicate that cfCas13d exhibits high RNA interference activity with rare collateral effects, which would maximize its applications.
On the other hand, multiple low-fidelity Cas13 variants exhibiting increased dual cleavage activity were obtained (bottom left in
To comprehensively detect the collateral effects by Cas13d/cfCas13d-mediated knockdown, transcriptome-wide RNA sequencing (RNA-seq) was performed in Cas13d-, cfCas13d- or dCas13d-treated HEK293 cells.
Significantly widespread off-target transcriptional changes were identified in cells that expressed Cas13d with RPL4 gRNA3 relative to dCas13d control (2007/6750 significant up/down-regulated genes, respectively), along with significant RPL4 on-target knockdown. Scatter plots of differential transcript levels between Cas13d and dCas13d-mediated RPL4, PPIA, CA2, or PPARG knockdown as determined by RNA sequencing (n=3) were not shown. Among these significant changes, 1 out of 11 predicted RPL4 gRNA-dependent off-target transcripts was identified (RPL4P5, a processed pseudogenes) (
Compared with dCas13d control, numerous off-target changes induced by Cas13d were found when targeting PPIA, CA2 or PPARG (
Additionally, among those significantly down-regulated changes between Cas13d group and dCas13d group, targeting genes with relatively high expression level (RPL4, PPIA) induced more collateral cleavages than targeting genes with relatively low expression level (CA2, PPARG), and those collateral cleavages induced more RNA transcripts knockdown on high expressed genes than low expressed genes (data not shown—statisticalally reduced counts of down-regulated transcripts induced by Cas13d-mediated RPL4, PPIA knockdown, compared to dCas13d. Reduced counts were correlated to expression level of endogenous transcripts), in agreement with the previous results (
Compared with Cas13d, cfCas13d remarkably reduced off-target changes when targeting RPL4 (down-regulated genes, 6750 vs. 39), PPIA (9289 vs. 8), CA2 (3519 vs. 18), and PPARG (1601 vs. 52). In addition, cfCas13d could also target predicted gRNA-dependent off-target sites as Cas13d, indicating mutations in cfCas13d decrease collateral off-target cleavage but not gRNA-dependent off-target cleavage (
Those results suggest that cfCas13d almost eliminates off-target edits induced by Cas13d collateral activity, and those gRNA-dependent off-target could be eliminated via optimization of the design on gRNAs.
Further analysis showed that those down-regulated genes induced by CasRx targeting RPL4/PPIA gRNA were mostly distributed in metabolism, biosynthetic process, cell cycle and signal transduction pathways, while cfCasRx exhibited notable decreased off-target changes in these processes (
When targeting RPL4, though some genes were similarly down-regulated (e.g., TP53BP2, ZMPSTE24 and FAM157C) or up-regulated (e.g., PPP1R3F), large number of unique genes were only changed in ether RPL4-g1 group or RPL4-g3 group.
Moreover, no overlaps of down-regulated or up-regulated genes were found between PPIA-g1 group and PPIA-g2 group when targeting PPIA. In addition, most of up-regulated genes from Cas13d targeting RPL4/PPIA were enriched in nucleosome assembly and gene expression pathways, related to cellular stress regulation after cleavage events (data not shown—bulk RNA-seq analysis of genes with differential expression level by Cas13d/cfCas13d targeting RPL4/PPIA, showing clustering analysis of genes with up-regulation induced by Cas13d targeting RPL4/PPIA).
Those suggested that collateral effects of Cas13d-mediated RNA reduction may inhibit cell growth, consistent with previous reports that massive host transcripts degradation induced by Cas13 result in cell retarded growth and dormancy.
These findings showed that cfCas13d maintained high specificity of on-target knockdown but collateral effects induced by Cas13d-mediated RNA knockdown were greatly reduced or even completely eliminated.
To further determine the cellular functional impact due to collateral effects induced by Cas13d-mediated RNA knockdown in vivo, stable cell lines were constructed by using the piggyBac transposon system with doxycycline (dox)-inducible Cas13d/cfCas13d/dCas13d expression targeting RPL4 (
Upon dox treatment, it was found that the cell clone carrying Cas13d had a significant retardation on cell growth and a notable decrease of RPL4 transcripts.
By contrast, the cell clone carrying cfCas13d exhibited no such changes on cell growth, along with a similar significant decrease of RPL4 transcripts (
These findings showed that collateral effects induced by Cas13d-mediated RNA knockdown in HEK293T cells could lead to severe cell growth retardation. Meanwhile, target RNA knockdown with a high-fidelity cfCas13d relieves cell growth stagnation.
Age-related macular degeneration (AMD), a progressive condition that is untreatable in up to 90% of patients, is a leading cause of blindness in the elderly worldwide. The two forms of AMD, wet and dry, are classified based on the presence or absence of blood vessels that have disruptively invaded the retina, respectively. Though wet AMD affects only 10-15% of AMD patients, it emerges abruptly, and rapidly progresses to blindness if left untreated. A detailed understanding of the molecular mechanisms underlying wet AMD has led to several robust FDA-approved therapies.
Wet AMD is typified by choroidal neovascularization (CNV), wherein newly immature blood vessels grow towards the outer retina from the underlying choroid, through a break in the Bruch membrane into the sub-retinal pigment epithelium (sub-RPE) or subretinal space. CNV is a major cause of visual loss.
Research in the late 1980s and early 1990s revealed the central role of VEGF in vascular biology, which lead to the development of the first FDA-approved anti-VEGF-A treatment for wet AMD—the monoclonal antibody Avastin (bevacizumab by Genentech). Most recently, in 2011, Eylea (VEGF-TRAP-Eye; aflibercept; Regeneron) received FDA approval for treatment of CNV. Aflibercept is a recombinant fusion protein consisting of VEGF-binding portions from the extracellular domains of human VEGF receptors 1 and 2, that are fused to the Fc portion of the human IgG1 immunoglobulin. It binds to circulating VEGFs and acts like a “VEGF trap” to inhibit the activity of VEGF-A and VEGF-B, as well as to placental growth factor (PGF), thus inhibiting the growth of new blood vessels in the choriocapillaris.
In late 2013, Chengdu Kanghong Pharmaceutical Group gained China Food and Drug Administration (CFDA) approval of Conbercept for the treatment of exudative macular degeneration. Like Conbercept is a recombinant fusion protein composed of the second Ig domain of VEGFR1 and the third and fourth Ig domains of VEGFR2 to the constant region (Fc) of human IgG1.
This example utilizes a mouse model of wet AMD to show that cfCas13e, just like wild-type Cas13e, can efficiently knock down VEGFA to reduce CNV.
Two VEGFA-targeting guide RNA molecules, gRNA-1 (g1) and gRNA-2 (g2), were previously identified to be able to direct high efficiency gRNA-guided VEGFA mRNA cleavage and expression knock down in mammalian cells, especially when they are used in combination (g1+g2). The corresponding DNA sequences of the gRNA are: gRNA-1 (g1) (SEQ ID NO: 879) and gRNA-2 (g2) (SEQ ID NO: 880).
In this experiment, coding sequence for cfCas13e (including two NLS sequences at the N- and C-terminus, under the EFS promoter) and the two gRNA's (g1+g2, under the control of the U6 promoter) were incorporated between the two ITR sequences of an AAV9 viral vector (with AAV9 serotype). Viral particles were injected directly into mouse subretinal space. After 21 days, laser light was used on the eyes of the experimental mouse to imitate UV-induced AMD. Seven days later, the extent of CNV in the experimental animals were determined (see
In
As another control, certain control animals were also treated, at the time of laser treatment, either Aflibercept or Conbercept (
In this experiment, the ITR sequence for the AAV9 viral vector is SEQ ID NO: 881, and the nucleotide sequence of the EFS promoter used to drive cfCas13e expression is SEQ ID NO: 882.
In summary, by combining analysis of 3D structure and protein sequence, Applicant has designed, constructed, and obtained by screening numerous mutant Cas13 variants with reduced or eliminated collateral effect (as well as variants with enhanced collateral effects). The guide RNA-mediated functions of these Cas13e and Cas13d mutants/variants have been verified by in vitro biochemical reactions, endogenous gene expression knock down in mammalian cells, as well as gene therapy in an in vivo mouse model of AMD.
These results demonstrate that the collateral effects of the Cas13 family proteins, including but not limited to Cas13d and Cas13e, can be engineered according to the methods and examples of the invention by, for example, introducing point mutations in and around the RXXXXH catalytic active sites within the HEPN domains (HEPN1 and HEPN2). These introduced mutations may not affect binding between the respective cfCas13 protein and the cognate gRNA, such that the cfCas13 mutants can still be activated to cleave target RNA in a gRNA-dependent manner. Meanwhile, the cfCas13 mutants have greatly reduced collateral effect compared to the corresponding wild-type Cas13, thus eliminating one significant risk of using Cas13 in gene therapy. A possible (non-limiting) mechanism of how cfCas13 mutants operate is illustrated in
Materials and Methods for the examples are provided below.
The Cas13d (CasRx) gene and gRNA backbone sequences were synthesized by a commercial source. Vectors CAG-Cas13d-p2A-GFP and U6-DR-BpiI-BpiI-DR-EF1α-mCherry were generated to knockdown target genes by transient transfection. The gRNA oligos were annealed and ligated into BpiI sites. The gRNA sequences were listed below.
HEK293T cell lines were purchased from Stem Cell Bank, Chinese Academy of Sciences. HEK293T cell lines were cultured with DMEM (Gibco) supplemented with 10% fetal bovine serum (Gibco), 1% penicillin/streptomycin (Thermo Fisher Scientific) and 0.1 mM non-essential amino acids (Gibco) in an incubator at 37° C. with 5% CO2. When cells reached 90% confluence, HEK293T cells were passaged at a ratio of 1:4 to 12-well plates. After 12 hr, 2 μg/well plasmids were transfected into cells with Lipofectamine 3000 (Thermo Fisher Scientific) using the standard protocol. 48 hr after transfection, 50,000 of both EGFP and mCherry positive cells were sorted by BD FACS Aria II for RNA extraction. For the groups of mCherry knockdown, total cells of the 12-well plate were collected for RNA extraction. Flow cytometry results were analyzed with FlowJo V10.5.3. For transgene cell lines, cells were expanded cultivation for dox (1 μg/mL) induction.
Total RNA was extracted by adding 500 μL Trizol (Invitrogen), 200 μL chloroform to the cells. After centrifuge at 12,000 rpm for 15 min at 4° C., the supernatant was transferred to a 1.5 mL RNase-free tube. 100% isopropanol and 75% alcohol were added to precipitate and purify the RNA. cDNA was prepared using HiScript Q RT SuperMix for qPCR (Vazyme, Biotech) according to manufacturer's instructions.
qPCR reactions were performed with AceQ qPCR SYBR Green Master Mix (Vazyme, Biotech). All of the reagents were precooled in advance. qPCR results were analyzed with—ΔΔCt method.
Unbiased all-in-one vectors CAG-Cas13d-U6-DR-gRNA-SV40-EGFP-SV40-mCherry and CMV-Cas13e-SV40-EGFP-SV40-mCherry-U6-DR-gRNA-DR, of which the gRNA target EGFP, were generated firstly. Then, 21 BpiI-harbouring Cas13 mutants, each spanning 36 amino acids, were introduced via site-directed mutagenesis by PCR and Gibson Assembly method using NEBuilder HiFi DNA Assembly Master Mix (New England BioLabs).
For Cas13d, to cover all the mutable regions, over a hundred of mutants with four or five random amino acid substitutions (replacing all non-alanine to alanine, X>A, and alanine to valine, A>V) were designed and generated by ligating two phosphorylated oligos (one wild-type oligo and the other mutant oligo) into corresponding BpiI-digested backbones.
To identify roles of amino acids within or nearby mutant N2V8 and N2V7, one more 17-amino-acid-span BpiI-harbouring Cas13 mutants N2R was generated, then single, double, triple or quadruple mutations were introduced by ligating annealed mutant oligos into corresponding BpiI-digested backbones.
For Cas13e, rationally designed mutants with four or five random amino acid substitutions in two regions (M17 and M18) were generated by ligating annealed mutant oligos into corresponding BpiI-digested backbones.
I-TASSER were used to perform the protein structure prediction.
Cas13 mutants screening was conducted in 48-well plates, and consolidation performed in 24-well plates. The day before transfection for screening, plate 3×104 cells per well in 0.25 mL of complete growth medium. After 12 hours., 0.5 μg plasmids were transfected into HEK293 cells with 1.25 μg PEI (DNA:PEI=1:2.5).
For 24-well plates, 1×105 cells were plated per well in 0.5 mL of complete growth medium, 0.8 μm plasmids were transfected into HEK293 cells with 2.5 μg PEI. 48 hours after transfection, cells were analyzed by BD FACS Aria II. Flow cytometry results were analyzed with FlowJo V10.5.3.
Cas13 protein purification was performed according to protocol as previously described. The humanized codon-optimized gene for Cas13d/cfCas13d/Cas13e/cfCas13e was synthesized (Huagene) and cloned into a bacterial expression vector (pC013-Twinstrep-SUMO-huLwCas13a, Plasmid #90097) after the plasmid digestion by BamHI and NotI with NEBuilder HiFi DNA Assembly Cloning Kit (New England Biolabs).
The expression constructs were transformed into BL21 (DE3) (TIANGEN) cells. One liter of LB Broth growth media (Tryptone 10.0 g; Yeast Extract 5.0 g; NaCl 10.0 g, Sangon Biotech) was inoculated with ten mL of 12 hr growing culture. Cells were then grown to a cell density A600 of 0.6 at 37° C., and then SUMO-Cas13 proteins expression was induced by supplementing with 500 mM IPTG. The induced cells were grown at 16° C. for 16-18 hours before harvest by centrifuge (4,000 rpm, 20 min). Collected cells were resuspended in Buffer W (Strep-Tactin Purification Buffer Set, IBA) and lysed using ultrasonic homogenizer (Scientz).
Cell debris was removed by centrifugation and the clear lysate was loaded onto StrepTactin Sepharose High Performance Column (StrepTrap HP, GE Healthcare). The non-specific binding protein and contaminants were flowed through. The target proteins were eluted with Elution Buffer (Strep-Tactin Purification Buffer Set, IBA). The N-terminal 6× His/Twinstrep-SUMO tag (“6× His” disclosed as SEQ ID NO: 947) was removed by SUMO protease (4° C., >20 hours). Then target proteins were subjected to a final polishing step by gel filtration (S200, GEHealthcare). The purity of >95% was assessed by SDS-PAGE.
Cas13 on-Target and Collateral Cleavage Activity Assay
Fluorescent labeled ssRNA reporter assay for Cas13 nuclease activity was performed as previously described. For on-target cleavage activity analysis, assays were performed with 45 nM purified Cas13d/cfCas13d/Cas13e/cfCas13e, 22.5 nM crRNA, 125 nM quenched fluorescent RNA reporter (Sangon Biotech), 1 μL murine RNase inhibitor (New England Biolabs), 100 ng of background total human RNA (purified from HEK293T cell culture), and varying amounts of input nucleic acid target, unless otherwise indicated, in nuclease assay buffer (40 mM Tris-HCl including 25 mM Tris-HCl, pH7.5 and 25 mM Tris-HCl, pH7.0, 60 mM NaCl, 6 mM MgCl2, pH 7.3). Reactions were allowed to proceed for 1-3 hr at 37° C. on a fluorescent plate reader (Analytik Jena) with fluorescent kinetics measured every 5 min.
For transcriptome sequencing, 35 μg all-in-one plasmids were transfected into HEK293 cells cultured in 10-cm dishes. Then 600,000 dual-positive EGFP+/mCherry+ (top 15%) cells were sorted out to make a pool for sequencing. Total RNA was extracted with TRIZOL-based method, fragmented and reverse transcribed to cDNAs with HiScript Q RT SuperMix for qPCR (Vazyme, Biotech) according to manufacturer's instructions. RNA-seq library was generated and quality was assessed using Illumina Hiseq X-ten platform in Novogene. Differential analysis among cell groups (RPL4 gRNA1, RPL4 gRNA3, PPIA gRNA1, PPIA gRNA2, CA2 gRNA1, and PPARG gRNA1) was done by a count-based method limma, which is implemented in R and voom is involved for normalization. Significantly expressed genes were first screened by BH-adjusted P value 0.05, further filtered with 2 fold-change. After enrichment analysis with GSEA v3.0 (Broad Institute, PreRanked mode), and the t-statistical output from limma as the metrics for ranking, 1,000 gene sets permutations were set as default, and gene sets were obtained through collecting pathways from KEGG and biological processes from GO. A gene set with an FDR P value<0.05 will be considered as significant enrichment.
Growth curve
Single cell clones with dCas13d/Cas13d/cfCas13d and RPL4 gRNA were plated on a 24-well plate at 2×105 cells/mL with or without dox treated (1 μg/mL). Cell were collected at 24, 48, 72, 96 and 120 hrs. Cell number was counted by an automated cell counter (C10311, Invitrogen). Experiments were performed for three replicates.
Cell proliferation was assessed by using a colorimetric thiazolyl blue (MTT) assay. Briefly, single cell clones with dCas13d/Cas13d/cfCas13d and RPL4 gRNA were treated with or without dox treated (1 μg/mL) for 0, 24, 48, 72, 96 or 120 hrs. Then each group of cells was collected and further plated on a 24-well plate at 2×105 cells/mL with or without dox treated (1 μg/mL). After an incubation period of 24 hrs at 37° C., the tetrazolium salt MTT (Sigma-Chemie) was added to a final concentration of 2 μg/mL, and incubation was continued for 4 hrs. Cells were washed 3 times and finally lysed with dimethyl sulfoxide. Metabolization of MTT directly correlates with the cell number and was quantitated by measuring the absorbance at 550 nm (reference wavelength, 690 nm) by using a microplate reader (type 7500; Cambridge Technology, Watertown, Mass.). Experiments were performed for five replicates.
Statisticalal tests performed by Graphpad Prism 8 included the two-tailed unpaired two-sample t-test or the log-rank Mantel-Cox test. The respective statisticalal test used for each figure is noted in the corresponding figure legends and significant statisticalal differences are noted as *P<0.05, **P<0.01, ***P<0.001. All values are reported as mean±s.e.m.
Collateral RNA degradation by the Cas13 family of effector enzymes has previously been found in glioma cells, flies and mammalian cells. Based on the fast and sensitive dual-fluorescence reporter system for detecting collateral effects as described herein, this example demonstrates that Cas13f could indeed induce substantial collateral effects in HEK293T cells. The example also demonstrates that the collateral effects of other Cas13f can also be diminished (if not eliminated) via mutagenesis, based on the finding that changing RNA-binding cleft proximal to catalytic sites RXXXXH in HEPN domains may selectively decrease promiscuous RNA binding and non-target cleavage, while maintaining on-target RNA cleavage.
Specifically, to evaluate the collateral effects of Cas13f in mammalian cells, different Cas13f variants were co-transfected with EGFP and mCherry coding sequences, together with targeted (against EGFP) guide RNA (gRNA) into HEK293T cells. Expression levels of the targeted EGFP and the non-targeted mCherry were measured 48 hrs after transfection (
A publically available online tool TASSER was used to predict the 3D structure of Cas13f, and the predicted structure was visualized with PyMOL in order to determine the position of the various structual domains in 3D (see
Then an unbiased screening system was designed based on the dual-fluorescence system described herein, in which coding sequences for EGFP, mCherry, EGFP-targeting gRNA, together with each Cas13 variants, were inserted into a plasmid for expression in 293T cells. In this system, expression of EGFP and expression of mCherry were driven by the same SV40 promoter, in order to ensure roughly equally stable expression of the reporter genes in the transfected host cell. The gRNA was chosen to be specific for EGFP mRNA. Each coding sequence for Cas13f and variants has an N-terminal and a C-terminal nuclear localization signal (NLS), and expression of Cas13f and variants/mutants was driven by the strong CAG promoter.
The EGFP and mCherry coding sequences are SEQ ID NOs: 1 and 2, respectively. The corresponding DNA sequence of the gRNA is SEQ ID NO: 3. The SV40 promoter sequence is SEQ ID NO: 104. The wild-type Cas13f protein sequence is SEQ ID NO: 52. The CAG promoter sequence is SEQ ID NO: 103.
The HEPN1, HEPN2, Helical1 and Helical2 domains of Cas13f were chosen for generating a Cas13f mutagenesis library. First, these regions were divided into 47 small segments (F1-F47), each with about 17 residues (
To facilitate subsequent selection, a BpiI restriction enzyme recognition site (GTCTTC, corresponding to encoded residues VF; reverse complement GAAGAC, corresponding to encoded residues ED) was introduced at each end of the segments. When producing mutants, all non-Ala residues were substituted by Ala, and all Ala residues were substituted by Val (e.g., replacing all non-alanine to alanine, X>A, and alanine to valine, A>V). About 4-5 total mutations were introduced between the two BpiI sites flanking each segment. The various mutants so generated and their corresponding wild-type sequences are provided below.
Using the EGFP-mCherry dual-fluorescence reporter system of the invention, these Cas13f mutants were functionally screened to assess their collateral vs. gRNA-guided cleavage activities. Specifically, according to standard cell culture methods, human HEK293 cells were grown in 24-well tissue culture plates to a suitable density before the cells were transfected with PEI reagents and plasmids that express each mutant Cas13f and the reporter system fluorescent proteins. Transfected cells were cultured at 37° C. in incubator under 5% CO2 for about 48 hours, before measuring EGFP and mCherry signals in the cells with FACS. Mutants leading to low percentage of the gRNA-targeted EGFP signal (lower percentage of EGFP+ cells, as a readout for preserved gRNA-guided cleavage) and high percentage of non-targeted mCherry signal (higher percentage of mCherry+ cells, as a readout for lacking collateral effect) were selected.
In this experiment, dCas13f with no gRNA-guided cleavage was used as a negative control, and the results (mean±s.e.m.) were normalized against that of dCas13f and listed below. Cas13f mutants/variants located at the upper left area of
After normalization of EGFP and mCherry fluorescence intensity by inactive dead Cas13f (dCas13f with R77A, H82A, R764A, and H769A mutations in HEPN domains), it was found that variants with mutation sites in F10, F38, F40, or F46, specially F10V1, F10V4, F38V2, F40V2, F40V4, F46V1 and F46V3, exhibited relatively low EGFP fluorescence intensity but much higher (or lower) mCherry fluorescence intensity compared to wild-type, indicating that these variants retained a high on-target activity but greatly reduced (or enhanced) collateral activity (
Further mutagenesis study in or nearby these regions (F10V1, F10V4, F38V2, F40V2, F40V4, F46V1 and F46V3) of these mutants was conducted, by generating a number of additional mutants with single or multiple (e.g., double, triple, or quadruple) combination mutations. The sequences of these mutants/variants are listed below:
In this experiment, dCas13f with no gRNA-guided cleavage was used as a negative control, and the results (mean±s.e.m.) were normalized against that of dCas13f and listed below. Cas13f mutants located at the upper left area of
Overall, some of the Cas13f mutants exhibited low collateral effect (e.g., ≤25% collateral effect, or ≥75% mCherry+ cells), and high (e.g., EGFP+ cells≤25%) to intermediate gRNA-guided cleavage (e.g., 25%≤EGFP+ cells≤75%) including: F40S23 ((Y666A,Y677A), SEQ ID NO: 1635) and F40S27, etc (see below table and
Other mutants/variants retained high gRNA-guided cleavage (e.g., EGFP+ cells≤25%), but also exhibited higher than wild-type level collateral activity (e.g., ≤25% mCherry+ cells). See tables above. These mutants/variants may be useful for better/more sensitivity detection methods such as SHERLOCK.
Number | Date | Country | Kind |
---|---|---|---|
PCT/CN2020/119559 | Sep 2020 | CN | national |
PCT/CN2021/079821 | Mar 2021 | CN | national |
The instant application is a continuation application, filed under 35 U.S.C. 111(a), of International Patent Application No. PCT/CN2021/121926, filed on Sep. 29, 2021, which claims foreign priority under 35 U.S.C. 365(b), to International Patent Application No. PCT/CN2021/079821, filed on Mar. 9, 2021, and International Patent Application No. PCT/CN2020/119559, filed on Sep. 30, 2020, the entire contents of each of the above-referenced applications, including any sequence listing and drawings, are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2021/121926 | Sep 2021 | US |
Child | 17836175 | US |