This application contains a Sequence Listing in computer readable form. The computer readable form is incorporated herein by reference. Said ASCII copy, created on Nov. 2, 2021, is named 146401_091527_SL.txt and is 430,186 bytes in size.
Prokaryotes have adaptive immune systems in place that utilize CRISPR (clustered regularly interspaced short palindromic repeats) and CRISPR-associated (Cas) proteins for RNA-guided nucleic acid cleavage to confer resistance to foreign genetic elements. The CRISPR-Cas systems act to confer adaptive immunity in bacteria and archaea via RNA-guided nucleic acid interference. To provide immunity against invaders, processed CRISPR array transcripts (crRNAs) assemble with Cas protein-containing surveillance complexes that recognize nucleic acids bearing sequence complementarity to the invader's derived segment of the crRNAs, known as the spacer.
Class 2 CRISPR-Cas systems are streamlined versions in which a single Cas protein (an effector endonuclease protein) bound to RNA is responsible for binding to and cleavage of a targeted sequence. The programmable nature of these minimal systems has facilitated their use as a versatile technology that continues to revolutionize the field of genome manipulation.
There however is a need for improved Class 2 CRISPR-Cas RNA-guided endonuclease variants. Provided herein are such variants, methods of making, methods of testing, and methods of using the same.
Provided herein are novel Class 2 Type II, Type V, and Type VI CRISPR-Cas RNA-guided proteins, methods of making, and methods of use. Also provided herein are engineered systems comprising the same.
In various embodiments, provided herein are compositions, pharmaceutical compositions, vectors, host cells, and kits comprising any of the proteins or polynucleotides of the engineered systems described herein.
Provided herein are novel Class 2 Type II, Type V, and Type VI CRISPR-Cas RNA-guided proteins, methods of making, and methods of use. Also provided herein are engineered systems comprising the same.
In various embodiments, provided herein are compositions, pharmaceutical compositions, vectors, host cells, and kits comprising any of the proteins or polynucleotides of the engineered systems described herein.
The disclosure relates to an engineered system that comprises a Class 2 CRISPR-Cas endonuclease or a nucleic acid encoding the endonuclease and a a gRNA or a nucleic acid encoding the gran. The Class 2 CRISPR-Cas endonuclease can be a Class 2 Type II CRISPR-Cas endonuclease comprising at least one of the RuvC sequences of Table 7, or a sequence comprising at least 60% sequence identity thereto. The Class 2 CRISPR-Cas endonuclease can be a Class 2 Type V CRISPR-Cas endonuclease comprising at least one of the RuvC sequences of Table 1, or a sequence comprising at least 60% sequence identity thereto. The Class 2 CRISPR-Cas endonuclease can be a Class 2 Type VI CRISPR-Cas endonuclease comprising at least one of the HEPN sequences of Table 4, or a sequence comprising at least 60% sequence identity thereto. The gRNA and the Class 2 CRISPR-Cas endonuclease generally do not naturally occur together. The gRNA can be capable of hybridizing to a target sequence in a target DNA or RNA. The gRNA can be capable of forming a complex with the Class 2 CRISPR-Cas endonuclease endonuclease.
The engineered system disclosed herein can comprise a Class 2 Type II CRISPR-Cas endonuclease; and a Class 2 Type II CRISPR-Cas gRNA. The gRNA can be a single-molecule gRNA. The gRNA can be a dual-molecule gRNA.
The endonuclease can be a Class 2 Type II CRISPR-Cas endonuclease comprising at least one of the RuvC or HNH sequences of Table 7, or a sequence comprising at least 60% sequence identity thereto or is a Class 2 Type V CRISPR-Cas endonuclease comprising at least one of the RuvC or HNH sequences of Table 1, or a sequence comprising at least 60% sequence identity thereto, and the target is target DNA.
The endonuclease is a Class 2 Type VI CRISPR-Cas endonuclease comprising at least one of the HEPN sequences of Table 4, or a sequence comprising at least 60% sequence identity thereto, and the target is target RNA.
The target RNA mRNA, tRNA, rRNA, miRNA, or siRNA.
The Class 2 Type II CRISPR-Cas endonuclease can comprise any one of SEQ ID NOS: 16-19, or a sequence comprising at least 60% sequence identity thereto. The Class 2 Type V CRISPR-Cas endonuclease can comprises any one of SEQ ID NOS: 1-7 or 20, or a sequence comprising at least 60% sequence identity thereto. The Class 2 Type VI CRISPR-Cas endonuclease can comprises any one of SEQ ID NOS: 8-15, or a sequence comprising at least 60% sequence identity thereto.
The disclosure relates to an engineered single-molecule gRNA that comprises a
targeter-RNA comprising a spacer sequence that is capable of hybridizing with a target sequence in a target DNA; and an activator-RNA that is capable of hybridizing with the targeter-RNA to form a double-stranded RNA duplex, the activator-RNA comprising a activator-RNA. The targeter-RNA and the activator-RNA can be covalently linked to one another. The single-molecule gRNA can be capable of forming a complex with a Class 2 Type II endonuclease. Hybridization of the spacer sequence to the target sequence can be capable of targeting the endonuclease to a target DNA. The Class 2 Type II CRISPR-Cas endonuclease can comprise at least one of the RuvC or HNH sequences of Table 7, or a sequence comprising at least 60% sequence identity thereto. The Class 2 Type II CRISPR-Cas endonuclease can comprise any one of SEQ ID NOS: 16-19, or a sequence comprising at least 60% sequence identity thereto. The targeter-RNA and the activator-RNA can be arranged in a 5′ to 3′ orientation. The activator-RNA and the targeter-RNA can be arranged in a 5′ to 3′ orientation. The targeter-RNA and the activator-RNA can be covalently linked to one another via a linker. The single-molecule gRNA can comprise one or more sequence modifications compared to a sequence of a corresponding wild type tracrRNA and/or crRNA. The targeter-RNA can comprise a spacer sequence of about 10-50 nucleotides that have 100% complementarity to a sequence in the target DNA. The targeter-RNA can comprise a spacer sequence of about 10-50 nucleotides that has less than 100% complementarity to a sequence in the target DNA.
Disclosed herein are methods of modifying a target DNA or RNA. The method can comprise contacting the target DNA with a CRISPR-Cas endonuclease system disclosed herein. The gRNA can hybridize with the target sequence, and modification of the target DNA or RNA occurs. The target can be RNA. The target can be mRNA, tRNA, rRNA, miRNA, or siRNA. The target can be DNA. The target DNA can be extrachromosomal DNA. The target DNA can be part of a chromosome. The target DNA can be part of a chromosome in vitro. The target DNA can be part of a chromosome in vivo. The target DNA or RNA can be outside a cell. The target DNA or RNA can be inside a cell. The target DNA or RNA can comprise a gene and/or its regulatory region.
The cell can be selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell.
The modifying can comprise introducing a double strand break in a target DNA. The contacting can occur under conditions that are permissive for non-homologous end joining or homology-directed repair. The contacting can be with a target DNA to a donor polynucleotide. The donor polynucleotide, a portion of the donor polynucleotide, a copy of the donor polynucleotide, or a portion of a copy of the donor polynucleotide integrates into the target DNA. The method ma not comprise contacting the cell with a donor polynucleotide, or wherein the target DNA is modified such that nucleotides within the target DNA are deleted.
Disclosed herein are methods of detecting a target nucleic acid a sample, the method comprising contacting the sample with a Class 2 Type V CRISPR-Cas endonuclease comprising at least one of the RuvC sequences of Table 1, or a sequence comprising at least 60% sequence identity thereto; or a Class 2 Type VI CRISPR-Cas endonuclease comprising at least one of the HEPN sequences of Table 4, or a sequence comprising at least 60% sequence identity thereto, and a gRNA comprising a spacer sequence that is capable of hybridizing with a target sequence in a target nucleic acid; and a labeled detector that does not hybridize with the spacer sequence of the gRNA; and measuring a detectable signal produced by cleavage of the labeled detector by the endonuclease, thereby detecting the target nucleic acid. The Class 2 Type V CRISPR-Cas endonuclease can comprise any one of SEQ ID NOS: 1-7 or 20, or a sequence comprising at least 60% sequence identity thereto. The Class 2 Type VI CRISPR-Cas endonuclease comprises any one of SEQ ID NOS: 8-15, or a sequence comprising at least 60% sequence identity thereto. The labeled detector can comprise a labeled single stranded DNA. The labeled detector can comprise a labeled RNA. The labeled RNA can be a single stranded RNA. The labeled detector can comprise a labeled single stranded DNA/RNA chimera. The labeled detector can comprise one or more modified nucleotides. The target nucleic acid can be a single stranded DNA. The target nucleic acid can be double stranded DNA. The target nucleic acid can be single stranded RNA. The target nucleic acid can be viral, plant, fungal, or bacterial. The target sequence can be a sequence of a target provided in any of Tables 10a-10f. The target can be a coronvavirus. The target can be a SARS-CoV-2 virus. The target nucleic acid can be cDNA. The target nucleic acid can be from a human cell. The target nucleic acid can be from a human fetus or cancer cell. The sample can comprises cells. The sample can be urine, blood, serum, plasma, lymphatic fluid, cerebrospinal fluid, saliva, nasopharyngeal, oropharyngeal, nasopharyngeal/oropharyngeal, aspirate, or biopsy sample.
The methods disclosed herein can comprise determining an amount of the target nucleic acid present in the sample. Measuring a detectable signal can comprise one or more of: visual based detection, sensor based detection, color detection, gold nanoparticle based detection, fluorescence polarization, colloid phase transition/dispersion, electrochemical detection, and semiconductor-based sensing. The labeled detector can comprise a modified nucleobase, a modified sugar moiety, and/or a modified nucleic acid linkage. The detectable signal can be detectable in less than 15, 30, 45, 60, 90, 120, 150, 180, 210, or 240 minutes. The method can further comprise an amplification step selected from loop-mediated isothermal amplification (LAMP), helicase-dependent amplification (HDA), recombinase polymerase amplification (RPA), strand displacement amplification (SDA), nucleic acid sequence-based amplification (NASBA), transcription mediated amplification (TMA), nicking enzyme amplification reaction (NEAR), rolling circle amplification (RCA), multiple displacement amplification (MDA), Ramification (RAM), circular helicase-dependent amplification (cHDA), single primer isothermal amplification (SPIA), signal mediated amplification of RNA technology (SMART), self-sustained sequence replication (3SR), genome exponential amplification reaction (GEAR), and isothermal multiple displacement amplification (IMDA).
The target nucleic acid in the sample can be present at a concentration of less than 100 μM.
Disclosed herein are endonucleases comprising an amino acid sequence with 30%-99.5% homology to any one of SEQ ID NOs: 1-20.
Disclosed herein are compositions comprising a endonucleases described herein, and optionally a pharmaceutically acceptable carrier. The composition can comprise an endonucleases, optionally comprising a pharmaceutically acceptable carrier, a nucleic acid stabilizing buffer and/or or a endonuclease stabilizing buffer. The endonuclease can be lyopholized, and optionally further comprises any one or more of a labeled detector, a reverse transcriptase enzyme, and reagents for loop-mediated isothermal amplification.
The disclosure can comprise a recombinant expression vector comprising a DNA polynucleotide. The recombinant expression vector o can comprise nucleotide sequences encoding a single endonuclease that operably linked to a promoter.
A host cell comprising the DNA polynucleotide. A kit comprising one or more components of any of the engineered systems described herein. One or more components can be lyopholized. The one or more components can further comprise, a labeled reporter, and a gRNA directed to SARS-CoV-2.
Provided herein are novel Class 2 Type II, V, and VI CRISPR-Cas RNA-guided endonucleases, systems, methods of making, and methods of use.
The terms “polynucleotide” and “nucleic acid,” used interchangeably herein, refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. Thus, terms “polynucleotide” and “nucleic acid” encompass single-stranded DNA; double-stranded DNA; multi-stranded DNA; single-stranded RNA; double-stranded RNA; multi-stranded RNA; genomic DNA; cDNA; DNA-RNA hybrids; and a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
By “hybridizable” or “complementary” or “substantially complementary” it is meant that a nucleic acid (e.g. RNA, DNA) comprises a sequence of nucleotides that enables it to non-covalently bind, i.e. form Watson-Crick base pairs and/or G/U base pairs, “anneal”, or “hybridize,” to another nucleic acid in a sequence-specific, antiparallel, manner (i.e., a nucleic acid specifically binds to a complementary nucleic acid) under the appropriate in vitro and/or in vivo conditions of temperature and solution ionic strength.
It is understood that a sequence of a polynucleotide need not be 100% complementary to that of its target nucleic acid to be specifically hybridizable. Moreover, a polynucleotide may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event (e.g., a loop structure or hairpin structure, a ‘bulge’, and the like).
Percent complementarity and determination of percent identity or homology between particular stretches of nucleic acid sequences or within nucleic acids can be determined using any convenient method. Example methods include BLAST programs (basic local alignment search tools) and PowerBLAST programs (Altschul et al., J. Mol. Biol., 1990, 215, 403-410; Zhang and Madden, Genome Res., 1997, 7, 649-656) or by using the Gap program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer Group, University Research Park, Madison Wis.), e.g., using default settings, which uses the algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2, 482-489). Other programs, algorithms, and methods are available to the skilled artisan and may be utilized.
Determination of percent identity between particular stretches of polypeptides can be determined using any convenient method. Several programs, algorithms, and methods are available to the skilled artisan and may be utilized.
Methods of determining sequence similarity or identity between two or more nucleic acid or amino acid sequences are known in the art. Sequence similarity or identity may be determined for an entire length of a nucleic acid or amino acid, or for an indicated portion thereof. Sequence similarity or identity may be determined using standard techniques, including, but not limited to, the local sequence identity algorithm of Smith & Waterman, Adv. Appl. Math. 2, 482 (1981), by the sequence identity alignment algorithm of Needleman & Wunsch, J Mol. Biol. 48,443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Natl. Acad. Sci. USA 85, 2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Drive, Madison, Wis.), the Best Fit sequence program described by Devereux et al., Nucl. Acid Res. 12, 387-395 (1984), or by inspection. Another suitable algorithm is the BLAST algorithm, described in Altschul et al., J Mol. Biol. 215, 403-410, (1990) and Karlin et al., Proc. Natl. Acad. Sci. USA 90, 5873-5787 (1993). Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting example of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g. the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAST, Novoalign (Novocraft Technologies; available at www.novocraft.com), ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). An exemplary useful BLAST program is the WU-BLAST-2 program which was obtained from Altschul et al., Methods in Enzymology, 266, 460-480 (1996); http://blast.wustl/edu/blast/README.html. WU-BLAST-2 uses several search parameters, which are optionally set to the default values. The parameters are dynamic values and are established by the program itself depending upon the composition of the particular sequence and composition of the particular database against which the sequence of interest is being searched; however, the values may be adjusted to increase sensitivity. Further, an additional useful algorithm is gapped BLAST as reported by Altschul et al, (1997) Nucleic Acids Res. 25, 3389-3402.
The terms “polypeptide,” and “protein” are used interchangeably herein, and refer to a polymeric form of amino acids of any length, which can include coded and non-coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones.
A “vector” or “expression vector” is a replicon, such as plasmid, phage, virus, or cosmid, to which another DNA segment, i.e. an “insert”, may be attached so as to bring about the replication of the attached segment in a cell.
General methods in molecular and cellular biochemistry can be found in such standard textbooks as Molecular Cloning: A Laboratory Manual, 3rd Ed. (Sambrook et al., HaRBor Laboratory Press 2001); Short Protocols in Molecular Biology, 4th Ed. (Ausubel et al. eds., John Wiley & Sons 1999); Protein Methods (Bollag et al., John Wiley & Sons 1996); Nonviral Vectors for Gene Therapy (Wagner et al. eds., Academic Press 1999); Viral Vectors (Kaplift & Loewy eds., Academic Press 1995); Immunology Methods Manual (I. Lefkovits ed., Academic Press 1997); and Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, John Wiley & Sons 1998), the disclosures of which are incorporated herein by reference.
In the context of formation of a CRISPR complex, “target sequence” refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. A target sequence may comprise DNA or RNA.
In general, a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a CRISPR complex to the target sequence. The term “targeting sequence” means the portion of a guide sequence having sufficient complementarity with a target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited.
It must be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a Type V endonuclease” includes a plurality of such endonucleases and reference to “the gRNA” or “the guide RNA” includes reference to one or more gRNAs and equivalents thereof known to those skilled in the art, and so forth. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.
Class 2 CRISPR-Cas systems generally have single-polypeptide multidomain nuclease effectors, and comprises Types II, V, and VI.
Class 2 Type II CRISPR-Cas endonucleases are RNA-guided DNA endonucleases (interchangeably referred to herein as Type II endonucleases, Type II endonulceases and the like). Exemplary Type II endonucleases include Cas9.
Class 2 Type V CRISPR-Cas endonucleases are RNA-guided DNA endonucleases (interchangeably referred to herein as Type V endonucleases, Type V endonucleases and the like), and further possess collateral activity. Exemplary Type V endonucleases include Cas12 (inclusive of all subtypes) and Cas14 (inclusive of all subtypes).
Class 2 Type VI CRISPR-Cas endonucleases are RNA-guided RNA endonucleases (interchangeably referred to herein as Type VI endonucleases, Type VI endonulceases and the like), and further possess collateral activity. Exemplary Type VI endonucleases include Cas13 (inclusive of all subtypes). Type VI endonucleases achieve RNA cleavage through conserved basic residues within its two HEPN domains. The target RNA, i.e. the RNA of interest, is the RNA to be targeted leading to the recruitment to, and the binding of the Type VI endonuclease at, the target site of interest on the target RNA.
Accordingly provided herein are novel Type II, Type V, and Type VI CRISPR-Cas RNA-guided endonucleases.
Provided herein are novel Class 2 Type V CRISPR-Cas RNA-guided endonucleases and their gRNAs, constituting the novel Class 2 Type V CRISPR-Cas RNA-guided systems of the disclosure.
Provided herein are engineered systems comprising: a Class 2 Type V CRISPR-Cas RNA-guided endonuclease of the disclosure and a single guide RNA, wherein the gRNA and the Class 2 Type V CRISPR-Cas RNA-guided endonuclease do not naturally occur together, wherein the gRNA is capable of hybridizing to a target sequence in a target DNA, wherein the gRNA is capable of forming a complex with the Class 2 Type V CRISPR-Cas RNA-guided endonuclease, and wherein the Class 2 Type V CRISPR-Cas RNA-guided endonuclease possesses collateral activity and is capable of collaterally cleaving a single stranded polynucleotide comprising RNA, without the use of a tracrRNA.
The components of the system described in turn below.
Provided herein are novel Type V CRISPR-Cas RNA-guided endonucleases. In some embodiments, these endonucleases may share certain structural, sequence, and/or functional similarities with any one of the subtypes of Cas12. In some embodiments, these endonucleases may share certain structural, sequence, and/or functional similarities with any one of the subtypes of Cas14.
Type V endonucleases of the are capable of cleaving target single stranded DNA (e.g. Cas14-like Type V endonucleases) and target double stranded DNA (e.g. Cas12-like Type V endonucleases). Type V endonucleases additionally possess collateral activity.
Without being bound to any theory or mechanism, a Type V CRISPR-Cas RNA-guided endonucleases of the disclosure comprise three RuvC motifs, responsible for catalytic activity.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises any one of the RuvC sequences of Table 1, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises any two of the RuvC sequences of Table 1, or sequences comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises any three of the RuvC sequences of Table 1, or sequences comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a RuvC I motif selected from the group consisting of SEQ ID NO: 62, SEQ ID NO: 67, SEQ ID NO: 71, SEQ ID NO: 75, SEQ ID NO: 80, SEQ ID NO: 85, SEQ ID NO: 89, and SEQ ID NO: 135, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a RuvC II motif selected from the group consisting of SEQ ID NO: 63, SEQ ID NO: 68, SEQ ID NO: 72, SEQ ID NO: 76, SEQ ID NO: 81, SEQ ID NO: 86, SEQ ID NO: 90, and SEQ ID NO: 136, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a RuvC III motif selected from the group consisting of SEQ ID NO: 64, SEQ ID NO: 69, SEQ ID NO: 73, SEQ ID NO: 77, SEQ ID NO: 82, SEQ ID NO: 87, SEQ ID NO: 91, and SEQ ID NO: 137, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a (1) RuvC I motif selected from the group consisting of SEQ ID NO: 62, SEQ ID NO: 67, SEQ ID NO: 71, SEQ ID NO: 75, SEQ ID NO: 80, SEQ ID NO: 85, SEQ ID NO: 89, and SEQ ID NO: 135, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; (2) a RuvC II motif selected from the group consisting of SEQ ID NO: 63, SEQ ID NO: 68, SEQ ID NO: 72, SEQ ID NO: 76, SEQ ID NO: 81, SEQ ID NO: 86, SEQ ID NO: 90, and SEQ ID NO: 136, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and (3) a RuvC III motif selected from the group consisting of SEQ ID NO: 64, SEQ ID NO: 69, SEQ ID NO: 73, SEQ ID NO: 77, SEQ ID NO: 82, SEQ ID NO: 87, SEQ ID NO: 91, and SEQ ID NO: 137, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a (1) RuvC I motif comprising the sequence of SEQ ID NO: 62, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; (2) a RuvC II motif comprising the sequence of SEQ ID NO: 63, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and (3) a RuvC III motif comprising the sequence of SEQ ID NO: 64, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a (1) RuvC I motif comprising the sequence of SEQ ID NO: 67, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; (2) a RuvC II motif comprising the sequence of SEQ ID NO: 68, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and (3) a RuvC III motif comprising the sequence of SEQ ID NO: 69, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a (1) RuvC I motif comprising the sequence of SEQ ID NO: 71, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; (2) a RuvC II motif comprising the sequence of SEQ ID NO: 72, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and (3) a RuvC III motif comprising the sequence of SEQ ID NO: 73, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a (1) RuvC I motif comprising the sequence of SEQ ID NO: 75, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; (2) a RuvC II motif comprising the sequence of SEQ ID NO: 76, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and (3) a RuvC III motif comprising the sequence of SEQ ID NO: 77, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a (1) RuvC I motif comprising the sequence of SEQ ID NO: 80, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; (2) a RuvC II motif comprising the sequence of SEQ ID NO: 81, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and (3) a RuvC III motif comprising the sequence of SEQ ID NO: 82, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a (1) RuvC I motif comprising the sequence of SEQ ID NO: 85, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; (2) a RuvC II motif comprising the sequence of SEQ ID NO: 86, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and (3) a RuvC III motif comprising the sequence of SEQ ID NO: 87, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a (1) RuvC I motif comprising the sequence of SEQ ID NO: 89, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; (2) a RuvC II motif comprising the sequence of SEQ ID NO: 90, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and (3) a RuvC III motif comprising the sequence of SEQ ID NO: 91, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a (1) RuvC I motif comprising the sequence of SEQ ID NO: 135, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; (2) a RuvC II motif comprising the sequence of SEQ ID NO: 136, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and (3) a RuvC III motif comprising the sequence of SEQ ID NO: 137, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
Table 1 provided exemplary RuvC I, RuvC II, RuvC III sequences of the Type V endonucleases of the disclosure.
Table 2 provides exemplary amino acid sequences for certain Type V sequences of the disclosure. Genes were identified from metagenomic samples. Scripts were run on the sequences, designed to find CRISPR sequences and accompanying genes encoding proteins showing homology with reported Cas enzymes. Comparative BlastP analyses were performed against sequences deposited in databases (NCBI, LENS), discarding those candidates showing Id %>50 with deposited proteins. Presence of specific domains (e.g. RuvC, HEPN) and catalytic motifs were determined (CD-search, phmmer, UNIPROT).
SEQ ID NO: 1 represents a novel Type V variant of the disclosure, Type V Cas_1, (1283 amino acids in length).
In some embodiments the Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 1 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 1 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 1 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 2 represents a novel Type V variant of the disclosure, Type V Cas_2, (1235 amino acids in length).
In some embodiments the Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 2 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 2 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 2 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 3 represents a novel Type V variant of the disclosure, Type V Cas_3, (1259 amino acids in length).
In some embodiments the Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 3 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 3 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 3 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 4 represents a novel Type V variant of the disclosure, Type V Cas_4, (1336 amino acids in length).
In some embodiments the Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 4 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 4 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 4 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 5 represents a novel Type V variant of the disclosure, Type V Cas_5, (1146 amino acids in length).
In some embodiments the Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 5 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 5 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 5 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 6 represents a novel Type V variant of the disclosure, Type V Cas_6, (1167amino acids in length).
In some embodiments the Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 6 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 6 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 6 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 7 represents a novel Type V variant of the disclosure, Type V Cas_7, (1245 amino acids in length).
In some embodiments the Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 7 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 7 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 7 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 20 represents a novel Type V variant of the disclosure, Type V Cas_8, (758 amino acids in length).
In some embodiments the Type V CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 20 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 20 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 20 and proteins with at least 30%-99.5% sequence identity thereto.
Table 3 provides exemplary nucleic acid sequences for encoding certain Type V sequences of the disclosure. Also provided are exemplary codon optimized nucleic acid sequences for encoding certain Type V sequences of the disclosure, for production in E. Coli systems.
Accordingly, provided herein are exemplary nucleic acid sequences encoding the Type V CRISPR-Cas RNA-guided endonucleases of the disclosure. In some embodiments, a Type V CRISPR-Cas RNA-guided endonuclease is encoded by a nucleic acid sequence comprising or consisting of the sequence of any one of SEQ ID NOs: 21-34 and SEQ ID NOs 59-60, or a nucleic acid sequence with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments, the Type V endonuclease of the disclosure is catalytically active.
In some embodiments, the Type V endonuclease of the disclosure is catalytically dead, e.g. by introducing mutations in one or more of the RuvC domains.
In some embodiments, the Type V endonuclease of the disclosure targets double stranded DNA, and is a Type V nickase.
The Type V endonucleases of the disclosure can be modified to include an aptamer.
The Type V endonuclease of the disclosure can be further fused to domains, e.g. catalytic domains to produce dual action Cas proteins. In some embodiments, a Type V endonuclease is further fused to a base editor.
Collateral activity of Class 2 Type V CRISPR-Cas RNA-Guided Endonucleases
In addition to the ability to cleave a target sequence in a single or double stranded targeted DNA, the Type V endonucleases of the disclosure also possess collateral (trans-cleavage activity), i.e. the ability to promiscuously cleave non-targeted DNA or RNA once activated by detection of a target DNA. Without being bound to any theory or mechanism, generally once a Type V endonuclease of the disclosure is activated by a gRNA, which occurs when a sample includes a target sequence to which the gRNA hybridizes (i.e., the sample includes the targeted DNA), the Type V endonuclease can become a nuclease that promiscuously cleaves oligonucleotides (e.g. ssDNAs, RNAs, chimeric RNA/DNAs) not comprising the target sequence of the gRNA (non-target oligonucleotides, to which the guide sequence of the gRNA does not hybridize). Thus, when the targeted DNA (double or single stranded) is present in the sample (e.g., in some embodiments above a threshold amount), the result can be cleavage of single stranded oligonucleotides (e.g. ssDNAs, ssRNAs, single stranded chimeric RNA/DNAs) in the sample, which can be detected using any convenient detection method (e.g., using a labeled detector DNA, RNA, or DNA/RNA chimera).
Accordingly, provided herein are methods and compositions for detecting a target DNA (dsDNA or ssDNA) in a sample. Also provided are methods and compositions for cleaving non-target oligonucleotides, which can be utilized detectors. These embodiments are described in further detail below.
gRNAs for Class 2 Type V CRISPR-Cas RNA-Guided Endonucleases
The present disclosure provides DNA-targeting guide RNAs that direct the activities of the novel Type V endonucleases of the disclosure to a specific target sequence within a target DNA. These DNA-targeting RNAs are referred to herein as “gRNAs” or “gRNAs”. In some embodiments, a Type V gRNA can comprise a single segment comprising both a spacer (DNA-targeting sequence) and a Cas “protein-binding sequence” together referred to as a crRNA (e.g. Cas_12a-endonuclease). In other embodiments, a Type V gRNA can comprise a first segment (also referred to herein as a “targeter-RNA”, a “DNA-targeting segment” or a “DNA-targeting sequence”) and a second segment (also referred to herein as a “activator-RNA”, a “activator-RNA” or a “protein-binding sequence”). Also provided herein are nucleotide sequences encoding the Type V gRNAs of the disclosure.
i. crRNA/Spacer Sequences for Single-RNA Guided Systems
Certain Type V endonucleases of the disclosure can be guided by a single crRNA (single-RNA guided systems). A prototypic CRISPR-Cas protein of this class includes Cas12a. The crRNA of the Type V single RNA system guides of the disclosure comprises a nucleotide sequence that is complementary to a sequence in a target DNA (DNA-targeting sequence or spacer). A prototypic CRISPR-Cas protein of this class includes Cas12a.
The crRNA portion of the Type V gRNAs of the disclosure can have a length of from about 25-50 nt. In some embodiments, the length can be about 40-43 nt.
The DNA-targeting spacer sequence of a Type V gRNA generally interacts with a target DNA in a sequence-specific manner via hybridization (i.e., base pairing). As such, the nucleotide sequence of the DNA-targeting sequence may vary and determines the location within the target DNA that the gRNA and the target DNA will interact. The DNA-targeting sequence of a subject Type V gRNA can be modified (e.g., by genetic engineering) to hybridize to a desired sequence within a target DNA.
The DNA-targeting sequence of a subject Type V gRNA can have a length of from about 8 nucleotides to about 30 nucleotides. For example, the length can be 20-23 nucleotides.
The percent complementarity between the DNA-targeting spacer sequence of the crRNA and the target sequence of the target DNA can be at least 60% (e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%). In some embodiments, the percent complementarity between the DNA-targeting sequence of the crRNA-RNA and the target sequence of the target DNA is 100% over the 1-23 contiguous 5′-most nucleotides of the target sequence of the complementary strand of the target DNA. In some embodiments, the percent complementarity between the DNA-targeting sequence of the crRNA and the target sequence of the target DNA is at least 60% over about 1-23 contiguous nucleotides. In some embodiments, the percent complementarity between the DNA-targeting sequence of the crRNA and the target sequence of the target DNA is 100% over the 1-23 contiguous 5′-most nucleotides of the target sequence of the complementary strand of the target DNA and as low as 0% over the remainder. In such a case, the DNA-targeting sequence can be considered to be 1-23 nucleotides in length.
Generally, a naturally unprocessed pre-crRNA of Type V (single-RNA guided system) comprises a direct repeat and an adjacent spacer (the portion of the crRNA that allows for targeting to a DNA molecule). In some embodiments, direct repeats (partial sequence or entire sequence) from unprocessed pre-crRNA are included into the Type V gRNAs of the disclosure, and improve gRNA stability. Exemplary direct repeat sequences include SEQ ID NO: 61, 70, 74, and 88 (DNA sequences), or SEQ ID NOS 134, 147, 150, 151 and 153 (RNA sequences). It is noted that while the exemplary sequences are provided in DNA nucleotides, it is understood that this DNA can then be transcribed into RNA. Accordingly the mature guides of disclosure may incorporate the entire or partial sequence of the exemplary direct repeat sequences provided herein; the guides may be composed of DNA nucleotides, analogous RNA nucleotides, or a combination of DNA and RNA nucleotides. Exemplary predicted secondary structures of the pre-crRNAs of the Type V endonucleases of the disclosure are presented in
In some embodiments, the crRNAs include non-naturally occurring, engineered direct repeat sequences.
In some embodiments the spacer sequence of a Type V gRNA of the disclosure is directed to a target sequence in a mammalian organism. In some embodiments the spacer sequence is directed to a target sequence in a non-mammalian organism.
In some embodiments, the spacer sequence of a Type V gRNA of the disclosure is directed to a target sequence which is a sequence of a human. In some embodiments, the target sequence is a sequence of a non-human primate.
In some embodiments, the spacer sequence of a Type V gRNA of the disclosure is directed to a target sequence in a mammalian organism, e.g. a human or non-human primate.
In some embodiments, the spacer sequence of a Type V gRNA of the disclosure is directed to a target sequence in a bacteria.
In some embodiments, the spacer sequence of a Type V gRNA of the disclosure is directed to a target sequence in a virus.
In some embodiments, the spacer sequence of a Type V gRNA of the disclosure is directed to a target sequence in a plant.
The Type V gRNAs of the disclosure can be modified to include an aptamer.
ii. Targeter-RNA/Dual-RNA Guided Systems
The above section notwithstanding, certain Type V endonucleases of the disclosure can be guided by a dual-RNA system that includes a crRNA (targeter RNA) and a auxiliary RNA; a prototypic CRISPR-Cas protein of this class includes Cas12d. Yet other Type V endonucleases of the disclosure can be guided by a dual-RNA system that includes a crRNA (targeter) and a trans-activating crRNA (tracrRNA); a prototypic CRISPR-Cas protein of this class includes Cas14. These components are discussed below.
The targeter-RNA of certain Type V endonulcease gRNAs of the disclosure comprise a nucleotide sequence that is complementary to a sequence in a target DNA (targeting sequence of the gRNA; DNA-targeting sequence; spacer sequence). The targeter-RNA can interchangeably be referred to as a crRNA. The targeter-RNA of a gRNA interacts with a target DNA in a sequence-specific manner via hybridization (i.e., base pairing). As such, the nucleotide sequence of the targeter-RNA may vary and determines the location within the target DNA that the gRNA and the target DNA will interact. The targeter-RNA of a subject gRNA can be modified (e.g., by genetic engineering) to hybridize to any desired sequence within a target DNA.
The targeter-RNA of the Type V dual-RNA guided systems can have a length of from about 12 nucleotides to about 100 nucleotides. For example, the targeter-RNA can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50 nt, from about 12 nt to about 40 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, or from about 12 nt to about 19 nt. For example, the targeter-RNA can have a length of from about 19 nt to about 20 nt, from about 19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt, from about 19 nt to about 60 nt, from about 19 nt to about 70 nt, from about 19 nt to about 80 nt, from about 19 nt to about 90 nt, from about 19 nt to about 100 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about 45 nt, from about 20 nt to about 50 nt, from about 20 nt to about 60 nt, from about 20 nt to about 70 nt, from about 20 nt to about 80 nt, from about 20 nt to about 90 nt, or from about 20 nt to about 100 nt.
Generally, a naturally unprocessed pre-crRNA of Type V (dual RNA-guided system) comprises a direct repeat and an adjacent spacer (the portion of the crRNA that allows for targeting to a DNA molecule). In some embodiments, direct repeats (partial sequence or entire sequence) from unprocessed pre-crRNA are included into the Type V gRNAs of the disclosure, and improve gRNA stability. Exemplary direct repeat sequences include SEQ ID NO: 66, 78, and 83. It is noted that while the exemplary sequences are provided in DNA nucleotides, it is understood that this DNA can then be transcribed into RNA. Accordingly the mature guides of disclosure may incorporate the entire or partial sequence of the exemplary direct repeat sequences provided herein; the guides may be composed of DNA nucleotides, analogous RNA nucleotides, or a combination of DNA and RNA nucleotides. Exemplary predicted secondary structures of the pre-crRNAs of the Type V endonucleases (dual RNA guided systems) of the disclosure are presented in
In some embodiments, the gRNAs of the disclosure include non-naturally occurring, engineered direct repeat sequences which can be incorporated into the engineered gRNAs of the disclosure.
i. Spacer Sequences/Dual-RNA Guided Systems
gRNAs of the disclosure (of the Type V dual-RNA guided systems) comprise spacer sequences, complementary to the target DNA. More specifically, the nucleotide sequence of the targeter-RNA that is complementary to a target nucleotide sequence (the DNA-targeting sequence or spacer sequence) of the target DNA can have a length at least about 12 nt. For example, the DNA-targeting sequence of the targeter-RNA that is complementary to a target sequence of the target DNA can have a length at least about 12 nt, at least about 15 nt, at least about 18 nt, at least about 19 nt, at least about 20 nt, at least about 25 nt, at least about 30 nt, at least about 35 nt or at least about 40 nt. For example, the DNA-targeting sequence of the targeter-RNA that is complementary to a target sequence of the target DNA can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50 nt, from about 12 nt to about 45 nt, from about 12 nt to about 40 nt, from about 12 nt to about 35 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, from about 12 nt to about 19 nt, from about 19 nt to about 20 nt, from about 19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt, from about 19 nt to about 60 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about 45 nt, from about 20 nt to about 50 nt, or from about 20 nt to about 60 nt. The nucleotide sequence (the DNA-targeting sequence) of the targeter-RNA that is complementary to a nucleotide sequence (target sequence) of the target DNA can have a length at least about 12 nt. In some embodiments, the DNA-targeting sequence of the targeter-RNA that is complementary to a target sequence of the target DNA is 20 nucleotides in length. In some embodiments, the DNA-targeting sequence of the targeter-RNA that is complementary to a target sequence of the target DNA is 19 nucleotides in length.
The percent complementarity between the spacer sequence of the targeter-RNA and the target sequence of the target DNA can be at least 60% (e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%). In some embodiments, the percent complementarity between the DNA-targeting sequence of the targeter-RNA and the target sequence of the target DNA is 100% over the 1-25 contiguous 5′-most nucleotides of the target sequence of the complementary strand of the target DNA. In some embodiments, the percent complementarity between the DNA-targeting sequence of the targeter-RNA and the target sequence of the target DNA is at least 60% over about 1-25 contiguous nucleotides. In some embodiments, the percent complementarity between the DNA-targeting sequence of the targeter-RNA and the target sequence of the target DNA is 100% over the 1-25 contiguous 5′-most nucleotides of the target sequence of the complementary strand of the target DNA and as low as 0% over the remainder. In such a case, the DNA-targeting sequence can be considered to be 1-25 nucleotides in length.
In some embodiments the spacer sequence of a Type V dual-RNA guided system of the disclosure is directed to a target sequence in a mammalian organism. In some embodiments the spacer sequence is directed to a target sequence in a non-mammalian organism.
In some embodiments, the spacer sequence of a Type V dual-RNA guided system of the disclosure is directed to a target sequence which is a sequence of a human. In some embodiments, the target sequence is a sequence of a non-human primate.
In some embodiments the spacer sequence of a Type V dual-RNA guided system of the disclosure is directed to a target sequence selected of a therapeutic target.
In some embodiments the spacer sequence of a Type V dual-RNA guided system of the disclosure is directed to a target sequence selected of a diagnostic target—for example in such embodiments a labeled catalytically dead Type II endonuclease of the disclosure and a gRNA directed to a diagnostic target DNA is contacted with the target DNA, or a cell comprising the target DNA, or a sample comprising the target DNA.
ii. Activator-RNA/Dual-RNA Guided Systems
The activator-RNA of certain Type V gRNA of the disclosure binds with its cognate Type V endonuclease of the disclosure (e.g. Type V Cas_8 of the disclosure). The activator-RNA can interchangeably be referred to as a tracrRNA. The gRNA guides the bound Type V endonuclease to a specific nucleotide sequence within target DNA via the above described targeter-RNA. The activator-RNA of a Type V gRNA comprises two stretches of nucleotides that are complementary to one another.
iii. Dual-Molecule Type V gRNAs
As noted above, in some embodiments, provided herein are dual molecule (two-molecule) Type V gRNAs for the novel Type V endonucleases of the disclosure. Such gRNAs comprise two separate RNA molecules (tracRNA or auxiliary RNA; and the targeting RNA-crRNA). Each of the two RNA molecules of a subject double-molecule gRNA comprises a stretch of nucleotides that are complementary to one another such that the complementary nucleotides of the two RNA molecules hybridize to form the double stranded RNA duplex of the gRNA.
A dual-molecule gRNA can be designed to allow for controlled (i.e., conditional) binding of a targeter-RNA with an activator-RNA. Because a dual-molecule gRNA is not functional unless both the activator-RNA and the targeter-RNA are bound in a functional complex with Type V endonulceases of the disclosure, a dual-molecule gRNA can be inducible (e.g., drug inducible) by rendering the binding between the activator-RNA and the targeter-RNA to be inducible. As one non-limiting example, RNA aptamers can be used to regulate (i.e., control) the binding of the activator-RNA with the targeter-RNA. Accordingly, the activator-RNA and/or the targeter-RNA can comprise an RNA aptamer sequence.
The dual-molecule guide can be modified to include an aptamer.
iv. Engineered Single-Molecule Type V Endonulcease gRNAs
In some embodiments, provided herein are engineered Type V gRNAs that comprises a single-molecule gRNA (interchangeably referred to herein as a sgRNA), for the novel Type V endonucleases of the disclosure.
Accordingly provided herein is an engineered single-molecule gRNA, comprising:
a. a targeter-RNA that is capable of hybridizing with a target sequence in a target DNA; and
b. an activator-RNA that is capable of hybridizing with the targeter-RNA to form a double-stranded RNA duplex, the activator-RNA comprising a activator-RNA,
wherein the targeter-RNA and the activator-RNA are covalently linked to one another, wherein the single-molecule gRNA is capable of forming a complex with a novel Type V endonuclease of the disclosure, and wherein hybridization of the targeter-RNA to the target sequence is capable of targeting the Type V endonuclease of the disclosure to the target DNA.
A subject engineered single-molecule gRNA comprises two segments of nucleotides (a targeter-RNA and an activator-RNA) that are complementary to one another, can be covalently linked by intervening nucleotides (“linkers” or “linker nucleotides”), and hybridize to form the double stranded RNA duplex (dsRNA duplex) of the activator-RNA, whereby resulting in a stem-loop structure. In some embodiments, the targeter-RNA and the activator-RNA are covalently linked via the 3′ end of the targeter-RNA and the 5′ end of the activator-RNA. In other embodiments, the activator-RNA is covalently linked via the 5′ end of the targeter-RNA and the 3′ end of the activator-RNA.
In some embodiments, the targeter-RNA and the activator-RNA are arranged in a 5′ to 3′ orientation.
In some embodiments, the activator-RNA and the targeter-RNA are arranged in a 5′ to 3′ orientation.
In some embodiments, the single molecule gRNA comprises one or more sequence modifications compared to a sequence of a corresponding wild type tracrRNA and/or crRNA.
In some embodiments, the targeter-RNA and the activator-RNA are covalently linked to one another via a linker.
When present, the linker of a single-molecule gRNA can have a length of from about 3 nucleotides to about 30 nucleotides. In exemplary embodiments, the linker of a single-molecule gRNA is 4, 5, 6, or 7 nt.
An exemplary single-molecule gRNA comprises two complementary stretches of nucleotides that hybridize to form a dsRNA duplex. In some embodiments, one of the two complementary stretches of nucleotides of the single-molecule gRNA (or the DNA encoding the stretch) is at least about 60% identical to one of the activator-RNA. For example, one of the two complementary stretches of nucleotides of the single-molecule gRNA (or the DNA encoding the stretch) is at least about 65% identical, at least about 70% identical, at least about 75% identical, at least about 80% identical, at least about 85% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical or 100% identical to an activator-RNA.
The activator-RNA and targeter-RNA segments can be engineered, while ensuring that the structure of the protein-binding domain of the gRNA is conserved. Thus, RNA folding structure of a naturally occurring protein-binding domain of a DNA-targeting RNA can be taken into account in order to design artificial protein-binding domains (either dual-molecule or single-molecule versions).
The activator-RNA in a single-molecule gRNA can have a length of from about 10 nucleotides to about 100 nucleotides. For example, the activator-RNA can have a length of from about 15 nucleotides (nt) to about 80 nt, from about 15 nt to about 50 nt, from about 15 nt to about 40 nt, from about 15 nt to about 30 nt or from about 15 nt to about 25 nt.
Also with regard to both the single-molecule and double-molecule gRNAs of the disclosure, the dsRNA duplex of the activator-RNA can have a length from about 6 nucleotides (nt) to about 50 bp. For example, the dsRNA duplex of the activator-RNA can have a length from about 6 nt to about 40 nt, from about 6 nt to about 30 bp, from about 6 nt to about 25 nt, from about 6 nt to about 20 nt, from about 6 nt to about 15 nt, from about 8 nt to about 40 nt, from about 8 nt to about 30 bp, from about 8 nt to about 25 nt, from about 8 nt to about 20 nt or from about 8 nt to about 15 nt. For example, the dsRNA duplex of the activator-RNA can have a length from about from about 8 nt to about 10 nt, from about 10 nt to about 15 nt, from about 15 nt to about 18 nt, from about 18 nt to about 20 nt, from about 20 nt to about 25 nt, from about 25 nt to about 30 nt, from about 30 nt to about 35 nt, from about 35 nt to about 40 nt, or from about 40 nt to about 50 nt. In some embodiments, the dsRNA duplex of the activator-RNA has a length of 8-15 base pairs. The percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the activator-RNA can be at least about 60%. For example, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the activator-RNA can be at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%. In some embodiments, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the activator-RNA is 100%.
In some embodiments, the spacer sequence of a Type V gRNA (whether it is a single molecule gRNA or a dual molecule gRNA) of the disclosure is directed to a target sequence in a mammalian organism, e.g. a human or non-human primate. In some embodiments, the spacer sequence of a Type V gRNA of the disclosure is directed to a target sequence in a bacteria.
In some embodiments, the spacer sequence of a Type V gRNA of the disclosure is directed to a target sequence in a virus. In some embodiments, the spacer sequence of a Type V gRNA of the disclosure is directed to a target sequence in a plant.
In some embodiments, the single-molecule Type V gRNAs of the disclosure can be modified to include an aptamer.
v. gRNA Arrays
In some embodiments, the Type V gRNAs of the disclosure can be provided as gRNA arrays.
Such gRNA arrays of the disclosure include more than one gRNA arrayed in tandem, and can be processed into two or more individual gRNAs. Thus, in some embodiments a precursor Type V gRNA array comprises two or more (e.g., 3 or more, 4 or more, 5 or more, 2, 3, 4, or 5) gRNAs (e.g., arrayed in tandem as precursor molecules). In some embodiments, two or more gRNAs can be present on an array (a precursor gRNA array). A Type V endonuclease of the disclosure can cleave the precursor gRNA array into individual gRNAs.
In some embodiments a Type V gRNA array includes 2 or more gRNAs (e.g., 3 or more, 4 or more, 5 or more, 6 or more, or 7 or more, gRNAs). The gRNAs of a given array can target (i.e., can include guide sequences that hybridize to) different target sites of the same target DNA. In some embodiments, two or more gRNAs of a precursor gRNA array have the same guide sequence. In some embodiments, the precursor gRNA array comprises two or more gRNAs that target different target sites within the same target DNA. In some embodiments, the precursor gRNA array comprises two or more gRNAs that target different target DNAs.
Provided herein are novel Class 2 Type VI CRISPR-Cas RNA-guided proteins and their guide RNAs (a “guide RNA” is interchangeably referred to herein as “gRNA”), constituting the Class 2 Type VI CRISPR-Cas RNA-guided systems of the disclosure.
Accordingly, provided herein are systems comprising (a) Type VI endonuclease, or a nucleic acid encoding the Type VI endonuclease; and (b) a Type VI gRNA, or a nucleic acid encoding the Type VI gRNA, wherein the gRNA and the Type VI endonuclease do not naturally occur together, wherein the gRNA is capable of hybridizing to a target sequence in a target single stranded RNA, and the gRNA is capable of forming a complex with the Type VI endonuclease.
The components of the system described in turn below.
Also provided herein are novel Type VI CRISPR-Cas RNA-guided endonucleases. In some embodiments, these endonucleases may share certain structural, sequence, and/or functional similarities with any one of the subtypes of Cas13 (e.g. Cas13a, Cas13b). Such Type VI endonucleases are useful for RNA targeting and modification. Type VI targets ssRNA and requires a protospacer flanking sequence (PFS) instead of the PAM required for dsDNA unwinding, e.g. for Type II and Type V endonucleases.
Without being bound to any theory or mechanism, a Type VI CRISPR-Cas RNA-guided endonucleases of the disclosure comprise two HEPN motifs, generally of the motif E RXXXXH (SEQ ID NO: 93), also referred to as E . . . R-X4-H (SEQ ID NO: 93). The distance between the E residue and the R-X4-H (SEQ ID NO: 93) can be of any length.
In some embodiments a Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises any one of the HEPN sequences of Table 4, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises any two of the HEPN sequences of Table 4, or sequences comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a HEPN motif selected from the group consisting of SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 97, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 102, SEQ ID NO: 104, SEQ ID NO: 105, SEQ ID NO: 107, SEQ ID NO: 108, SEQ ID NO: 110, SEQ ID NO: 111, SEQ ID NO: 113, and SEQ ID NO: 197, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a first HEPN motif selected from the group consisting of SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 97, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 102, SEQ ID NO: 104, SEQ ID NO: 105, SEQ ID NO: 107, SEQ ID NO: 108, SEQ ID NO: 110, SEQ ID NO: 111, SEQ ID NO: 113, and SEQ ID NO: 197, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and comprises a second HEPN motif selected from the group consisting of SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 97, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 102, SEQ ID NO: 104, SEQ ID NO: 105, SEQ ID NO: 107, SEQ ID NO: 108, SEQ ID NO: 110, SEQ ID NO: 111, SEQ ID NO: 113, and SEQ ID NO: 197, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a first HEPN motif comprising the amino acid sequence of SEQ ID NO: 94, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and comprises a second HEPN motif comprising the amino acid sequence of SEQ ID NO: 95 or SEQ ID NO: 197, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a first HEPN motif comprising the amino acid sequence of SEQ ID NO: 97, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and comprises a second HEPN motif comprising the amino acid sequence of SEQ ID NO: 95 or SEQ ID NO: 197, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a first HEPN motif comprising the amino acid sequence of SEQ ID NO: 99, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and comprises a second HEPN motif comprising the amino acid sequence of SEQ ID NO: 100, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a first HEPN motif comprising the amino acid sequence of SEQ ID NO: 95 or SEQ ID NO: 197, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and comprises a second HEPN motif comprising the amino acid sequence of SEQ ID NO: 102, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a first HEPN motif comprising the amino acid sequence of SEQ ID NO: 104, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and comprises a second HEPN motif comprising the amino acid sequence of SEQ ID NO: 105, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a first HEPN motif comprising the amino acid sequence of SEQ ID NO: 107, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and comprises a second HEPN motif comprising the amino acid sequence of SEQ ID NO: 108, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a first HEPN motif comprising the amino acid sequence of SEQ ID NO: 110 or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and comprises a second HEPN motif comprising the amino acid sequence of SEQ ID NO: 111, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a first HEPN motif comprising the amino acid sequence of SEQ ID NO: 99, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and comprises a second HEPN motif comprising the amino acid sequence of SEQ ID NO: 113, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
Table 4 provided exemplary HEPN sequences of the Type VI endonucleases of the disclosure.
Table 5 provides exemplary amino acid sequences for certain Type VI sequences of the disclosure. Genes were identified from metagenomic samples. Scripts were run on the sequences, designed to find CRISPR sequences and accompanying genes encoding proteins showing homology with reported Cas enzymes. Comparative BlastP analyses were performed against sequences deposited in databases (NCBI, LENS), discarding those candidates showing Id %>50 with deposited proteins. Presence of specific domains (e.g. RuvC, HEPN) and catalytic motifs were determined (CD-search, phmmer, UNIPROT).
SEQ ID NO: 8 represents a novel Type VI variant of the disclosure, Type VI Cas_1, (1148 amino acids in length).
In some embodiments the Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 8 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 8 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 8 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 9 represents a novel Type VI variant of the disclosure, Type VI Cas_2, (1138 amino acids in length).
In some embodiments the Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 9 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 9 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 9 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 10 represents a novel Type VI variant of the disclosure, Type VI Cas_3, (1093 amino acids in length).
In some embodiments the Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 10 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 10 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 10 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 11 represents a novel Type VI variant of the disclosure, Type VI Cas_4, (1236 amino acids in length).
In some embodiments the Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 11 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 11 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 11 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 12 represents a novel Type VI variant of the disclosure, Type VI Cas_5, (1092 amino acids in length).
In some embodiments the Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 12 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 12 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 12 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 13 represents a novel Type VI variant of the disclosure, Type VI Cas_6, (1053 amino acids in length).
In some embodiments the Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 13 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 13 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 13 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 14 represents a novel Type VI variant of the disclosure, Type VI Cas_7, (1163 amino acids in length).
In some embodiments the Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 14 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 14 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 14 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 15 represents a novel Type VI variant of the disclosure, Type VI Cas_8, (1124 amino acids in length).
In some embodiments the Type VI CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 15 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 15 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 15 and proteins with at least 30%-99.5% sequence identity thereto.
Table 6 provides exemplary nucleic acid sequences for encoding certain Type VI sequences of the disclosure. Also provided are exemplary E. coli codon optimized nucleic acid sequences for encoding certain Type VI sequences of the disclosure.
Accordingly, provided herein are exemplary nucleic acid sequences encoding the Type VI CRISPR-Cas RNA-guided endonucleases of the disclosure. In some embodiments, a Type VI CRISPR-Cas RNA-guided endonuclease is encoded by a nucleic acid sequence comprising or consisting of the sequence of any one of SEQ ID NOs: 35-50, or a nucleic acid sequence with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments, the Type VI endonuclease of the disclosure is catalytically active.
In some embodiments, the Type VI endonuclease of the disclosure is catalytically dead, e.g. by introducing mutations in one or both of the HEPN domains.
The Type VI endonucleases of the disclosure can be modified to include an aptamer.
The Type VI endonuclease of the disclosure can be further fused to domains, e.g. catalytic domains to produce dual action Cas proteins. In some embodiments, a Type VI endonuclease is further fused to a base editor.
In addition to the ability to cleave a target sequence in a ssRNA, the Type VI endonucleases of the disclosure also possess collateral (trans-cleavage activity), i.e. the ability to promiscuously cleave non-targeted DNA or RNA once activated by detection of a target DNA. Without being bound to any theory or mechanism, generally once a Type VI endonuclease of the disclosure is activated by the binding of a gRNA, which occurs when a sample includes a target sequence to which the gRNA hybridizes (i.e., the sample includes the targeted ssRNA), the Type VI endonuclease can become a nuclease that promiscuously cleaves oligonucleotides (ssRNAs) not comprising the target sequence of the gRNA (non-target oligonucleotides, to which the guide sequence of the gRNA does not hybridize). Thus, when the targeted ssRNA is present in the sample (e.g., in some embodiments above a threshold amount), the result can be cleavage of single stranded reporter oligonucleotides (e.g. labeled) in the sample, which can be detected using any convenient detection method.
Accordingly, provided herein are methods and compositions for detecting a target RNA in a sample. Also provided are methods and compositions for cleaving non-target RNA oligonucleotides, which can be utilized as detectors. These embodiments are described in further detail below.
gRNAs for Class 2 Type VI CRISPR-Cas RNA-Guided Endonucleases
The present disclosure provides RNA-targeting RNAs that direct the activities of the novel Type VI endonucleases of the disclosure to a specific target sequence within a target ssRNA. These RNA-targeting RNAs are also referred to herein as “gRNAs” or “gRNAs” Generally, as provided herein, a Type VI gRNA comprises a single segment comprising both a spacer (DNA-targeting sequence) and a Type VI “protein-binding sequence” together referred to as a crRNA. Also provided herein are nucleotide sequences encoding the Type VI gRNAs of the disclosure.
i. Spacer Sequences
The Type VI endonucleases of the disclosure are single crRNA-guided endonucleases (single guide RNA, sgRNA, while the Type II endonucleases of the disclosure are guided by a dual-RNA system consisting of a crRNA and a trans-activating crRNA (tracrRNA). The crRNA of the Type VI guides of the disclosure comprises a nucleotide sequence that is complementary to a sequence in a target RNA.
The crRNA portion of the Type VI gRNAs of the disclosure can have a length of from about 45 to about 70 nt. In some embodiments, the length can be about 60 to about 65 nt.
The RNA-targeting spacer sequence of a Type VI gRNA generally interacts with a target RNA in a sequence-specific manner via hybridization (i.e., base pairing). As such, the nucleotide sequence of the RNA-targeting sequence may vary and determines the location within the target RNA that the gRNA and the target RNA will interact. The RNA-targeting sequence of a subject Type VI gRNA can be modified (e.g., by genetic engineering) to hybridize to a desired sequence within a target RNA.
The RNA-targeting sequence of a subject Type VI gRNA can have a length of from about 18 nucleotides to about 30 nucleotides. For example, the length can be 27 nucleotides.
The percent complementarity between the RNA-targeting spacer sequence of the crRNA and the target sequence of the target RNA can be at least 60% (e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%). In some embodiments, the percent complementarity between the RNA-targeting sequence of the crRNA-RNA and the target sequence of the target RNA is 100% over the 1-27 contiguous 5′-most nucleotides of the target sequence of the complementary strand of the target RNA. In some embodiments, the percent complementarity between the RNA-targeting sequence of the crRNA and the target sequence of the target RNA is at least 60% over about 1-27 contiguous nucleotides. In some embodiments, the percent complementarity between the RNA-targeting sequence of the crRNA and the target sequence of the target RNA is 100% over the 1-27 contiguous 5′-most nucleotides of the target sequence of the complementary strand of the target RNA and as low as 0% over the remainder. In such a case, the RNA-targeting sequence can be considered to be 1-27 nucleotides in length.
Generally, a naturally unprocessed pre-crRNA of Type VI comprises a direct repeat and an adjacent spacer (the portion of the crRNA that allows for targeting to a RNA molecule). In some embodiments, direct repeats (partial sequence or entire sequence) from unprocessed pre-crRNA are included into the Type VI gRNAs of the disclosure, and improve gRNA stability. Exemplary direct repeat sequences include SEQ ID NO: 92, 96, 98, 101, 103, 106, 109, and 112 (DNA sequences) or SEQ ID NOS 154-161 (RNA sequences). It is noted that while the exemplary sequences are provided in DNA nucleotides, it is understood that this DNA can then be transcribed into RNA. Accordingly the mature guides of disclosure may incorporate the entire or partial sequence of the exemplary direct repeat sequences provided herein; the guides may be composed of DNA nucleotides, analogous RNA nucleotides, or a combination of DNA and RNA nucleotides. Exemplary predicted secondary structures of the pre-crRNAs of the Type VI endonucleases of the disclosure are presented in
In some embodiments, the crRNAs include non-naturally occurring, engineered direct repeat sequences.
In some embodiments the spacer sequence of a Type VI gRNA of the disclosure is directed to a target sequence in a mammalian organism. In some embodiments the spacer sequence is directed to a target sequence in a non-mammalian organism.
In some embodiments, the spacer sequence of a Type VI gRNA of the disclosure is directed to a target sequence which is a sequence of a human. In some embodiments, the target sequence is a sequence of a non-human primate.
In some embodiments, the spacer sequence of a Type VI gRNA of the disclosure is directed to a target sequence in a mammalian organism, e.g. a human or non-human primate.
In some embodiments, the spacer sequence of a Type VI gRNA of the disclosure is directed to a target sequence in a bacteria.
In some embodiments, the spacer sequence of a Type VI gRNA of the disclosure is directed to a target sequence in a virus.
In some embodiments, the spacer sequence of a Type VI gRNA of the disclosure is directed to a target sequence in a plant.
The Type VI gRNAs of the disclosure can be modified to include an aptamer.
ii. gRNA Arrays
In some embodiments, the Type VI gRNAs of the disclosure can be provided as gRNA arrays.
Such gRNA arrays of the disclosure include more than one gRNA arrayed in tandem, and can be processed into two or more individual gRNAs. Thus, in some embodiments a precursor Type VI gRNA array comprises two or more (e.g., 3 or more, 4 or more, 5 or more, 2, 3, 4, or 5) gRNAs (e.g., arrayed in tandem as precursor molecules). In some embodiments, two or more gRNAs can be present on an array (a precursor gRNA array). A Type VI endonuclease of the disclosure can cleave the precursor gRNA array into individual gRNAs.
In some embodiments a Type VI gRNA array includes 2 or more gRNAs (e.g., 3 or more, 4 or more, 5 or more, 6 or more, or 7 or more, gRNAs). The gRNAs of a given array can target (i.e., can include guide sequences that hybridize to) different target sites of the same target RNA. In some embodiments, two or more gRNAs of a precursor gRNA array have the same guide sequence. In some embodiments, the precursor gRNA array comprises two or more gRNAs that target different target sites within the same target RNA. In some embodiments, the precursor gRNA array comprises two or more gRNAs that target different target RNAs.
Provided herein are novel Class 2 Type II CRISPR-Cas RNA-guided proteins and their guide RNAs (a “guide RNA” is interchangeably referred to herein as “gRNA”), constituting the Class 2 Type II CRISPR-Cas RNA-guided systems of the disclosure. As used herein a gRNA may comprise only RNA nucleotides, may comprise RNA and DNA nucleotides, or may comprise only DNA nucleotides, and thus while referred to as a gRNA, may comprise non RNA-nucleotides.
Accordingly, provided herein are systems comprising (a) a Type II endonuclease, or a nucleic acid encoding the Type II endonuclease; and (b) a Type II gRNA, or a nucleic acid encoding the Type II gRNA, wherein the gRNA and the Type II endonuclease do not naturally occur together, wherein the gRNA is capable of hybridizing to a target sequence in a target DNA, and the gRNA is capable of forming a complex with the Type II endonuclease. It should be understood that
These components are described in turn below.
Provided herein are novel Type II CRISPR-Cas RNA-guided endonucleases. In some embodiments, these endonucleases may share certain structural, sequence, and/or functional similarities with any one of the subtypes of Cas9.
Without being bound to any theory or mechanism, a Type II CRISPR-Cas RNA-guided endonucleases of the disclosure comprise three RuvC motifs and a HNH domain, responsible for catalytic activity.
In some embodiments a Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises any one of the RuvC sequences of Table 7, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises any two of the RuvC sequences of Table 7, or sequences comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises any three of the RuvC sequences of Table 7, or sequences comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a HNH domain selected from the group consisting of SEQ ID NO: 138, SEQ ID NO: 139, SEQ ID NO: 140, and SEQ ID NO: 141, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a RuvC I motif selected from the group consisting of SEQ ID NO: 116, SEQ ID NO: 121, SEQ ID NO: 126, and SEQ ID NO: 131, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a RuvC II motif selected from the group consisting of SEQ ID NO: 117, SEQ ID NO: 122, SEQ ID NO: 127, and SEQ ID NO: 132, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a RuvC III motif selected from the group consisting of SEQ ID NO: 118, SEQ ID NO: 123, SEQ ID NO: 128, and SEQ ID NO: 133, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a (1) RuvC I motif selected from the group consisting of of SEQ ID NO: 116, SEQ ID NO: 121, SEQ ID NO: 126, and SEQ ID NO: 131, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; (2) a RuvC II motif selected from the group consisting of SEQ ID NO: 117, SEQ ID NO: 122, SEQ ID NO: 127, and SEQ ID NO: 132, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and (3) a RuvC III motif selected from the group consisting of SEQ ID NO: 118, SEQ ID NO: 123, SEQ ID NO: 128, and SEQ ID NO: 133, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. The Type II CRISPR-Cas RNA-guided endonuclease may further comprise a HNH domain selected from the group consisting of SEQ ID NO: 138, SEQ ID NO: 139, SEQ ID NO: 140, and SEQ ID NO: 141, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a (1) RuvC I motif comprising the sequence of SEQ ID NO: 116, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; (2) a RuvC II motif comprising the sequence of SEQ ID NO: 117, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and (3) a RuvC III motif comprising the sequence of SEQ ID NO: 118, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. The Type II CRISPR-Cas RNA-guided endonuclease may further comprise a HNH domain selected from the group consisting of SEQ ID NO: 138, SEQ ID NO: 139, SEQ ID NO: 140, and SEQ ID NO: 141, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. In some embodiments, the HNH domain comprises the sequence of SEQ ID NO: 138, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a (1) RuvC I motif comprising the sequence of SEQ ID NO: 121, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; (2) a RuvC II motif comprising the sequence of SEQ ID NO: 122, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and (3) a RuvC III motif comprising the sequence of SEQ ID NO: 123, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. The Type II CRISPR-Cas RNA-guided endonuclease may further comprise a HNH domain selected from the group consisting of SEQ ID NO: 138, SEQ ID NO: 139, SEQ ID NO: 140, and SEQ ID NO: 141, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. In some embodiments, the HNH domain comprises the sequence of SEQ ID NO: 139, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a (1) RuvC I motif comprising the sequence of SEQ ID NO: 126, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; (2) a RuvC II motif comprising the sequence of SEQ ID NO: 127, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and (3) a RuvC III motif comprising the sequence of SEQ ID NO: 128, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. The Type II CRISPR-Cas RNA-guided endonuclease may further comprise a HNH domain selected from the group consisting of SEQ ID NO: 138, SEQ ID NO: 139, SEQ ID NO: 140, and SEQ ID NO: 141, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. In some embodiments, the HNH domain comprises the sequence of SEQ ID NO: 140, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments a Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises a (1) RuvC I motif comprising the sequence of SEQ ID NO: 131, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; (2) a RuvC II motif comprising the sequence of SEQ ID NO: 132, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto; and (3) a RuvC III motif comprising the sequence of SEQ ID NO: 133, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. The Type II CRISPR-Cas RNA-guided endonuclease may further comprise a HNH domain selected from the group consisting of SEQ ID NO: 138, SEQ ID NO: 139, SEQ ID NO: 140, and SEQ ID NO: 141, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. In some embodiments, the HNH domain comprises the sequence of SEQ ID NO: 141, or a sequence comprising at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
Table 7 provided exemplary RuvC I, RuvC II, RuvC III, and HNH domain sequences of the Type II endonucleases of the disclosure.
Table 8 shows exemplary amino acid sequences for novel Type II sequences of the disclosure. Genes were identified from metagenomic samples. Scripts were run on the sequences, designed to find CRISPR sequences and accompanying genes encoding proteins showing homology with reported Cas enzymes. Comparative BlastP analyses were performed against sequences deposited in databases (NCBI, LENS), discarding those candidates showing Id %>50 with deposited proteins. Presence of specific domains (e.g. RuvC, HEPN) and catalytic motifs were determined (CD-search, phmmer, UNIPROT).
SEQ ID NO: 16 represents a novel Type II variant of the disclosure, Type II Cas_1, (1091 amino acids in length
In some embodiments the Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 16 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 16 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 16 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 17 represents a novel Type II variant of the disclosure, Type II Cas_2, (1565 amino acids in length).
In some embodiments the Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 17 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 17 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 17 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 18 represents a novel Type II variant of the disclosure, Type II Cas_3, (1064 amino acids in length).
In some embodiments the Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 18 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 18 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 18 and proteins with at least 30%-99.5% sequence identity thereto.
SEQ ID NO: 19 represents a novel Type II variant of the disclosure, Type II Cas_4, (1024 amino acids in length).
In some embodiments the Type II CRISPR-Cas RNA-guided endonuclease of the disclosure comprises the amino acid sequence of SEQ ID NO: 19 and proteins with at least 30%-99.5% sequence identity thereto. Accordingly, provided herein are proteins comprising the amino acid sequence of SEQ ID NO: 19 and proteins with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto. Also provided herein are nucleic acids encoding the proteins comprising the amino acid sequence of SEQ ID NO: 19 and proteins with at least 30%-99.5% sequence identity thereto.
Table 9 provides exemplary nucleic acid sequences for encoding certain Type II sequences of the disclosure. Also provided are exemplary E. coli codon optimized nucleic acid sequences for encoding certain Type II sequences of the disclosure.
Accordingly, provided herein are exemplary nucleic acid sequences encoding the Type II CRISPR-Cas RNA-guided endonucleases of the disclosure. In some embodiments, a Type II CRISPR-Cas RNA-guided endonuclease is encoded by a nucleic acid sequence comprising or consisting of the sequence of any one of SEQ ID NOs: 51-58, or a nucleic acid sequence with at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99.5% sequence identity thereto.
In some embodiments, the Type II endonuclease of the disclosure is catalytically active.
In some embodiments, the Type II endonuclease of the disclosure is catalytically dead e.g. by introducing mutations in one or more of the RuvC domains.
In some embodiments, the Type II endonuclease of the disclosure is a Type II nickase.
The Type II endonucleases of the disclosure can be modified to include an aptamer.
The Type II endonuclease of the disclosure can be further fused to domains, e.g. catalytic domains to produce dual action Cas proteins. In some embodiments, a Type II endonuclease is further fused to a base editor.
gRNAs for Class 2 Type II CRISPR-Cas RNA-Guided Endonucleases
The present disclosure provides DNA-targeting RNAs that direct the activities of the novel Type II endonucleases of the disclosure to a specific target sequence within a target DNA. These DNA-targeting RNAs are referred to herein as “gRNAs” or “gRNAs” Generally, as provided herein, a Type II gRNA comprises a first segment (also referred to herein as a “targeter-RNA”, a “DNA-targeting segment” or a “DNA-targeting sequence”) and a second segment (also referred to herein as a “activator-RNA”, a “activator-RNA” or a “protein-binding sequence”). Also provided herein are nucleotide sequences encoding the Type II gRNAs of the disclosure.
i. Targeter-RNA
The targeter-RNA of a Type II endonulcease gRNA of the disclosure comprises a nucleotide sequence that is complementary to a sequence in a target DNA (targeting sequence of the gRNA; DNA-targeting sequence; spacer sequence). The targeter-RNA can interchangeably be referred to as a crRNA. The targeter-RNA of a gRNA interacts with a target DNA in a sequence-specific manner via hybridization (i.e., base pairing). As such, the nucleotide sequence of the targeter-RNA may vary and determines the location within the target DNA that the gRNA and the target DNA will interact. The targeter-RNA of a subject gRNA can be modified (e.g., by genetic engineering) to hybridize to any desired sequence within a target DNA.
Generally, a naturally unprocessed pre-crRNA of Type II comprises a direct repeat and an adjacent spacer (the portion of the crRNA that allows for targeting to a DNA molecule). In some embodiments, direct repeats (partial sequence or entire sequence) from unprocessed pre-crRNA are included into the Type II gRNAs of the disclosure, and improve gRNA stability. Exemplary direct repeat sequences include SEQ ID NO: 115, 120, 125, and 130. It is noted that while the exemplary sequences are provided in DNA nucleotides, it is understood that this DNA can then be transcribed into RNA. Accordingly the mature guides of disclosure may incorporate the entire or partial sequence of the exemplary direct repeat sequences provided herein; the guides may be composed of DNA nucleotides, analogous RNA nucleotides, or a combination of DNA and RNA nucleotides. Exemplary predicted secondary structures of the pre-crRNAs of the Type II endonucleases of the disclosure are presented in
The targeter-RNA can have a length of from about 12 nucleotides to about 100 nucleotides. For example, the targeter-RNA can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50 nt, from about 12 nt to about 40 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, or from about 12 nt to about 19 nt. For example, the targeter-RNA can have a length of from about 19 nt to about 20 nt, from about 19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt, from about 19 nt to about 60 nt, from about 19 nt to about 70 nt, from about 19 nt to about 80 nt, from about 19 nt to about 90 nt, from about 19 nt to about 100 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about 45 nt, from about 20 nt to about 50 nt, from about 20 nt to about 60 nt, from about 20 nt to about 70 nt, from about 20 nt to about 80 nt, from about 20 nt to about 90 nt, or from about 20 nt to about 100 nt.
In some embodiments, the gRNAs of the disclosure include a portion of, or the entirety of the naturally occurring direct repeat sequences which can be incorporated into the engineered gRNAs of the disclosure. Exemplary Type II naturally occurring direct sequences are provided herein, and include SEQ ID NO: and 115, 120, 125, and 130.
In some embodiments, the gRNAs of the disclosure include non-naturally occurring, engineered direct repeat sequences which can be incorporated into the engineered gRNAs of the disclosure.
ii. Spacer Sequences
gRNAs of the disclosure comprise spacer sequences, complementary to the target DNA. More specifically, the nucleotide sequence of the targeter-RNA that is complementary to a target nucleotide sequence (the DNA-targeting sequence or spacer sequence) of the target DNA can have a length at least about 12 nt. For example, the DNA-targeting sequence of the targeter-RNA that is complementary to a target sequence of the target DNA can have a length at least about 12 nt, at least about 15 nt, at least about 18 nt, at least about 19 nt, at least about 20 nt, at least about 25 nt, at least about 30 nt, at least about 35 nt or at least about 40 nt. For example, the DNA-targeting sequence of the targeter-RNA that is complementary to a target sequence of the target DNA can have a length of from about 12 nucleotides (nt) to about 80 nt, from about 12 nt to about 50 nt, from about 12 nt to about 45 nt, from about 12 nt to about 40 nt, from about 12 nt to about 35 nt, from about 12 nt to about 30 nt, from about 12 nt to about 25 nt, from about 12 nt to about 20 nt, from about 12 nt to about 19 nt, from about 19 nt to about 20 nt, from about 19 nt to about 25 nt, from about 19 nt to about 30 nt, from about 19 nt to about 35 nt, from about 19 nt to about 40 nt, from about 19 nt to about 45 nt, from about 19 nt to about 50 nt, from about 19 nt to about 60 nt, from about 20 nt to about 25 nt, from about 20 nt to about 30 nt, from about 20 nt to about 35 nt, from about 20 nt to about 40 nt, from about 20 nt to about 45 nt, from about 20 nt to about 50 nt, or from about 20 nt to about 60 nt. The nucleotide sequence (the DNA-targeting sequence) of the targeter-RNA that is complementary to a nucleotide sequence (target sequence) of the target DNA can have a length at least about 12 nt. In some embodiments, the DNA-targeting sequence of the targeter-RNA that is complementary to a target sequence of the target DNA is 20 nucleotides in length. In some embodiments, the DNA-targeting sequence of the targeter-RNA that is complementary to a target sequence of the target DNA is 19 nucleotides in length.
The percent complementarity between the spacer sequence of the targeter-RNA and the target sequence of the target DNA can be at least 60% (e.g., at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%). In some embodiments, the percent complementarity between the DNA-targeting sequence of the targeter-RNA and the target sequence of the target DNA is 100% over the 1-25 contiguous 5′-most nucleotides of the target sequence of the complementary strand of the target DNA. In some embodiments, the percent complementarity between the DNA-targeting sequence of the targeter-RNA and the target sequence of the target DNA is at least 60% over about 1-25 contiguous nucleotides. In some embodiments, the percent complementarity between the DNA-targeting sequence of the targeter-RNA and the target sequence of the target DNA is 100% over the 1-25 contiguous 5′-most nucleotides of the target sequence of the complementary strand of the target DNA and as low as 0% over the remainder. In such a case, the DNA-targeting sequence can be considered to be 1-25 nucleotides in length.
In some embodiments the spacer sequence of a Type II gRNA of the disclosure is directed to a target sequence in a mammalian organism. In some embodiments the spacer sequence is directed to a target sequence in a non-mammalian organism.
In some embodiments, the spacer sequence of a Type II gRNA of the disclosure is directed to a target sequence which is a sequence of a human. In some embodiments, the target sequence is a sequence of a non-human primate.
In some embodiments the spacer sequence of a Type II gRNA of the disclosure is directed to a target sequence selected of a therapeutic target.
In some embodiments the spacer sequence of a Type II gRNA of the disclosure is directed to a target sequence selected of a diagnostic target—for example in such embodiments a labeled catalytically dead Type II endonuclease of the disclosure and a gRNA directed to a diagnostic target DNA is contacted with the target DNA, or a cell comprising the target DNA, or a sample comprising the target DNA.
iii. Activator-RNA
The activator-RNA of a Type II gRNA of the disclosure binds with its cognate Type II endonuclease of the disclosure. The activator-RNA can interchangeably be referred to as a tracrRNA. The gRNA guides the bound Type II endonuclease to a specific nucleotide sequence within target DNA via the above described targeter-RNA. The activator-RNA of a Type II gRNA comprises two stretches of nucleotides that are complementary to one another. Exemplary tracrRNAs are provided herein, and include SEQ ID NO: 114, 119, 124, and 129.
iv. Dual-Molecule Type H gRNAs
In some embodiments, provided herein are dual molecule (two-molecule) gRNAs for the novel Type II endonucleases of the disclosure. Such gRNAs comprise two separate RNA molecules (activator RNA-tracRNA; and the targeting RNA-crRNA). Each of the two RNA molecules of a subject double-molecule gRNA comprises a stretch of nucleotides that are complementary to one another such that the complementary nucleotides of the two RNA molecules hybridize to form the double stranded RNA duplex of the gRNA.
A dual-molecule gRNA can be designed to allow for controlled (i.e., conditional) binding of a targeter-RNA with an activator-RNA. Because a dual-molecule gRNA is not functional unless both the activator-RNA and the targeter-RNA are bound in a functional complex with Type II endonulcease of the disclosure, a dual-molecule gRNA can be inducible (e.g., drug inducible) by rendering the binding between the activator-RNA and the targeter-RNA to be inducible. As one non-limiting example, RNA aptamers can be used to regulate (i.e., control) the binding of the activator-RNA with the targeter-RNA. Accordingly, the activator-RNA and/or the targeter-RNA can comprise an RNA aptamer sequence.
The dual-molecule guide can be modified to include an aptamer
v. Single-Molecule Type II Endonulcease gRNAs
In some embodiments, provided herein are Type II gRNAs that comprises a single-molecule gRNA (interchangeably referred to herein as a sgRNA), for the novel Type II endonucleases of the disclosure.
Accordingly provided herein is an engineered single-molecule gRNA, comprising:
a. a targeter-RNA that is capable of hybridizing with a target sequence in a target DNA; and
b. an activator-RNA that is capable of hybridizing with the targeter-RNA to form a double-stranded RNA duplex, the activator-RNA comprising a activator-RNA,
wherein the targeter-RNA and the activator-RNA are covalently linked to one another, wherein the single-molecule gRNA is capable of forming a complex with a novel Type II endonuclease of the disclosure, and wherein hybridization of the targeter-RNA to the target sequence is capable of targeting the Type II endonuclease of the disclosure to the target DNA.
A subject single-molecule gRNA comprises two segments of nucleotides (a targeter-RNA and an activator-RNA) that are complementary to one another, can be covalently linked by intervening nucleotides (“linkers” or “linker nucleotides”), and hybridize to form the double stranded RNA duplex (dsRNA duplex) of the activator-RNA, whereby resulting in a stem-loop structure. In some embodiments, the targeter-RNA and the activator-RNA are covalently linked via the 3′ end of the targeter-RNA and the 5′ end of the activator-RNA. In other embodiments, the activator-RNA is covalently linked via the 5′ end of the targeter-RNA and the 3′ end of the activator-RNA.
In some embodiments, the targeter-RNA and the activator-RNA are arranged in a 5′ to 3′ orientation.
In some embodiments, the activator-RNA and the targeter-RNA are arranged in a 5′ to 3′ orientation.
In some embodiments, the single molecule gRNA comprises one or more sequence modifications compared to a sequence of a corresponding wild type tracrRNA and/or crRNA.
In some embodiments, the targeter-RNA and the activator-RNA are covalently linked to one another via a linker.
When present, the linker of a single-molecule gRNA can have a length of from about 3 nucleotides to about 30 nucleotides. In exemplary embodiments, the linker of a single-molecule gRNA is 4, 5, 6, or 7 nt.
An exemplary single-molecule gRNA comprises two complementary stretches of nucleotides that hybridize to form a dsRNA duplex. In some embodiments, one of the two complementary stretches of nucleotides of the single-molecule gRNA (or the DNA encoding the stretch) is at least about 60% identical to one of the activator-RNA. For example, one of the two complementary stretches of nucleotides of the single-molecule gRNA (or the DNA encoding the stretch) is at least about 65% identical, at least about 70% identical, at least about 75% identical, at least about 80% identical, at least about 85% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical or 100% identical to an activator-RNA.
The activator-RNA and targeter-RNA segments can be engineered, while ensuring that the structure of the protein-binding domain of the gRNA is conserved. Thus, RNA folding structure of a naturally occurring protein-binding domain of a DNA-targeting RNA can be taken into account in order to design artificial protein-binding domains (either dual-molecule or single-molecule versions).
The activator-RNA in a single-molecule gRNA can have a length of from about 10 nucleotides to about 100 nucleotides. For example, the activator-RNA can have a length of from about 15 nucleotides (nt) to about 80 nt, from about 15 nt to about 50 nt, from about 15 nt to about 40 nt, from about 15 nt to about 30 nt or from about 15 nt to about 25 nt.
Also with regard to both the single-molecule and double-molecule gRNAs of the disclosure, the dsRNA duplex of the activator-RNA can have a length from about 6 nucleotides (nt) to about 50 bp. For example, the dsRNA duplex of the activator-RNA can have a length from about 6 nt to about 40 nt, from about 6 nt to about 30 bp, from about 6 nt to about 25 nt, from about 6 nt to about 20 nt, from about 6 nt to about 15 nt, from about 8 nt to about 40 nt, from about 8 nt to about 30 bp, from about 8 nt to about 25 nt, from about 8 nt to about 20 nt or from about 8 nt to about 15 nt. For example, the dsRNA duplex of the activator-RNA can have a length from about from about 8 nt to about 10 nt, from about 10 nt to about 15 nt, from about 15 nt to about 18 nt, from about 18 nt to about 20 nt, from about 20 nt to about 25 nt, from about 25 nt to about 30 nt, from about 30 nt to about 35 nt, from about 35 nt to about 40 nt, or from about 40 nt to about 50 nt. In some embodiments, the dsRNA duplex of the activator-RNA has a length of 8-15 base pairs. The percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the activator-RNA can be at least about 60%. For example, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the activator-RNA can be at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%. In some embodiments, the percent complementarity between the nucleotide sequences that hybridize to form the dsRNA duplex of the activator-RNA is 100%.
In some embodiments, the spacer sequence of a Type II gRNA (whether it is a single molecule gRNA or a dual molecule gRNA) of the disclosure is directed to a target sequence in a mammalian organism, e.g. a human or non-human primate. In some embodiments, the spacer sequence of a Type II gRNA of the disclosure is directed to a target sequence in a bacteria.
In some embodiments, the spacer sequence of a Type II gRNA of the disclosure is directed to a target sequence in a virus. In some embodiments, the spacer sequence of a Type II gRNA of the disclosure is directed to a target sequence in a plant.
In some embodiments, the single-molecule Type II gRNAs of the disclosure can be modified to include an aptamer.
vi. gRNA Arrays
The Type II gRNAs of the disclosure can be provided as gRNA arrays.
gRNA arrays include more than one gRNA arrayed in tandem, and can be processed into into two or more individual gRNAs. Thus, in some embodiments a precursor Type II gRNA array comprises two or more (e.g., 3 or more, 4 or more, 5 or more, 2, 3, 4, or 5) gRNAs (e.g., arrayed in tandem as precursor molecules). In some embodiments, two or more gRNAs can be present on an array (a precursor gRNA array). A Type II endonuclease of the disclosure can cleave the precursor gRNA array into individual gRNAs.
In some embodiments a gRNA array includes 2 or more gRNAs (e.g., 3 or more, 4 or more, 5 or more, 6 or more, or 7 or more, gRNAs). The gRNAs of a given array can target (i.e., can include guide sequences that hybridize to) different target sites of the same target DNA. In some embodiments, two or more gRNAs of a precursor gRNA array have the same guide sequence. In some embodiments, the precursor gRNA array comprises two or more gRNAs that target different target sites within the same target DNA. In some embodiments, the precursor gRNA array comprises two or more gRNAs that target different target DNAs.
a. Type II and Type V Endonuclease-Mediated Modification of Target DNA
Provided herein are uses of the novel Type II and Type V endonucleases of the disclosure, for the modification of a target DNA. In some embodiments the method of modifying a target DNA, the method comprising contacting the target DNA with any one of the Type II or Type V systems described herein.
In some embodiments, the target DNA is part of a chromosome in vitro. In some embodiments, the target DNA is part of a chromosome in vivo.
In some embodiments, the target DNA is part of a chromosome in a cell.
In some embodiments, the target DNA is extrachromosomal DNA.
In some embodiments, the target DNA is in a cell, wherein the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell.
In some embodiments, the target DNA is the DNA of a parasite.
In some embodiments, the target DNA is a viral DNA.
In some embodiments, the target DNA is a bacterial DNA.
In some embodiments, the modifying comprises introducing a double strand break in the target DNA.
In some embodiments, the contacting occurs under conditions that are permissive for non-homologous end joining or homology-directed repair.
In some embodiments, the method comprises contacting the target DNA with a donor polynucleotide, wherein the donor polynucleotide, a portion of the donor polynucleotide, a copy of the donor polynucleotide, or a portion of a copy of the donor polynucleotide integrates into the target DNA.
In some embodiments, the method does not comprise contacting the cell with a donor polynucleotide, wherein the target DNA is modified such that nucleotides within the target DNA are deleted.
b. Type VI Endonuclease-Mediated Modification of Target RNA
Provided herein are uses of the novel Type VI endonucleases of the disclosure, for the modification of a target RNA. In some embodiments the method of modifying a target RNA, the method comprising contacting the target RNA with any one of the Type VI systems described herein.
In some embodiments, the target RNA is in vitro. In some embodiments, the target RNA in vivo.
In some embodiments, the target RNA is in a cell, wherein the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell.
In some embodiments, the target RNA is the RNA of a parasite.
In some embodiments, the target RNA is a viral RNA.
In some embodiments, the target RNA is a bacterial RNA.
The target RNA may be any suitable form of RNA. This may include, in some embodiments, mRNA. In other embodiments, the target RNA may include tRNA or rRNA. In other embodiments, the target RNA may include miRNA. In other embodiments, the target RNA may include siRNA.
c. Therapeutic Applications (Type II, Type V endonucleases)
The disclosure provides novel Type II, and Type V endonucleases, engineered systems, one or more polynucleotides encoding components of said system, and vector or delivery systems comprising one or more polynucleotides encoding components of said system for use in therapeutic methods. The therapeutic methods may comprise gene or genome editing, or gene therapy. The therapeutic methods comprise use and delivery of the novel Type II or Type V endonucleases of the disclosure.
Accordingly, in some embodiments, provided herein is a method of modifying a target DNA, the method comprising contacting a target DNA, a cell comprising the target DNA, or a subject with cells with the target DNA, with any one of the Type II and Type V systems described herein. In other embodiments, provided herein is a method of modifying a target RNA, the method comprising contacting a target RNA, a cell comprising the target RNA, or a subject with cells with the target RNA, with any one of the Type VI systems described herein.
In some embodiments, the target DNA is part of a chromosome in vitro. In some embodiments, the target DNA is part of a chromosome in vivo.
In some embodiments, the target DNA is part of a chromosome in a cell.
In some embodiments, the target DNA is extrachromosomal DNA.
In some embodiments, the target DNA is in a cell, wherein the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell.
In some embodiments, the target DNA is outside of a cell.
In some embodiments, the target DNA is in vitro inside of a cell.
In some embodiments, the target DNA is in vivo, inside of a cell.
In some embodiments, the modifying comprises introducing a double strand break in the target DNA.
In some embodiments, the contacting occurs under conditions that are permissive for non-homologous end joining or homology-directed repair.
In some embodiments, the method comprises contacting the target DNA with a donor polynucleotide, wherein the donor polynucleotide, a portion of the donor polynucleotide, a copy of the donor polynucleotide, or a portion of a copy of the donor polynucleotide integrates into the target DNA.
In some embodiments, the method does not comprise contacting the cell with a donor polynucleotide, wherein the target DNA is modified such that nucleotides within the target DNA are deleted.
In some embodiments, the therapeutic methods involve modifying a target DNA comprising a target sequence of a gene of interest and/or the regulatory region of the gene of interest, the method comprising delivering to a cell comprising the target DNA, a Type II endonuclease of the disclosure and one or more Type II gRNAs, a Type V endonuclease of the disclosure and one or more Type V gRNAs, one or more nucleotides encoding the Type II endonuclease of the disclosure and one or more Type II gRNAs, or one or more nucleotides encoding a Type V endonuclease of the disclosure and one or more Type V gRNAs.
In some embodiments, the gene of interest is within a eukaryotic cell, e.g. a human or non-human primate cell.
In some embodiments, the gene of interest is within a plant cell.
In some embodiments, the delivering comprises delivering to the cell a Type II endonuclease of the disclosure (or one or more nucleotides encoding the same) and one or more Type II gRNAs.
In some embodiments, the delivering comprises delivering to the cell a Type V endonuclease of the disclosure (or one or more nucleotides encoding the same) and one or more Type V gRNAs.
In some embodiments, the delivering comprises delivering to the cell one or more nucleotides encoding the Type II endonuclease of the disclosure and one or more Type II gRNAs.
In some embodiments, the delivering comprises delivering to the cell one or more nucleotides encoding a Type V endonuclease of the disclosure and one or more Type V gRNAs.
d. Therapeutic Applications (Type VI Endonucleases)
The disclosure provides novel Type VI endonucleases, engineered systems, one or more polynucleotides encoding components of said system, and vector or delivery systems comprising one or more polynucleotides encoding components of said system for use in therapeutic methods.
Accordingly, in some embodiments, provided herein is a method of modifying a target RNA, the method comprising contacting a target RNA, a cell comprising the target RNA, or a subject with cells with the target RNA, with any one of the Type VI systems described herein. In other embodiments, provided herein is a method of modifying a target RNA, the method comprising contacting a target RNA, a cell comprising the target RNA, or a subject with cells with the target RNA, with any one of the Type VI systems described herein.
In some embodiments, the target RNA is in a cell, wherein the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell.
In some embodiments, the target RNA is outside of a cell.
In some embodiments, the target RNA is in vitro inside of a cell.
In some embodiments, the target RNA is in vivo, inside of a cell.
The target RNA may be any suitable form of RNA. This may include, in some embodiments, mRNA. In other embodiments, the target RNA may include tRNA or rRNA. In other embodiments, the target RNA may include miRNA. In other embodiments, the target RNA may include siRNA.
In some embodiments, the therapeutic methods involve modifying a target RNA comprising a mRNA encoding a gene of interest and/or the regulatory region of the mRNA of interest, the method comprising delivering to a cell comprising the target RNA, a Type VI endonuclease of the disclosure and one or more Type VI gRNAs, or one or more nucleotides encoding the Type VI endonuclease of the disclosure and one or more Type VI gRNAs.
In some embodiments, the RNA of interest is within a eukaryotic cell, e.g. a human or non-human primate cell.
In some embodiments, the RNA of interest is within a plant cell.
In some embodiments, the delivering comprises delivering to the cell a Type VI endonuclease of the disclosure (or one or more nucleotides encoding the same) and one or more Type VI gRNAs.
In some embodiments, the delivering comprises delivering to the cell one or more nucleotides encoding a Type VI endonuclease of the disclosure and one or more Type VI gRNAs.
e. Delivery
Delivery of the Type II, Type V, and Type VI components to a cell can be achieved by any variety of delivery methods known to those of skill in the art. As a non-limiting example, the components can be combined with a lipid. As another non-limiting example, the components combined with a particle, or formulated into a particle, e.g. a nanoparticle.
Methods of introducing a nucleic acid and/or protein into a host cell are known in the art, and any convenient method can be used to introduce a subject nucleic acid (e.g., an expression construct/vector) into a target cell (e.g., prokaryotic cell, eukaryotic cell, plant cell, animal cell, mammalian cell, human cell, and the like). Suitable methods include, e.g., viral infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI)-mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct micro injection, nanoparticle-mediated nucleic acid delivery and the like.
A gRNA can be introduced, e.g., as a DNA molecule encoding the gRNA, or can be provided directly as an RNA molecule (or a chimeric/hybrid molecule when applicable).
In some embodiments, Type II, Type V, or Type VI endonuclease is provided as a nucleic acid (e.g., an mRNA, a DNA, a plasmid, an expression vector, a viral vector, etc.) that encodes the protein.
In some embodiments, the Type II, Type V, or Type VI endonuclease is provided directly as a protein (e.g., without an associated gRNA or with an associate gRNA, i.e., as a ribonucleoprotein complex—RNP). Like a gRNA, a Type II, Type V, or Type VI endonuclease of the disclosure can be introduced into a cell (provided to the cell) by any convenient method; such methods are known to those of ordinary skill in the art. As an illustrative example, a Type II, Type V, or Type VI endonuclease of the disclosure can be injected directly into a cell (e.g., with or without a gRNA or nucleic acid encoding a gRNA). As another example, a pre-formed complex of a Type II, Type V, or Type VI endonuclease and a gRNA can be introduced into a cell (e.g., eukaryotic cell) (e.g., via injection, via nucleofection; via a protein transduction domain (PTD) conjugated to one or more components, e.g., conjugated to the Type II, Type V, or Type VI endonuclease of the disclosure, conjugated to a gRNA; etc.).
In some embodiments, a nucleic acid (e.g., a gRNA; a nucleic acid comprising a nucleotide sequence encoding a Type II, Type V, or Type VI endonuclease of the disclosure; etc.) and/or a polypeptide (e.g., a Type II, Type V, or Type VI endonuclease of the disclosure) is delivered to a cell (e.g., a target host cell) in a particle, or associated with a particle. In some embodiments, the particle is a nanoparticle.
A Type II, Type V, or Type VI endonuclease of the disclosure (or an mRNA comprising a nucleotide sequence encoding the protein) and/or gRNA (or a nucleic acid such as one or more expression vectors encoding the gRNA) may be delivered simultaneously using particles or lipid envelopes.
f. Target Cells of Interest
Suitable target cells (which can comprise target DNA such as genomic DNA or target RNA) include, but are not limited to: a bacterial cell; an archaeal cell; a cell of a single-cell eukaryotic organism; a plant cell; an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens, C. agardh, and the like; a fungal cell (e.g., a yeast cell); an animal cell; a cell from an invertebrate animal (e.g. fruit fly, a cnidarian, an echinoderm, a nematode, etc.); a cell of an insect (e.g., a mosquito; a bee; an agricultural pest; etc.); a cell of an arachnid (e.g., a spider; a tick; etc.); a cell from a vertebrate animal (e.g., a fish, an amphibian, a reptile, a bird, a mammal); a cell from a mammal (e.g., a cell from a rodent; a cell from a human; a cell of a non-human mammal; a cell of a rodent (e.g., a mouse, a rat); a cell of a lagomorph (e.g., a rabbit); a cell of an ungulate (e.g., a cow, a horse, a camel, a llama, a vicuna, a sheep, a goat, etc.); a cell of a marine mammal (e.g., a whale, a seal, an elephant seal, a dolphin, a sea lion; etc.) and the like.
Any type of cell may be of interest (e.g. a stem cell, e.g. an embryonic stem (ES) cell, an induced pluripotent stem cell (iPSC), a germ cell (e.g., an oocyte, a sperm, an oogonia, a spermatogonia, etc.), an adult stem cell, a somatic cell, e.g. a fibroblast, a hematopoietic cell, a neuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell; an in vitro or in vivo embryonic cell of an embryo at any stage, e.g., a 1-cell, 2-cell, 4-cell, 8-cell, etc. stage zebrafish embryo; etc.).
Cells may be from cell lines or primary cells. Target cells can be unicellular organisms and/or can be grown in culture. If the cells are primary cells, they may be harvest from an individual by any convenient method. For example, leukocytes may be conveniently harvested by apheresis, leukocytapheresis, density gradient separation, etc., while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. can be conveniently harvested by biopsy.
Because the gRNA provides specificity by hybridizing to target nucleic acid, a mitotic and/or post-mitotic cell of interest in the disclosed methods may include a cell of any organism (e.g. a bacterial cell, an archaeal cell, a cell of a single-cell eukaryotic organism, a plant cell, an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens, C. agardh, and the like, a fungal cell (e.g., a yeast cell), an animal cell, a cell of an invertebrate animal (e.g. fruit fly, cnidarian, echinoderm, nematode, etc.), a cell of a vertebrate animal (e.g., fish, amphibian, reptile, bird, mammal), a cell of a mammal, a cell of a rodent, a cell of a human, etc.).
Plant cells include cells of a monocotyledon, and cells of a dicotyledon. The cells can be root cells, leaf cells, cells of the xylem, cells of the phloem, cells of the cambium, apical meristem cells, parenchyma cells, collenchyma cells, sclerenchyma cells, and the like. Plant cells include cells of agricultural crops such as wheat, corn, rice, sorghum, millet, soybean, etc. Plant cells include cells of agricultural fruit and nut plants, e.g., plant that produce apricots, oranges, lemons, apples, plums, pears, almonds, etc.
Non-limiting examples of cells (target cells) include: a prokaryotic cell, eukaryotic cell, a bacterial cell, an archaeal cell, a cell of a single-cell eukaryotic organism, a protozoa cell, a cell from a plant (e.g., cells from plant crops, fruits, vegetables, grains, soy bean, corn, maize, wheat, seeds, tomatos, rice, cassava, sugarcane, pumpkin, hay, potatos, cotton, cannabis, tobacco, flowering plants, conifers, gymnosperms, angiosperms, ferns, clubmosses, hornworts, liverworts, mosses, dicotyledons, monocotyledons, etc.), an algal cell, (e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens, C. agardh, and the like), seaweeds (e.g. kelp) a fungal cell (e.g., a yeast cell, a cell from a mushroom), an animal cell, a cell from an invertebrate animal (e.g., fruit fly, cnidarian, echinoderm, nematode, etc.), a cell from a vertebrate animal (e.g., fish, amphibian, reptile, bird, mammal), a cell from a mammal (e.g., an ungulate (e.g., a pig, a cow, a goat, a sheep); a rodent (e.g., a rat, a mouse); a non-human primate; a human; a feline (e.g., a cat); a canine (e.g., a dog); etc.), and the like. In some embodiments, the cell is a cell that does not originate from a natural organism (e.g., the cell can be a synthetically made cell; also referred to as an artificial cell).
A cell can be an in vitro cell (e.g., established cultured cell line). A cell can be an ex vivo cell (cultured cell from an individual). A cell can be and in vivo cell (e.g., a cell in an individual). A cell can be an isolated cell. A cell can be a cell inside of an organism. A cell can be an organism.
Suitable cells include human embryonic stem cells, fetal cardiomyocytes, myofibroblasts, mesenchymal stem cells, autotransplated expanded cardiomyocytes, adipocytes, totipotent cells, pluripotent cells, blood stem cells, myoblasts, adult stem cells, bone marrow cells, mesenchymal cells, embryonic stem cells, parenchymal cells, epithelial cells, endothelial cells, mesothelial cells, fibroblasts, osteoblasts, chondrocytes, exogenous cells, endogenous cells, stem cells, hematopoietic stem cells, bone-marrow derived progenitor cells, myocardial cells, skeletal cells, fetal cells, undifferentiated cells, multi-potent progenitor cells, unipotent progenitor cells, monocytes, cardiac myoblasts, skeletal myoblasts, macrophages, capillary endothelial cells, xenogenic cells, allogenic cells, and post-natal stem cells.
In some embodiments, the cell is an immune cell, a neuron, an epithelial cell, and endothelial cell, or a stem cell. In some embodiments, the immune cell is a T cell, a B cell, a monocyte, a natural killer cell, a dendritic cell, or a macrophage. In some embodiments, the immune cell is a cytotoxic T cell. In some embodiments, the immune cell is a helper T cell. In some embodiments, the immune cell is a regulatory T cell (Treg).
In some embodiments, the cell is a stem cell. Stem cells include adult stem cells. Adult stem cells are also referred to as somatic stem cells.
Adult stem cells are resident in differentiated tissue, but retain the properties of self-renewal and ability to give rise to multiple cell types, usually cell types typical of the tissue in which the stem cells are found. Numerous examples of somatic stem cells are known to those of skill in the art, including muscle stem cells; hematopoietic stem cells; epithelial stem cells; neural stem cells; mesenchymal stem cells; mammary stem cells; intestinal stem cells; mesodermal stem cells; endothelial stem cells; olfactory stem cells; neural crest stem cells; and the like.
Stem cells of interest include mammalian stem cells, where the term “mammalian” refers to any animal classified as a mammal, including humans; non-human primates; domestic and farm animals; and zoo, laboratory, sports, or pet animals, such as dogs, horses, cats, cows, mice, rats, rabbits, etc. In some embodiments, the stem cell is a human stem cell. In some embodiments, the stem cell is a rodent (e.g., a mouse; a rat) stem cell. In some embodiments, the stem cell is a non-human primate stem cell.
g. Targets
Any gene of interest can serve as a target for modification.
In particular embodiments, the target is a gene or mRNA implicated in cancer. In particular embodiments, the target is a gene or mRNA implicated in an immune disease, e.g. an autoimmune disease. In particular embodiments, the target is a gene or mRNA implicated in a neurodegenerative disease. In particular embodiments, the target is a gene or mRNA implicated in a neuropsychiatric disease. In particular embodiments, the target is a gene or mRNA implicated in a muscular disease. In particular embodiments, the target is a gene or mRNA implicated in a cardiac disease. In particular embodiments, the target is a gene implicated in diabetes. In particular embodiments, the target is a gene implicated in kidney disease.
h. Precursor gRNA Arrays
The therapeutic methods provided herein can include delivery of precursor gRNA arrays. A Type II, Type V, or Type VI endonuclease of the disclosure can cleave a precursor gRNA into a mature gRNA, e.g., by endoribonucleolytic cleavage of the precursor. A Type II, Type V, or Type VI endonuclease of the disclosure can cleave a precursor gRNA array (that includes more than one gRNA arrayed in tandem) into two or more individual gRNAs.
In addition to the ability to cleave a target sequence in a targeted DNA, the Type V or Type VI endonucleases of the disclosure also possess collateral (trans-cleavage activity), i.e. the ability to promiscuously cleave non-targeted oligonucleotides, once activated by detection of a target DNA or RNA. Without being bound to any theory or mechanism, generally once a Type V or Type VI endonuclease of the disclosure is activated by a gRNA, which occurs when a sample includes a target sequence to which the gRNA hybridizes (i.e., the sample includes the targeted DNA or the targeted RNA), the Type V or Type VI becomes a nuclease that promiscuously cleaves single stranded oligonucleotides (i.e., non-target single stranded oligonucleotides, i.e., single stranded oligonucleotides to which the guide sequence of the gRNA does not hybridize). Thus, when the targeted DNA (double or single stranded) or RNA is present in the sample (e.g., in some embodiments above a threshold amount), the result can be cleavage (collateral) of oligonucleotidesin the sample, which can be detected using any convenient detection method (e.g., using a labeled single stranded detector DNA, labeled detector RNA, or labeled detector DNA/RNA chimeric oligonucleotides).
Accordingly, provided herein are methods and compositions for detecting a target DNA (dsDNA or ssDNA) or RNA in a sample. Also provided are methods and compositions for cleaving non-target oligonucleotides (e.g. used as detectors).
As used herein, generally a “detector” comprises an oligonucleotide of any nature, single or double stranded and does not hybridize with the guide sequence of the gRNA (i.e., the detector oligonucleotide that is a non-target). Exemplary detectors include, but are not limited to ssDNA, dsDNA, ssRNA, ss DNA/RNA chimeras, dsRNA, RNA comprising ss and ds regions, and ss or ds oligonucleotides containing RNA and DNA nucleotides (as used herein ss=single stranded; and ds=double stranded). Ultimately, the preference of the particular CRISPR-Cas protein in question will be determined, and the appropriate detector(s) will be utilized.
The detection methods based on the collateral activity of the Type V or Type VI endonucleases of the disclosure can include:
(a) contacting the sample with: (i) a Type V or Type VI endonuclease of the disclosure; (ii) a gRNA comprising: a region that binds to the Type V or Type VI endonuclease, and a guide sequence that hybridizes with the target DNA; and (iii) a detector that does not hybridize with the guide sequence of the gRNA; and
(b) measuring a detectable signal produced by cleavage of the detector by the Type V or Type VI endonuclease, thereby detecting the target DNA.
Once a subject Type V or Type VI endonuclease is activated by a gRNA, which can occur when the sample includes a target DNA to which the gRNA hybridizes (i.e., the sample includes the targeted sequence in the target DNA), the Type V or Type VI can be activated to function as an endoribonuclease that non-specifically cleaves detector oligonucleotides (including non-target ss oligonucleotides) present in the sample. Thus, when the target DNA is present in the sample, the result is cleavage of a detector oligonucleotide in the sample, which can be detected using any convenient detection method (e.g., using a labeled detector oligonucleotides).
Also provided are methods and compositions for cleaving detector oligonucleotides (e.g., ssDNAs, ssRNAs, ssDNA/RNA chimeras or detectors comprising ss and ds regions). Such methods can include contacting a population of nucleic acids, wherein said population comprises a target DNA and a plurality of non-target ss oligonucleotides, with: (i) a Type V or Type VI endonuclease of the disclosure; and (ii) a gRNA comprising: a region that binds to the Type V or Type VI effector protein, and a guide sequence that hybridizes with the target DNA, wherein the Type V or Type VI endonuclease cleaves non-target ss oligonucleotides
Accordingly, provided herein is a method of detecting a target DNA or RNA in a sample, the method comprising:
(a) contacting the sample with:
(i) a Type V or Type VI endonuclease of the disclosure;
(ii) a gRNA comprising a spacer sequence that is capable of hybridizing with a target sequence in a target DNA or RNA; and
(iii) a labeled detector oligonucleotide that does not hybridize with the spacer sequence of the gRNA; and
(b) measuring a detectable signal produced by cleavage of the labeled detector oligonucleotide by the Type V or Type VI endonuclease, thereby detecting the target target DNA or RNA.
In some embodiments, the contacting step can be carried out in an acellular environment, e.g., outside of a cell. In other embodiments, contacting step can be carried out inside a cell. The contacting step can be carried out in a cell in vitro. The contacting step can be carried out in a cell in vivo. The contacting step of a detection method can be carried out in a composition comprising divalent metal ions.
The gRNA can be provided as RNA or as a nucleic acid encoding the gRNA (e.g., a DNA such as a recombinant expression vector), described herein.
The contacting, prior to the measuring step, can last for any period of time, e.g from 5 seconds to 2 hours or more, prior to the measuring step. In some embodiments the sample is contacted for 45 minutes or less prior to the measuring step. In some embodiments the sample is contacted for 30 minutes or less prior to the measuring step. In some embodiments the sample is contacted for 10 minutes or less prior to the measuring step. In some embodiments the sample is contacted for 5 minutes or less prior to the measuring step. In some embodiments the sample is contacted for 1 minute or less prior to the measuring step. In some embodiments the sample is contacted for from 50 seconds to 60 seconds prior to the measuring step. In some embodiments the sample is contacted for from 40 seconds to 50 seconds prior to the measuring step. In some embodiments the sample is contacted for from 30 seconds to 40 seconds prior to the measuring step. In some embodiments the sample is contacted for from 20 seconds to 30 seconds prior to the measuring step. In some embodiments the sample is contacted for from 10 seconds to 20 seconds prior to the measuring step.
The detection methods provided herein can detect a target DNA or RNA with a high degree of sensitivity. Accordingly, in some embodiments, the detection methods of the disclosure can be used to detect a target DNA or RNA present in a sample comprising a plurality of DNA or RNA (including the target DNA or RNA and a plurality of non-target DNAs or RNAs), where the target DNA or RNA is present at one or more copies per 5 to 10{circumflex over ( )}9 copies of the non-target DNAs or RNAs).
In some embodiments, the threshold of detection, for a detection method of detecting a target DNA or RNA in a sample, is 10 nM or less. The term “threshold of detection” is used herein to describe the minimal amount of target DNA or RNA that must be present in a sample in order for detection to occur. In some embodiments, a subject composition or method exhibits an attomolar (aM) sensitivity of detection. In some embodiments, a subject composition or method exhibits a femtomolar (fM) sensitivity of detection. In some embodiments, a subject composition or method exhibits a picomolar (pM) sensitivity of detection. In some embodiments, a subject composition or method exhibits a nanomolar (nM) sensitivity of detection.
a. Target DNA and RNA
A target DNA can be single stranded (ssDNA) or double stranded (dsDNA). There need not be any preference or requirement for a PAM sequence in a single stranded target DNA. A target RNA can be single stranded RNA.
The source of the target DNA or RNA can be any source. In some embodiments the target DNA or RNA is a viral or bacterial DNA or RNA (e.g., a genomic DNA or RNA of a DNA or RNA virus or bacteria). As such, detection method can be for detecting the presence of a viral or bacterial DNA amongst a population of nucleic acids (e.g., in a sample). In the case of a RNA-carrying organism, for example, a RNA virus (e.g. a coronavirus)—it is understood that a step such as reverse transcription may be carried out on a sample comprising the RNA-carrying organism to generated cDNA, and the cDNA is then the target DNA. Alternatively, the RNA can also be detected directly using a Type VI endonuclease of the disclosure.
Exemplary non-limiting sources for target DNA or RNA are provided in Tables 10a-10f. Without being limited to a particular methodology, if the genome of the target is a DNA, and the CRISPR-Cas enzyme utilized is an RNA-targeting enzyme, an in vitro transcription (IVT) step could be included to transcribe the genome to RNA, prior to assessment. Likewise, without being limited to a particular methodology, if the genome of the target is a RNA, and the CRISPR-Cas enzyme utilized is an DNA-targeting enzyme, a reverse transcriptase (RT) step could be included to reverse transcribe the genome to DNA, prior to assessment.
DNA or RNA obtained from viruses and bacteria related to respiratory infections may also be targeted. A list of targets of interest may include the examples shown in Table 10c.
DNA or RNA obtained from viruses and bacteria related to sexually transmitted diseases may also be targeted. A list of targets of interest may include the examples shown in Table 10d.
Neisseria gonorrhoeae
Other DNA or RNA targets may also be targeted. As another example, male genes to determine the sex of the embryo of a pregnant woman/animal, and the male genes to determine the sex of plants and seeds may also be targeted. Examples of further targets of interest may include the following shown in Table 10e.
Other miscellaneous targets of interest that provide sources for DNA or RNA targets are shown in Table 10f.
b. Samples
The term “sample” is used herein to mean any sample that includes DNA or RNA (e.g., in order to determine whether a target DNA or RNA is present among a population of DNA or RNAs). As noted above, the DNA can be single stranded, double stranded DNA, complementary DNA, and the like.
A sample intended for detection comprises a plurality of nucleic acids. Thus, in some embodiments a sample includes two or more (e.g., 3 or more, 5 or more, 10 or more, 20 or more, 50 or more, 100 or more, 500 or more, 1,000 or more, or 5,000 or more) nucleic acids (e.g., DNA or RNAs). A detection method can be used as a very sensitive way to detect a target DNA or RNA present in a sample (e.g., in a complex mixture of nucleic acids such as DNA or RNAs).
In some embodiments the sample includes 5 or more DNA or RNAs (e.g., 10 or more, 20 or more, 50 or more, 100 or more, 500 or more, 1,000 or more, or 5,000 or more DNA or RNAs) that differ from one another in sequence. In some embodiments, the sample includes 10 or more, 20 or more, 50 or more, 100 or more, 500 or more, 10{circumflex over ( )}3 or more, 5×10{circumflex over ( )}3 or more, 10{circumflex over ( )}4 or more, 5×10{circumflex over ( )}4 or more, 10{circumflex over ( )}5 or more, 5×10{circumflex over ( )}5 or more, 10{circumflex over ( )}6 or more 5×10{circumflex over ( )}6 or more, or 10{circumflex over ( )}7 or more, DNA or RNAs. In some embodiments, the sample comprises from 10 to 20, from 20 to 50, from 50 to 100, from 100 to 500, from 500 to 10{circumflex over ( )}3, from 10{circumflex over ( )}3 to 5×10{circumflex over ( )}3, from 5×10{circumflex over ( )}3 to 10{circumflex over ( )}4, from 10{circumflex over ( )}4 to 5×10{circumflex over ( )}4, from 5×10{circumflex over ( )}4 to 10{circumflex over ( )}5, from 10{circumflex over ( )}5 to 5×10{circumflex over ( )}5, from 5×10{circumflex over ( )}5 to 10{circumflex over ( )}6, from 10{circumflex over ( )}6 to 5×10{circumflex over ( )}6, or from 5×10{circumflex over ( )}6 to 10{circumflex over ( )}7, or more than 10{circumflex over ( )}7, DNA or RNAs. In some embodiments, the sample comprises from 5 to 10{circumflex over ( )}7 DNA or RNAs (e.g., that differ from one another in sequence)(e.g., from 5 to 10{circumflex over ( )}6, from 5 to 10{circumflex over ( )}5, from 5 to 50,000, from 5 to 30,000, from 10 to 10{circumflex over ( )}6, from 10 to 10{circumflex over ( )}5, from 10 to 50,000, from 10 to 30,000, from 20 to 10{circumflex over ( )}6, from 20 to 10{circumflex over ( )}5, from 20 to 50,000, or from 20 to 30,000 DNA or RNAs).
In some embodiments the sample includes 20 or more DNA or RNAs that differ from one another in sequence. In some embodiments, the sample includes DNA or RNAs from a cell lysate (e.g., a eukaryotic cell lysate, a mammalian cell lysate, a human cell lysate, a prokaryotic cell lysate, a plant cell lysate, and the like). For example, in some embodiments the sample includes DNA or RNA from a cell such as a eukaryotic cell, e.g., a mammalian cell such as a human cell.
The sample can be derived from any source, e.g., the sample can be a synthetic combination of purified DNA or RNAs; the sample can be a cell lysate, a DNA or RNA-enriched cell lysate, or DNA or RNAs isolated and/or purified from a cell lysate. The sample can be from a patient (e.g., for the purpose of diagnosis). The sample can be from permeabilized cells. The sample can be from crosslinked cells. The sample can be in tissue sections.
A sample can include a target DNA or RNA and a plurality of non-target DNA or RNAs. In some embodiments, the target DNA or RNA is present in the sample at one or more copies per 5 to 10{circumflex over ( )}9 copies of the non-target DNA or RNAs.
Suitable samples include but are not limited to urine, blood, serum, plasma, lymphatic fluid, cerebrospinal fluid, saliva, nasopharyngeal, oropharyngeal, nasopharyngeal/oropharyngeal, aspirate, or biopsy sample. Thus, the term “sample” with respect to a patient encompasses blood and other liquid samples of biological origin, solid tissue samples such as a biopsy specimen or tissue cultures or cells derived therefrom and the progeny thereof. Samples also can be samples that have been manipulated in any way after their procurement, such as by treatment with reagents; washed; or enrichment for certain cell populations, such as cancer cells. The samples can be obtained by use of a swab, for example, a nasopharyngeal swab, an oropharyngeal swab, or a nasopharyngeal/oropharyngeal swab. Samples also can be samples that have been enriched for particular types of molecules, e.g., DNA or RNAs. Samples encompasses biological samples such as a clinical sample such as blood, plasma, serum, aspirate, cerebral spinal fluid (CSF), and also includes tissue obtained by surgical resection, tissue obtained by biopsy, cells in culture, cell supernatants, cell lysates, tissue samples, organs, bone marrow, and the like. A “biological sample” includes biological fluids derived therefrom (e.g., cancerous cell, infected cell, etc.), e.g., a sample comprising DNA or RNAs that is obtained from such cells (e.g., a cell lysate or other cell extract comprising DNA or RNAs).
A sample can comprise, or can be obtained from, any of a variety of cells, tissues, organs, or acellular fluids. Suitable sample sources include eukaryotic cells, bacterial cells, and archaeal cells. Suitable sample sources include single-celled organisms and multi-cellular organisms. Suitable sample sources include single-cell eukaryotic organisms; a plant or a plant cell; an algal cell; a fungal cell; an animal cell, tissue, or organ; a cell, tissue, or organ from an invertebrate animal; a cell, tissue, fluid, or organ from a vertebrate animal; a cell, tissue, fluid, or organ from a mammal (e.g., a human; a non-human primate; an ungulate; a feline; a bovine; an ovine; a caprine; etc.). Suitable sample sources include nematodes, protozoans, and the like. Suitable sample sources include parasites such as helminths, malarial parasites, etc.
Suitable sample sources include a cell, tissue, or organism of any of the six kingdoms.
Suitable sources of a sample include cells, fluid, tissue, or organ taken from an organism; from a particular cell or group of cells isolated from an organism; etc. For example, where the organism is a plant, suitable sources include xylem, the phloem, the cambium layer, leaves, roots, etc. Where the organism is an animal, suitable sources include particular tissues (e.g., lung, liver, heart, kidney, brain, spleen, skin, fetal tissue, etc.), or a particular cell type (e.g., neuronal cells, epithelial cells, endothelial cells, astrocytes, macrophages, glial cells, islet cells, T lymphocytes, B lymphocytes, etc.).
In some embodiments, the source of the sample is a (or is suspected of being a diseased cell, fluid, tissue, or organ.
In some embodiments, the source of the sample is a normal (non-diseased) cell, fluid, tissue, or organ.
In some embodiments, the source of the sample is a (or is suspected of being a pathogen-infected cell, tissue, or organ. For example, the source of a sample can be an individual who may or may not be infected—and the sample could be any biological sample (e.g., blood, saliva, biopsy, plasma, serum, bronchoalveolar lavage, sputum, a fecal sample, cerebrospinal fluid, a fine needle aspirate, a swab sample (e.g., a buccal swab, a cervical swab, a nasal swab), interstitial fluid, synovial fluid, nasal discharge, tears, buffy coat, a mucous membrane sample, an epithelial cell sample (e.g., epithelial cell scraping), etc.) collected from the individual. In some embodiments, the sample is a cell-free liquid sample.
In some embodiments, the sample is a liquid sample that can comprise cells (urine, blood, serum, plasma, lymphatic fluid, cerebrospinal fluid, saliva, nasopharyngeal, oropharyngeal, nasopharyngeal/oropharyngeal, aspirate, and biopsy). Pathogens include viruses, fungi, helminths, protozoa, malarial parasites, Plasmodium parasites, Toxoplasma parasites, Schistosoma parasites, and the like. “Helminths” include roundworms, heartworms, and phytophagous nematodes (Nematoda), flukes (Tematoda), Acanthocephala, and tapeworms (Cestoda). Protozoan infections include infections from Giardia spp., Trichomonas spp., African trypanosomiasis, amoebic dysentery, babesiosis, balantidial dysentery, Chaga's disease, coccidiosis, malaria and toxoplasmosis. Examples of pathogens such as parasitic/protozoan pathogens include, but are not limited to: Plasmodium falciparum, Plasmodium vivax, Trypanosoma cruzi and Toxoplasma gondii. Fungal pathogens include, but are not limited to: Cryptococcus neoformans, Histoplasma capsulatum, Coccidioides immitis, Blastomyces dermatitidis, Chlamydia trachomatis, and Candida albicans. Pathogenic viruses include RNA or DNA viruses, e.g., coronoavirus (e.g. SARS-CoV, SARS-CoV-2, MERS-CoV); immunodeficiency virus (e.g., HIV); influenza virus; dengue; West Nile virus; herpes virus; yellow fever virus; Hepatitis Virus C; Hepatitis Virus A; Hepatitis Virus B; papillomavirus; and the like. Pathogenic viruses can include DNA viruses such as: a papovavirus (e.g., human papillomavirus (HPV), polyomavirus); a hepadnavirus (e.g., Hepatitis B Virus (HBV)); a herpesvirus (e.g., herpes simplex virus (HSV), varicella zoster virus (VZV), epstein-barr virus (EBV), cytomegalovirus (CMV), herpes lymphotropic virus, Pityriasis Rosea, kaposi's sarcoma-associated herpesvirus); an adenovirus (e.g., atadenovirus, aviadenovirus, ichtadenovirus, mastadenovirus, siadenovirus); a poxvirus (e.g., smallpox, vaccinia virus, cowpox virus, monkeypox virus, orf virus, pseudocowpox, bovine papular stomatitis virus; tanapox virus, yaba monkey tumor virus; molluscum contagiosum virus (MCV)); a parvovirus (e.g., adeno-associated virus (AAV), Parvovirus B19, human bocavirus, bufavirus, human parv4 G1); Geminiviridae; Nanoviridae; Phycodnaviridae; and the like. Pathogens can include, e.g., DNAviruses [e.g.: a papovavirus (e.g., human papillomavirus (HPV), polyomavirus); a hepadnavirus (e.g., Hepatitis B Virus (HBV)); a herpesvirus (e.g., herpes simplex virus (HSV), varicella zoster virus (VZV), epstein-barr virus (EBV), cytomegalovirus (CMV), herpes lymphotropic virus, Pityriasis Rosea, kaposi's sarcoma-associated herpesvirus); an adenovirus (e.g., atadenovirus, aviadenovirus, ichtadenovirus, mastadenovirus, siadenovirus); a poxvirus (e.g., smallpox, vaccinia virus, cowpox virus, monkeypox virus, orf virus, pseudocowpox, bovine papular stomatitis virus; tanapox virus, yaba monkey tumor virus; molluscum contagiosum virus (MCV)); a parvovirus (e.g., adeno-associated virus (AAV), Parvovirus B19, human bocavirus, bufavirus, human parv4 G1); Geminiviridae; Nanoviridae; Phycodnaviridae; and the like], Mycobacterium tuberculosis, Streptococcus agalactiae, methicillin-resistant Staphylococcus aureus, Legionella pneumophila, Streptococcus pyogenes, Escherichia coli, Neisseria gonorrhoeae, Neisseria meningitidis, Pneumococcus, Cryptococcus neoformans, Histoplasma capsulatum, Hemophilus influenzae B, Treponema pallidum, Lyme disease spirochetes, Pseudomonas aeruginosa, Mycobacterium leprae, Brucella abortus, rabies virus, influenza virus, cytomegalovirus, herpes simplex virus I, herpes simplex virus II, human serum parvo-like virus, respiratory syncytial virus, varicella-zoster virus, hepatitis B virus, hepatitis C virus, measles virus, adenovirus, human T-cell leukemia viruses, Epstein-Barr virus, murine leukemia virus, mumps virus, vesicular stomatitis virus, Sindbis virus, lymphocytic choriomeningitis virus, wart virus, blue tongue virus, Sendai virus, feline leukemia virus, Reovirus, polio virus, simian virus 40, mouse mammary tumor virus, dengue virus, rubella virus, West Nile virus, Plasmodium falciparum, Plasmodium vivax, Toxoplasma gondii, Trypanosoma rangeli, Trypanosoma cruzi, Trypanosoma rhodesiense, Trypanosoma brucei, Schistosoma mansoni, Schistosoma japonicum, Babesia bovis, Eimeria tenella, Onchocerca volvulus, Leishmania tropica, Mycobacterium tuberculosis, Trichinella spiralis, Theileria parva, Taenia hydatigena, Taenia ovis, Taenia saginata, Echinococcus granulosus, Mesocestoides corti, Mycoplasma arthritidis, M. hyorhinis, M. orale, M. arginini, Acholeplasma laidlawii, M. salivarium and M. pneumoniae.
c. Measuring a Detectable Signal
The detection method generally includes a step of measuring (e.g., measuring a detectable signal produced by the Type V or Type VI of the disclosure. A detectable signal can be any signal that is produced when ss oliogonucleotide is cleaved. The step of detection can involve a fluorescence-based detection. The readout of such detection methods can be any convenient readout. Examples of possible readouts include but are not limited to: a measured amount of detectable fluorescent signal; a visual analysis of bands on a gel (e.g., bands that represent cleaved product versus uncleaved substrate), a visual or sensor based detection of the presence or absence of a color (i.e., color detection method), the presence or absence of (or a particular amount of) a magnetic signal and the presence or absence of (or a particular amount of) an electrical signal.
The measuring can in some embodiments be quantitative, e.g., in the sense that the amount of signal detected can be used to determine the amount of target DNA or RNA present in the sample. The measuring can in some embodiments be qualitative, e.g., in the sense that the presence or absence of detectable signal can indicate the presence or absence of targeted DNA or RNA (e.g., virus, SNP, etc.). In some embodiments, a detectable signal will not be present (e.g., above a given threshold level) unless the targeted DNA or RNA(s) (e.g., virus, SNP, etc.) is present above a particular threshold concentration. In some embodiments, the threshold of detection can be titrated by modifying the amount of the Type V or Type VI endonuclease provided.
The compositions and methods of this disclosure can be used to detect any DNA or RNA target.
In some embodiments, the detection methods of the disclosure can be used to determine the amount of a target DNA or RNA in a sample (e.g., a sample comprising the target DNA or RNA and a plurality of non-target DNA or RNAs). Determining the amount of a target DNA or RNA in a sample can comprise comparing the amount of detectable signal generated from a test sample to the amount of detectable signal generated from a reference sample. Determining the amount of a target DNA or RNA in a sample can comprise: measuring the detectable signal to generate a test measurement; measuring a detectable signal produced by a reference sample to generate a reference measurement; and comparing the test measurement to the reference measurement to determine an amount of target DNA or RNA present in the sample.
In some embodiments, the detectable signal is detectable in less than 1, 2, 3, 4, 5, 10, 15, 20, 30, 60, 90, 120, 150, 180, 210, or 240 minutes.
In some embodiments, sensitivity of a subject composition and/or method (e.g., for detecting the presence of a target DNA or RNA, such as viral DNA or RNA or a SNP, in cellular genomic DNA or RNA) can be increased by coupling detection with nucleic acid amplification.
In some embodiments, the nucleic acids in a sample are amplified prior to contact with a Type V or Type VI; in particular embodiments, the Type V or Type VI remains in an inactive state until amplification has concluded. In some embodiments, the nucleic acids in a sample are amplified simultaneous with contact with Type V or Type VI. Amplification can be carried out using primers. As it relates to the overall processing time for the detection method, amplification can occur for 5 seconds or more, up to 240 minutes or more.
Various amplification methods and components will be known to one of ordinary skill in the art and any convenient method can be used.
Nucleic acid amplification can comprise polymerase chain reaction (PCR), reverse transcription PCR (RT-PCR), quantitative PCR (qPCR), reverse transcription qPCR (RT-qPCR), isothermal PCR, nested PCR, multiplex PCR, asymmetric PCR, touchdown PCR, random primer PCR, hemi-nested PCR, polymerase cycling assembly (PCA), colony PCR, ligase chain reaction (LCR), digital PCR, methylation specific-PCR (MSP),co-amplification at lower denaturation temperature-PCR (COLD-PCR), allele-specific PCR, intersequence-specific PCR (ISS-PCR), whole genome amplification (WGA), inverse PCR, and thermal asymmetric interlaced PCR (TAIL-PCR).
In some embodiments the amplification is isothermal amplification. Isothermal nucleic acid amplification methods can therefore be carried out inside or outside of a laboratory environment. Examples of isothermal amplification methods include but are not limited to: loop-mediated isothermal Amplification (LAMP), helicase-dependent Amplification (HDA), recombinase polymerase amplification (RPA), strand displacement amplification (SDA), nucleic acid sequence-based amplification (NASBA), transcription mediated amplification (TMA), nicking enzyme amplification reaction (NEAR), rolling circle amplification (RCA), multiple displacement amplification (MDA), Ramification (RAM), circular helicase-dependent amplification (cHDA), single primer isothermal amplification (SPIA), signal mediated amplification of RNA technology (SMART), self-sustained sequence replication (3SR), genome exponential amplification reaction (GEAR) and isothermal multiple displacement amplification (IMDA).
d. Detector Oligonucleotides
The novel Type V or Type VI endonucleases of the disclosure possess collateral cleavage (trans-cleavage) activity.
In some embodiments, a detection method includes contacting a sample with: i) a Type V or Type VI endonuclease of the disclosure; ii) a gRNA (or precursor gRNA array); and iii) a detector that does not hybridize with the guide sequence of the gRNA. For example, in some embodiments, a detection method includes contacting a sample with a labeled detector that includes a fluorescence-emitting dye pair; the Type V or Type VI endonuclease of the disclosure has the ability to cleave the labeled detector after it is activated (by gRNA hybridizing to a target DNA or RNA); and the detectable signal that is measured is produced by the fluorescence-emitting dye pair. For example, in some embodiments, a detection method includes contacting a sample with a labeled detector comprising a fluorescence resonance energy transfer (FRET) pair or a quencher/fluor pair, or both. In some embodiments, a detection method includes contacting a sample with a labeled detector comprising a FRET pair. In some embodiments, a detection method includes contacting a sample with a labeled detector comprising a fluor/quencher pair.
Fluorescence-emitting dye pairs comprise a FRET pair or a quencher/fluor pair. In both embodiments of a FRET pair and a quencher/fluor pair, the emission spectrum of one of the dyes overlaps a region of the absorption spectrum of the other dye in the pair. As used herein, the term “fluorescence-emitting dye pair” is a generic term used to encompass both a “fluorescence resonance energy transfer (FRET) pair” and a “quencher/fluor pair”. The term “fluorescence-emitting dye pair” is used interchangeably with the phrase “a FRET pair and/or a quencher/fluor pair.”
In some embodiments (e.g., when the detector includes a FRET pair) the labeled detector produces an amount of detectable signal prior to being cleaved, and the amount of detectable signal that is measured is reduced when the labeled detector is cleaved. In some embodiments, the labeled detector produces a first detectable signal prior to being cleaved (e.g., from a FRET pair) and a second detectable signal when the labeled detector is cleaved (e.g., from a quencher/fluor pair). As such, in some embodiments, the labeled detector comprises a FRET pair and a quencher/fluor pair.
In some embodiments, the labeled detector comprises a FRET pair.
FRET donor and acceptor moieties (FRET pairs) will be known to one of ordinary skill in the art and any convenient FRET pair (e.g., any convenient donor and acceptor moiety pair) can be used. Examples of suitable FRET pairs include but are not limited to those presented in Table 11. FRET pairs provided in U.S. Pat. No. 10,253,365 are incorporate by reference herein in their entirety. In some embodiments, the FRET pair is 5′ 6-FAM and 3IABkFQ (Iowa Black (Registered)-FQ).
In some embodiments, a detectable signal is produced when the labeled detector is cleaved (e.g., in some embodiments, the labeled detector comprises a quencher/fluor pair).
Any fluorescent label can be utilized. Examples of fluorescent labels include, but are not limited to: an Alexa Fluor® dye, an ATTO dye (e.g., ATTO 390, ATTO 425, ATTO 465, ATTO 488, ATTO 495, ATTO 514, ATTO 520, ATTO 532, ATTO Rho6G, ATTO 542, ATTO 550, ATTO 565, ATTO Rho3B, ATTO Rho11, ATTO Rho12, ATTO Thio12, ATTO Rho101, ATTO 590, ATTO 594, ATTO Rho13, ATTO 610, ATTO 620, ATTO Rho14, ATTO 633, ATTO 647, ATTO 647N, ATTO 655, ATTO Oxa12, ATTO 665, ATTO 680, ATTO 700, ATTO 725, ATTO 740), a DyLight dye, a cyanine dye (e.g., Cy2, Cy3, Cy3.5, Cy3b, Cy5, Cy5.5, Cy7, Cy7.5), a FluoProbes dye, a Sulfo Cy dye, a Seta dye, an IRIS Dye, a SeTau dye, an SRfluor dye, a Square dye, fluorescein isothiocyanate (FITC), fluorescein amidite (FAM), tetramethylrhodamine (TRITC), Texas Red, Oregon Green, Pacific Blue, Pacific Green, Pacific Orange, quantum dots, and a tethered fluorescent protein.
Examples of quencher moieties include, but are not limited to: a dark quencher, a Black Hole Quencher® (BHQ®) (e.g., BHQ-0, BHQ-1, BHQ-2, BHQ-3), a Qxl quencher, an ATTO quencher (e.g., ATTO 540Q, ATTO 580Q, and ATTO 612Q), dimethylaminoazobenzenesulfonic acid (Dabsyl), Iowa Black RQ, Iowa Black FQ, IRDye QC-1, a QSY dye (e.g., QSY 7, QSY 9, QSY 21), AbsoluteQuencher, Eclipse, and metal clusters such as gold nanoparticles, and the like.
In some embodiments, a quencher moiety is selected from: a dark quencher, a Black Hole Quencher® (BHQ®) (e.g., BHQ-0, BHQ-1, BHQ-2, BHQ-3), a Qxl quencher, an ATTO quencher (e.g., ATTO 540Q, ATTO 580Q, and ATTO 612Q), dimethylaminoazobenzenesulfonic acid (Dabsyl), Iowa Black RQ, Iowa Black FQ, IRDye QC-1, a QSY dye (e.g., QSY 7, QSY 9, QSY 21), AbsoluteQuencher, Eclipse, and a metal cluster.
In some embodiments, cleavage of a labeled detector can be detected by measuring a colorimetric read-out. For example, the liberation of a fluorophore (e.g., liberation from a FRET pair, liberation from a quencher/fluor pair, and the like) can result in a wavelength shift (and thus color shift) of a detectable signal. Thus, in some embodiments, cleavage of a subject labeled detector can be detected by a color-shift. Such a shift can be expressed as a loss of an amount of signal of one color (wavelength), a gain in the amount of another color, a change in the ration of one color to another, and the like.
As provided herein, a labeled detector can be a nucleic acid mimetic. Polynucleotide mimics include PNAs, LNAs, CeNAs, and morpholino nucleic acids.
A labeled detector can also include one or more substituted sugar moieties.
A labeled detector may also include modified nucleotides.
e. Positive Controls
The detection methods provided herein can also include a positive control target DNA or RNA. In some embodiments, the methods include using a positive control gRNA that comprises a nucleotide sequence that hybridizes to a control target DNA or RNA. In some embodiments, the positive control target DNA or RNA is provided in various amounts. In some embodiments, the positive control target DNA or RNA is provided in various known concentrations, along with control non-target DNA or RNAs.
f. gRNA Arrays
In some embodiments, the method comprises contacting the sample with a precursor gRNA array, wherein the novel Type V or Type VI endonuclease of the disclosure cleaves the precursor gRNA array to produce said gRNA.
In some embodiments a such a gRNA array includes 2 or more gRNAs (e.g., 3 or more, 4 or more, 5 or more, 6 or more, or 7 or more, gRNAs). The gRNAs of a given array can target (i.e., can include guide sequences that hybridize to) different target sites of the same target DNA or RNA (e.g., which can increase sensitivity of detection) and/or can target different target DNA or RNAs (e.g., single nucleotide polymorphisms (SNPs), different strains of a particular virus, etc.), and such could be used for example to detect multiple strains of a virus. In some embodiments, each gRNA of a precursor gRNA array has a different guide sequence.
In some embodiments, the precursor gRNA array comprises two or more gRNAs that target different target sites within the same target DNA or RNA. For example, such a scenario can in some embodiments increase sensitivity of detection by activating Type II, Type V or Type VI endonuclease of the disclosure when either one hybridizes to the target DNA or RNA. As such, in some embodiments as subject composition (e.g., kit) or method includes two or more gRNAs (in the context of a precursor gRNA array, or not in the context of a precursor gRNA array, e.g., the gRNAs can be mature gRNAs).
In some embodiments, the precursor gRNA array comprises two or more gRNAs that target different target DNA or RNAs. For example, such a scenario can result in a positive signal when any one of a family of potential target DNA or RNAs is present. Such an array could be used for targeting a family of transcripts, e.g., based on variation such as single nucleotide polymorphisms (SNPs) (e.g., for diagnostic purposes). Such could also be useful for detecting whether any one of a number of different strains of virus is present. Such could also be useful for detecting whether any one of a number of different species, strains, isolates, or variants of a bacterium or virus is present As such, in some embodiments as subject composition (e.g., kit) or method includes two or more gRNAs (in the context of a precursor gRNA array, or not in the context of a precursor gRNA array, e.g., the gRNAs can be mature gRNAs).
Provided herein are compositions and pharmaceutical compositions comprising the Type II, Type V, or Type VI endonucleases and/or the Type II, Type V, or Type VI gRNAs of the disclosure, which can optionally include a pharmaceutically acceptable carrier and/or a protein stabilizing buffer, and/or a nucleic acid stabilizing buffer. In some embodiments, the Type II, Type V, or Type VI endonucleases and/or the Type II, Type V, or Type VI gRNAs are provided in a lyophilized form.
Provided herein are compositions comprising gRNAs and/or gRNA arrays of the disclosure (compatible for use with Type II, Type V, or Type VI endonucleases of the disclosure), and optionally a protein stabilizing buffer.
Provided herein are proteins comprising an amino acid sequence with 30%-99.5% homology to any one of SEQ ID NOs: 1-20. Provided herein are compositions comprising these proteins, and optionally a pharmaceutically acceptable carrier. Provided herein are these proteins and optionally a protein stabilizing buffer.
Provided herein are DNA polynucleotides encoding a sequence that encodes any of the Type II, Type V, or Type VI endonucleases of the disclosure. Also provided are recombinant expression vectors comprising such DNA polynucleotides. In some embodiments, a nucleotide sequence encoding a Type II, Type V, or Type VI endonuclease of the disclosure is operably linked to a promoter. In some embodiments, the nucleic acid encoding the Type II, Type V, or Type VI endonuclease further comprises a nuclear localization signal (NLS), useful for expression in eukaryotic systems.
Provided herein are DNA polynucleotides or RNAs comprising a sequence that encodes any of the gRNAs of the disclosure. Also provided are recombinant expression vectors comprising such DNA polynucleotides. In some embodiments, a nucleotide sequence encoding a gRNA of the disclosure is operably linked to a promoter.
Also provided herein are host cells comprising any of the recombinant vectors provided herein.
Provided herein are kits comprising one or more components of the Type II, Type V, and Type VI engineered systems described herein, useful for a variety of applications including, but not limited to, therapeutic and diagnostic applications.
In some embodiments provided herein is a kit comprising: (a) Type II endonuclease of the disclosure, or a nucleic acid encoding the Type II endonuclease; and (b) Type II gRNA, wherein the gRNA and the Type II endonuclease do not naturally occur together, wherein the gRNA is capable of hybridizing to a target sequence in a target DNA, and the gRNA is capable of forming a complex with the Type II endonuclease.
In some embodiments provided herein is a kit comprising: (a) Type V endonuclease, or a nucleic acid encoding the Type V endonuclease; and (b) Type V gRNA, wherein the gRNA and the Type V endonuclease do not naturally occur together, wherein the gRNA is capable of hybridizing to a target sequence in a single stranded or double stranded target DNA, and the gRNA is capable of forming a complex with the Type II endonuclease.
In exemplary embodiments, provided herein are diagnostic kits. In exemplary embodiments, the reagent components are provided in lyophilized form. In some embodiments, the reagent components are provided individually (either lyophilized or not lyophilized), in other embodiments, the reagent components are provided in a pre-mixed format (either lyophilized or not lyophilized).
By way of example only, the following are exemplary kit reagent components useful for the detection of SARS-CoV-2, a RNA virus, using one of the novel Type V or Type VI endonucleases of the disclosure.
(1) Lyophilized reaction mix containing reagents, SARS-CoV-2 primer sets and enzymes for reverse transcription and loop-mediated isothermal amplification (RT-LAMP) of a gene of diseasSARS-CoV-2 genome.
(2) Lyophilized reaction mix containing reagents, control RNAse P primer sets and enzymes for reverse transcription and RT-LAMP amplification of human housekeeping gene RNAse P.
(3) Lyophilized reaction mix containing reagents and CRISPR-Cas enzyme gRNA-RNP complexes for detection of a SARS-CoV-2 amplification product. Such mix may also include a labeled reporter, e.g. a 5′FAM-3′Quencher ssRNA or ssDNA-based oligonucleotide reporter, or a 5′FAM-3′Quencher single stranded DNA/RNA chimera-based oligonucleotide reporter.
(4) Lyophilized reaction mix containing reagents and Cas enzyme gRNA-RNP complexes for detection of RNAse P amplification product. Such mix may also include a labeled reporter, e.g. a 5′FAM-3′Quencher RNA-based oligonucleotide reporter.
The following examples are included for illustrative purposes and are not intend to limit the scope of the invention.
Metagenome sequences were obtained from environmental samples, and compiled to construct a database of putative CRISPR-Cas loci. CRISPR arrays were identified using CrisprCasFinder software. The criteria of filtering were putative Class II Type II, V, and VI effectors>400 aa, which were adjacent to cas genes and CRISPR arrays. Sequences were aligned with Clustal Omega using HMM profiles. Genes were identified from metagenomic samples. Scripts were run on the sequences, designed to find CRISPR sequences and accompanying genes encoding proteins showing homology with reported Cas enzymes. Comparative BlastP analyses were performed against sequences deposited in databases (NCBI, LENS), discarding those candidates showing Id %>50 with deposited proteins. Presence of specific domains (e.g. RuvC, HEPN) and catalytic motifs were determined (CD-search, phmmer, UNIPROT). The novel endonucleases described herein were identified.
Expression vectors were artificially synthesized. The effector plasmid codon optimization, synthesis, and cloning were generated. Expression plasmids were transformed into E. coli.
SEQ ID NO: 1 represents a novel Type V variant of the disclosure, Type V Cas_1, (1283 amino acids in length).
The Type V Cas_1 protein was purified via the following scheme. Recombinant protein was expressed in E. coli NiCo21 (DE3) cells (NEB # C2529H) harboring the pET28a/Type V Cas_1-H6× expression plasmid by growing in LB broth culture medium at 37° C. followed by induction of expression at 28° C. for 3 hr in presence of 0.25 mM IPTG. Cells were disrupted by sonication prior to chromatographic purification. Recombinant protein was purified using a HisTrapHP (Ni-NTA) (GE Healthcare) followed by a HiPrep™ 26/10desalting column (GE Healthcare) where the protein was desalted into storage buffer containing Tris-HCl 50 mM (pH 8), NaCl 200 mM, MgCl2 20 mM, DTT 1 mM. Protein purity was controlled by Coomassie blue staining after SDS-PAGE on a 10% polyacrylamide gel. Protein concentrations were determined by UV spectroscopy and Qubit protein assay (Invitrogen). Purified proteins were stored at −80° C.
SEQ ID NO: 2 represents a novel Type V variant of the disclosure, Type V Cas_2, (1235 amino acids in length).
Detection assay was performed at 46° C., 50° C., 52.5° C. and 60° C. using Type V Cas_2 complexes to a final concentration of 150 nM Type V Cas_2:150 nM sgRNA:50 nM activator in a solution containing 1× Binding Buffer (25 mM NaCl, 10 mM Tris-HCl, 10 mM MgCl2, 5 mM DTT, 100 g/ml BSA, pH 8.8, MnCl2 0.5, 1, 2 mM) and 600 nM of ssDNA FAMQ reporter substrate in a 40 μl reaction. Reactions were incubated in a qPCR (Bio-Rad) for 100 minutes with fluorescence measurements taken every 1 minute (ssDNA FQ substrates=λex: 485 nm; λem: 538 nm). Non-template negative control (NTC) fluorescence values were calculated from reactions carried out in the absence of ssDNA Hanta target. Results are shown in
SEQ ID NO: 7 represents a novel Type V variant of the disclosure, Type V Cas_7, (1245 amino acids in length).
Detection assay was performed at 30° C. using Type V Cas_3 complexes to a final concentration of 150 nM Type V Cas_3:150 nM sgRNA:10 nM activator in a solution containing 1× Binding Buffer and 625 nM of each ssDNA FAMQ reporter substrate in a 40 μl reaction. Three different commercial Binding Buffers were tested: NEB 2.1, CutSmart and Isothermal Amplification Buffer (New England Biotechnology), a curve of pH (from 6.8 to 9.6) was prepared using the base of a 2.1 NEB buffer (50 mM NaCl, 10 mM Tris-HCl, 10 mM MgCl2, 100 g/ml BSA). The salt concentration curve (25-200 mM NaCl) was prepared at 7.9 pH from 2.1 NEB buffer (25-200 mM NaCl, 10 mM Tris-HCl, 10 mM MgCl2, 100 g/ml BSA, pH 7.9). Reactions were incubated 120 minutes in a fluorescence plate reader Synergy H1 (Bio-Tek) and background-corrected fluorescence values were calculated by subtracting fluorescence values obtained from reactions carried out by triplicate in the absence of ssDNA Hanta target. Results are shown in
Detection assay was performed at 30° C. using Type V Cas_4 complexes to a final concentration of 150 nM Type V Cas_4:150 nM sgRNA:10 nM activator in a solution containing 1× Binding Buffer and 625 nM of each ssDNA FAMQ reporter substrate in a 40 μl reaction. Three different commercial Binding Buffers were tested: NEB 2.1, CutSmart and Isothermal Amplification Buffer (New England Biotechnology), a curve of pH (from 6.8 to 9.5) was prepared using the base of a 2.1 NEB buffer (50 mM NaCl, 10 mM Tris-HCl, 10 mM MgCl2, 100 g/ml BSA). The salt concentration curve (25-200 mM NaCl) was prepared at 7.9 pH from 2.1 NEB buffer (25-200 mM NaCl, 10 mM Tris-HCl, 10 mM MgCl2, 100 g/ml BSA, pH 7.9). Reactions were incubated 150 minutes in a fluorescence plate reader Synergy H1 (Bio-Tek) and background-corrected fluorescence values were calculated by subtracting fluorescence values obtained from reactions carried out by triplicate in the absence of ssDNA Hanta target.
Thermal stability assay was performed at a temperature range from 20° C. to 90° C. using 15 ug of Type V Cas_5 protein in a solution containing 1× Desalting buffer desalting buffer (50 mM Tris-HCl pH 8, 200 mM NaCl, 20 mM MgCl2, 1 mM DTT) and 10× of SYPRO® dye in a 30 μl reaction. The mix was incubated in a qPCR (Bio-Rad) increasing the temperature from 20° C. to 90° C. with fluorescence measurements taken every 1° C. (SYPRO® dye=λex: 300 nm; λem: 570 nm). A no-protein negative control fluorescence values were calculated from samples without protein. Results are shown in
The present application claims the benefit of U.S. Provisional Application No. 63/109,302 filed on Nov. 3, 2020, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63109302 | Nov 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2021/057798 | Nov 2021 | US |
Child | 17541398 | US |